Get item
Transformation to wrap around the pyspark getItem function
koheesio.spark.transformations.get_item.GetItem #
Get item from list or map (dictionary)
Wrapper around pyspark.sql.functions.getItem
GetItem is strict about the data type of the column. If the column is not a list or a map, an error will be
raised.
Note
Only MapType and ArrayType are supported.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| columns | Union[str, List[str]] | The column (or list of columns) to get the item from. Alias: column | required | 
| target_column | Optional[str] | The column to store the result in. If not provided, the result will be stored in the source column. Alias: target_suffix - if multiple columns are given as source, this will be used as a suffix. | None | 
| key | Union[int, str] | The key (or index) to get from the list or map. If the column is a list (ArrayType), this should be an integer. If the column is a dict (MapType), this should be a string. Alias: index | required | 
Example
Example with list (ArrayType)#
By specifying an integer for the parameter "key", getItem knows to get the element at index n of a list (index starts at 0).
input_df:
| id | content | 
|---|---|
| 1 | [1, 2, 3] | 
| 2 | [4, 5] | 
| 3 | [6] | 
| 4 | [] | 
output_df = GetItem(
    column="content",
    index=1,  # get the second element of the list
    target_column="item",
).transform(input_df)
output_df:
| id | content | item | 
|---|---|---|
| 1 | [1, 2, 3] | 2 | 
| 2 | [4, 5] | 5 | 
| 3 | [6] | null | 
| 4 | [] | null | 
Example with a dict (MapType)#
input_df:
| id | content | 
|---|---|
| 1 | {key1 -> value1} | 
| 2 | {key1 -> value2} | 
| 3 | {key2 -> hello} | 
| 4 | {key2 -> world} | 
output_df:
| id | content | item | 
|---|---|---|
| 1 | {key1 -> value1} | null | 
| 2 | {key1 -> value2} | null | 
| 3 | {key2 -> hello} | hello | 
| 4 | {key2 -> world} | world | 
            key
  
      class-attribute
      instance-attribute
  
#
key: Union[int, str] = Field(
    default=...,
    alias="index",
    description="The key (or index) to get from the list or map. If the column is a list (ArrayType), this should be an integer. If the column is a dict (MapType), this should be a string. Alias: index",
)
ColumnConfig #
Limit the data types to ArrayType and MapType.
koheesio.spark.transformations.get_item.get_item #
Wrapper around pyspark.sql.functions.getItem
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| column | Column | The column to get the item from | required | 
| key | Union[str, int] | The key (or index) to get from the list or map. If the column is a list (ArrayType), this should be an integer. If the column is a dict (MapType), this should be a string. | required | 
Returns:
| Type | Description | 
|---|---|
| Column | The column with the item |