Get item
Transformation to wrap around the pyspark getItem function
koheesio.spark.transformations.get_item.GetItem #
Get item from list or map (dictionary)
Wrapper around pyspark.sql.functions.getItem
GetItem
is strict about the data type of the column. If the column is not a list or a map, an error will be
raised.
Note
Only MapType and ArrayType are supported.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns |
Union[str, List[str]]
|
The column (or list of columns) to get the item from. Alias: column |
required |
target_column |
Optional[str]
|
The column to store the result in. If not provided, the result will be stored in the source column. Alias: target_suffix - if multiple columns are given as source, this will be used as a suffix. |
None
|
key |
Union[int, str]
|
The key (or index) to get from the list or map. If the column is a list (ArrayType), this should be an integer. If the column is a dict (MapType), this should be a string. Alias: index |
required |
Example
Example with list (ArrayType)#
By specifying an integer for the parameter "key", getItem knows to get the element at index n of a list (index starts at 0).
input_df:
id | content |
---|---|
1 | [1, 2, 3] |
2 | [4, 5] |
3 | [6] |
4 | [] |
output_df = GetItem(
column="content",
index=1, # get the second element of the list
target_column="item",
).transform(input_df)
output_df:
id | content | item |
---|---|---|
1 | [1, 2, 3] | 2 |
2 | [4, 5] | 5 |
3 | [6] | null |
4 | [] | null |
Example with a dict (MapType)#
input_df:
id | content |
---|---|
1 | {key1 -> value1} |
2 | {key1 -> value2} |
3 | {key2 -> hello} |
4 | {key2 -> world} |
output_df:
id | content | item |
---|---|---|
1 | {key1 -> value1} | null |
2 | {key1 -> value2} | null |
3 | {key2 -> hello} | hello |
4 | {key2 -> world} | world |
key
class-attribute
instance-attribute
#
key: Union[int, str] = Field(
default=...,
alias="index",
description="The key (or index) to get from the list or map. If the column is a list (ArrayType), this should be an integer. If the column is a dict (MapType), this should be a string. Alias: index",
)
ColumnConfig #
Limit the data types to ArrayType and MapType.
koheesio.spark.transformations.get_item.get_item #
Wrapper around pyspark.sql.functions.getItem
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column |
Column
|
The column to get the item from |
required |
key |
Union[str, int]
|
The key (or index) to get from the list or map. If the column is a list (ArrayType), this should be an integer. If the column is a dict (MapType), this should be a string. |
required |
Returns:
Type | Description |
---|---|
Column
|
The column with the item |