Skip to content

Get item

Transformation to wrap around the pyspark getItem function

koheesio.spark.transformations.get_item.GetItem #

Get item from list or map (dictionary)

Wrapper around pyspark.sql.functions.getItem

GetItem is strict about the data type of the column. If the column is not a list or a map, an error will be raised.


Only MapType and ArrayType are supported.


Name Type Description Default
columns Union[str, List[str]]

The column (or list of columns) to get the item from. Alias: column

target_column Optional[str]

The column to store the result in. If not provided, the result will be stored in the source column. Alias: target_suffix - if multiple columns are given as source, this will be used as a suffix.

key Union[int, str]

The key (or index) to get from the list or map. If the column is a list (ArrayType), this should be an integer. If the column is a dict (MapType), this should be a string. Alias: index

Example with list (ArrayType)#

By specifying an integer for the parameter "key", getItem knows to get the element at index n of a list (index starts at 0).


id content
1 [1, 2, 3]
2 [4, 5]
3 [6]
4 []
output_df = GetItem(
    index=1,  # get the second element of the list


id content item
1 [1, 2, 3] 2
2 [4, 5] 5
3 [6] null
4 [] null
Example with a dict (MapType)#


id content
1 {key1 -> value1}
2 {key1 -> value2}
3 {key2 -> hello}
4 {key2 -> world}

output_df = GetItem(
    column= "content",
As we request the key to be "key2", the first 2 rows will be null, because it does not have "key2".


id content item
1 {key1 -> value1} null
2 {key1 -> value2} null
3 {key2 -> hello} hello
4 {key2 -> world} world

key class-attribute instance-attribute #

key: Union[int, str] = Field(default=..., alias='index', description='The key (or index) to get from the list or map. If the column is a list (ArrayType), this should be an integer. If the column is a dict (MapType), this should be a string. Alias: index')

ColumnConfig #

Limit the data types to ArrayType and MapType.

data_type_strict_mode class-attribute instance-attribute #

data_type_strict_mode = True

limit_data_type class-attribute instance-attribute #

limit_data_type = run_for_all_data_type

run_for_all_data_type class-attribute instance-attribute #

run_for_all_data_type = [ARRAY, MAP]

func #

func(column: Column) -> Column
Source code in src/koheesio/spark/transformations/
def func(self, column: Column) -> Column:
    return get_item(column, self.key)

koheesio.spark.transformations.get_item.get_item #

get_item(column: Column, key: Union[str, int])

Wrapper around pyspark.sql.functions.getItem


Name Type Description Default
column Column

The column to get the item from

key Union[str, int]

The key (or index) to get from the list or map. If the column is a list (ArrayType), this should be an integer. If the column is a dict (MapType), this should be a string.



Type Description

The column with the item

Source code in src/koheesio/spark/transformations/
def get_item(column: Column, key: Union[str, int]):
    Wrapper around pyspark.sql.functions.getItem

    column : Column
        The column to get the item from
    key : Union[str, int]
        The key (or index) to get from the list or map. If the column is a list (ArrayType), this should be an integer.
        If the column is a dict (MapType), this should be a string.

        The column with the item
    return column.getItem(key)