Skip to content

Change case

Convert the case of a string column to upper case, lower case, or title case

Classes:

Name Description
`Lower`

Converts a string column to lower case.

`Upper`

Converts a string column to upper case.

`TitleCase` or `InitCap`

Converts a string column to title case, where each word starts with a capital letter.

koheesio.spark.transformations.strings.change_case.InitCap module-attribute #

InitCap = TitleCase

koheesio.spark.transformations.strings.change_case.LowerCase #

This function makes the contents of a column lower case.

Wraps the pyspark.sql.functions.lower function.

Warnings

If the type of the column is not string, LowerCase will not be run. A Warning will be thrown indicating this.

Parameters:

Name Type Description Default
columns Union[str, List[str]]

The name of the column or columns to convert to lower case. Alias: column. Lower case will be applied to all columns in the list. Column is required to be of string type.

required
target_column str

The name of the column to store the result in. If None, the result will be stored in the same column as the input.

required
Example

input_df:

product amount country
Banana lemon orange 1000 USA
Carrots Blueberries 1500 USA
Beans 1600 USA
output_df = LowerCase(
    column="product", target_column="product_lower"
).transform(df)

output_df:

product amount country product_lower
Banana lemon orange 1000 USA banana lemon orange
Carrots Blueberries 1500 USA carrots blueberries
Beans 1600 USA beans

In this example, the column product is converted to product_lower and the contents of this column are converted to lower case.

ColumnConfig #

Limit data type to string

limit_data_type class-attribute instance-attribute #

limit_data_type = [STRING]

run_for_all_data_type class-attribute instance-attribute #

run_for_all_data_type = [STRING]

func #

func(column: Column)
Source code in src/koheesio/spark/transformations/strings/change_case.py
def func(self, column: Column):
    return lower(column)

koheesio.spark.transformations.strings.change_case.TitleCase #

This function makes the contents of a column title case. This means that every word starts with an upper case.

Wraps the pyspark.sql.functions.initcap function.

Warnings

If the type of the column is not string, TitleCase will not be run. A Warning will be thrown indicating this.

Parameters:

Name Type Description Default
columns ListOfColumns

The name of the column or columns to convert to title case. Alias: column. Title case will be applied to all columns in the list. Column is required to be of string type.

required
target_column str

The name of the column to store the result in. If None, the result will be stored in the same column as the input.

required
Example

input_df:

product amount country
Banana lemon orange 1000 USA
Carrots blueberries 1500 USA
Beans 1600 USA
output_df = TitleCase(
    column="product", target_column="product_title"
).transform(df)

output_df:

product amount country product_title
Banana lemon orange 1000 USA Banana Lemon Orange
Carrots blueberries 1500 USA Carrots Blueberries
Beans 1600 USA Beans

In this example, the column product is converted to product_title and the contents of this column are converted to title case (each word now starts with an upper case).

func #

func(column: Column)
Source code in src/koheesio/spark/transformations/strings/change_case.py
def func(self, column: Column):
    return initcap(column)

koheesio.spark.transformations.strings.change_case.UpperCase #

This function makes the contents of a column upper case.

Wraps the pyspark.sql.functions.upper function.

Warnings

If the type of the column is not string, UpperCase will not be run. A Warning will be thrown indicating this.

Parameters:

Name Type Description Default
columns Union[str, List[str]]

The name of the column or columns to convert to upper case. Alias: column. Upper case will be applied to all columns in the list. Column is required to be of string type.

required
target_column str

The name of the column to store the result in. If None, the result will be stored in the same column as the input.

required

Examples:

input_df:

product amount country
Banana lemon orange 1000 USA
Carrots Blueberries 1500 USA
Beans 1600 USA
output_df = UpperCase(
    column="product", target_column="product_upper"
).transform(df)

output_df:

product amount country product_upper
Banana lemon orange 1000 USA BANANA LEMON ORANGE
Carrots Blueberries 1500 USA CARROTS BLUEBERRIES
Beans 1600 USA BEANS

In this example, the column product is converted to product_upper and the contents of this column are converted to upper case.

func #

func(column: Column)
Source code in src/koheesio/spark/transformations/strings/change_case.py
def func(self, column: Column):
    return upper(column)