Skip to content

Trim

Trim whitespace from the beginning and/or end of a string.

Classes:

Name Description
- `Trim`

Trim whitespace from the beginning and/or end of a string.

- `LTrim`

Trim whitespace from the beginning of a string.

- `RTrim`

Trim whitespace from the end of a string.

See class docstrings for more information.

koheesio.spark.transformations.strings.trim.trim_type module-attribute #

trim_type = Literal['left', 'right', 'left-right']

koheesio.spark.transformations.strings.trim.LTrim #

Trim whitespace from the beginning of a string. Alias: LeftTrim

direction class-attribute instance-attribute #

direction: trim_type = 'left'

koheesio.spark.transformations.strings.trim.RTrim #

Trim whitespace from the end of a string. Alias: RightTrim

direction class-attribute instance-attribute #

direction: trim_type = 'right'

koheesio.spark.transformations.strings.trim.Trim #

Trim whitespace from the beginning and/or end of a string.

This is a wrapper around PySpark ltrim() and rtrim() functions

The direction parameter can be changed to apply either a left or a right trim. Defaults to left AND right trim.

Note: If the type of the column is not string, Trim will not be run. A Warning will be thrown indicating this

Parameters:

Name Type Description Default
columns ListOfColumns

The column (or list of columns) to trim. Alias: column If no columns are provided, all string columns will be trimmed.

required
target_column ListOfColumns

The column to store the result in. If not provided, the result will be stored in the source column. Alias: target_suffix - if multiple columns are given as source, this will be used as a suffix.

required
direction trim_type

On which side to remove the spaces. Either "left", "right" or "left-right". Defaults to "left-right"

"left-right"

Examples:

input_df: | column | |-----------| | " hello " |

Trim whitespace from the beginning of a string#
output_df = Trim(
    column="column", target_column="trimmed_column", direction="left"
).transform(input_df)

output_df: | column | trimmed_column | |-----------|----------------| | " hello " | "hello " |

Trim whitespace from both sides of a string#
output_df = Trim(
    column="column",
    target_column="trimmed_column",
    direction="left-right",  # default value
).transform(input_df)

output_df: | column | trimmed_column | |-----------|----------------| | " hello " | "hello" |

Trim whitespace from the end of a string#
output_df = Trim(
    column="column", target_column="trimmed_column", direction="right"
).transform(input_df)

output_df: | column | trimmed_column | |-----------|----------------| | " hello " | " hello" |

columns class-attribute instance-attribute #

columns: ListOfColumns = Field(
    default="*",
    alias="column",
    description="The column (or list of columns) to trim. Alias: column. If no columns are provided, all stringcolumns will be trimmed.",
)

direction class-attribute instance-attribute #

direction: trim_type = Field(
    default="left-right",
    description="On which side to remove the spaces. Either 'left', 'right' or 'left-right'",
)

ColumnConfig #

Limit data types to string only.

limit_data_type class-attribute instance-attribute #

limit_data_type = [STRING]

run_for_all_data_type class-attribute instance-attribute #

run_for_all_data_type = [STRING]

func #

func(column: Column)
Source code in src/koheesio/spark/transformations/strings/trim.py
def func(self, column: Column):
    if self.direction == "left":
        return f.ltrim(column)

    if self.direction == "right":
        return f.rtrim(column)

    # both (left-right)
    return f.rtrim(f.ltrim(column))