Skip to content

Replace

String replacements without using regular expressions.

koheesio.spark.transformations.strings.replace.Replace #

Replace all instances of a string in a column with another string.

This transformation uses PySpark when().otherwise() functions.

Notes
  • If original_value is not set, the transformation will replace all null values with new_value
  • If original_value is set, the transformation will replace all values matching original_value with new_value
  • Numeric values are supported, but will be cast to string in the process
  • Replace is meant for simple string replacements. If more advanced replacements are needed, use the RegexpReplace transformation instead.

Parameters:

Name Type Description Default
columns Union[str, List[str]]

The column (or list of columns) to replace values in. Alias: column

required
target_column Optional[str]

The column to store the result in. If not provided, the result will be stored in the source column. Alias: target_suffix - if multiple columns are given as source, this will be used as a suffix.

None
original_value Optional[str]

The original value that needs to be replaced. Alias: from

None
new_value str

The new value to replace this with. Alias: to

required

Examples:

input_df:

column
hello
world
None
Replace all null values with a new value#
output_df = Replace(
    column="column",
    target_column="replaced_column",
    original_value=None,  # This is the default value, so it can be omitted
    new_value="programmer",
).transform(input_df)

output_df:

column replaced_column
hello hello
world world
None programmer
Replace all instances of a string in a column with another string#
output_df = Replace(
    column="column",
    target_column="replaced_column",
    original_value="world",
    new_value="programmer",
).transform(input_df)

output_df:

column replaced_column
hello hello
world programmer
None None

new_value class-attribute instance-attribute #

new_value: str = Field(
    default=...,
    alias="to",
    description="The new value to replace this with",
)

original_value class-attribute instance-attribute #

original_value: Optional[str] = Field(
    default=None,
    alias="from",
    description="The original value that needs to be replaced",
)

cast_values_to_str #

cast_values_to_str(value)

Cast values to string if they are not None

Source code in src/koheesio/spark/transformations/strings/replace.py
@field_validator("original_value", "new_value", mode="before")
def cast_values_to_str(cls, value):
    """Cast values to string if they are not None"""
    if value:
        return str(value)

func #

func(column: Column)
Source code in src/koheesio/spark/transformations/strings/replace.py
def func(self, column: Column):
    when_statement = (
        when(column.isNull(), lit(self.new_value))
        if not self.original_value
        else when(
            column == self.original_value,
            lit(self.new_value),
        )
    )
    return when_statement.otherwise(column)