Replace
String replacements without using regular expressions.
koheesio.spark.transformations.strings.replace.Replace #
Replace all instances of a string in a column with another string.
This transformation uses PySpark when().otherwise() functions.
Notes
- If original_value is not set, the transformation will replace all null values with new_value
- If original_value is set, the transformation will replace all values matching original_value with new_value
- Numeric values are supported, but will be cast to string in the process
- Replace is meant for simple string replacements. If more advanced replacements are needed, use the
RegexpReplace
transformation instead.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Union[str, List[str]]
|
The column (or list of columns) to replace values in. Alias: column |
required |
target_column
|
Optional[str]
|
The column to store the result in. If not provided, the result will be stored in the source column. Alias: target_suffix - if multiple columns are given as source, this will be used as a suffix. |
None
|
original_value
|
Optional[str]
|
The original value that needs to be replaced. Alias: from |
None
|
new_value
|
str
|
The new value to replace this with. Alias: to |
required |
Examples:
input_df:
column |
---|
hello |
world |
None |
Replace all null values with a new value#
output_df = Replace(
column="column",
target_column="replaced_column",
original_value=None, # This is the default value, so it can be omitted
new_value="programmer",
).transform(input_df)
output_df:
column | replaced_column |
---|---|
hello | hello |
world | world |
None | programmer |
Replace all instances of a string in a column with another string#
output_df = Replace(
column="column",
target_column="replaced_column",
original_value="world",
new_value="programmer",
).transform(input_df)
output_df:
column | replaced_column |
---|---|
hello | hello |
world | programmer |
None | None |
new_value
class-attribute
instance-attribute
#
new_value: str = Field(
default=...,
alias="to",
description="The new value to replace this with",
)
original_value
class-attribute
instance-attribute
#
original_value: Optional[str] = Field(
default=None,
alias="from",
description="The original value that needs to be replaced",
)
cast_values_to_str #
Cast values to string if they are not None