Substring
Extracts a substring from a string column starting at the given position.
koheesio.spark.transformations.strings.substring.Substring #
Extracts a substring from a string column starting at the given position.
This is a wrapper around PySpark substring() function
Notes
- Numeric columns will be cast to string
- start is 1-indexed, not 0-indexed!
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns |
Union[str, List[str]]
|
The column (or list of columns) to substring. Alias: column |
required |
target_column |
Optional[str]
|
The column to store the result in. If not provided, the result will be stored in the source column. Alias: target_suffix - if multiple columns are given as source, this will be used as a suffix. |
None
|
start |
PositiveInt
|
Positive int. Defines where to begin the substring from. The first character of the field has index 1! |
required |
length |
Optional[int]
|
Optional. If not provided, the substring will go until end of string. |
-1
|
Example
Extract a substring from a string column starting at the given position.#
input_df:
column |
---|
skyscraper |
output_df = Substring(
column="column",
target_column="substring_column",
start=3, # 1-indexed! So this will start at the 3rd character
length=4,
).transform(input_df)
output_df:
column | substring_column |
---|---|
skyscraper | yscr |