Drop column
This module defines the DropColumn class, a subclass of ColumnsTransformation.
koheesio.spark.transformations.drop_column.DropColumn #
Drop one or more columns
The DropColumn class is used to drop one or more columns from a PySpark DataFrame.
It wraps the pyspark.DataFrame.drop function and can handle either a single string
or a list of strings as input.
If a specified column does not exist in the DataFrame, no error or warning is thrown, and all existing columns will remain.
Expected behavior
- When the
columndoes not exist, all columns will remain (no error or warning is thrown) - Either a single string, or a list of strings can be specified
Example
df:
| product | amount | country |
|---|---|---|
| Banana lemon orange | 1000 | USA |
| Carrots Blueberries | 1500 | USA |
| Beans | 1600 | USA |
output_df:
| amount | country |
|---|---|
| 1000 | USA |
| 1500 | USA |
| 1600 | USA |
In this example, the product column is dropped from the DataFrame df.