Drop column
This module defines the DropColumn class, a subclass of ColumnsTransformation.
koheesio.spark.transformations.drop_column.DropColumn #
Drop one or more columns
The DropColumn class is used to drop one or more columns from a PySpark DataFrame.
It wraps the pyspark.DataFrame.drop
function and can handle either a single string
or a list of strings as input.
If a specified column does not exist in the DataFrame, no error or warning is thrown, and all existing columns will remain.
Expected behavior
- When the
column
does not exist, all columns will remain (no error or warning is thrown) - Either a single string, or a list of strings can be specified
Example
df:
product | amount | country |
---|---|---|
Banana lemon orange | 1000 | USA |
Carrots Blueberries | 1500 | USA |
Beans | 1600 | USA |
output_df:
amount | country |
---|---|
1000 | USA |
1500 | USA |
1600 | USA |
In this example, the product
column is dropped from the DataFrame df
.