Repartition
Repartition Transformation
koheesio.spark.transformations.repartition.Repartition #
Wrapper around DataFrame.repartition
With repartition, the number of partitions can be given as an optional value. If this is not provided, a default value is used. The default number of partitions is defined by the spark config 'spark.sql.shuffle.partitions', for which the default value is 200 and will never exceed the number of rows in the DataFrame (whichever is value is lower).
If columns are omitted, the entire DataFrame is repartitioned without considering the particular values in the columns.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns
|
Optional[Union[str, List[str]]]
|
Name of the source column(s). If omitted, the entire DataFrame is repartitioned without considering the particular values in the columns. Alias: column |
None
|
num_partitions
|
Optional[int]
|
The number of partitions to repartition to. If omitted, the default number of partitions is used as defined by the spark config 'spark.sql.shuffle.partitions'. |
None
|
Example
columns
class-attribute
instance-attribute
#
columns: Optional[ListOfColumns] = Field(
default="",
alias="column",
description="Name of the source column(s)",
)
num_partitions
class-attribute
instance-attribute
#
num_partitions: Optional[int] = Field(
default=None,
alias="numPartitions",
description="The number of partitions to repartition to. If omitted, the default number of partitions is used as defined by the spark config 'spark.sql.shuffle.partitions'.",
)
execute #
execute() -> Output