Skip to content

Spark

Spark step module

koheesio.spark.AnalysisException module-attribute #

AnalysisException = AnalysisException

koheesio.spark.SparkStep #

Base class for a Spark step

Extends the Step class with SparkSession support. The following: - Spark steps are expected to return a Spark DataFrame as output. - spark property is available to access the active SparkSession instance. - The SparkSession instance can be provided as an argument to the constructor through the spark parameter.

spark class-attribute instance-attribute #

spark: Optional[SparkSession] = Field(
    default=None,
    description="The SparkSession instance. If not provided, the active SparkSession will be used.",
    validate_default=False,
)

Output #

Output class for SparkStep

df class-attribute instance-attribute #

df: Optional[DataFrame] = Field(
    default=None, description="The Spark DataFrame"
)

koheesio.spark.current_timestamp_utc #

current_timestamp_utc(spark)
Source code in src/koheesio/spark/__init__.py
def current_timestamp_utc(spark):
    warnings.warn(
        message=(
            "The current_timestamp_utc function has been moved to the koheesio.spark.functions module."
            "Import it from there instead. Current import path will be deprecated in the future."
        ),
        category=DeprecationWarning,
        stacklevel=2,
    )
    from koheesio.spark.functions import current_timestamp_utc as _current_timestamp_utc

    return _current_timestamp_utc(spark)