Skip to content

Spark

Spark step module

koheesio.spark.AnalysisException module-attribute #

AnalysisException = AnalysisException

koheesio.spark.DataFrame module-attribute #

DataFrame = DataFrame

koheesio.spark.SparkSession module-attribute #

SparkSession = SparkSession

koheesio.spark.SparkStep #

Base class for a Spark step

Extends the Step class with SparkSession support. The following: - Spark steps are expected to return a Spark DataFrame as output. - spark property is available to access the active SparkSession instance.

spark property #

Get active SparkSession instance

Output #

Output class for SparkStep

df class-attribute instance-attribute #

df: Optional[DataFrame] = Field(
    default=None, description="The Spark DataFrame"
)

koheesio.spark.current_timestamp_utc #

current_timestamp_utc(spark: SparkSession) -> Column

Get the current timestamp in UTC

Source code in src/koheesio/spark/__init__.py
def current_timestamp_utc(spark: SparkSession) -> Column:
    """Get the current timestamp in UTC"""
    return F.to_utc_timestamp(F.current_timestamp(), spark.conf.get("spark.sql.session.timeZone"))