Skip to content


This module provides a DateTimeColumn class that extends the Column class from PySpark. It allows for adding or subtracting an interval value from a datetime column.

This can be used to reflect a change in a given date / time column in a more human-readable way.

Please refer to the Spark SQL documentation for a list of valid interval values:


The aim is to easily add or subtract an 'interval' value to a datetime column. An interval value is a string that represents a time interval. For example, '1 day', '1 month', '5 years', '1 minute 30 seconds', '10 milliseconds', etc. These can be used to reflect a change in a given date / time column in a more human-readable way.

Typically, this can be done using the date_add() and date_sub() functions in Spark SQL. However, these functions only support adding or subtracting a single unit of time measured in days. Using an interval gives us much more flexibility; however, Spark SQL does not provide a function to add or subtract an interval value from a datetime column through the python API directly, so we have to use the expr() function to do this to be able to directly use SQL.

This module provides a DateTimeColumn class that extends the Column class from PySpark. It allows for adding or subtracting an interval value from a datetime column using the + and - operators.

Additionally, this module provides two transformation classes that can be used as a transformation step in a pipeline:

  • DateTimeAddInterval: adds an interval value to a datetime column
  • DateTimeSubtractInterval: subtracts an interval value from a datetime column

These classes are subclasses of ColumnsTransformationWithTarget and hence can be used to perform transformations on multiple columns at once.

The above transformations both use the provided asjust_time() function to perform the actual transformation.

See also:

Related Koheesio classes:

From the koheesio.spark.transformations module:



Name Description

A datetime column that can be adjusted by adding or subtracting an interval value using the + and - operators.


A transformation that adds an interval value to a datetime column. This class is a subclass of ColumnsTransformationWithTarget and hence can be used as a transformation step in a pipeline. See ColumnsTransformationWithTarget for more information.


A transformation that subtracts an interval value from a datetime column. This class is a subclass of ColumnsTransformationWithTarget and hence can be used as a transformation step in a pipeline. See ColumnsTransformationWithTarget for more information.


the DateTimeAddInterval and DateTimeSubtractInterval classes are very similar. The only difference is that one adds an interval value to a datetime column, while the other subtracts an interval value from a datetime column.


Name Description

Converts a column to a DateTimeColumn. This function aims to be a drop-in replacement for pyspark.sql.functions.col that returns a DateTimeColumn instead of a Column.


Adjusts a datetime column by adding or subtracting an interval value.


Validates a given interval string.

Various ways to create and interact with DateTimeColumn:#
  • Create a DateTimeColumn from a string: dt_column("my_column")
  • Create a DateTimeColumn from a Column: dt_column(df.my_column)
  • Use the + and - operators to add or subtract an interval value from a DateTimeColumn:
    • dt_column("my_column") + "1 day"
    • dt_column("my_column") - "1 month"
Functional examples using adjust_time():#
  • Add 1 day to a column: adjust_time("my_column", operation="add", interval="1 day")
  • Subtract 1 month from a column: adjust_time("my_column", operation="subtract", interval="1 month")
As a transformation step:#

from koheesio.spark.transformations.date_time.interval import (

input_df = spark.createDataFrame(
    [(1, "2022-01-01 00:00:00")], ["id", "my_column"]

# add 1 day to my_column and store the result in a new column called 'one_day_later'
output_df = DateTimeAddInterval(
    column="my_column", target_column="one_day_later", interval="1 day"

id my_column one_day_later
1 2022-01-01 00:00:00 2022-01-02 00:00:00

DateTimeSubtractInterval works in a similar way, but subtracts an interval value from a datetime column.

koheesio.spark.transformations.date_time.interval.Operations module-attribute #

Operations = Literal['add', 'subtract']

koheesio.spark.transformations.date_time.interval.DateTimeAddInterval #

A transformation that adds or subtracts a specified interval from a datetime column.

See also:



Name Type Description Default
interval str

The interval to add to the datetime column.

operation Operations

The operation to perform. Must be either 'add' or 'subtract'.

add 1 day to a column#
    interval="1 day",
subtract 1 month from my_column and store the result in a new column called one_month_earlier#
    interval="1 month",

interval class-attribute instance-attribute #

interval: str = Field(
    description="The interval to add to the datetime column.",
    examples=["1 day", "5 years", "3 months"],

operation class-attribute instance-attribute #

operation: Operations = Field(
    description="The operation to perform. Must be either 'add' or 'subtract'.",

validate_interval class-attribute instance-attribute #

validate_interval = field_validator("interval")(

func #

func(column: Column)
Source code in src/koheesio/spark/transformations/date_time/
def func(self, column: Column):
    return adjust_time(column, operation=self.operation, interval=self.interval)

koheesio.spark.transformations.date_time.interval.DateTimeColumn #

A datetime column that can be adjusted by adding or subtracting an interval value using the + and - operators.

from_column classmethod #

from_column(column: Column)

Create a DateTimeColumn from an existing Column

Source code in src/koheesio/spark/transformations/date_time/
def from_column(cls, column: Column):
    """Create a DateTimeColumn from an existing Column"""
    return cls(column._jc)

koheesio.spark.transformations.date_time.interval.DateTimeSubtractInterval #

Subtracts a specified interval from a datetime column.

Works in the same way as DateTimeAddInterval, but subtracts the specified interval from the datetime column. See DateTimeAddInterval for more information.

operation class-attribute instance-attribute #

operation: Operations = Field(
    description="The operation to perform. Must be either 'add' or 'subtract'.",

koheesio.spark.transformations.date_time.interval.adjust_time #

    column: Column, operation: Operations, interval: str
) -> Column

Adjusts a datetime column by adding or subtracting an interval value.

This can be used to reflect a change in a given date / time column in a more human-readable way.

See also

Please refer to the Spark SQL documentation for a list of valid interval values:

add 1 day to a column#
adjust_time("my_column", operation="add", interval="1 day")
subtract 1 month from a column#
adjust_time("my_column", operation="subtract", interval="1 month")
or, a much more complicated example#

In this example, we add 5 days, 3 hours, 7 minutes, 30 seconds, and 1 millisecond to a column called my_column.

    interval="5 days 3 hours 7 minutes 30 seconds 1 millisecond",


Name Type Description Default
column Column

The datetime column to adjust.

operation Operations

The operation to perform. Must be either 'add' or 'subtract'.

interval str

The value to add or subtract. Must be a valid interval string.



Type Description

The adjusted datetime column.

Source code in src/koheesio/spark/transformations/date_time/
def adjust_time(column: Column, operation: Operations, interval: str) -> Column:
    Adjusts a datetime column by adding or subtracting an interval value.

    This can be used to reflect a change in a given date / time column in a more human-readable way.

    See also
    Please refer to the Spark SQL documentation for a list of valid interval values:

    ### pyspark.sql.functions:


    ### add 1 day to a column
    adjust_time("my_column", operation="add", interval="1 day")

    ### subtract 1 month from a column
    adjust_time("my_column", operation="subtract", interval="1 month")

    ### or, a much more complicated example

    In this example, we add 5 days, 3 hours, 7 minutes, 30 seconds, and 1 millisecond to a column called `my_column`.
        interval="5 days 3 hours 7 minutes 30 seconds 1 millisecond",

    column : Column
        The datetime column to adjust.
    operation : Operations
        The operation to perform. Must be either 'add' or 'subtract'.
    interval : str
        The value to add or subtract. Must be a valid interval string.

        The adjusted datetime column.

    # check that value is a valid interval
    interval = validate_interval(interval)

    column_name = column._jc.toString()

    # determine the operation to perform
        operation = {
            "add": "try_add",
            "subtract": "try_subtract",
    except KeyError as e:
        raise ValueError(f"Operation '{operation}' is not valid. Must be either 'add' or 'subtract'.") from e

    # perform the operation
    _expression = f"{operation}({column_name}, interval '{interval}')"
    column = expr(_expression)

    return column

koheesio.spark.transformations.date_time.interval.dt_column #

dt_column(column: Union[str, Column]) -> DateTimeColumn

Convert a column to a DateTimeColumn

Aims to be a drop-in replacement for pyspark.sql.functions.col that returns a DateTimeColumn instead of a Column.

create a DateTimeColumn from a string#
create a DateTimeColumn from a Column#


Name Type Description Default
column Union[str, Column]

The column (or name of the column) to convert to a DateTimeColumn

Source code in src/koheesio/spark/transformations/date_time/
def dt_column(column: Union[str, Column]) -> DateTimeColumn:
    """Convert a column to a DateTimeColumn

    Aims to be a drop-in replacement for `pyspark.sql.functions.col` that returns a DateTimeColumn instead of a Column.

    ### create a DateTimeColumn from a string

    ### create a DateTimeColumn from a Column

    column : Union[str, Column]
        The column (or name of the column) to convert to a DateTimeColumn
    if isinstance(column, str):
        column = col(column)
    elif not isinstance(column, Column):
        raise TypeError(f"Expected column to be of type str or Column, got {type(column)} instead.")
    return DateTimeColumn.from_column(column)

koheesio.spark.transformations.date_time.interval.validate_interval #

validate_interval(interval: str)

Validate an interval string


Name Type Description Default
interval str

The interval string to validate



Type Description

If the interval string is invalid

Source code in src/koheesio/spark/transformations/date_time/
def validate_interval(interval: str):
    """Validate an interval string

    interval : str
        The interval string to validate

        If the interval string is invalid
        expr(f"interval '{interval}'")
    except ParseException as e:
        raise ValueError(f"Value '{interval}' is not a valid interval.") from e
    return interval