Skip to content

Stream

This module defines the DeltaTableStreamWriter class, which is used to write streaming dataframes to Delta tables.

koheesio.spark.writers.delta.stream.DeltaTableStreamWriter #

Delta table stream writer

Options #

Options for DeltaTableStreamWriter

allow_population_by_field_name class-attribute instance-attribute #

allow_population_by_field_name: bool = Field(
    default=True,
    description=" To do convert to Field and pass as .options(**config)",
)

maxBytesPerTrigger class-attribute instance-attribute #

maxBytesPerTrigger: Optional[str] = Field(
    default=None,
    description="How much data to be processed per trigger. The default is 1GB",
)

maxFilesPerTrigger class-attribute instance-attribute #

maxFilesPerTrigger: int = Field(
    default == 1000,
    description="The maximum number of new files to be considered in every trigger (default: 1000).",
)

execute #

execute()
Source code in src/koheesio/spark/writers/delta/stream.py
def execute(self):
    if self.batch_function:
        self.streaming_query = self.writer.start()
    else:
        self.streaming_query = self.writer.toTable(tableName=self.table.table_name)