Writers
The Writer class is used to write the DataFrame to a target.
koheesio.spark.writers.BatchOutputMode #
For Batch:
- append: Append the contents of the DataFrame to the output table, default option in Koheesio.
- overwrite: overwrite the existing data.
- ignore: ignore the operation (i.e. no-op).
- error or errorifexists: throw an exception at runtime.
- merge: update matching data in the table and insert rows that do not exist.
- merge_all: update matching data in the table and insert rows that do not exist.
koheesio.spark.writers.StreamingOutputMode #
For Streaming:
- append: only the new rows in the streaming DataFrame will be written to the sink.
- complete: all the rows in the streaming DataFrame/Dataset will be written to the sink every time there are some updates.
- update: only the rows that were updated in the streaming DataFrame/Dataset will be written to the sink every time there are some updates. If the query doesn't contain aggregations, it will be equivalent to append mode.
koheesio.spark.writers.Writer #
The Writer class is used to write the DataFrame to a target.
df
class-attribute
instance-attribute
#
df: Optional[DataFrame] = Field(
default=None,
description="The Spark DataFrame",
exclude=True,
)
format
class-attribute
instance-attribute
#
format: str = Field(
default="delta", description="The format of the output"
)
execute
abstractmethod
#
execute() -> Output
Execute on a Writer should handle writing of the self.df (input) as a minimum
write #
Write the DataFrame to the output using execute() and return the output.
If no DataFrame is passed, the self.df will be used. If no self.df is set, a RuntimeError will be thrown.