File writer
File writers for different formats: - CSV - Parquet - Avro - JSON - ORC - Text
The FileWriter class is a configurable Writer that allows writing to different file formats providing any option needed.
CsvFileWriter, ParquetFileWriter, AvroFileWriter, JsonFileWriter, OrcFileWriter, and TextFileWriter are convenience
classes that just set the format
field to the corresponding file format.
koheesio.spark.writers.file_writer.AvroFileWriter #
Writes a DataFrame to an Avro file.
This class is a convenience class that sets the format
field to FileFormat.avro
.
Extra parameters can be passed to the writer as keyword arguments.
Examples:
koheesio.spark.writers.file_writer.CsvFileWriter #
koheesio.spark.writers.file_writer.FileFormat #
Supported file formats for the FileWriter class
koheesio.spark.writers.file_writer.FileWriter #
A configurable Writer that allows writing to different file formats providing any option needed.
Extra parameters can be passed to the writer as keyword arguments.
Examples:
writer = FileWriter(
df=df,
path="path/to/file.csv",
output_mode=BatchOutputMode.APPEND,
format=FileFormat.parquet,
compression="snappy",
)
format
class-attribute
instance-attribute
#
format: FileFormat = Field(
...,
description="The file format to use when writing the data.",
)
output_mode
class-attribute
instance-attribute
#
output_mode: BatchOutputMode = Field(
default=APPEND, description="The output mode to use"
)
path
class-attribute
instance-attribute
#
ensure_path_is_str #
Ensure that the path is a string as required by Spark.
execute #
execute() -> Output
Source code in src/koheesio/spark/writers/file_writer.py
koheesio.spark.writers.file_writer.JsonFileWriter #
Writes a DataFrame to a JSON file.
This class is a convenience class that sets the format
field to FileFormat.json
.
Extra parameters can be passed to the writer as keyword arguments.
Examples:
koheesio.spark.writers.file_writer.OrcFileWriter #
Writes a DataFrame to an ORC file.
This class is a convenience class that sets the format
field to FileFormat.orc
.
Extra parameters can be passed to the writer as keyword arguments.
Examples:
koheesio.spark.writers.file_writer.ParquetFileWriter #
Writes a DataFrame to a Parquet file.
This class is a convenience class that sets the format
field to FileFormat.parquet
.
Extra parameters can be passed to the writer as keyword arguments.
Examples:
writer = ParquetFileWriter(
df=df,
path="path/to/file.parquet",
output_mode=BatchOutputMode.APPEND,
compression="snappy",
)
koheesio.spark.writers.file_writer.TextFileWriter #
Writes a DataFrame to a text file.
This class is a convenience class that sets the format
field to FileFormat.text
.
Extra parameters can be passed to the writer as keyword arguments.
Examples: