Autoloader
Read from a location using Databricks' autoloader
Autoloader can ingest JSON, CSV, PARQUET, AVRO, ORC, TEXT, and BINARYFILE file formats.
koheesio.spark.readers.databricks.autoloader.AutoLoader #
Read from a location using Databricks' autoloader
Autoloader can ingest JSON, CSV, PARQUET, AVRO, ORC, TEXT, and BINARYFILE file formats.
Notes
autoloader
is a Spark Structured Streaming
function!
Although most transformations are compatible with Spark Structured Streaming
, not all of them are. As a result,
be mindful with your downstream transformations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
format |
Union[str, AutoLoaderFormat]
|
The file format, used in |
required |
location |
str
|
The location where the files are located, used in |
required |
schema_location |
str
|
The location for storing inferred schema and supporting schema evolution, used in |
required |
options |
Optional[Dict[str, str]]
|
Extra inputs to provide to the autoloader. For a full list of inputs, see https://docs.databricks.com/ingestion/auto-loader/options.html |
{}
|
Example
See Also
Some other useful documentation:
- autoloader: https://docs.databricks.com/ingestion/auto-loader/index.html
- Spark Structured Streaming: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
format
class-attribute
instance-attribute
#
format: Union[str, AutoLoaderFormat] = Field(default=..., description=__doc__)
location
class-attribute
instance-attribute
#
location: str = Field(default=..., description='The location where the files are located, used in `cloudFiles.location`')
options
class-attribute
instance-attribute
#
options: Optional[Dict[str, str]] = Field(default_factory=dict, description='Extra inputs to provide to the autoloader. For a full list of inputs, see https://docs.databricks.com/ingestion/auto-loader/options.html')
schema_location
class-attribute
instance-attribute
#
schema_location: str = Field(default=..., alias='schemaLocation', description='The location for storing inferred schema and supporting schema evolution, used in `cloudFiles.schemaLocation`.')
execute #
get_options #
Get the options for the autoloader
reader #
validate_format #
Validate format
value
Source code in src/koheesio/spark/readers/databricks/autoloader.py
koheesio.spark.readers.databricks.autoloader.AutoLoaderFormat #
The file format, used in cloudFiles.format
Autoloader supports JSON, CSV, PARQUET, AVRO, ORC, TEXT, and BINARYFILE file formats.