Autoloader
Read from a location using Databricks' autoloader
Autoloader can ingest JSON, CSV, PARQUET, AVRO, ORC, TEXT, and BINARYFILE file formats.
koheesio.spark.readers.databricks.autoloader.AutoLoader #
Read from a location using Databricks' autoloader
Autoloader can ingest JSON, CSV, PARQUET, AVRO, ORC, TEXT, and BINARYFILE file formats.
Notes
autoloader
is a Spark Structured Streaming
function!
Although most transformations are compatible with Spark Structured Streaming
, not all of them are. As a result,
be mindful with your downstream transformations.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
format |
Union[str, AutoLoaderFormat]
|
The file format, used in |
required |
location |
str
|
The location where the files are located, used in |
required |
schema_location |
str
|
The location for storing inferred schema and supporting schema evolution, used in |
required |
options |
Optional[Dict[str, str]]
|
Extra inputs to provide to the autoloader. For a full list of inputs, see https://docs.databricks.com/ingestion/auto-loader/options.html |
{}
|
Example
See Also
Some other useful documentation:
- autoloader: https://docs.databricks.com/ingestion/auto-loader/index.html
- Spark Structured Streaming: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
format
class-attribute
instance-attribute
#
format: Union[str, AutoLoaderFormat] = Field(
default=..., description=__doc__
)
location
class-attribute
instance-attribute
#
location: str = Field(
default=...,
description="The location where the files are located, used in `cloudFiles.location`",
)
options
class-attribute
instance-attribute
#
options: Optional[Dict[str, Any]] = Field(
default_factory=dict,
description="Extra inputs to provide to the autoloader. For a full list of inputs, see https://docs.databricks.com/ingestion/auto-loader/options.html",
)
schema_
class-attribute
instance-attribute
#
schema_: Optional[
Union[
str,
StructType,
List[str],
Tuple[str, ...],
AtomicType,
]
] = Field(
default=None,
description="Explicit schema to apply to the input files.",
alias="schema",
)
schema_location
class-attribute
instance-attribute
#
schema_location: str = Field(
default=...,
alias="schemaLocation",
description="The location for storing inferred schema and supporting schema evolution, used in `cloudFiles.schemaLocation`.",
)
execute #
get_options #
Get the options for the autoloader
reader #
Source code in src/koheesio/spark/readers/databricks/autoloader.py
validate_format #
Validate format
value
Source code in src/koheesio/spark/readers/databricks/autoloader.py
koheesio.spark.readers.databricks.autoloader.AutoLoaderFormat #
The file format, used in cloudFiles.format
Autoloader supports JSON, CSV, PARQUET, AVRO, ORC, TEXT, and BINARYFILE file formats.