Hyper
koheesio.integrations.spark.tableau.hyper.HyperFile #
Base class for all HyperFile classes
schema_
class-attribute
instance-attribute
#
schema_: str = Field(
default="Extract",
alias="schema",
description="Internal schema name within the Hyper file",
)
table
class-attribute
instance-attribute
#
table: str = Field(
default="Extract",
description="Table name within the Hyper file",
)
table_name
property
#
Return TableName object for the Hyper file TableDefinition.
koheesio.integrations.spark.tableau.hyper.HyperFileDataFrameWriter #
Write a Spark DataFrame to a Hyper file. The process will write the DataFrame to a parquet file and then use the HyperFileParquetWriter to write to the Hyper file.
Examples:
hw = HyperFileDataFrameWriter(
df=spark.createDataFrame([(1, "foo"), (2, "bar")], ["id", "name"]),
name="test",
).execute()
# or in Databricks
hw = HyperFileDataFrameWriter(
df=spark.createDataFrame([(1, "foo"), (2, "bar")], ["id", "name"]),
name="test",
path="dbfs:/tmp/hyper/",
).execute()
# do somthing with returned file path
hw.hyper_path
df
class-attribute
instance-attribute
#
table_definition
class-attribute
instance-attribute
#
table_definition: Optional[TableDefinition] = None
clean_dataframe #
- Replace NULLs for string and numeric columns
- Convert data types to ensure compatibility with Tableau Hyper API
Source code in src/koheesio/integrations/spark/tableau/hyper.py
table_definition_column
staticmethod
#
Convert a Spark StructField to a Tableau Hyper SqlType
Source code in src/koheesio/integrations/spark/tableau/hyper.py
write_parquet #
Source code in src/koheesio/integrations/spark/tableau/hyper.py
koheesio.integrations.spark.tableau.hyper.HyperFileListWriter #
Write list of rows to a Hyper file.
Reference
Datatypes in https://tableau.github.io/hyper-db/docs/sql/datatype/ for supported data types.
Examples:
hw = HyperFileListWriter(
name="test",
table_definition=TableDefinition(
table_name=TableName("Extract", "Extract"),
columns=[
TableDefinition.Column(name="string", type=SqlType.text(), nullability=NOT_NULLABLE),
TableDefinition.Column(name="int", type=SqlType.int(), nullability=NULLABLE),
TableDefinition.Column(name="timestamp", type=SqlType.timestamp(), nullability=NULLABLE),
],
),
data=[
["text_1", 1, datetime(2024, 1, 1, 0, 0, 0, 0)],
["text_2", 2, datetime(2024, 1, 2, 0, 0, 0, 0)],
["text_3", None, None],
],
).execute()
# do somthing with returned file path
hw.hyper_path
data
class-attribute
instance-attribute
#
data: conlist(List[Any], min_length=1) = Field(
default=...,
description="List of rows to write to the Hyper file",
)
execute #
Source code in src/koheesio/integrations/spark/tableau/hyper.py
koheesio.integrations.spark.tableau.hyper.HyperFileParquetWriter #
Read one or multiple parquet files and write them to a Hyper file.
Notes
This method is much faster than HyperFileListWriter for large files.
References
Copy from external format: https://tableau.github.io/hyper-db/docs/sql/command/copy_from Datatypes in https://tableau.github.io/hyper-db/docs/sql/datatype/ for supported data types. Parquet format limitations: https://tableau.github.io/hyper-db/docs/sql/external/formats/#external-format-parquet
Examples:
hw = HyperFileParquetWriter(
name="test",
table_definition=TableDefinition(
table_name=TableName("Extract", "Extract"),
columns=[
TableDefinition.Column(name="string", type=SqlType.text(), nullability=NOT_NULLABLE),
TableDefinition.Column(name="int", type=SqlType.int(), nullability=NULLABLE),
TableDefinition.Column(name="timestamp", type=SqlType.timestamp(), nullability=NULLABLE),
],
),
files=[
"/my-path/parquet-1.snappy.parquet",
"/my-path/parquet-2.snappy.parquet",
],
).execute()
# do somthing with returned file path
hw.hyper_path
file
class-attribute
instance-attribute
#
file: conlist(Union[str, PurePath], min_length=1) = Field(
default=...,
alias="files",
description="One or multiple parquet files to write to the Hyper file",
)
execute #
execute() -> Output
Source code in src/koheesio/integrations/spark/tableau/hyper.py
koheesio.integrations.spark.tableau.hyper.HyperFileReader #
Read a Hyper file and return a Spark DataFrame.
Examples:
path
class-attribute
instance-attribute
#
path: PurePath = Field(
default=...,
description="Path to the Hyper file",
examples=["PurePath(~/data/my-file.hyper)"],
)
execute #
execute() -> Output
Source code in src/koheesio/integrations/spark/tableau/hyper.py
koheesio.integrations.spark.tableau.hyper.HyperFileWriter #
Base class for all HyperFileWriter classes
Reference
HyperProcess parameters: https://tableau.github.io/hyper-db/docs/hyper-api/hyper_process/#process-settings
hyper_process_parameters
class-attribute
instance-attribute
#
hyper_process_parameters: dict = Field(
default={"log_config": ""},
description="Set HyperProcess parameters, see Tableau Hyper API documentation for more details: https://tableau.github.io/hyper-db/docs/hyper-api/hyper_process/#process-settings",
)
name
class-attribute
instance-attribute
#
name: str = Field(
default="extract", description="Name of the Hyper file"
)
path
class-attribute
instance-attribute
#
path: PurePath = Field(
default=name,
description="Path to the Hyper file, if executing in Databricks set the path manually and ensure to specify the scheme `dbfs:/`.",
examples=[
"PurePath(/tmp/hyper/)",
"PurePath(dbfs:/tmp/hyper/)",
],
)
table_definition
class-attribute
instance-attribute
#
table_definition: TableDefinition = Field(
default=None,
description="Table definition to write to the Hyper file as described in https://tableau.github.io/hyper-db/lang_docs/py/tableauhyperapi.html#tableauhyperapi.TableDefinition",
)