Delta
Module for creating and managing Delta tables.
koheesio.spark.delta.DeltaTableStep #
Class for creating and managing Delta tables.
DeltaTable aims to provide a simple interface to create and manage Delta tables. It is a wrapper around the Spark SQL API for Delta tables.
Example
from koheesio.steps import DeltaTableStep
DeltaTableStep(
table="my_table",
database="my_database",
catalog="my_catalog",
create_if_not_exists=True,
default_create_properties={
"delta.randomizeFilePrefixes": "true",
"delta.checkpoint.writeStatsAsStruct": "true",
"delta.minReaderVersion": "2",
"delta.minWriterVersion": "5",
},
)
Methods:
Name | Description |
---|---|
get_persisted_properties |
Get persisted properties of table. |
add_property |
Alter table and set table property. |
add_properties |
Alter table and add properties. |
execute |
Nothing to execute on a Table. |
max_version_ts_of_last_execution |
Max version timestamp of last execution. If no timestamp is found, returns 1900-01-01 00:00:00.
Note: will raise an error if column |
Properties
- name -> str
Deprecated. Use
.table_name
instead. - table_name -> str Table name.
- dataframe -> DataFrame Returns a DataFrame to be able to interact with this table.
- columns -> Optional[List[str]] Returns all column names as a list.
- has_change_type -> bool
Checks if a column named
_change_type
is present in the table. - exists -> bool Check if table exists.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
table |
str
|
Table name. |
required |
database |
str
|
Database or Schema name. |
None
|
catalog |
str
|
Catalog name. |
None
|
create_if_not_exists |
bool
|
Force table creation if it doesn't exist. Note: Default properties will be applied to the table during CREATION. |
False
|
default_create_properties |
Dict[str, str]
|
Default table properties to be applied during CREATION if |
{"delta.randomizeFilePrefixes": "true", "delta.checkpoint.writeStatsAsStruct": "true", "delta.minReaderVersion": "2", "delta.minWriterVersion": "5"}
|
catalog
class-attribute
instance-attribute
#
catalog: Optional[str] = Field(
default=None,
description="Catalog name. Note: Can be ignored if using a SparkCatalog that does not support catalog notation (e.g. Hive)",
)
columns
property
#
create_if_not_exists
class-attribute
instance-attribute
#
create_if_not_exists: bool = Field(
default=False,
alias="force_creation",
description="Force table creation if it doesn't exist.Note: Default properties will be applied to the table during CREATION.",
)
database
class-attribute
instance-attribute
#
dataframe
property
#
Returns a DataFrame to be able to interact with this table
default_create_properties
class-attribute
instance-attribute
#
default_create_properties: Dict[
str, Union[str, bool, int]
] = Field(
default={
"delta.randomizeFilePrefixes": "true",
"delta.checkpoint.writeStatsAsStruct": "true",
"delta.minReaderVersion": "2",
"delta.minWriterVersion": "5",
},
description="Default table properties to be applied during CREATION if `create_if_not_exists` True",
)
has_change_type
property
#
has_change_type: bool
Checks if a column named _change_type
is present in the table
is_cdf_active
property
#
is_cdf_active: bool
Check if CDF property is set and activated
Returns:
Type | Description |
---|---|
bool
|
delta.enableChangeDataFeed property is set to 'true' |
table_name
property
#
table_name: str
Fully qualified table name in the form of catalog.database.table
add_properties #
Alter table and add properties.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
properties |
Dict[str, Union[str, int, bool]]
|
Properties to be added to table. |
required |
override |
bool
|
Enable override of existing value for property in table. |
False
|
Source code in src/koheesio/spark/delta.py
add_property #
Alter table and set table property.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key |
str
|
Property key(name). |
required |
value |
Union[str, int, bool]
|
Property value. |
required |
override |
bool
|
Enable override of existing value for property in table. |
False
|
Source code in src/koheesio/spark/delta.py
execute #
get_column_type #
Get the type of a column in the table.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column |
str
|
Column name. |
required |
Returns:
Type | Description |
---|---|
Optional[DataType]
|
Column type. |
Source code in src/koheesio/spark/delta.py
get_persisted_properties #
Get persisted properties of table.
Returns:
Type | Description |
---|---|
Dict[str, str]
|
Persisted properties as a dictionary. |