Hash
Module for hashing data using SHA-2 family of hash functions
See the docstring of the Sha2Hash class for more information.
koheesio.spark.transformations.hash.HASH_ALGORITHM
module-attribute
#
HASH_ALGORITHM = Literal[224, 256, 384, 512]
koheesio.spark.transformations.hash.Sha2Hash #
hash the value of 1 or more columns using SHA-2 family of hash functions
Mild wrapper around pyspark.sql.functions.sha2
Returns the hex string result of SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512).
Note
This function allows concatenating the values of multiple columns together prior to hashing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns |
Union[str, List[str]]
|
The column (or list of columns) to hash. Alias: column |
required |
delimiter |
Optional[str]
|
Optional separator for the string that will eventually be hashed. Defaults to '|' |
|
|
num_bits |
Optional[HASH_ALGORITHM]
|
Algorithm to use for sha2 hash. Defaults to 256. Should be one of 224, 256, 384, 512 |
256
|
target_column |
str
|
The generated hash will be written to the column name specified here |
required |
delimiter
class-attribute
instance-attribute
#
delimiter: Optional[str] = Field(
default="|",
description="Optional separator for the string that will eventually be hashed. Defaults to '|'",
)
num_bits
class-attribute
instance-attribute
#
num_bits: Optional[HASH_ALGORITHM] = Field(
default=256,
description="Algorithm to use for sha2 hash. Defaults to 256. Should be one of 224, 256, 384, 512",
)
target_column
class-attribute
instance-attribute
#
target_column: str = Field(
default=...,
description="The generated hash will be written to the column name specified here",
)
execute #
Source code in src/koheesio/spark/transformations/hash.py
koheesio.spark.transformations.hash.sha2_hash #
sha2_hash(
columns: List[str],
delimiter: Optional[str] = "|",
num_bits: Optional[HASH_ALGORITHM] = 256,
)
hash the value of 1 or more columns using SHA-2 family of hash functions
Mild wrapper around pyspark.sql.functions.sha2
Returns the hex string result of SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). This function allows concatenating the values of multiple columns together prior to hashing.
If a null is passed, the result will also be null.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns |
List[str]
|
The columns to hash |
required |
delimiter |
Optional[str]
|
Optional separator for the string that will eventually be hashed. Defaults to '|' |
|
|
num_bits |
Optional[HASH_ALGORITHM]
|
Algorithm to use for sha2 hash. Defaults to 256. Should be one of 224, 256, 384, 512 |
256
|