Hash
Module for hashing data using SHA-2 family of hash functions
See the docstring of the Sha2Hash class for more information.
koheesio.spark.transformations.hash.HASH_ALGORITHM
module-attribute
#
HASH_ALGORITHM = Literal[224, 256, 384, 512]
koheesio.spark.transformations.hash.Sha2Hash #
hash the value of 1 or more columns using SHA-2 family of hash functions
Mild wrapper around pyspark.sql.functions.sha2
Returns the hex string result of SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512).
Note
This function allows concatenating the values of multiple columns together prior to hashing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns |
Union[str, List[str]]
|
The column (or list of columns) to hash. Alias: column |
required |
delimiter |
Optional[str]
|
Optional separator for the string that will eventually be hashed. Defaults to '|' |
|
|
num_bits |
Optional[HASH_ALGORITHM]
|
Algorithm to use for sha2 hash. Defaults to 256. Should be one of 224, 256, 384, 512 |
256
|
target_column |
str
|
The generated hash will be written to the column name specified here |
required |
delimiter
class-attribute
instance-attribute
#
delimiter: Optional[str] = Field(default='|', description="Optional separator for the string that will eventually be hashed. Defaults to '|'")
num_bits
class-attribute
instance-attribute
#
num_bits: Optional[HASH_ALGORITHM] = Field(default=256, description='Algorithm to use for sha2 hash. Defaults to 256. Should be one of 224, 256, 384, 512')
target_column
class-attribute
instance-attribute
#
target_column: str = Field(default=..., description='The generated hash will be written to the column name specified here')
execute #
Source code in src/koheesio/spark/transformations/hash.py
koheesio.spark.transformations.hash.sha2_hash #
sha2_hash(columns: List[str], delimiter: Optional[str] = '|', num_bits: Optional[HASH_ALGORITHM] = 256)
hash the value of 1 or more columns using SHA-2 family of hash functions
Mild wrapper around pyspark.sql.functions.sha2
Returns the hex string result of SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). This function allows concatenating the values of multiple columns together prior to hashing.
If a null is passed, the result will also be null.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
columns |
List[str]
|
The columns to hash |
required |
delimiter |
Optional[str]
|
Optional separator for the string that will eventually be hashed. Defaults to '|' |
|
|
num_bits |
Optional[HASH_ALGORITHM]
|
Algorithm to use for sha2 hash. Defaults to 256. Should be one of 224, 256, 384, 512 |
256
|