Hash
Module for hashing data using SHA-2 family of hash functions
See the docstring of the Sha2Hash class for more information.
            koheesio.spark.transformations.hash.HASH_ALGORITHM
  
      module-attribute
  
#
HASH_ALGORITHM = Literal[224, 256, 384, 512]
koheesio.spark.transformations.hash.Sha2Hash #
hash the value of 1 or more columns using SHA-2 family of hash functions
Mild wrapper around pyspark.sql.functions.sha2
Returns the hex string result of SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512).
Note
This function allows concatenating the values of multiple columns together prior to hashing.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| columns | Union[str, List[str]] | The column (or list of columns) to hash. Alias: column | required | 
| delimiter | Optional[str] | Optional separator for the string that will eventually be hashed. Defaults to '|' | | | 
| num_bits | Optional[HASH_ALGORITHM] | Algorithm to use for sha2 hash. Defaults to 256. Should be one of 224, 256, 384, 512 | 256 | 
| target_column | str | The generated hash will be written to the column name specified here | required | 
            delimiter
  
      class-attribute
      instance-attribute
  
#
delimiter: Optional[str] = Field(
    default="|",
    description="Optional separator for the string that will eventually be hashed. Defaults to '|'",
)
            num_bits
  
      class-attribute
      instance-attribute
  
#
num_bits: Optional[HASH_ALGORITHM] = Field(
    default=256,
    description="Algorithm to use for sha2 hash. Defaults to 256. Should be one of 224, 256, 384, 512",
)
            target_column
  
      class-attribute
      instance-attribute
  
#
target_column: str = Field(
    default=...,
    description="The generated hash will be written to the column name specified here",
)
execute #
Source code in src/koheesio/spark/transformations/hash.py
              
            koheesio.spark.transformations.hash.sha2_hash #
sha2_hash(
    columns: List[str],
    delimiter: Optional[str] = "|",
    num_bits: Optional[HASH_ALGORITHM] = 256,
)
hash the value of 1 or more columns using SHA-2 family of hash functions
Mild wrapper around pyspark.sql.functions.sha2
Returns the hex string result of SHA-2 family of hash functions (SHA-224, SHA-256, SHA-384, and SHA-512). This function allows concatenating the values of multiple columns together prior to hashing.
If a null is passed, the result will also be null.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| columns | List[str] | The columns to hash | required | 
| delimiter | Optional[str] | Optional separator for the string that will eventually be hashed. Defaults to '|' | | | 
| num_bits | Optional[HASH_ALGORITHM] | Algorithm to use for sha2 hash. Defaults to 256. Should be one of 224, 256, 384, 512 | 256 |