Lookup
Lookup transformation for joining two dataframes together
Classes:
| Name | Description | 
|---|---|
JoinMapping | 
            
               | 
          
TargetColumn | 
            
               | 
          
JoinType | 
            
               | 
          
JoinHint | 
            
               | 
          
DataframeLookup | 
            
               | 
          
koheesio.spark.transformations.lookup.DataframeLookup #
Lookup transformation for joining two dataframes together
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                df
             | 
            
                  DataFrame
             | 
            
               The left Spark DataFrame  | 
            required | 
                other
             | 
            
                  DataFrame
             | 
            
               The right Spark DataFrame  | 
            required | 
                on
             | 
            
                  List[JoinMapping] | JoinMapping
             | 
            
               List of join mappings. If only one mapping is passed, it can be passed as a single object.  | 
            required | 
                targets
             | 
            
                  List[TargetColumn] | TargetColumn
             | 
            
               List of target columns. If only one target is passed, it can be passed as a single object.  | 
            required | 
                how
             | 
            
                  JoinType
             | 
            
               What type of join to perform. Defaults to left. See JoinType for more information.  | 
            required | 
                hint
             | 
            
                  JoinHint
             | 
            
               What type of join hint to use. Defaults to None. See JoinHint for more information.  | 
            required | 
Example
from pyspark.sql import SparkSession
from koheesio.spark.transformations.lookup import (
    DataframeLookup,
    JoinMapping,
    TargetColumn,
    JoinType,
)
spark = SparkSession.builder.getOrCreate()
# create the dataframes
left_df = spark.createDataFrame([(1, "A"), (2, "B")], ["id", "value"])
right_df = spark.createDataFrame(
    [(1, "A"), (3, "C")], ["id", "value"]
)
# perform the lookup
lookup = DataframeLookup(
    df=left_df,
    other=right_df,
    on=JoinMapping(source_column="id", joined_column="id"),
    targets=TargetColumn(
        target_column="value", target_column_alias="right_value"
    ),
    how=JoinType.LEFT,
)
output_df = lookup.transform()
output_df:
| id | value | right_value | 
|---|---|---|
| 1 | A | A | 
| 2 | B | null | 
In this example, the left_df and right_df dataframes are joined together using the id column. The value
column from the right_df is aliased as right_value in the output dataframe.
            df
  
      class-attribute
      instance-attribute
  
#
df: Optional[DataFrame] = Field(
    default=None, description="The left Spark DataFrame"
)
            hint
  
      class-attribute
      instance-attribute
  
#
hint: Optional[JoinHint] = Field(
    default=None,
    description="What type of join hint to use. Defaults to None. "
    + str(__doc__),
)
            how
  
      class-attribute
      instance-attribute
  
#
how: Optional[JoinType] = Field(
    default=LEFT,
    description="What type of join to perform. Defaults to left. "
    + str(__doc__),
)
            on
  
      class-attribute
      instance-attribute
  
#
on: Union[List[JoinMapping], JoinMapping] = Field(
    default=...,
    alias="join_mapping",
    description="List of join mappings. If only one mapping is passed, it can be passed as a single object.",
)
            other
  
      class-attribute
      instance-attribute
  
#
other: Optional[DataFrame] = Field(
    default=None, description="The right Spark DataFrame"
)
            targets
  
      class-attribute
      instance-attribute
  
#
targets: Union[List[TargetColumn], TargetColumn] = Field(
    default=...,
    alias="target_columns",
    description="List of target columns. If only one target is passed, it can be passed as a single object.",
)
Output #
execute #
execute() -> Output
Execute the lookup transformation
Source code in src/koheesio/spark/transformations/lookup.py
              get_right_df #
set_list #
set_list(
    value: Union[
        List[JoinMapping],
        JoinMapping,
        List[TargetColumn],
        TargetColumn,
    ],
) -> List
Ensure that we can pass either a single object, or a list of objects
Source code in src/koheesio/spark/transformations/lookup.py
              
            koheesio.spark.transformations.lookup.JoinHint #
koheesio.spark.transformations.lookup.JoinMapping #
koheesio.spark.transformations.lookup.JoinType #
Supported join types