Lookup
Lookup transformation for joining two dataframes together
Classes:
Name | Description |
---|---|
JoinMapping |
|
TargetColumn |
|
JoinType |
|
JoinHint |
|
DataframeLookup |
|
koheesio.spark.transformations.lookup.DataframeLookup #
Lookup transformation for joining two dataframes together
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame
|
The left Spark DataFrame |
required |
other |
DataFrame
|
The right Spark DataFrame |
required |
on |
List[JoinMapping] | JoinMapping
|
List of join mappings. If only one mapping is passed, it can be passed as a single object. |
required |
targets |
List[TargetColumn] | TargetColumn
|
List of target columns. If only one target is passed, it can be passed as a single object. |
required |
how |
JoinType
|
What type of join to perform. Defaults to left. See JoinType for more information. |
required |
hint |
JoinHint
|
What type of join hint to use. Defaults to None. See JoinHint for more information. |
required |
Example
from pyspark.sql import SparkSession
from koheesio.spark.transformations.lookup import (
DataframeLookup,
JoinMapping,
TargetColumn,
JoinType,
)
spark = SparkSession.builder.getOrCreate()
# create the dataframes
left_df = spark.createDataFrame([(1, "A"), (2, "B")], ["id", "value"])
right_df = spark.createDataFrame([(1, "A"), (3, "C")], ["id", "value"])
# perform the lookup
lookup = DataframeLookup(
df=left_df,
other=right_df,
on=JoinMapping(source_column="id", joined_column="id"),
targets=TargetColumn(
target_column="value", target_column_alias="right_value"
),
how=JoinType.LEFT,
)
output_df = lookup.transform()
output_df:
id | value | right_value |
---|---|---|
1 | A | A |
2 | B | null |
In this example, the left_df
and right_df
dataframes are joined together using the id
column. The value
column from the right_df
is aliased as right_value
in the output dataframe.
df
class-attribute
instance-attribute
#
hint
class-attribute
instance-attribute
#
hint: Optional[JoinHint] = Field(
default=None,
description="What type of join hint to use. Defaults to None. "
+ __doc__,
)
how
class-attribute
instance-attribute
#
how: Optional[JoinType] = Field(
default=LEFT,
description="What type of join to perform. Defaults to left. "
+ __doc__,
)
on
class-attribute
instance-attribute
#
on: Union[List[JoinMapping], JoinMapping] = Field(
default=...,
alias="join_mapping",
description="List of join mappings. If only one mapping is passed, it can be passed as a single object.",
)
other
class-attribute
instance-attribute
#
targets
class-attribute
instance-attribute
#
targets: Union[List[TargetColumn], TargetColumn] = Field(
default=...,
alias="target_columns",
description="List of target columns. If only one target is passed, it can be passed as a single object.",
)
Output #
execute #
execute() -> Output
Execute the lookup transformation
Source code in src/koheesio/spark/transformations/lookup.py
get_right_df #
set_list #
Ensure that we can pass either a single object, or a list of objects
koheesio.spark.transformations.lookup.JoinHint #
koheesio.spark.transformations.lookup.JoinMapping #
koheesio.spark.transformations.lookup.JoinType #
Supported join types