Skip to content

Dummy

A simple DummyReader that returns a DataFrame with an id-column of the given range

koheesio.spark.readers.dummy.DummyReader #

A simple DummyReader that returns a DataFrame with an id-column of the given range

Can be used in place of any Reader without having to read from a real source.

Wraps SparkSession.range(). Output DataFrame will have a single column named "id" of type Long and length of the given range.

Parameters:

Name Type Description Default
range int

How large to make the Dataframe

required
Example
from koheesio.spark.readers.dummy import DummyReader

output_df = DummyReader(range=100).read()

output_df: Output DataFrame will have a single column named "id" of type Long containing 100 rows (0-99).

id
0
1
...
99

range class-attribute instance-attribute #

range: int = Field(
    default=100,
    description="How large to make the Dataframe",
)

execute #

execute()
Source code in src/koheesio/spark/readers/dummy.py
def execute(self):
    self.output.df = self.spark.range(self.range)