Skip to content

Getting Started with Koheesio#




If you're using Poetry, add the following entry to the pyproject.toml file:

name = "nike"
url = ""
secondary = true
poetry add koheesio


If you're using pip, run the following command to install Koheesio:

Requires pip.

pip install koheesio

Basic Usage#

Once you've installed Koheesio, you can start using it in your Python scripts. Here's a basic example:

from koheesio import Step

# Define a step
class MyStep(Step):
    def execute(self):
        # Your step logic here

# Create an instance of the step
step = MyStep()

# Run the step

Advanced Usage#

from pyspark.sql.functions import lit
from pyspark.sql import DataFrame, SparkSession

# Step 1: import Koheesio dependencies
from koheesio.context import Context
from koheesio.steps.readers.dummy import DummyReader
from koheesio.steps.transformations.camel_to_snake import CamelToSnakeTransformation
from koheesio.steps.writers.dummy import DummyWriter
from koheesio.tasks.etl_task import EtlTask

# Step 2: Set up a SparkSession
spark = SparkSession.builder.getOrCreate()

# Step 3: Configure your Context
context = Context({
    "source": DummyReader(),
    "transformations": [CamelToSnakeTransformation()],
    "target": DummyWriter(),
    "my_favorite_movie": "inception",

# Step 4: Create a Task
class MyFavoriteMovieTask(EtlTask):
    my_favorite_movie: str

    def transform(self, df: DataFrame = None) -> DataFrame:
        df = df.withColumn("MyFavoriteMovie", lit(self.my_favorite_movie))
        return super().transform(df)

# Step 5: Run your Task
task = MyFavoriteMovieTask(**context)


If you want to contribute to Koheesio, check out the file in this repository. It contains guidelines for contributing, including how to submit issues and pull requests.


To run the tests for Koheesio, use the following command:

make dev-test

This will run all the tests in the tests directory.