Setup
Prerequisites¶
Python¶
- Supported versions: 3.9, 3.10, 3.11, 3.12 (recommended: latest 3.12.x)
- Check version:
- Install Python: You can install Python by downloading it from python.org or by using a version manager such as pyenv.
Hatch¶
- Install Hatch
Java¶
- Supported versions: 8, 11, 17 (recommended: latest 17.x)
- Recommendation: Use a JDK version manager such as SDKMAN! or OpenJDK(Linux/macOS) or an equivalent tool for your operating system to install and manage Java versions.
- JAVA_HOME: If your tools or environment require it, set the
JAVA_HOME
environment variable to point to your installed JDK. Refer to your JDK version manager’s documentation for instructions.
IDE¶
- Recommended: Visual Studio Code
- Other options: PyCharm
Docker¶
- Requirement: A container engine is required for running integration tests. You can use Docker, Podman, containerd, Rancher, or any equivalent container runtime for your operating system.
Github Configuration¶
- Required: A GitHub account to access the source code and documentation.
Create GPG and SSH Keys¶
- GPG: Use GPG for signing commits and tags.
See GitHub Docs – Generate a GPG Key. - SSH: Use SSH for connecting to and interacting with remote repositories.
See GitHub Docs – Generate an SSH Key. - Clone the repository:
Environment Setup¶
We recommend using version manager tool like:
Create a virtual environment:
pyenv install -l | less # List available versions
pyenv install 3.12.11 # Install specific python version
pyenv versions # List installed versions
pyenv prefix # Display full path to the current installed python directory
pyenv global <version> # sets default for all shells
pyenv local <version> # sets version for current directory (creates .python-version)
Install Dependencies:
This project uses Hatch for Python environment and dependency management.
All required and optional dependencies are managed via pyproject.toml
.
To initialize dev environment (python virtual environments) and install dependencies run following makefile target:
# Configures Hatch dev Environment
# Initializes Python virtual environments ( 3.10,3.11,3.12)
# Installs Dependencies
make dev
To view Hatch Environments:
This shows which environments are available for development, testing, and other workflows, and how they are configured for your project.
Since Hatch
is used as dependency management tool output of this command will be creation of multiple python virtual environments stored under project root .venv/env/virtual/<dev.py3.X>
Troubleshooting Hatch Environments:
If you encounter issues, try cleaning and recreating the environment.
make env-remove-all
# Manual option would be to delete <project_root>./venv/ directory to delete hatch python virtual environments
# Run following to re-create environments
make dev
Running Tests¶
To execture all spark-expectations tests run following command:
Warning
Previous command will spin up docker container needed to execute some tests so make sure docker is running.
Checkout ./spark_expectations/examples/docker_scripts
and test fixtures to better understand what is being spin up.
For Example
- Build the required Docker image
- Start a Kafka service in a Docker container
- Make Kafka available for integration tests or local development
IDE Debugging¶
VSCode Interpreter and Launch Config
settings.json
Setting default python interpreter be project created virtual environment
{
"python.testing.pytestEnabled": true,
"python.testing.autoTestDiscoverOnSaveEnabled": true,
"python.defaultInterpreterPath": ".venv/env/virtual/dev.py3.12/bin/python",
"terminal.integrated.profiles.osx": {
"Hatch Shell": {
"path": "hatch",
"args": ["shell"],
"icon": "terminal"
}
},
"terminal.integrated.defaultProfile.osx": "zsh",
"python.testing.pytestArgs": [
"--maxfail=1",
"--disable-warnings"
]
}
launch.json
debug launch configurations
{
"version": "0.2.0",
"configurations": [
{
"name": "Coverage: Pytest (All files)",
"type": "debugpy",
"request": "launch",
"module": "coverage",
"args": [
"run",
"--source=spark_expectations",
"--omit=spark_expectations/examples/*",
"-m", "pytest",
"-v",
"-x"
],
"console": "integratedTerminal",
"justMyCode": false
},
{
"name": "Coverage: Pytest (Selected File)",
"type": "debugpy",
"request": "launch",
"module": "coverage",
"args": [
"run",
"--source=spark_expectations",
"--omit=spark_expectations/examples/*",
"-m", "pytest",
"-v",
"-x",
"${file}"
],
"console": "integratedTerminal",
"justMyCode": false
}
]
}
Adding Certificates¶
To enable trusted SSL/TLS communication during Spark-Expectations testing, you may need to provide custom Certificate Authority (CA) certificates.
Place any required .crt
files in the spark_expectations/examples/docker_scripts/certs
directory.
During test container startup, all certificates in this folder will be automatically imported into the container’s trusted certificate store, ensuring that your Spark jobs and dependencies can establish secure connections as needed.