Skip to content

Brickflow Projects

Prerequisites

  1. Install Locally (optional):

    1. Python >= 3.8
  2. Configure the databricks cli cfg file. pip install databricks-cli and then databricks configure -t which will configure the databricks cli with a token.

    pip install databricks-cli
    databricks configure -t
    
  3. Install brickflow cli

    pip install brickflows
    

Confirming the installation

  • To confirm the setup run the following command:

    bf --help
    
  • Also confirm the connectivity to databricks:

    databricks workspace list /
    
    or if you have specific profile

    databricks workspace list /  --profile <profile>
    

Brickflow Projects Setup

Brickflow introduced projects in version 0.9.2 for managing mono repos with multiple projects or workflows that need to be deployed in groups. It helps with the following things:

  1. It helps manage statefiles and simplifies deployment.
  2. It helps you manage clean up of state, etc.
  3. It also helps the framework resolve imports for python modules, etc in your repo.

Concepts

  1. Project - A project is a collection of workflows that are deployed together. A project is a folder with a entrypoint and a set of workflows.
  2. Workflow - A workflow is a collection of tasks that are deployed together which may be DLT pipelines, notebooks, wheels, jars, etc.

Monorepo Style

A monorepo style project is a repository that has multiple folders and modules that can contain multiple brickflow projects. Learn more here.

Folder structure:

repo-root/
├── .git
├── projects/
│   ├── project_abc/
│   │   ├── lib/
│   │   │   ├── __init__.py
│   │   │   └── shared_functions.py
│   │   ├── workflows/
│   │   │   ├── __init__.py
│   │   │   ├── entrypoint.py
│   │   │   └── workflow_abc.py
│   │   ├── setup.py
│   │   └── .brickflow-project-root.yml
│   └── project_xyz/
│       ├── workflows_geo_b/
│       │   ├── entrypoint.py
│       │   └── workflow_xyz.py
│       ├── workflows_geo_a/
│       │   ├── entrypoint.py
│       │   └── workflow_xyz.py
│       └── .brickflow-project-root.yml
├── .gitignore
├── brickflow-multi-project.yml
└── README.md
  1. entrypoint.py: This is the entrypoint for your project. It is the file that will be used to identify all the workflows to be deployed.
  2. brickflow-multi-project.yml: This is the project file that will be generated by brickflow. It will contain the list of projects and a path to the project root config. This will be created in the git repository root (where your .git folder is).

Example for monorepo with multiple projects:

```yaml
project_roots:
  project_abc:
    root_yaml_rel_path: projects/project_abc
  project_xyz_geo_a:
    root_yaml_rel_path: projects/project_xyz
  project_xyz_geo_b:
    root_yaml_rel_path: projects/project_xyz
version: v1
```
  1. brickflow-project-root.yml: This is the project root config file. It will contain the list of workflows and a path to the workflows root config.

Example for monorepo with multiple projects for repo-root/projects/project_xyz/.brickflow-project-root.yml:

```yaml
# DO NOT MODIFY THIS FILE - IT IS AUTO GENERATED BY BRICKFLOW AND RESERVED FOR FUTURE USAGE
projects:
  project_xyz_geo_a:
    brickflow_version: auto # automatically determine the brickflow version based on cli version
    deployment_mode: bundle
    name: project_xyz_geo_a
    path_from_repo_root_to_project_root: projects/project_xyz # path from the repo root (where your .git folder is) to the project root
    path_project_root_to_workflows_dir: workflows_geo_a
  project_xyz_geo_b:
    brickflow_version: auto  # automatically determine the brickflow version based on cli version
    deployment_mode: bundle
    name: project_xyz_geo_b
    path_from_repo_root_to_project_root: projects/project_xyz
    path_project_root_to_workflows_dir: workflows_geo_b
version: v1
```

The important fields are:

  • path_from_repo_root_to_project_root: This is the path from the repo root to the project root. This is the path that will be used to find the entrypoint file.
  • path_project_root_to_workflows_dir: This is the path from the project .git root and is used to find and load modules into python
    • This is what helps you make your imports work in your notebooks. It is the path from the project root to the workflows directory.

Polyrepo Style

A polyrepo style project is a repository that has multiple repositories that can contain multiple brickflow projects.

Folder structure

repo-root/
├── .git
├── src/
│   ├── lib/
│   │   ├── __init__.py
│   │   └── shared_functions.py
│   ├── workflows_a/
│   │   ├── __init__.py
│   │   ├── entrypoint.py
│   │   └── workflow_a.py
│   ├── workflows_b/
│   │   ├── __init__.py
│   │   ├── entrypoint.py
│   │   └── workflow_b.py
│   └── __init__.py
├── .gitignore
├── .brickflow-project-root.yml
├── brickflow-multi-project.yml
└── README.md
  1. entrypoint.py: This is the entrypoint for your project. It is the file that will be used to identify all the workflows to be deployed.
  2. brickflow-multi-project.yml: This is the project file that will be generated by brickflow. It will contain the list of projects and a path to the project root config. This will be created in the git repository root (where your .git folder is).

Example for polyrepo with multiple projects:

```yaml
project_roots:
  project_abc:
    root_yaml_rel_path: .
  project_abc_workflows_2:
    root_yaml_rel_path: .
  project_xyz:
    root_yaml_rel_path: .
version: v1
```
  1. brickflow-project-root.yml: This is the project root config file. It will contain the list of workflows and a path to the workflows root config.

Example for polyrepo with multiple projects:

```yaml
# DO NOT MODIFY THIS FILE - IT IS AUTO GENERATED BY BRICKFLOW AND RESERVED FOR FUTURE USAGE
projects:
  project_abc:
    brickflow_version: auto # automatically determine the brickflow version based on cli version
    deployment_mode: bundle
    name: project_abc
    path_from_repo_root_to_project_root: . # path from the repo root (where your .git folder is) to the project root
    path_project_root_to_workflows_dir: workflows
  project_abc_workflows_2:
    brickflow_version: auto  # automatically determine the brickflow version based on cli version
    deployment_mode: bundle
    name: project_abc_workflows_2
    path_from_repo_root_to_project_root: .
    path_project_root_to_workflows_dir: workflows2
version: v1
```

The important fields are:

* path_from_repo_root_to_project_root: This is the path from the repo root to the project root. This is the path
  that will be used to find the entrypoint file.
* path_project_root_to_workflows_dir: This is the path from the project .git root and is used to find and load
  modules into python
    * This is what helps you make your imports work in your notebooks. It is the path from the project root to the
      workflows directory.

Initialize Project

The first step is to create a new project.

Warning

Make sure you are in repository root (where your .git folder is) to do this! Otherwise you will run into validation issues.

Note

Please note that if you are an advanced user and understand the concepts of both files described above, you can manually create the files thats brickflow projects add creates.

  1. Run the following command:
    bf projects add
    
  2. Update your .gitignore file with the correct directories to ignore. .databricks and bundle.yml should be ignored.

  3. It will prompt you for the:

Project Name: # (1)!
Path from repo root to project root (optional) [.]: # (2)!
Path from project root to workflows dir: # (3)!
Git https url: # (4)!
Brickflow version [auto]: # (5)!
Spark expectations version [0.8.0]: # (6)!  
Skip entrypoint [y/N]: # (7)!
  1. A name thats not already used please only use alphanumeric characters
  2. If you have a polyrepo leave this a .. Look above for polyrepo sections and monorepo sections for guidance.
  3. Look above for polyrepo sections and monorepo sections for guidance.
  4. Used to populate entrypoint and used for deployment to higher environments
  5. Auto or hard code specific version to be shipped with the project during deployment
  6. If you want to use spark expectations. Visit spark-expectations for more information.
  7. If you already have an entrypoint in that folder you can skip this step.

Validating your project

  • To test your configuration run the following command:

    bf projects synth --project <project_name> --profile <profile> # profile is optional its your databricks profile
    
  • This will generate the following output at the end:

    SUCCESSFULLY SYNTHESIZED BUNDLE.YML FOR PROJECT: <project_name>
    
  • This should create a bundle.yml file in your project root and it should contain all the information for your workflow.

  • Anything else would indicate an error.

gitignore

  • For now all the bundle.yml files will be code generated so you can add the following to your .gitignore file:

    **/bundle.yml
    

Deploying your Project

  • To deploy the workflow run the following command

    bf projects deploy --project <project> -p <profile> --force-acquire-lock # force acquire lock is optional
    

By default this will deploy to local.

Important

Keep in mind that environments are logical, your profile controls where the workflows are deployed and your code may have business logic based on which environment you are on.

If you want to deploy to a higher environment you can use the following command:

  • dev:

    bf projects deploy --project <project> -p <profile> -e dev --force-acquire-lock # force acquire lock is optional
    
  • test:

    bf projects deploy --project <project> -p <profile> -e test --force-acquire-lock # force acquire lock is optional
    
  • prod:

    bf projects deploy --project <project> -p <profile> -e prod --force-acquire-lock # force acquire lock is optional
    

Deployments By Release Candidates or PRs

Sometimes you may want to deploy multiple RC branches into the same "test" environment. Your objective will be to:

  1. Deploy the workflows
  2. Run and test the workflows
  3. Destroy the workflows after confirming the tests pass

To do this you can use the BRICKFLOW_WORKFLOW_PREFIX and BRICKFLOW_WORKFLOW_SUFFIX environment variables.

  • Doing it based on release candidates
BRICKFLOW_WORKFLOW_SUFFIX="0.1.0-rc1" bf projects deploy --project <project> -p <profile> -e test --force-acquire-lock # force acquire lock is optional
  • Doing it based on PRs
BRICKFLOW_WORKFLOW_SUFFIX="0.1.0-pr34" bf projects deploy --project <project> -p <profile> -e test --force-acquire-lock # force acquire lock is optional

Make sure when using the suffix and prefix that you destroy them, they are considered independent deployments and have their own state.

BRICKFLOW_WORKFLOW_SUFFIX="0.1.0-rc1" bf projects destroy --project <project> -p <profile> -e test --force-acquire-lock # force acquire lock is optional
  • Doing it based on PRs
BRICKFLOW_WORKFLOW_SUFFIX="0.1.0-pr34" bf projects destroy --project <project> -p <profile> -e test --force-acquire-lock # force acquire lock is optional

Destroying your project

  • To destroy the workflow run the following command

    bf projects destroy --project <project> -p <profile> --force-acquire-lock # force acquire lock is optional