PagerDuty
spark-expectations relies on the PagerDuty Events API V2 to create new incidents. An existing PagerDuty service needs to be created before incidents can be created from spark-expectations.
Pre-requisites¶
By default PagerDuty notifications (or the ability to create incidents) are disabled. To use them we need to pass the required user configurations for spark-expectations to properly run.
Notification Config Parameters¶
user_config.se_notifications_enable_pagerduty
Master toggle to enable PagerDuty notifications (this will create incidents for your service!)
PagerDuty Failure-Only Behavior
PagerDuty incidents are only created for critical failure scenarios, regardless of which notification triggers are enabled. This ensures PagerDuty is used appropriately for alerting on issues that require immediate attention, rather than routine status updates or informational notifications.
PagerDuty incidents will be triggered for:
- Job failures (
se_notifications_on_fail) - Error threshold breaches (
se_notifications_on_error_drop_exceeds_threshold_breach)
PagerDuty incidents will NOT be triggered for:
- Job start notifications (
se_notifications_on_start) - Job completion notifications (
se_notifications_on_completion) - Rules with 'ignore' action that fail (
se_notifications_on_rules_action_if_failed_set_ignore) - These are informational only
Rules with 'ignore' action
When se_notifications_on_rules_action_if_failed_set_ignore is enabled, notifications will be sent to other channels (email, Slack, Teams, etc.) for informational purposes, but PagerDuty incidents will NOT be created. Rules marked with action_if_failed='ignore' are not considered critical failures requiring immediate incident response.
Other notification channels (email, Slack, Teams, etc.) will continue to respect all configured triggers.
Notification triggers
These parameters control when notifications are sent during Spark-Expectations runs. Note: PagerDuty will only create incidents for failure-related triggers(when enabled)
Hover over each parameter to see a short description.
- user_config.se_notifications_enable_pagerduty
- user_config.se_notifications_on_start
- user_config.se_notifications_on_completion
- user_config.se_notifications_on_fail
- user_config.se_notifications_on_error_drop_exceeds_threshold_breach
- user_config.se_notifications_on_rules_action_if_failed_set_ignore
- user_config.se_notifications_on_error_drop_threshold
PagerDuty Configs
Additional configurations that are needed to be able to create incidents with spark-expectations.
Hover over each parameter to see a short description.
- user_config.se_notifications_pagerduty_integration_key
- user_config.se_notifications_pagerduty_webhook_url
User Configuration Example¶
Show example user configuration
Links to example notebooks¶
An example notebook is available to use that sets up PD in a notebook here.
This notebook will:
- Grab integration key using databricks secret manager (default)
- An option to use Cerberus Secrets Manager is present but commented out. Uncomment if you would to use this method instead.
- Configure spark-expectations
- Load sample data and then run some validations rules afterwards.
If everything has been configured correctly, this will create a new incident based on the triggers you have enabled.