Skip to content

Kafka Streaming Custom Configuration

See this example config page for context about the user config: examples page.

Many users will set their streaming configuration to use Databricks or Cerberus secrets. However, if you are running locally or not using Cerberus or Databricks and want to specify the streaming topic name and Kafka bootstrap server you can enable the following custom parameters.

Setup Note

Please note that the specified streaming topic and Kafka bootstrap server have to exist when running Spark Expectations (they will not be generated for you).

Kafka Custom Configuration Parameters

user_config.se_streaming_stats_kafka_custom_config_enable

Master toggle to enable using custom Kafka parameters

user_config.se_streaming_stats_kafka_bootstrap_server

Used to set the Kafka bootstrap server

user_config.se_streaming_stats_topic_name

Used to set the streaming topic name

Defaults

If user_config.se_streaming_stats_kafka_custom_config_enable is set to True but the topic and server options are not specified, the defaults from the spark_expectations/config/spark-expectations-default-config.yaml file will be used.

Configuration Example

from typing import Dict, Union
from spark_expectations.config.user_config import Constants as user_config

stats_streaming_config_dict: Dict[str, Union[bool, str]] = {
    user_config.se_enable_streaming: True,
    user_config.se_streaming_stats_kafka_custom_config_enable: True,
    user_config.se_streaming_stats_topic_name: "dq-sparkexpectations-stats",
    user_config.se_streaming_stats_kafka_bootstrap_server: "localhost:9092",
}