Classes¶
spark_expectations.core.context.SparkExpectationsContext
dataclass
¶
This class provides the context for SparkExpectations
Attributes¶
get_agg_dq_rule_type_name: str
property
¶
This function is used to get aggregation data quality rule type name
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _agg_dq_rule_type_name" |
get_cerberus_cred_path: str
property
¶
This functions implemented to return cerberus credentials path Returns:
get_cerberus_token: str
property
¶
This functions implemented to return cerberus token Returns:
get_cerberus_url: str
property
¶
This functions implemented to return cerberus url Returns:
get_client_id: Optional[str]
property
¶
This function helps in getting key / path for client id Returns: client id key / path in Optional[str]
get_config_file_path: str
property
¶
This function returns config file abs path Returns: str: Returns _config_file_path(str)
get_debugger_mode: bool
property
¶
This function returns a debugger Returns: bool: return debugger
get_dq_run_status: str
property
¶
This function is used to get data quality pipeline status
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _dq_status" |
get_dq_run_time: float
property
¶
This function implements time diff for dq run Returns: float: time in float
get_dq_stats_table_name: str
property
¶
Get dq_stats_table_name to which the final stats of the dq job will be written into
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
returns the dq_stats_table_name |
get_enable_mail: bool
property
¶
This function return whether mail notification to enable or not Returns: str: Returns _enable_mail(bool)
get_enable_slack: bool
property
¶
This function returns whether to enable slack notification or not Returns: Returns _enable_slack(bool)
get_env: Optional[str]
property
¶
functions returns running environment type Returns: str: Returns _env
get_error_count: int
property
¶
This functions return error count Returns: int: Returns _error_count(int)
get_error_drop_percentage: float
property
¶
This function returns error drop percentage percentage Returns: float: error drop percentage
get_error_drop_threshold: int
property
¶
This function return error threshold breach Returns: int: error threshold breach
get_error_percentage: float
property
¶
This function returns error percentage Returns: float: error percentage
get_error_table_name: str
property
¶
Get dq_stats_table_name to which the final stats of the dq job will be written into
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
returns the dq_stats_table_name |
get_final_agg_dq_result: Optional[List[Dict[str, str]]]
property
¶
This function return status of the final_agg_dq_result Returns: dict: Returns final_agg_dq_result which in list of dict with str(key) and str(value)
get_final_agg_dq_run_time: float
property
¶
This function implements time diff for final agg dq run Returns: float: time in float
get_final_agg_dq_status: str
property
¶
This function is used to get final aggregation data quality status
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _final_agg_dq_status" |
get_final_query_dq_result: Optional[List[Dict[str, str]]]
property
¶
This function return status of the final_query_dq_result Returns: dict: Returns final_query_dq_result which in list of dict with str(key) and str(value)
get_final_query_dq_run_time: float
property
¶
This function implements time diff for final query dq run Returns: float: time in float
get_final_query_dq_status: str
property
¶
This function is used to get final query dq data quality status
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _final_query_dq_status" |
get_final_table_name: str
property
¶
Get dq_stats_table_name to which the final stats of the dq job will be written into
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
returns the dq_stats_table_name |
get_input_count: int
property
¶
This function return input count Returns: int: Returns _input_count(int)
get_mail_from: str
property
¶
This function returns mail id to send email Returns:
get_mail_smtp_port: int
property
¶
This functions returns smtp port Returns: int: returns _mail_smtp_server port
get_mail_smtp_server: str
property
¶
This functions returns smtp server host Returns: str: returns _mail_smtp_server
get_mail_subject: str
property
¶
This function returns mail subject Returns: str: Returns _mail_subject(str)
get_notification_on_completion: bool
property
¶
This function returns notification on completion Returns: bool: Returns _notification_on_completion
get_notification_on_fail: bool
property
¶
This function returns notification on fail Returns: bool: Returns _notification_on_fail
get_notification_on_start: bool
property
¶
This function returns notification on start Returns: bool: Returns _notification_on_start
get_num_agg_dq_rules: dict
property
¶
This function returns number agg dq rules applied for batch run Returns: int: number of rules in int
get_num_dq_rules: int
property
¶
This function returns number dq rules applied for batch run Returns: int: number of rules in int
get_num_query_dq_rules: dict
property
¶
This function returns number query dq rules applied for batch run Returns: int: number of rules in int
get_num_row_dq_rules: int
property
¶
This function returns number row dq rules applied for batch run Returns: int: number of rules in int
get_output_count: int
property
¶
This function returns output count Returns: int: Returns _output(int)
get_output_percentage: float
property
¶
This function return output percentage Returns: float: output percentage
get_query_dq_rule_type_name: str
property
¶
This function is used to get query data quality rule type name
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _query_dq_rule_type_name" |
get_row_dq_rule_type_name: str
property
¶
This function is used to get row data quality rule type name
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _row_dq_rule_type_name" |
get_row_dq_run_time: float
property
¶
This function implements time diff for row dq run Returns: float: time in float
get_row_dq_status: str
property
¶
This function is used to get row data quality status
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _row_dq_status" |
get_rules_exceeds_threshold: Optional[List[dict]]
property
¶
This function returns error percentage for each rule
get_run_date: str
property
¶
Get run_date for the instance of the spark-expectations class
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
returns the run_date |
get_run_date_name: str
property
¶
This function returns name for the run_date column Returns: str: name of run_date in str
get_run_date_time_name: str
property
¶
This function returns name for the run_date_time column Returns: str: name of run_date_time in str
get_run_id: str
property
¶
Get run_id for the instance of spark-expectations class
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
returns the run_id |
get_run_id_name: str
property
¶
This function returns name for the run_id column Returns: str: name of run_id in str
get_se_streaming_stats_dict: Dict[str, str]
property
¶
This function returns secret keys dict
get_se_streaming_stats_topic_name: str
property
¶
This function returns kafka topic name Returns: str: Returns _se_streaming_stats_topic_name
get_secret_type: Optional[str]
property
¶
This function helps in getting secret type Returns: secret type in Optional[str]
get_server_url_key: Optional[str]
property
¶
This function helps in getting key / path for kafka server url Returns: kafka server url key / path in Optional[str]
get_slack_webhook_url: str
property
¶
This function returns sack webhook url Returns: str: Returns _webhook_url(str)
get_source_agg_dq_result: Optional[List[Dict[str, str]]]
property
¶
This function return status of the source_agg_dq_result Returns: dict: Returns source_agg_dq_result which in list of dict with str(key) and str(value)
get_source_agg_dq_run_time: float
property
¶
This function implements time diff for source agg dq run Returns: float: time in float
get_source_agg_dq_status: str
property
¶
This function is used to get source aggregation data quality status
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _source_agg_dq_status" |
get_source_query_dq_result: Optional[List[Dict[str, str]]]
property
¶
This function return status of the source_query_dq_result Returns: dict: Returns source_query_dq_result which in list of dict with str(key) and str(value)
get_source_query_dq_run_time: float
property
¶
This function implements time diff for source query dq run Returns: float: time in float
get_source_query_dq_status: str
property
¶
This function is used to get source query data quality status
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _source_query_dq_status" |
get_stats_table_writer_config: dict
property
¶
This function returns stats table writer config Returns: dict: Returns stats_table_writer_config which in dict
get_success_percentage: float
property
¶
This function returns success percentage Returns: float: success percentage
get_summarised_row_dq_res: Optional[List[Dict[str, str]]]
property
¶
This function returns row dq summarised res Returns: list(dict): Returns summarised_row_dq_res which in list of dict with str(key) and str(value) of rule meta data
get_supported_df_query_dq: DataFrame
property
¶
This function returns the place holder dataframe for query check Returns: DataFrame: returns dataframe for query dq
get_table_name: str
property
¶
This function returns table name Returns: str: Returns _table_name(str)
get_target_and_error_table_writer_config: dict
property
¶
This function returns target and error table writer config Returns: dict: Returns target_and_error_table_writer_config which in dict
get_to_mail: str
property
¶
This function returns list of mail id's Returns: str: Returns _mail_id(str)
get_token: Optional[str]
property
¶
This function helps in getting key / path for token Returns: token key / path in Optional[str]
get_token_endpoint_url: Optional[str]
property
¶
This function helps in getting key / path for end point url Returns: end point url key / path in Optional[str]
get_topic_name: Optional[str]
property
¶
This function helps in getting key / path for topic name Returns: topic name key / path in Optional[str]
product_id: str
instance-attribute
¶
spark: SparkSession
instance-attribute
¶
Functions¶
get_time_diff(start_time: Optional[datetime], end_time: Optional[datetime]) -> float
¶
This function implements time diff Args: start_time: end_time:
Returns:
Source code in spark_expectations/core/context.py
print_dataframe_with_debugger(df: DataFrame) -> None
¶
This function has a debugger that can print out the DataFrame Returns:
reset_num_agg_dq_rules() -> None
¶
This function used to reset the_num_agg_dq_rules Returns: None
Source code in spark_expectations/core/context.py
reset_num_dq_rules() -> None
¶
reset_num_query_dq_rules() -> None
¶
This function used to rest the _num_query_dq_rules Returns: None
Source code in spark_expectations/core/context.py
reset_num_row_dq_rules() -> None
¶
This function used to reset the _num_row_dq_rules Returns: None
set_debugger_mode(debugger_mode: bool) -> None
¶
set_dq_end_time() -> None
¶
set_dq_run_status(dq_run_status: str = 'Failed') -> None
¶
set_dq_start_time() -> None
¶
set_dq_stats_table_name(dq_stats_table_name: str) -> None
¶
set_enable_mail(enable_mail: bool) -> None
¶
set_enable_slack(enable_slack: bool) -> None
¶
set_end_time_when_dq_job_fails() -> None
¶
function used to set end time when job fails in any one of the stages by using start time Returns:
Source code in spark_expectations/core/context.py
set_env(env: Optional[str]) -> None
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
env |
Optional[str]
|
which accepts env type |
required |
Returns:
Type | Description |
---|---|
None
|
None |
set_error_count(error_count: int = 0) -> None
¶
set_error_drop_threshold(error_drop_threshold: int) -> None
¶
set_error_table_name(error_table_name: str) -> None
¶
set_final_agg_dq_end_time() -> None
¶
This function sets end time final agg dq computation Returns: None
set_final_agg_dq_result(final_agg_dq_result: Optional[List[Dict[str, str]]] = None) -> None
¶
set_final_agg_dq_start_time() -> None
¶
This function sets start time final agg dq computation Returns: None
set_final_agg_dq_status(final_agg_dq_status: str = 'Skipped') -> None
¶
set_final_query_dq_end_time() -> None
¶
This function sets end time final query dq computation Returns: None
set_final_query_dq_result(final_query_dq_result: Optional[List[Dict[str, str]]] = None) -> None
¶
set_final_query_dq_start_time() -> None
¶
This function sets start time final query dq computation Returns: None
set_final_query_dq_status(final_query_dq_status: str = 'Skipped') -> None
¶
set_final_table_name(final_table_name: str) -> None
¶
set_input_count(input_count: int = 0) -> None
¶
set_mail_from(mail_from: str) -> None
¶
set_mail_smtp_port(mail_smtp_port: int) -> None
¶
set_mail_smtp_server(mail_smtp_server: str) -> None
¶
set_mail_subject(mail_subject: str) -> None
¶
set_notification_on_completion(notification_on_completion: bool) -> None
¶
set_notification_on_fail(notification_on_fail: bool) -> None
¶
set_notification_on_start(notification_on_start: bool) -> None
¶
set_num_agg_dq_rules(source_agg_enabled: bool = False, final_agg_enabled: bool = False) -> None
¶
This function sets number of applied agg dq rules for batch run source_agg_enabled: Marked True when agg rules set for source, by default False final_agg_enabled: Marked True when agg rules set for final, by default False Returns: None
Source code in spark_expectations/core/context.py
set_num_query_dq_rules(source_query_enabled: bool = False, final_query_enabled: bool = False) -> None
¶
This function sets number of applied query dq rules for batch run source_query_enabled: Marked True when query rules set for source, by default False final_query_enabled: Marked True when query rules set for final, by default False Returns: None
Source code in spark_expectations/core/context.py
set_num_row_dq_rules() -> None
¶
This function sets number of applied row dq rules for batch run Returns: None
set_output_count(output_count: int = 0) -> None
¶
set_row_dq_end_time() -> None
¶
set_row_dq_start_time() -> None
¶
set_row_dq_status(row_dq_status: str = 'Skipped') -> None
¶
set_rules_exceeds_threshold(rules: Optional[List[dict]] = None) -> None
¶
This function implements error percentage for each rule type
set_run_date() -> str
staticmethod
¶
This function is used to generate the current datatime in UTC
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns the current utc datatime in the format - "%Y-%m-%d %H:%M:%S" |
Source code in spark_expectations/core/context.py
set_se_streaming_stats_dict(se_streaming_stats_dict: Dict[str, str]) -> None
¶
This function helps to set secret keys dict
set_se_streaming_stats_topic_name(se_streaming_stats_topic_name: str) -> None
¶
set_slack_webhook_url(slack_webhook_url: str) -> None
¶
set_source_agg_dq_end_time() -> None
¶
This function sets end time source agg dq computation Returns: None
set_source_agg_dq_result(source_agg_dq_result: Optional[List[Dict[str, str]]] = None) -> None
¶
set_source_agg_dq_start_time() -> None
¶
This function sets start time source agg dq computation Returns: None
set_source_agg_dq_status(source_agg_dq_status: str = 'Skipped') -> None
¶
set_source_query_dq_end_time() -> None
¶
This function sets end time source query dq computation Returns: None
set_source_query_dq_result(source_query_dq_result: Optional[List[Dict[str, str]]] = None) -> None
¶
set_source_query_dq_start_time() -> None
¶
This function sets start time source query dq computation Returns: None
set_source_query_dq_status(source_query_dq_status: str = 'Skipped') -> None
¶
set_stats_table_writer_config(config: dict) -> None
¶
This function sets stats table writer config Args: config: dict Returns: None
set_summarised_row_dq_res(summarised_row_dq_res: Optional[List[Dict[str, str]]] = None) -> None
¶
This function implements or supports to set row dq summarised res Args: summarised_row_dq_res: list(dict) Returns: None
Source code in spark_expectations/core/context.py
set_supported_df_query_dq() -> DataFrame
¶
set_table_name(table_name: str) -> None
¶
set_target_and_error_table_writer_config(config: dict) -> None
¶
This function sets target and error table writer config Args: config: dict Returns: None