Classes¶
spark_expectations.core.context.SparkExpectationsContext
dataclass
¶
This class provides the context for SparkExpectations
Attributes¶
get_agg_dq_rule_type_name: str
property
¶
This function is used to get aggregation data quality rule type name
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _agg_dq_rule_type_name" |
get_cerberus_cred_path: str
property
¶
This functions implemented to return cerberus credentials path
get_cerberus_token: str
property
¶
This functions implemented to return cerberus token
get_cerberus_url: str
property
¶
This functions implemented to return cerberus url
get_client_id: Optional[str]
property
¶
This function helps in getting key / path for client id
Returns:
Type | Description |
---|---|
Optional[str]
|
client id key / path in Optional[str] |
get_config_file_path: str
property
¶
This function returns config file abs path
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _config_file_path(str) |
get_debugger_mode: bool
property
¶
This function returns a debugger
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
return debugger |
get_dq_run_status: str
property
¶
This function is used to get data quality pipeline status
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _dq_status" |
get_dq_run_time: float
property
¶
This function implements time diff for dq run
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
time in float |
get_dq_stats_table_name: str
property
¶
Get dq_stats_table_name to which the final stats of the dq job will be written into
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
returns the dq_stats_table_name |
get_enable_mail: bool
property
¶
This function return whether mail notification to enable or not
Returns:
Name | Type | Description |
---|---|---|
str |
bool
|
Returns _enable_mail(bool) |
get_enable_slack: bool
property
¶
This function returns whether to enable slack notification or not
get_env: Optional[str]
property
¶
functions returns running environment type
Returns:
Name | Type | Description |
---|---|---|
str |
Optional[str]
|
Returns _env |
get_error_count: int
property
¶
This functions return error count
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
Returns _error_count(int) |
get_error_drop_percentage: float
property
¶
This function returns error drop percentage percentage
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
error drop percentage |
get_error_drop_threshold: int
property
¶
This function return error threshold breach
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
error threshold breach |
get_error_percentage: float
property
¶
This function returns error percentage
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
error percentage |
get_error_table_name: str
property
¶
Get dq_stats_table_name to which the final stats of the dq job will be written into
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
returns the dq_stats_table_name |
get_final_agg_dq_result: Optional[List[Dict[str, str]]]
property
¶
This function return status of the final_agg_dq_result
Returns:
Name | Type | Description |
---|---|---|
dict |
Optional[List[Dict[str, str]]]
|
Returns final_agg_dq_result which in list of dict with str(key) and str(value) |
get_final_agg_dq_run_time: float
property
¶
This function implements time diff for final agg dq run
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
time in float |
get_final_agg_dq_status: str
property
¶
This function is used to get final aggregation data quality status
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _final_agg_dq_status" |
get_final_query_dq_result: Optional[List[Dict[str, str]]]
property
¶
This function return status of the final_query_dq_result
Returns:
Name | Type | Description |
---|---|---|
dict |
Optional[List[Dict[str, str]]]
|
Returns final_query_dq_result which in list of dict with str(key) and str(value) |
get_final_query_dq_run_time: float
property
¶
This function implements time diff for final query dq run
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
time in float |
get_final_query_dq_status: str
property
¶
This function is used to get final query dq data quality status
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _final_query_dq_status" |
get_final_table_name: str
property
¶
Get dq_stats_table_name to which the final stats of the dq job will be written into
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
returns the dq_stats_table_name |
get_input_count: int
property
¶
This function return input count
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
Returns _input_count(int) |
get_mail_from: str
property
¶
This function returns mail id to send email
get_mail_smtp_port: int
property
¶
This functions returns smtp port
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
returns _mail_smtp_server port |
get_mail_smtp_server: str
property
¶
This functions returns smtp server host
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
returns _mail_smtp_server |
get_mail_subject: str
property
¶
This function returns mail subject
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _mail_subject(str) |
get_notification_on_completion: bool
property
¶
This function returns notification on completion
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
Returns _notification_on_completion |
get_notification_on_fail: bool
property
¶
This function returns notification on fail
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
Returns _notification_on_fail |
get_notification_on_start: bool
property
¶
This function returns notification on start
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
Returns _notification_on_start |
get_num_agg_dq_rules: dict
property
¶
This function returns number agg dq rules applied for batch run
Returns:
Name | Type | Description |
---|---|---|
int |
dict
|
number of rules in int |
get_num_dq_rules: int
property
¶
This function returns number dq rules applied for batch run
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
number of rules in int |
get_num_query_dq_rules: dict
property
¶
This function returns number query dq rules applied for batch run
Returns:
Name | Type | Description |
---|---|---|
int |
dict
|
number of rules in int |
get_num_row_dq_rules: int
property
¶
This function returns number row dq rules applied for batch run
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
number of rules in int |
get_output_count: int
property
¶
This function returns output count
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
Returns _output(int) |
get_output_percentage: float
property
¶
This function return output percentage
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
output percentage |
get_query_dq_rule_type_name: str
property
¶
This function is used to get query data quality rule type name
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _query_dq_rule_type_name" |
get_row_dq_rule_type_name: str
property
¶
This function is used to get row data quality rule type name
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _row_dq_rule_type_name" |
get_row_dq_run_time: float
property
¶
This function implements time diff for row dq run
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
time in float |
get_row_dq_status: str
property
¶
This function is used to get row data quality status
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _row_dq_status" |
get_rules_exceeds_threshold: Optional[List[dict]]
property
¶
This function returns error percentage for each rule
get_run_date: str
property
¶
Get run_date for the instance of the spark-expectations class
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
returns the run_date |
get_run_date_name: str
property
¶
This function returns name for the run_date column
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
name of run_date in str |
get_run_date_time_name: str
property
¶
This function returns name for the run_date_time column
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
name of run_date_time in str |
get_run_id: str
property
¶
Get run_id for the instance of spark-expectations class
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
returns the run_id |
get_run_id_name: str
property
¶
This function returns name for the run_id column
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
name of run_id in str |
get_se_streaming_stats_dict: Dict[str, str]
property
¶
This function returns secret keys dict
get_se_streaming_stats_topic_name: str
property
¶
This function returns nsp topic name
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _se_streaming_stats_topic_name |
get_secret_type: Optional[str]
property
¶
This function helps in getting secret type
Returns:
Type | Description |
---|---|
Optional[str]
|
secret type in Optional[str] |
get_server_url_key: Optional[str]
property
¶
This function helps in getting key / path for kafka server url Returns: kafka server url key / path in Optional[str]
get_slack_webhook_url: str
property
¶
This function returns sack webhook url
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _webhook_url(str) |
get_source_agg_dq_result: Optional[List[Dict[str, str]]]
property
¶
This function return status of the source_agg_dq_result
Returns:
Name | Type | Description |
---|---|---|
dict |
Optional[List[Dict[str, str]]]
|
Returns source_agg_dq_result which in list of dict with str(key) and str(value) |
get_source_agg_dq_run_time: float
property
¶
This function implements time diff for source agg dq run
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
time in float |
get_source_agg_dq_status: str
property
¶
This function is used to get source aggregation data quality status
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _source_agg_dq_status" |
get_source_query_dq_result: Optional[List[Dict[str, str]]]
property
¶
This function return status of the source_query_dq_result
Returns:
Name | Type | Description |
---|---|---|
dict |
Optional[List[Dict[str, str]]]
|
Returns source_query_dq_result which in list of dict with str(key) and str(value) |
get_source_query_dq_run_time: float
property
¶
This function implements time diff for source query dq run
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
time in float |
get_source_query_dq_status: str
property
¶
This function is used to get source query data quality status
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _source_query_dq_status" |
get_success_percentage: float
property
¶
This function returns success percentage
Returns:
Name | Type | Description |
---|---|---|
float |
float
|
success percentage |
get_summarised_row_dq_res: Optional[List[Dict[str, str]]]
property
¶
This function returns row dq summarised res
Returns:
Name | Type | Description |
---|---|---|
list |
dict
|
Returns summarised_row_dq_res which in list of dict with str(key) and |
Optional[List[Dict[str, str]]]
|
str(value) of rule meta data |
get_supported_df_query_dq: DataFrame
property
¶
This function returns the place holder dataframe for query check
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
DataFrame
|
returns dataframe for query dq |
get_table_name: str
property
¶
This function returns table name
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _table_name(str) |
get_to_mail: str
property
¶
This function returns list of mail id's
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns _mail_id(str) |
get_token: Optional[str]
property
¶
This function helps in getting key / path for token
Returns:
Type | Description |
---|---|
Optional[str]
|
token key / path in Optional[str] |
get_token_endpoint_url: Optional[str]
property
¶
This function helps in getting key / path for end point url Returns: end point url key / path in Optional[str]
get_topic_name: Optional[str]
property
¶
This function helps in getting key / path for topic name
Returns:
Type | Description |
---|---|
Optional[str]
|
topic name key / path in Optional[str] |
product_id: str
instance-attribute
¶
Functions¶
get_time_diff(start_time: Optional[datetime], end_time: Optional[datetime]) -> float
¶
This function implements time diff
Parameters:
Name | Type | Description | Default |
---|---|---|---|
start_time |
Optional[datetime]
|
|
required |
end_time |
Optional[datetime]
|
|
required |
Source code in spark_expectations/core/context.py
print_dataframe_with_debugger(df: DataFrame) -> None
¶
This function has a debugger that can print out the DataFrame
reset_num_agg_dq_rules() -> None
¶
This function used to reset the_num_agg_dq_rules
Returns:
Type | Description |
---|---|
None
|
None |
Source code in spark_expectations/core/context.py
reset_num_dq_rules() -> None
¶
reset_num_query_dq_rules() -> None
¶
This function used to rest the _num_query_dq_rules
Returns:
Type | Description |
---|---|
None
|
None |
Source code in spark_expectations/core/context.py
reset_num_row_dq_rules() -> None
¶
This function used to reset the _num_row_dq_rules
Returns:
Type | Description |
---|---|
None
|
None |
set_debugger_mode(debugger_mode: bool) -> None
¶
set_dq_end_time() -> None
¶
set_dq_run_status(dq_run_status: str = 'Failed') -> None
¶
set_dq_start_time() -> None
¶
This function sets start time dq computation
Returns:
Type | Description |
---|---|
None
|
None |
set_dq_stats_table_name(dq_stats_table_name: str) -> None
¶
set_enable_mail(enable_mail: bool) -> None
¶
set_enable_slack(enable_slack: bool) -> None
¶
set_end_time_when_dq_job_fails() -> None
¶
function used to set end time when job fails in any one of the stages by using start time
Source code in spark_expectations/core/context.py
set_env(env: Optional[str]) -> None
¶
Parameters:
Name | Type | Description | Default |
---|---|---|---|
env |
Optional[str]
|
which accepts env type |
required |
Returns:
Type | Description |
---|---|
None
|
None |
set_error_count(error_count: int = 0) -> None
¶
set_error_drop_threshold(error_drop_threshold: int) -> None
¶
set_error_table_name(error_table_name: str) -> None
¶
set_final_agg_dq_end_time() -> None
¶
This function sets end time final agg dq computation
Returns:
Type | Description |
---|---|
None
|
None |
set_final_agg_dq_result(final_agg_dq_result: Optional[List[Dict[str, str]]] = None) -> None
¶
set_final_agg_dq_start_time() -> None
¶
This function sets start time final agg dq computation
None
set_final_agg_dq_status(final_agg_dq_status: str = 'Skipped') -> None
¶
set_final_query_dq_end_time() -> None
¶
This function sets end time final query dq computation
Returns:
Type | Description |
---|---|
None
|
None |
set_final_query_dq_result(final_query_dq_result: Optional[List[Dict[str, str]]] = None) -> None
¶
set_final_query_dq_start_time() -> None
¶
This function sets start time final query dq computation
Returns:
Type | Description |
---|---|
None
|
None |
set_final_query_dq_status(final_query_dq_status: str = 'Skipped') -> None
¶
set_final_table_name(final_table_name: str) -> None
¶
set_input_count(input_count: int = 0) -> None
¶
set_mail_from(mail_from: str) -> None
¶
set_mail_smtp_port(mail_smtp_port: int) -> None
¶
set_mail_smtp_server(mail_smtp_server: str) -> None
¶
set_mail_subject(mail_subject: str) -> None
¶
set_notification_on_completion(notification_on_completion: bool) -> None
¶
set_notification_on_fail(notification_on_fail: bool) -> None
¶
set_notification_on_start(notification_on_start: bool) -> None
¶
set_num_agg_dq_rules(source_agg_enabled: bool = False, final_agg_enabled: bool = False) -> None
¶
This function sets number of applied agg dq rules for batch run source_agg_enabled: Marked True when agg rules set for source, by default False final_agg_enabled: Marked True when agg rules set for final, by default False
Returns:
Type | Description |
---|---|
None
|
None |
Source code in spark_expectations/core/context.py
set_num_query_dq_rules(source_query_enabled: bool = False, final_query_enabled: bool = False) -> None
¶
This function sets number of applied query dq rules for batch run source_query_enabled: Marked True when query rules set for source, by default False final_query_enabled: Marked True when query rules set for final, by default False
Returns:
Type | Description |
---|---|
None
|
None |
Source code in spark_expectations/core/context.py
set_num_row_dq_rules() -> None
¶
This function sets number of applied row dq rules for batch run
Returns:
Type | Description |
---|---|
None
|
None |
set_output_count(output_count: int = 0) -> None
¶
set_row_dq_end_time() -> None
¶
This function sets end time row dq computation
Returns:
Type | Description |
---|---|
None
|
None |
set_row_dq_start_time() -> None
¶
This function sets start time row dq computation
Returns:
Type | Description |
---|---|
None
|
None |
set_row_dq_status(row_dq_status: str = 'Skipped') -> None
¶
set_rules_exceeds_threshold(rules: Optional[List[dict]] = None) -> None
¶
This function implements error percentage for each rule type
set_run_date() -> str
staticmethod
¶
This function is used to generate the current datatime in UTC
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Returns the current utc datatime in the format - "%Y-%m-%d %H:%M:%S" |
Source code in spark_expectations/core/context.py
set_se_streaming_stats_dict(se_streaming_stats_dict: Dict[str, str]) -> None
¶
This function helps to set secret keys dict
set_se_streaming_stats_topic_name(se_streaming_stats_topic_name: str) -> None
¶
set_slack_webhook_url(slack_webhook_url: str) -> None
¶
set_source_agg_dq_end_time() -> None
¶
This function sets end time source agg dq computation
Returns:
Type | Description |
---|---|
None
|
None |
set_source_agg_dq_result(source_agg_dq_result: Optional[List[Dict[str, str]]] = None) -> None
¶
set_source_agg_dq_start_time() -> None
¶
This function sets start time source agg dq computation
Returns:
Type | Description |
---|---|
None
|
None |
set_source_agg_dq_status(source_agg_dq_status: str = 'Skipped') -> None
¶
set_source_query_dq_end_time() -> None
¶
This function sets end time source query dq computation
Returns:
Type | Description |
---|---|
None
|
None |
set_source_query_dq_result(source_query_dq_result: Optional[List[Dict[str, str]]] = None) -> None
¶
set_source_query_dq_start_time() -> None
¶
This function sets start time source query dq computation
Returns:
Type | Description |
---|---|
None
|
None |
set_source_query_dq_status(source_query_dq_status: str = 'Skipped') -> None
¶
set_summarised_row_dq_res(summarised_row_dq_res: Optional[List[Dict[str, str]]] = None) -> None
¶
This function implements or supports to set row dq summarised res
Parameters:
Name | Type | Description | Default |
---|---|---|---|
summarised_row_dq_res |
Optional[List[Dict[str, str]]]
|
list(dict) |
None
|