Classes¶
spark_expectations.core.expectations.SparkExpectations(product_id: str, rules_df: DataFrame, stats_table: str, target_and_error_table_writer: Union[WrappedDataFrameWriter, WrappedDataFrameStreamWriter], stats_table_writer: WrappedDataFrameWriter, debugger: bool = False, stats_streaming_options: Optional[Dict[str, Union[str, bool]]] = None)
dataclass
¶
This class implements/supports running the data quality rules on a dataframe returned by a function
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
product_id
|
str
|
Name of the product |
required |
rules_df
|
DataFrame
|
DataFrame which contains the rules. User is responsible for reading the rules_table in which ever system it is |
required |
stats_table
|
str
|
Name of the table where the stats/audit-info need to be written |
required |
debugger
|
bool
|
Mark it as "True" if the debugger mode need to be enabled, by default is False |
False
|
stats_streaming_options
|
Optional[Dict[str, Union[str, bool]]]
|
Provide options to override the defaults, while writing into the stats streaming table |
None
|
Attributes¶
debugger: bool = False
class-attribute
instance-attribute
¶
product_id: str
instance-attribute
¶
rules_df: DataFrame
instance-attribute
¶
stats_streaming_options: Optional[Dict[str, Union[str, bool]]] = None
class-attribute
instance-attribute
¶
stats_table: str
instance-attribute
¶
stats_table_writer: WrappedDataFrameWriter
instance-attribute
¶
target_and_error_table_writer: Union[WrappedDataFrameWriter, WrappedDataFrameStreamWriter]
instance-attribute
¶
Functions¶
with_expectations(target_table: str, write_to_table: bool = False, write_to_temp_table: bool = False, user_conf: Optional[Dict[str, Union[str, int, bool, Dict[str, str]]]] = None, target_table_view: Optional[str] = None, target_and_error_table_writer: Optional[Union[WrappedDataFrameWriter, WrappedDataFrameStreamWriter]] = None) -> Any
¶
This decorator helps to wrap a function which returns dataframe and apply dataframe rules on it
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
target_table
|
str
|
Name of the table where the final dataframe need to be written |
required |
write_to_table
|
bool
|
Mark it as "True" if the dataframe need to be written as table |
False
|
write_to_temp_table
|
bool
|
Mark it as "True" if the input dataframe need to be written to the temp table to break the spark plan |
False
|
user_conf
|
Optional[Dict[str, Union[str, int, bool, Dict[str, str]]]]
|
Provide options to override the defaults, while writing into the stats streaming table |
None
|
target_table_view
|
Optional[str]
|
This view is created after the _row_dq process to run the target agg_dq and query_dq. If value is not provided, defaulted to {target_table}_view |
None
|
target_and_error_table_writer
|
Optional[Union[WrappedDataFrameWriter, WrappedDataFrameStreamWriter]]
|
Provide the writer to write the target and error table, this will take precedence over the class level writer |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Any |
Any
|
Returns a function which applied the expectations on dataset |
Source code in spark_expectations/core/expectations.py
893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 | |