Spark sql reader
This module contains the SparkSqlReader class which reads the SparkSQL compliant query and returns the dataframe.
koheesio.spark.readers.spark_sql_reader.SparkSqlReader #
SparkSqlReader reads the SparkSQL compliant query and returns the dataframe.
This SQL can originate from a string or a file and may contain placeholder (parameters) for templating. - Placeholders are identified with ${placeholder}. - Placeholders can be passed as explicit params (params) or as implicit params (kwargs).
Example
SQL script (example.sql):
Python code:
from koheesio.spark.readers import SparkSqlReader
reader = SparkSqlReader(
    sql_path="example.sql",
    # params can also be passed as kwargs
    dynamic_column"="name",
    "table_name"="my_table"
)
reader.execute()
In this example, the SQL script is read from a file and the placeholders are replaced with the given params. The resulting SQL query is:
The query is then executed and the resulting DataFrame is stored in the output.df attribute.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
                sql_path
             | 
            
                  str or Path
             | 
            
               Path to a SQL file  | 
            required | 
                sql
             | 
            
                  str
             | 
            
               SQL query to execute  | 
            required | 
                params
             | 
            
                  dict
             | 
            
               Placeholders (parameters) for templating. These are identified with ${placeholder} in the SQL script.  | 
            required | 
Notes
Any arbitrary kwargs passed to the class will be added to params.