Skip to content

Excel

Excel reader for Spark

Note

Ensure the 'excel' extra is installed before using this reader. Default implementation uses openpyxl as the engine for reading Excel files. Other implementations can be used by passing the correct keyword arguments to the reader.

See Also

koheesio.spark.readers.excel.ExcelReader #

Read data from an Excel file

This class is a wrapper around the PandasExcelReader class. It reads an Excel file first using pandas, and then converts the pandas DataFrame to a Spark DataFrame.

Attributes:

Name Type Description
path str

The path to the Excel file

sheet_name str

The name of the sheet to read

header int

The row to use as the column names

execute #

execute() -> Output
Source code in src/koheesio/spark/readers/excel.py
def execute(self) -> Reader.Output:
    pdf: PandasDataFrame = PandasExcelReader.from_step(self).execute().df
    self.output.df = self.spark.createDataFrame(pdf)