Box
Box Module
The module is used to facilitate various interactions with Box service. The implementation is based on the functionalities available in Box Python SDK: https://github.com/box/box-python-sdk
Prerequisites
- Box Application is created in the developer portal using the JWT auth method (Developer Portal - My Apps - Create)
- Application is authorized for the enterprise (Developer Portal - MyApp - Authorization)
koheesio.integrations.box.Box #
Configuration details required for the authentication can be obtained in the Box Developer Portal by generating the Public / Private key pair in "Application Name -> Configuration -> Add and Manage Public Keys".
The downloaded JSON file will look like this:
{
"boxAppSettings": {
"clientID": "client_id",
"clientSecret": "client_secret",
"appAuth": {
"publicKeyID": "public_key_id",
"privateKey": "private_key",
"passphrase": "pass_phrase"
}
},
"enterpriseID": "123456"
}
Examples:
b = Box(
client_id="client_id",
client_secret="client_secret",
enterprise_id="enterprise_id",
jwt_key_id="jwt_key_id",
rsa_private_key_data="rsa_private_key_data",
rsa_private_key_passphrase="rsa_private_key_passphrase",
)
b.client
Source code in src/koheesio/integrations/box.py
auth_options
property
#
Get a dictionary of authentication options, that can be handily used in the child classes
client_id
class-attribute
instance-attribute
#
client_id: Union[SecretStr, SecretBytes] = Field(
default=...,
alias="clientID",
description="Client ID from the Box Developer console.",
)
client_secret
class-attribute
instance-attribute
#
client_secret: Union[SecretStr, SecretBytes] = Field(
default=...,
alias="clientSecret",
description="Client Secret from the Box Developer console.",
)
enterprise_id
class-attribute
instance-attribute
#
enterprise_id: Union[SecretStr, SecretBytes] = Field(
default=...,
alias="enterpriseID",
description="Enterprise ID from the Box Developer console.",
)
jwt_key_id
class-attribute
instance-attribute
#
jwt_key_id: Union[SecretStr, SecretBytes] = Field(
default=...,
alias="publicKeyID",
description="PublicKeyID for the public/private generated key pair.",
)
rsa_private_key_data
class-attribute
instance-attribute
#
rsa_private_key_data: Union[SecretStr, SecretBytes] = Field(
default=...,
alias="privateKey",
description="Private key generated in the app management console.",
)
rsa_private_key_passphrase
class-attribute
instance-attribute
#
rsa_private_key_passphrase: Union[
SecretStr, SecretBytes
] = Field(
default=...,
alias="passphrase",
description="Private key passphrase generated in the app management console.",
)
execute #
koheesio.integrations.box.BoxCsvFileReader #
Class facilitates reading one or multiple CSV files with the same structure directly from Box and producing Spark Dataframe.
Notes
To manually identify the ID of the file in Box, open the file through Web UI, and copy ID from the page URL, e.g. https://foo.ent.box.com/file/1234567890 , where 1234567890 is the ID.
Examples:
from koheesio.steps.integrations.box import BoxCsvFileReader
from pyspark.sql.types import StructType
schema = StructType(...)
b = BoxCsvFileReader(
client_id="",
client_secret="",
enterprise_id="",
jwt_key_id="",
rsa_private_key_data="",
rsa_private_key_passphrase="",
file=["1", "2"],
schema=schema,
).execute()
b.df.show()
Source code in src/koheesio/integrations/box.py
file
class-attribute
instance-attribute
#
file: Union[str, list[str]] = Field(
default=...,
description="ID or list of IDs for the files to read.",
)
execute #
Loop through the list of provided file identifiers and load data into dataframe. For traceability purposes the following columns will be added to the dataframe: * meta_file_id: the identifier of the file on Box * meta_file_name: name of the file
Returns:
Type | Description |
---|---|
DataFrame
|
|
Source code in src/koheesio/integrations/box.py
koheesio.integrations.box.BoxCsvPathReader #
Read all CSV files from the specified path into the dataframe. Files can be filtered using the regular expression in the 'filter' parameter. The default behavior is to read all CSV / TXT files from the specified path.
Notes
The class does not contain archival capability as it is presumed that the user wants to make sure that the full pipeline is successful (for example, the source data was transformed and saved) prior to moving the source files. Use BoxToBoxFileMove class instead and provide the list of IDs from 'file_id' output.
Examples:
from koheesio.steps.integrations.box import BoxCsvPathReader
auth_params = {...}
b = BoxCsvPathReader(**auth_params, path="foo/bar/").execute()
b.df # Spark Dataframe
... # do something with the dataframe
from koheesio.steps.integrations.box import BoxToBoxFileMove
bm = BoxToBoxFileMove(**auth_params, file=b.file_id, path="/foo/bar/archive")
Source code in src/koheesio/integrations/box.py
filter
class-attribute
instance-attribute
#
filter: Optional[str] = Field(
default=".csv|.txt$",
description="[Optional] Regexp to filter folder contents",
)
execute #
Identify the list of files from the source Box path that match desired filter and load them into Dataframe
Source code in src/koheesio/integrations/box.py
koheesio.integrations.box.BoxFileBase #
Generic class to facilitate interactions with Box folders.
Box SDK provides File class that has various properties and methods to interact with Box files. The object can
be obtained in multiple ways:
* provide Box file identified to file
parameter (the identifier can be obtained, for example, from URL)
* provide existing object to file
parameter (boxsdk.object.file.File)
Notes
Refer to BoxFolderBase for mor info about folder
and path
parameters
See Also
boxsdk.object.file.File
Source code in src/koheesio/integrations/box.py
files
class-attribute
instance-attribute
#
files: conlist(Union[File, str], min_length=1) = Field(
default=...,
alias="file",
description="List of Box file objects or identifiers",
)
folder
class-attribute
instance-attribute
#
folder: Optional[Union[Folder, str]] = Field(
default=None,
description="Existing folder object or folder identifier",
)
path
class-attribute
instance-attribute
#
path: Optional[str] = Field(
default=None,
description="Path to the Box folder, for example: `folder/sub-folder/lz",
)
action #
execute #
Generic execute method for all BoxToBox interactions. Deals with getting the correct folder and file objects from various parameter inputs
Source code in src/koheesio/integrations/box.py
koheesio.integrations.box.BoxFileWriter #
Write file or a file-like object to Box.
Examples:
from koheesio.steps.integrations.box import BoxFileWriter
auth_params = {...}
f1 = BoxFileWriter(
**auth_params, path="/foo/bar", file="path/to/my/file.ext"
).execute()
# or
import io
b = io.BytesIO(b"my-sample-data")
f2 = BoxFileWriter(
**auth_params, path="/foo/bar", file=b, name="file.ext"
).execute()
Source code in src/koheesio/integrations/box.py
description
class-attribute
instance-attribute
#
description: Optional[str] = Field(
None,
description="Optional description to add to the file in Box",
)
file
class-attribute
instance-attribute
#
file_name
class-attribute
instance-attribute
#
file_name: Optional[str] = Field(
default=None,
description="When file path or name is provided to 'file' parameter, this will override the original name.When binary stream is provided, the 'name' should be used to set the desired name for the Box file.",
)
Output #
action #
Source code in src/koheesio/integrations/box.py
validate_name_for_binary_data #
Validate 'file_name' parameter when providing a binary input for 'file'.
Source code in src/koheesio/integrations/box.py
koheesio.integrations.box.BoxFolderBase #
Generic class to facilitate interactions with Box folders.
Box SDK provides Folder class that has various properties and methods to interact with Box folders. The object can
be obtained in multiple ways:
* provide Box folder identified to folder
parameter (the identifier can be obtained, for example, from URL)
* provide existing object to folder
parameter (boxsdk.object.folder.Folder)
* provide filesystem-like path to path
parameter
See Also
boxsdk.object.folder.Folder
Source code in src/koheesio/integrations/box.py
folder
class-attribute
instance-attribute
#
folder: Optional[Union[Folder, str]] = Field(
default=None,
description="Existing folder object or folder identifier",
)
path
class-attribute
instance-attribute
#
path: Optional[str] = Field(
default=None,
description="Path to the Box folder, for example: `folder/sub-folder/lz",
)
root
class-attribute
instance-attribute
#
root: Optional[Union[Folder, str]] = Field(
default="0",
description="Folder object or identifier of the folder that should be used as root",
)
Output #
action #
Placeholder for 'action' method, that should be implemented in the child classes
Returns:
Type | Description |
---|---|
Folder or None
|
|
validate_folder_or_path #
Validations for 'folder' and 'path' parameter usage
Source code in src/koheesio/integrations/box.py
koheesio.integrations.box.BoxFolderCreate #
Explicitly create the new Box folder object and parent directories.
Examples:
from koheesio.steps.integrations.box import BoxFolderCreate
auth_params = {...}
folder = BoxFolderCreate(**auth_params, path="/foo/bar").execute()
Source code in src/koheesio/integrations/box.py
koheesio.integrations.box.BoxFolderDelete #
Delete existing Box folder based on object, identifier or path.
Examples:
from koheesio.steps.integrations.box import BoxFolderDelete
auth_params = {...}
BoxFolderDelete(**auth_params, path="/foo/bar").execute()
# or
BoxFolderDelete(**auth_params, folder="1").execute()
# or
folder = BoxFolderGet(**auth_params, path="/foo/bar").execute().folder
BoxFolderDelete(**auth_params, folder=folder).execute()
Source code in src/koheesio/integrations/box.py
action #
Delete folder action
Returns:
Type | Description |
---|---|
None
|
|
Source code in src/koheesio/integrations/box.py
koheesio.integrations.box.BoxFolderGet #
Get the Box folder object for an existing folder or create a new folder and parent directories.
Examples:
from koheesio.steps.integrations.box import BoxFolderGet
auth_params = {...}
folder = BoxFolderGet(**auth_params, path="/foo/bar").execute().folder
# or
folder = BoxFolderGet(**auth_params, path="1").execute().folder
Source code in src/koheesio/integrations/box.py
create_sub_folders
class-attribute
instance-attribute
#
create_sub_folders: Optional[bool] = Field(
False,
description="Create sub-folders recursively if the path does not exist.",
)
action #
Get folder action
Returns:
Name | Type | Description |
---|---|---|
folder |
Folder
|
Box Folder object as specified in Box SDK |
Source code in src/koheesio/integrations/box.py
koheesio.integrations.box.BoxFolderNotFoundError #
Error when a provided box path does not exist.
koheesio.integrations.box.BoxPathIsEmptyError #
Exception when provided Box path is empty or no files matched the mask.
koheesio.integrations.box.BoxReaderBase #
Base class for Box readers.
Source code in src/koheesio/integrations/box.py
params
class-attribute
instance-attribute
#
params: Optional[Dict[str, Any]] = Field(
default_factory=dict,
description="[Optional] Set of extra parameters that should be passed to the Spark reader.",
)
schema_
class-attribute
instance-attribute
#
schema_: Optional[StructType] = Field(
None,
alias="schema",
description="[Optional] Schema that will be applied during the creation of Spark DataFrame",
)
Output #
koheesio.integrations.box.BoxToBoxFileCopy #
Copy one or multiple files to the target Box path.
Examples:
from koheesio.steps.integrations.box import BoxToBoxFileCopy
auth_params = {...}
BoxToBoxFileCopy(**auth_params, file=["1", "2"], path="/foo/bar").execute()
# or
BoxToBoxFileCopy(**auth_params, file=["1", "2"], folder="1").execute()
# or
folder = BoxFolderGet(**auth_params, path="/foo/bar").execute().folder
BoxToBoxFileCopy(**auth_params, file=[File(), File()], folder=folder).execute()
Source code in src/koheesio/integrations/box.py
action #
Copy file to the desired destination and extend file description with the processing info
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file |
File
|
File object as specified in Box SDK |
required |
folder |
Folder
|
Folder object as specified in Box SDK |
required |
Source code in src/koheesio/integrations/box.py
koheesio.integrations.box.BoxToBoxFileMove #
Move one or multiple files to the target Box path
Examples:
from koheesio.steps.integrations.box import BoxToBoxFileMove
auth_params = {...}
BoxToBoxFileMove(**auth_params, file=["1", "2"], path="/foo/bar").execute()
# or
BoxToBoxFileMove(**auth_params, file=["1", "2"], folder="1").execute()
# or
folder = BoxFolderGet(**auth_params, path="/foo/bar").execute().folder
BoxToBoxFileMove(**auth_params, file=[File(), File()], folder=folder).execute()
Source code in src/koheesio/integrations/box.py
action #
Move file to the desired destination and extend file description with the processing info
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file |
File
|
File object as specified in Box SDK |
required |
folder |
Folder
|
Folder object as specified in Box SDK |
required |