Models
Models package creates models that can be used to base other classes on.
- Every model should be at least a pydantic BaseModel, but can also be a Step, or a StepOutput.
- Every model is expected to be an ABC (Abstract Base Class)
- Optionally a model can inherit ExtraParamsMixin that provides unpacking of kwargs into
extra_params
dict property removing need to create a dict before passing kwargs to a model initializer.
A Model class can be exceptionally handy when you need similar Pydantic models in multiple places, for example across Transformation and Reader classes.
koheesio.models.ListOfColumns
module-attribute
#
Annotated type for a list of column names. Will ensure that there are no duplicate columns, empty strings, etc. In case an individual column is passed, the value will be coerced to a list.
koheesio.models.BaseModel #
Base model for all models.
Extends pydantic BaseModel with some additional configuration. To be used as a base class for all models in Koheesio instead of pydantic.BaseModel.
Additional methods and properties:
Fields#
Every Koheesio BaseModel has two predefined fields: name
and description
. These fields are used to provide a
name and a description to the model.
-
name
: This is the name of the Model. If not provided, it defaults to the class name. -
description
: This is the description of the Model. It has several default behaviors:- If not provided, it defaults to the docstring of the class.
- If the docstring is not provided, it defaults to the name of the class.
- For multi-line descriptions, it has the following behaviors:
- Only the first non-empty line is used.
- Empty lines are removed.
- Only the first 3 lines are considered.
- Only the first 120 characters are considered.
Validators#
_set_name_and_description
: Set the name and description of the Model as per the rules mentioned above.
Properties#
log
: Returns a logger with the name of the class.
Class Methods#
from_basemodel
: Returns a new BaseModel instance based on the data of another BaseModel.from_context
: Creates BaseModel instance from a given Context.from_dict
: Creates BaseModel instance from a given dictionary.from_json
: Creates BaseModel instance from a given JSON string.from_toml
: Creates BaseModel object from a given toml file.from_yaml
: Creates BaseModel object from a given yaml file.lazy
: Constructs the model without doing validation.
Dunder Methods#
__add__
: Allows to add two BaseModel instances together.__enter__
: Allows for using the model in a with-statement.__exit__
: Allows for using the model in a with-statement.__setitem__
: Set Item dunder method for BaseModel.__getitem__
: Get Item dunder method for BaseModel.
Instance Methods#
hasattr
: Check if given key is present in the model.get
: Get an attribute of the model, but don't fail if not present.merge
: Merge key,value map with self.set
: Allows for subscribing / assigning toclass[key]
.to_context
: Converts the BaseModel instance to a Context object.to_dict
: Converts the BaseModel instance to a dictionary.to_json
: Converts the BaseModel instance to a JSON string.to_yaml
: Converts the BaseModel instance to a YAML string.
Different Modes
This BaseModel class supports lazy mode. This means that validation of the items stored in the class can be called at will instead of being forced to run it upfront.
-
Normal mode: you need to know the values ahead of time
-
Lazy mode: being able to defer the validation until later
The prime advantage of using lazy mode is that you don't have to know all your outputs up front, and can add them as they become available. All while still being able to validate that you have collected all your output at the end. -
With statements: With statements are also allowed. The
Note: that a lazy mode BaseModel object is required to work with a with-statement.validate_output
method from the earlier example will run upon exit of the with-statement.
Examples:
from koheesio.models import BaseModel
class Person(BaseModel):
name: str
age: int
# Using the lazy method to create an instance without immediate validation
person = Person.lazy()
# Setting attributes
person.name = "John Doe"
person.age = 30
# Now we validate the instance
person.validate_output()
print(person)
In this example, the Person instance is created without immediate validation. The attributes name and age are set
afterward. The validate_output
method is then called to validate the instance.
Koheesio specific configuration:
Koheesio models are configured differently from Pydantic defaults. The configuration looks like this:
-
extra="allow"
This setting allows for extra fields that are not specified in the model definition. If a field is present in the data but not in the model, it will not raise an error. Pydantic default is "ignore", which means that extra attributes are ignored.
-
arbitrary_types_allowed=True
This setting allows for fields in the model to be of any type. This is useful when you want to include fields in your model that are not standard Python types. Pydantic default is False, which means that fields must be of a standard Python type.
-
populate_by_name=True
This setting allows an aliased field to be populated by its name as given by the model attribute, as well as the alias. This was known as allow_population_by_field_name in pydantic v1. Pydantic default is False, which means that fields can only be populated by their alias.
-
validate_assignment=False
This setting determines whether the model should be revalidated when the data is changed. If set to
True
, every time a field is assigned a new value, the entire model is validated again.Pydantic default is (also)
False
, which means that the model is not revalidated when the data is changed. By default, Pydantic validates the data when creating the model. If the user changes the data after creating the model, it does not revalidate the model. -
revalidate_instances="subclass-instances"
This setting determines whether to revalidate models during validation if the instance is a subclass of the model. This is important as inheritance is used a lot in Koheesio. Pydantic default is
never
, which means that the model and dataclass instances are not revalidated during validation. -
validate_default=True
This setting determines whether to validate default values during validation. When set to True, default values are checked during the validation process. We opt to set this to True, as we are attempting to make the sure that the data is valid prior to running / executing any Step. Pydantic default is False, which means that default values are not validated during validation.
-
frozen=False
This setting determines whether the model is immutable. If set to True, once a model is created, its fields cannot be changed. Pydantic default is also False, which means that the model is mutable.
-
coerce_numbers_to_str=True
This setting determines whether to convert number fields to strings. When set to True, enables automatic coercion of any
Number
type tostr
. Pydantic doesn't allow number types (int
,float
,Decimal
) to be coerced as typestr
by default. -
use_enum_values=True
This setting determines whether to use the values of Enum fields. If set to True, the actual value of the Enum is used instead of the reference. Pydantic default is False, which means that the reference to the Enum is used.
description
class-attribute
instance-attribute
#
model_config
class-attribute
instance-attribute
#
model_config = ConfigDict(
extra="allow",
arbitrary_types_allowed=True,
populate_by_name=True,
validate_assignment=False,
revalidate_instances="subclass-instances",
validate_default=True,
frozen=False,
coerce_numbers_to_str=True,
use_enum_values=True,
)
name
class-attribute
instance-attribute
#
from_basemodel
classmethod
#
Returns a new BaseModel instance based on the data of another BaseModel
Source code in src/koheesio/models/__init__.py
from_context
classmethod
#
Creates BaseModel instance from a given Context
You have to make sure that the Context object has the necessary attributes to create the model.
Examples:
class SomeStep(BaseModel):
foo: str
context = Context(foo="bar")
some_step = SomeStep.from_context(context)
print(some_step.foo) # prints 'bar'
Parameters:
Name | Type | Description | Default |
---|---|---|---|
context
|
Context
|
|
required |
Returns:
Type | Description |
---|---|
BaseModel
|
|
Source code in src/koheesio/models/__init__.py
from_dict
classmethod
#
from_json
classmethod
#
Creates BaseModel instance from a given JSON string
BaseModel offloads the serialization and deserialization of the JSON string to Context class. Context uses jsonpickle library to serialize and deserialize the JSON string. This is done to allow for objects to be stored in the BaseModel object, which is not possible with the standard json library.
See Also
Context.from_json : Deserializes a JSON string to a Context object
Parameters:
Name | Type | Description | Default |
---|---|---|---|
json_file_or_str
|
Union[str, Path]
|
Pathlike string or Path that points to the json file or string containing json |
required |
Returns:
Type | Description |
---|---|
BaseModel
|
|
Source code in src/koheesio/models/__init__.py
from_toml
classmethod
#
Creates BaseModel object from a given toml file
Note: BaseModel offloads the serialization and deserialization of the TOML string to Context class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
toml_file_or_str
|
Union[str, Path]
|
Pathlike string or Path that points to the toml file, or string containing toml |
required |
Returns:
Type | Description |
---|---|
BaseModel
|
|
Source code in src/koheesio/models/__init__.py
from_yaml
classmethod
#
Creates BaseModel object from a given yaml file
Note: BaseModel offloads the serialization and deserialization of the YAML string to Context class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
yaml_file_or_str
|
str
|
Pathlike string or Path that points to the yaml file, or string containing yaml |
required |
Returns:
Type | Description |
---|---|
BaseModel
|
|
Source code in src/koheesio/models/__init__.py
get #
Get an attribute of the model, but don't fail if not present
Similar to dict.get()
Examples:
step_output = StepOutput(foo="bar")
step_output.get("foo") # returns 'bar'
step_output.get("non_existent_key", "oops") # returns 'oops'
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key
|
str
|
name of the key to get |
required |
default
|
Optional[Any]
|
Default value in case the attribute does not exist |
None
|
Returns:
Type | Description |
---|---|
Any
|
The value of the attribute |
Source code in src/koheesio/models/__init__.py
hasattr #
lazy
classmethod
#
Constructs the model without doing validation
Essentially an alias to BaseModel.construct()
merge #
Merge key,value map with self
Functionally similar to adding two dicts together; like running {**dict_a, **dict_b}
.
Examples:
step_output = StepOutput(foo="bar")
step_output.merge({"lorem": "ipsum"}) # step_output will now contain {'foo': 'bar', 'lorem': 'ipsum'}
Parameters:
Name | Type | Description | Default |
---|---|---|---|
other
|
Union[Dict, BaseModel]
|
Dict or another instance of a BaseModel class that will be added to self |
required |
Source code in src/koheesio/models/__init__.py
set #
Allows for subscribing / assigning to class[key]
.
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key
|
str
|
The key of the attribute to assign to |
required |
value
|
Any
|
Value that should be assigned to the given key |
required |
Source code in src/koheesio/models/__init__.py
to_context #
to_context() -> Context
to_json #
Converts the BaseModel instance to a JSON string
BaseModel offloads the serialization and deserialization of the JSON string to Context class. Context uses jsonpickle library to serialize and deserialize the JSON string. This is done to allow for objects to be stored in the BaseModel object, which is not possible with the standard json library.
See Also
Context.to_json : Serializes a Context object to a JSON string
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pretty
|
bool
|
Toggles whether to return a pretty json string or not |
False
|
Returns:
Type | Description |
---|---|
str
|
containing all parameters of the BaseModel instance |
Source code in src/koheesio/models/__init__.py
to_yaml #
Converts the BaseModel instance to a YAML string
BaseModel offloads the serialization and deserialization of the YAML string to Context class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
clean
|
bool
|
Toggles whether to remove |
False
|
Returns:
Type | Description |
---|---|
str
|
containing all parameters of the BaseModel instance |
Source code in src/koheesio/models/__init__.py
validate #
validate() -> BaseModel
Validate the BaseModel instance
This method is used to validate the BaseModel instance. It is used in conjunction with the lazy method to validate the instance after all the attributes have been set.
This method is intended to be used with the lazy
method. The lazy
method is used to create an instance of
the BaseModel without immediate validation. The validate
method is then used to validate the instance after.
Note: in the Pydantic BaseModel, the
validate
method throws a deprecated warning. This is because Pydantic recommends using thevalidate_model
method instead. However, we are using thevalidate
method here in a different context and a slightly different way.
Examples:
class FooModel(BaseModel):
foo: str
lorem: str
foo_model = FooModel.lazy()
foo_model.foo = "bar"
foo_model.lorem = "ipsum"
foo_model.validate()
foo_model
instance is created without immediate validation. The attributes foo and lorem
are set afterward. The validate
method is then called to validate the instance.
Returns:
Type | Description |
---|---|
BaseModel
|
The BaseModel instance |
Source code in src/koheesio/models/__init__.py
koheesio.models.ExtraParamsMixin #
Mixin class that adds support for arbitrary keyword arguments to Pydantic models.
The keyword arguments are extracted from the model's values
and moved to a params
dictionary.