Marshal¶
Kale’s marshalling system is a small, extensible dispatcher that serializes data flowing between pipeline steps. See Data Passing & Marshalling for the conceptual overview; this page is the API reference.
Dispatcher and base class¶
Marshal dispatcher and backend base class.
Defines MarshalBackend, the abstract base class for type-aware
serializers that Kale uses to pass data between pipeline steps, and
Dispatcher, which routes objects to the correct backend at
save/load time.
- class kale.marshal.backend.MarshalBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]¶
Bases:
objectBase class for marshalling Python objects.
This class is supposed to be subclassed by specialized backends that implement the save and load functions to marshal library-specific objects.
A backend registers itself to specific objects/file types using the following class attributes:
- file_type: The file extension of the files/folders the backend is able
to restore. NOTE: Currently this can be just one ext.
- obj_type_regex: A regex which is matched against the type of an
object.
Take a look at backend.py for some examples on how to create custom marshal backends.
- fallback_on_missing_lib = True¶
- wrapped_save(obj: Any, name: str)[source]¶
Wrapper around the public save function.
This function provides common logging and exception handling for every class that extends the base MarshalBackend. Dispatcher calls directly this function instead of save.
Returns the path (<data_dir>/<basename>.<backend_extension>) to the saved file.
- kale.marshal.backend.get_dispatcher()[source]¶
Get the unique instance of dispatcher.
This is preferred since Dispatcher registered all MarshalBackends that are decorated with the register function. We don’t want the registration process to happen all the time.
- class kale.marshal.backend.Dispatcher[source]¶
Bases:
objectDispatch backend classes based on obj types or file extensions.
This class holds a reference to all the marshalling backends that register themselves with the register function. Dispatcher is the main mechanism with which a specialized backend is chosen to either save or load an and object to/from memory.
The public functions that users should be aware of:
- save: Dispatches to a specialized backend based on the input object
type, by filtering through the backends’ obj_type_regex attribute.
- load: Dispatches to a specialized backend based on the input file path
by filtering through the backends’ file_type attribute.
Users and external code are not supposed to interact directly with the singleton instance of this class. Rather, they should just call the two publicly exposed functions save and load like so:
` from kale.marshal import save, load `- END_USER_EXC_MSG = '\n\nThe error was:\n%s\n\nPlease help us improve Kale by opening a new issue at:\nhttps://github.com/kubeflow-kale/kale/issues.'¶
- backends: dict[str, MarshalBackend]¶
- register(cls: ``type[MarshalBackend]``) ``type[MarshalBackend]``[source]¶
Register a new marshalling backend.
- Parameters:
cls (
type[MarshalBackend]) – Marshal backend class
Returns: the class itself
- get_backends() dict[str, MarshalBackend][source]¶
Get all registered backends.
Built-in backends¶
Built-in marshal backends.
Concrete MarshalBackend implementations for the
Python types Kale supports out of the box: numpy arrays, pandas DataFrames,
scikit-learn estimators, PyTorch / Keras / TensorFlow models, XGBoost
boosters and DMatrices, and plain Python functions.
- class kale.marshal.backends.FunctionBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]¶
Bases:
MarshalBackendMarshal Python functions.
- class kale.marshal.backends.SKLearnBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]¶
Bases:
MarshalBackendMarshal SKLearn objects.
- fallback_on_missing_lib = False¶
- class kale.marshal.backends.NumpyBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]¶
Bases:
MarshalBackendMarshal Numpy objects functions.
- class kale.marshal.backends.PandasBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]¶
Bases:
MarshalBackendMarshal Pandas objects.
- class kale.marshal.backends.XGBoostModelBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]¶
Bases:
MarshalBackendMarshal XGBoost Model object.
- class kale.marshal.backends.XGBoostDMatrixBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]¶
Bases:
MarshalBackendMarshal XGBoost DMatrix object.
- class kale.marshal.backends.PyTorchBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]¶
Bases:
MarshalBackendMarshal PyTorch objects.
- class kale.marshal.backends.KerasBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]¶
Bases:
MarshalBackendMarshal Keras objects.
Decorator¶
- class kale.marshal.decorator.PipelineParam(param_type: str, param_value: Any)[source]¶
Bases:
NamedTupleA pipeline parameter.
- kale.marshal.decorator.marshal(ins: list, outs: list, parameters: dict[str, PipelineParam | Any] = None, marshal_dir: str = None, introspect: bool = False)[source]¶
Decorator that ensures proper marshalling happens when the fn is run.
- class kale.marshal.decorator.Marshaller(func, ins: list, outs: list, parameters: dict[str, PipelineParam] = None, marshal_dir=None, introspect=False)[source]¶
Bases:
objectWrap a function to perform marshalling around its execution.
This class acts as a wrapper around a function that runs in a pipeline step and needs input arguments to be loaded from a marshal directory and its outputs saved likewise.