Marshal

Kale’s marshalling system is a small, extensible dispatcher that serializes data flowing between pipeline steps. See Data Passing & Marshalling for the conceptual overview; this page is the API reference.

Dispatcher and base class

Marshal dispatcher and backend base class.

Defines MarshalBackend, the abstract base class for type-aware serializers that Kale uses to pass data between pipeline steps, and Dispatcher, which routes objects to the correct backend at save/load time.

kale.marshal.backend.set_data_dir(path)[source]

Set the data directory where marshalling happens.

kale.marshal.backend.get_data_dir()[source]

Get the data directory where marshalling happens.

class kale.marshal.backend.MarshalBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]

Bases: object

Base class for marshalling Python objects.

This class is supposed to be subclassed by specialized backends that implement the save and load functions to marshal library-specific objects.

A backend registers itself to specific objects/file types using the following class attributes:

  • file_type: The file extension of the files/folders the backend is able

    to restore. NOTE: Currently this can be just one ext.

  • obj_type_regex: A regex which is matched against the type of an

    object.

Take a look at backend.py for some examples on how to create custom marshal backends.

predictor_type: str = None
fallback_on_missing_lib = True
name: str = 'Default backend'
display_name: str = 'generic'
obj_type_regex: str = None
file_type: str = 'dillpkl'
wrapped_save(obj: Any, name: str)[source]

Wrapper around the public save function.

This function provides common logging and exception handling for every class that extends the base MarshalBackend. Dispatcher calls directly this function instead of save.

Returns the path (<data_dir>/<basename>.<backend_extension>) to the saved file.

save(obj: Any, path: str)[source]

Save obj to file.

wrapped_load(name: str) Any[source]

Wrapper around the public load function.

This function provides common logging and exception handling for every class that extends the base MarshalBackend. Dispatcher calls directly this function instead of load.

load(file_path: str) Any[source]

Restore file_path to memory.

kale.marshal.backend.get_dispatcher()[source]

Get the unique instance of dispatcher.

This is preferred since Dispatcher registered all MarshalBackends that are decorated with the register function. We don’t want the registration process to happen all the time.

class kale.marshal.backend.Dispatcher[source]

Bases: object

Dispatch backend classes based on obj types or file extensions.

This class holds a reference to all the marshalling backends that register themselves with the register function. Dispatcher is the main mechanism with which a specialized backend is chosen to either save or load an and object to/from memory.

The public functions that users should be aware of:

  • save: Dispatches to a specialized backend based on the input object

    type, by filtering through the backends’ obj_type_regex attribute.

  • load: Dispatches to a specialized backend based on the input file path

    by filtering through the backends’ file_type attribute.

Users and external code are not supposed to interact directly with the singleton instance of this class. Rather, they should just call the two publicly exposed functions save and load like so:

` from kale.marshal import save, load `

END_USER_EXC_MSG = '\n\nThe error was:\n%s\n\nPlease help us improve Kale by opening a new issue at:\nhttps://github.com/kubeflow-kale/kale/issues.'
backends: dict[str, MarshalBackend]
register(cls: ``type[MarshalBackend]``) ``type[MarshalBackend]``[source]

Register a new marshalling backend.

Parameters:

cls (type[MarshalBackend]) – Marshal backend class

Returns: the class itself

get_backend(obj: Any)[source]

Get the backend registered for the input object type.

get_backends() dict[str, MarshalBackend][source]

Get all registered backends.

get_backend_by_name(name: str)[source]

Get a registered backend by its display name.

save(obj: Any, obj_name: str)[source]

Save an object to file.

Parameters:
  • obj (Any) – Object to be marshalled

  • obj_name (str) – Name of the object to be saved

load(basename: str)[source]

Restore a file to memory.

Parameters:

basename (str) – The name of the serialized object to be loaded

Returns: restored object

get_path(basename: str)[source]

Get a serialized kfp artifact abs path.

Parameters:

basename (str) – The name of the artifact to retrieve its abs path

Returns: the marshalled artifact path

Built-in backends

Built-in marshal backends.

Concrete MarshalBackend implementations for the Python types Kale supports out of the box: numpy arrays, pandas DataFrames, scikit-learn estimators, PyTorch / Keras / TensorFlow models, XGBoost boosters and DMatrices, and plain Python functions.

class kale.marshal.backends.FunctionBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]

Bases: MarshalBackend

Marshal Python functions.

name: str = 'Function backend'
display_name: str = 'function'
file_type: str = 'pyfn'
obj_type_regex: str = 'function'
class kale.marshal.backends.SKLearnBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]

Bases: MarshalBackend

Marshal SKLearn objects.

name: str = 'SKLearn backend'
display_name: str = 'scikit-learn'
file_type: str = 'joblib'
obj_type_regex: str = 'sklearn\\..*'
predictor_type: str = 'sklearn'
fallback_on_missing_lib = False
save(obj, path)[source]

Save a SKLearn object.

load(file_path)[source]

Restore a SKLearn object.

class kale.marshal.backends.NumpyBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]

Bases: MarshalBackend

Marshal Numpy objects functions.

name: str = 'Numpy backend'
display_name: str = 'numpy'
file_type: str = 'npy'
obj_type_regex: str = 'numpy\\..*'
save(obj, path)[source]

Save a Numpy object.

load(file_path)[source]

Restore a Numpy object.

class kale.marshal.backends.PandasBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]

Bases: MarshalBackend

Marshal Pandas objects.

name: str = 'Pandas backend'
display_name: str = 'pandas'
file_type: str = 'pdpkl'
obj_type_regex: str = 'pandas\\..*(DataFrame|Series)'
save(obj, path)[source]

Save a Pandas object.

load(file_path)[source]

Restore a Pandas object.

class kale.marshal.backends.XGBoostModelBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]

Bases: MarshalBackend

Marshal XGBoost Model object.

name: str = 'XGBoost Model backend'
display_name: str = 'xgboost'
file_type: str = 'json'
obj_type_regex: str = 'xgboost\\.core\\.Booster'
predictor_type: str = 'xgboost'
save(obj, path)[source]

Save an XGBoost Model object.

load(file_path)[source]

Restore an XGBoost Model object.

class kale.marshal.backends.XGBoostDMatrixBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]

Bases: MarshalBackend

Marshal XGBoost DMatrix object.

name: str = 'XGBoost DMatrix backend'
display_name: str = 'xgboost-dmatrix'
file_type: str = 'dmatrix'
obj_type_regex: str = 'xgboost\\.core\\.DMatrix'
save(obj, path)[source]

Save an XGBoost DMatrix object.

load(file_path)[source]

Restore an XGBoost DMatrix object.

class kale.marshal.backends.PyTorchBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]

Bases: MarshalBackend

Marshal PyTorch objects.

name: str = 'PyTorch backend'
display_name: str = 'pytorch'
file_type: str = 'pt'
obj_type_regex: str = 'torch\\.nn\\.modules\\.module\\.Module'
save(obj, path)[source]

Save a PyTorch object.

load(file_path)[source]

Restore a PyTorch object.

class kale.marshal.backends.KerasBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]

Bases: MarshalBackend

Marshal Keras objects.

name: str = 'Keras backend'
display_name: str = 'keras'
file_type: str = 'keras'
obj_type_regex: str = 'keras\\..*'
save(obj, path)[source]

Save a Keras object.

load(file_path)[source]

Restore a Keras object.

class kale.marshal.backends.TensorflowKerasBackend(name: str = None, display_name: str = None, obj_type_regex: str = None, file_type: str = None)[source]

Bases: MarshalBackend

Marshal Tensorflow Keras objects.

name: str = 'Tensorflow backend'
display_name: str = 'tensorflow'
file_type: str = 'tfkeras'
obj_type_regex: str = 'tensorflow\\.python\\.keras.*'
predictor_type: str = 'tensorflow'
save(obj, path)[source]

Save a Tensorflow Keras object.

load(file_path)[source]

Restore a Tensorflow Keras object.

Decorator

class kale.marshal.decorator.PipelineParam(param_type: str, param_value: Any)[source]

Bases: NamedTuple

A pipeline parameter.

param_type: str

Alias for field number 0

param_value: Any

Alias for field number 1

kale.marshal.decorator.marshal(ins: list, outs: list, parameters: dict[str, PipelineParam | Any] = None, marshal_dir: str = None, introspect: bool = False)[source]

Decorator that ensures proper marshalling happens when the fn is run.

class kale.marshal.decorator.Marshaller(func, ins: list, outs: list, parameters: dict[str, PipelineParam] = None, marshal_dir=None, introspect=False)[source]

Bases: object

Wrap a function to perform marshalling around its execution.

This class acts as a wrapper around a function that runs in a pipeline step and needs input arguments to be loaded from a marshal directory and its outputs saved likewise.

__call__()[source]

Run the function by passing loaded vars and saving the results.