Troubleshooting¶
Most Kale problems fall into one of a few categories. This page covers the
common ones. For anything not listed here, open an issue on
GitHub with the notebook and the
generated .kale/<name>.kale.py.
ModuleNotFoundError in a pipeline step¶
You’ll see this most often on your first pipeline. A step fails with:
ModuleNotFoundError: No module named 'seaborn'
even though you can clearly import seaborn in your notebook.
Cause. Kale does not rebuild a Docker image for your pipeline. It
uses the base image configured in pipeline metadata (or overridden with
--docker_image), and it installs extra Python packages at step startup
via packages_to_install. That list is built from the imports cell only
— if you import a package in a step or functions cell, Kale won’t add
it to the install list and the step will fail.
Fix.
Move every
importstatement into animportscell.For packages the AST can’t resolve to a pip name, add them to the pipeline’s base docker image (or build a custom one).
See Cell Types & Annotations for details on the imports rule.
Pickle / marshal errors¶
TypeError: cannot pickle '_thread.RLock' object
or
AttributeError: Can't pickle local object 'train.<locals>.inner'
Cause. A variable flowing between steps can’t be serialized. Kale’s
marshalling system handles many ML-framework types natively (numpy, pandas,
sklearn, PyTorch, Keras, TF, XGBoost), but falls back to dill for
anything unknown. dill can serialize a lot, but not:
Open file handles, sockets, database connections.
Objects holding native thread locks or OS resources.
Closures that capture local variables.
Fix.
Don’t pass the offending object between steps. Recreate it from scratch at the start of each step that needs it, using a helper in a
functionscell — see Data Passing: Non-serializable objects.If the object is a model from an ML library Kale doesn’t yet support, consider contributing a new
MarshalBackend— see Extending the dispatcher.
Compiler errors from KFP¶
Internal compiler error: Compiler has produced Argo-incompatible workflow.
Kale’s compile step hands the generated DSL to the KFP SDK’s compiler,
which runs argo lint if argo is on your PATH. An outdated argo
binary can produce false positives here.
Fix. Remove argo from your PATH or upgrade it to the version
recommended by your KFP installation. Kale itself does not require argo.
Aliasing: a variable’s value doesn’t seem to propagate¶
# step:A
model1 = model2 = SomeModel()
# step:B (prev: A)
model2.add_layer(SomeLayer())
# step:C (prev: B)
print(model1) # still the old model
Cause. Kale saves model1 and model2 separately in step A, so
mutations to model2 in step B don’t affect what C loads as model1.
Fix. Avoid aliasing across steps. Use one name for each object. See Data Passing: Aliasing.
Global state mutated inside a step is not visible to other steps¶
Each step runs in its own container, so library configuration like
warnings.simplefilter, logging.basicConfig, or
plt.rcParams[...] does not persist across steps.
Fix. Put all global configuration in the imports or functions
cell, so every step applies it at startup.
kale command not found after installing¶
If you installed Kale inside a uv or virtualenv, make sure you’re running
kale from an activated environment, or use uv run kale --nb .... The
kale script is registered as a [project.scripts] entry in
pyproject.toml, so it only shows up inside the environment where the
package was installed.
Still stuck?¶
Check the generated
.kale/<name>.kale.py— it often reveals what Kale thinks the pipeline looks like.Run with
--debugfor verbose output.File an issue on GitHub with a minimal reproduction notebook.