Annotating Notebooks¶
Kale turns cell tags into pipeline structure. You can set those tags two ways:
From the Kale JupyterLab extension — point-and-click, recommended for interactive development.
By editing notebook metadata directly — useful for scripting, code review, and when you don’t have JupyterLab running.
Both produce the same tags, so the resulting .ipynb looks identical either
way.
Using the JupyterLab extension¶
After you run make jupyter (or start JupyterLab any other way with the
Kale extension installed), open a notebook and click the Kale icon in the
left sidebar. The Kale panel appears with a master toggle at the top.
When you enable Kale, two things happen:
Every notebook cell grows a Kale metadata row at the top showing its cell type (Imports, Functions, Step, …), step name, and dependencies.
The side panel unlocks pipeline-level settings: pipeline name, experiment name, and description.
Setting a cell’s type¶
Click the cell type dropdown on the cell’s Kale row. You’ll see:
Imports — tag the cell as
imports.Functions — tag the cell as
functions.Pipeline Parameters — tag the cell as
pipeline-parameters.Pipeline Metrics — tag the cell as
pipeline-metrics.Step — tag the cell as a pipeline step. When you pick this, you’ll be prompted for the step name and given a dropdown of possible dependencies (other steps in the notebook).
Skip Cell — tag the cell as
skip.
Setting step dependencies¶
For cells tagged as Step, the panel lets you pick zero or more previous
steps. Each choice becomes a prev:<name> tag. Kale validates the choices
so you can’t create cycles.
Pipeline-level settings¶
The side panel’s pipeline settings form is mapped to fields on
kale.pipeline.PipelineConfig. Changes are saved into the
notebook’s top-level metadata under the kubeflow_notebook key, so they
travel with the notebook and show up on the next open.
Submitting¶
The Compile and Run button invokes the Kale backend from the JupyterLab
extension, which compiles the pipeline with exactly the same code path as
kale --nb ... on the CLI, then uploads and runs it against the configured
KFP host.
Editing metadata by hand¶
Each cell in a notebook is JSON. Kale tags live in metadata.tags as a list
of strings. A minimal step cell looks like this:
{
"cell_type": "code",
"metadata": {
"tags": ["step:load_data"]
},
"source": [
"df = pd.read_csv('data.csv')\n"
]
}
A step with a dependency and a GPU request:
{
"cell_type": "code",
"metadata": {
"tags": [
"step:train",
"prev:load_data",
"limit:nvidia.com/gpu:1",
"image:pytorch/pytorch:2.0-cuda12"
]
},
"source": ["..."]
}
Pipeline-level Kale settings live on the notebook (not on a cell) under the
metadata.kubeflow_notebook key — these are the same fields the side panel
exposes.
Organising a notebook for Kale¶
A notebook that compiles well with Kale usually follows this order:
One
importscell at the top with everyimportstatement.One or more
functionscells below with pure function and class definitions.One
pipeline-parameterscell declaring tunable inputs.A sequence of
stepcells, each doing one logical thing, withprev:tags describing the DAG.Optional
pipeline-metricscells at the end of training steps to surface accuracy / loss / etc. in the KFP UI.skipcells wherever you want exploratory code to live without affecting the pipeline.
See the examples gallery for notebooks that follow this pattern.