Hooks¶
Kedro framework hooks for tracking pipeline execution status. The ExecutionStatusHook reports
node-level progress (pending, running, completed, failed) via callbacks, enabling real-time
UI updates during pipeline runs.
hooks
¶
Kedro execution hooks for pipeline status tracking.
Provides :class:ExecutionStatusHook which reports node-level execution status
(pending, running, completed, failed) via callbacks, enabling real-time UI
updates during pipeline runs.
DataInjectionHook
¶
Inject in-memory DataFrames into the catalog before pipeline execution.
Registered when inputs use format="MEMORY" (Toolkit path). The hook
fills MemoryDataset catalog entries with actual data before any
node runs. Uses the official before_pipeline_run hook point.
Source code in src/choregraph/hooks.py
DtypeInferenceHook
¶
Run :func:infer_dtypes on every DataFrame loaded from the catalog.
This ensures that object-typed columns carrying numeric/date strings are
converted to their proper pandas dtype before they enter pipeline nodes.
Without this, pd.concat (union) on DataFrames with inconsistent dtypes
produces mixed-type object columns that pyarrow cannot serialize to Parquet.
ExecutionStatusHook
¶
Kedro hook to track execution status of nodes and trigger a callback on updates. Status can be: 'pending', 'running', 'completed', 'failed'.
Initialize the execution status hook.
| PARAMETER | DESCRIPTION |
|---|---|
on_update
|
Callback invoked with the full status dict whenever a node's status changes.
TYPE:
|
excluded_nodes
|
Set of node names to skip tracking for.
TYPE:
|
Source code in src/choregraph/hooks.py
before_pipeline_run
¶
Initialize all nodes to pending.
Source code in src/choregraph/hooks.py
before_node_run
¶
Mark node as running.
Source code in src/choregraph/hooks.py
after_node_run
¶
Mark node as completed.
Source code in src/choregraph/hooks.py
on_node_error
¶
Mark node as failed.
Source code in src/choregraph/hooks.py
MetadataStatsHook
¶
Kedro hook to capture dataset statistics during pipeline execution.
- after_node_run: Captures stats for inputs and outputs while DataFrames are in memory
- after_pipeline_run: Saves all collected stats to catalogue_stats.json
This avoids expensive reloading of datasets just for metadata extraction.
Source code in src/choregraph/hooks.py
after_node_run
¶
Capture stats for inputs and outputs while data is in memory.
Type and visibility are derived from the spec, not stored in the cache. We store stats for: - All inputs (regardless of visibility) - Only visible outputs (visibility=True in spec)
Source code in src/choregraph/hooks.py
after_pipeline_run
¶
Ensure all inputs are processed, even if not used in the pipeline run.