Choregraph (Facade)¶
The main entry point for pipeline lifecycle management. Orchestrates XML spec parsing, Kedro project generation, pipeline execution, data caching, and DIVE VisuSpec export.
This is the only class exported from the choregraph package — all interaction starts here.
choregraph
¶
Choregraph facade -- the main entry point for pipeline lifecycle management.
This module provides the :class:Choregraph class which orchestrates XML spec
parsing, Kedro project generation, pipeline execution, data caching, and DIVE
VisuSpec export. It delegates to the parser, builder, wrapper, and connectors
modules internally.
Choregraph
¶
Main facade for Choregraph pipeline lifecycle management.
Orchestrates XML spec parsing, Kedro project generation, pipeline execution,
data caching, and DIVE VisuSpec export. Supports both programmatic pipeline
construction (via :meth:add_input / :meth:add_node) and loading from XML.
Can be used as a context manager::
with Choregraph(xml_spec="pipeline.xml") as cg:
cg.run()
df = cg.get_dataset("my_output")
Source code in src/choregraph/choregraph.py
get_xsd
¶
Get the XSD content as a string (bundled with the package).
run
¶
Execute the pipeline using a Kedro session.
Generates Kedro project files, dumps external inputs to disk, and runs
the pipeline via SequentialRunner. Supports lazy evaluation — if the
spec and input files haven't changed, cached results are returned.
| PARAMETER | DESCRIPTION |
|---|---|
lazy
|
If True, skip execution when the spec hash is unchanged.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
A tuple |
str
|
when the pipeline executed (or was skipped) without error, and |
Tuple[bool, str]
|
error_message contains the failure description otherwise. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If |
Source code in src/choregraph/choregraph.py
202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 | |
get_dataset
¶
Load a dataset by ID.
For PartitionedDataset entries (temporal collections, etc.):
- time=None: loads the first partition (representative).
- time=N: loads the Nth partition.
Returns whatever the underlying dataset produces (DataFrame, Image, dict, etc.).
Source code in src/choregraph/choregraph.py
list_data
¶
List all available datasets including dynamically generated multi-table outputs.
Source code in src/choregraph/choregraph.py
get_id_for_name
¶
Reverse lookup: get ID for a dataset name.
get_datasets_metadata
¶
Get full datasets metadata from catalogue_stats.json.
Delegates to :meth:MetadataResult.to_api_format.
Source code in src/choregraph/choregraph.py
update_from_spec
¶
Replace the current pipeline specification by parsing new XML.
| PARAMETER | DESCRIPTION |
|---|---|
xml_spec
|
Path to an XML file or an XML string.
TYPE:
|
Source code in src/choregraph/choregraph.py
export_to_xml
¶
Serialize the current pipeline specification to an XML file.
| PARAMETER | DESCRIPTION |
|---|---|
save_to_path
|
Destination file path for the XML output.
TYPE:
|
Source code in src/choregraph/choregraph.py
add_input
¶
add_input(id, location='', format='CSV', label=None, visibility=False, url=None, data=None, **options)
Add an input data source.
| PARAMETER | DESCRIPTION |
|---|---|
id
|
Unique input ID (string).
TYPE:
|
location
|
File path or URL. Not required for in-memory data.
TYPE:
|
format
|
Data format (CSV, JSON, MEMORY, etc.).
Set automatically to
TYPE:
|
label
|
Human-readable label (auto-generated if None).
TYPE:
|
visibility
|
Whether input is visible in visualization.
TYPE:
|
url
|
Origin URL for URL-based data sources.
TYPE:
|
data
|
Optional in-memory data (pandas DataFrame, dict, or list).
When provided, the input is stored in
DEFAULT:
|
**options
|
Additional format-specific options.
DEFAULT:
|
Source code in src/choregraph/choregraph.py
add_node
¶
Add a node to the pipeline.
| PARAMETER | DESCRIPTION |
|---|---|
id
|
Unique node ID
TYPE:
|
type
|
Transform function name
TYPE:
|
input_ports
|
List of input port specifications
TYPE:
|
output_ports
|
List of output port specifications (auto-generated if None)
TYPE:
|
label
|
Human-readable label (auto-generated if None)
TYPE:
|
Source code in src/choregraph/choregraph.py
remove_node
¶
Remove a node from the pipeline.
Source code in src/choregraph/choregraph.py
remove_input
¶
Remove an input from the pipeline.
This removes the input from both the inputs list and outputs list (if visible). It also triggers catalog regeneration to ensure catalog.yml is updated.
| PARAMETER | DESCRIPTION |
|---|---|
id
|
The input ID to remove
TYPE:
|
Source code in src/choregraph/choregraph.py
subscribe
¶
Register a listener for pipeline events.
| PARAMETER | DESCRIPTION |
|---|---|
callback
|
Function called with
TYPE:
|
Source code in src/choregraph/choregraph.py
unsubscribe
¶
Remove a previously registered event listener.
| PARAMETER | DESCRIPTION |
|---|---|
callback
|
The callback function to remove.
TYPE:
|
close
¶
reset_spec
¶
Reset the spec to an empty state, clearing all inputs, nodes, and outputs.
Source code in src/choregraph/choregraph.py
load
¶
Load or reload a pipeline specification (compatibility layer).
Re-parses the XML spec and regenerates Kedro project files if the spec content has changed since the last call.
| PARAMETER | DESCRIPTION |
|---|---|
xml_spec
|
Path to an XML file or an XML string.
TYPE:
|
external_inputs
|
Dict mapping input IDs to in-memory data objects.
TYPE:
|
workspace_path
|
Override the workspace directory.
TYPE:
|
Source code in src/choregraph/choregraph.py
get_inputs
¶
get_visibles
¶
Get list of datasets marked as visible (visibility=True) as (id, name) tuples.
Source code in src/choregraph/choregraph.py
get_leaves
¶
Get list of terminal output ports (not consumed by any downstream node) as (id, name) tuples.
Note: Only returns output ports from nodes, not inputs. If there are no nodes, returns an empty list (inputs are already inputs, they don't need promotion).
Source code in src/choregraph/choregraph.py
find_node_for_output_port
¶
Find the node that owns a given output port ID.
| PARAMETER | DESCRIPTION |
|---|---|
output_port_id
|
The ID of the output port
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Optional[NodeSpec]
|
The NodeSpec containing this output port, or None if not found |
Source code in src/choregraph/choregraph.py
give_id
¶
Give the next available integer ID as a string. This can be used for generating unique IDs for nodes and ports.
Source code in src/choregraph/choregraph.py
promote_leaves
¶
Promote all leaf outputs as inputs, optionally removing their source nodes.
For each terminal output port (not consumed downstream):
- Single-file outputs are promoted via :meth:_promote_output.
- Partitioned outputs are promoted via :meth:_promote_partitioned.
Nodes are only removed when all their outputs were successfully promoted.
| RETURNS | DESCRIPTION |
|---|---|
List[Tuple[str, str]]
|
List of |
Source code in src/choregraph/choregraph.py
pushd
¶
Context manager to temporarily change the working directory. Restores the original directory automatically when exiting the 'with' block.