Exercise 11: Pipeline Observability¶
Objective¶
Understand why observability matters for multi-agent pipelines and create a pipeline-trace.json artifact that records what happened during a pipeline run. This is the foundation for cost analysis and debugging in production.
Required Reading
.cursor/skills/jg-pipeline-artifact-io/SKILL.md-- Directory layout includingpipeline-trace.json.cursor-practitioner/pipeline/schema.py-- Schema showing required fields forpipeline-trace.json- Putting It Together | Cursor Learn -- End-to-end workflows
Traces record which agents ran and what they produced.
Observability concepts are IDE-agnostic. In any system, tracking which agent ran, how long it took, and what it produced is essential for debugging and cost management.
Context¶
When a pipeline runs, you get the final artifacts (plan, worker-result, test-result, etc.) but not the execution timeline. Without a trace, you cannot answer:
- How long did each stage take?
- Which agent was invoked at each step?
- Were there retries or failures along the way?
- What was the total cost of the run?
A pipeline-trace.json artifact fills this gap by recording the execution timeline in structured form.
Tasks¶
Part 1: Create a pipeline trace¶
Create sandbox/.pipeline/ISSUE-42/pipeline-trace.json that reconstructs the execution timeline for the Issue-42 walkthrough. Use the existing artifacts in sandbox/.pipeline/ISSUE-42/ to inform your trace.
Required fields (validated by schema.py):
issue_id:"ISSUE-42"stages: Array of stage records, each with:stage: Stage name (e.g.,"plan","implement","test","debug","review","git")agent: Agent that executed the stage (e.g.,"jg-subplanner")started_at: ISO 8601 timestampduration_ms: Duration in millisecondsresult:"pass"or"fail"artifact: Path to the output artifacttotal_duration_ms: Sum of all stage durationsproduced_by:"jg-planner"
The trace should reflect the Issue-42 narrative: plan, implement, test (fail), debug, implement (retry), test (pass), review, git.
Part 2: Write an observability analysis¶
Write to docs/practitioner/tutorials/outputs/11-observability-analysis.md explaining:
- Why traces matter -- What questions can you answer with a trace that you cannot answer from artifacts alone?
- Cost visibility -- How would you extend the trace to track token usage and model costs per stage?
- Failure debugging -- How does the trace help when a pipeline produces unexpected results?
- Production monitoring -- What metrics would you derive from traces across many pipeline runs? (e.g., average cycle time, retry rate, cost per issue)
Output¶
sandbox/.pipeline/ISSUE-42/pipeline-trace.json-- Valid trace artifactdocs/practitioner/tutorials/outputs/11-observability-analysis.md-- Analysis with 4 sections
Validation
python3 docs/practitioner/tutorials/verify.py --exercise 11
Checks: pipeline-trace.json exists, passes schema validation, has at least 6 stage entries, includes both pass and fail results. Analysis file exists with 4 sections, each with sufficient depth.
Answer
pipeline-trace.json must include issue_id, at least 6 stage entries (plan, implement, test-fail, debug, implement-retry, test-pass, review, git), total_duration_ms, and produced_by: "jg-planner". Must include both pass and fail results.
Observability analysis: Traces answer questions artifacts cannot (timing, retry count, execution path). Extend with input_tokens, output_tokens, cost_usd per stage for cost visibility. Production metrics: average cycle time, retry rate, cost per issue, stage hotspots, failure classification distribution.