Skip to content

Skills

Skills are SKILL.md files with YAML frontmatter. They provide specialized capabilities and domain knowledge. Skills are activated on-demand when an agent needs them for a task (e.g. reading pipeline artifacts or running benchmarks).

Skills Overview

Name Description Available in Tiers
jg-pipeline-artifact-io Read/write layout for pipeline artifacts in .pipeline/ Practitioner, Expert
jg-benchmark-ops Benchmark collection and evaluation workflow for agent model assignment reviews Practitioner, Expert

jg-pipeline-artifact-io

Frontmatter: name: jg-pipeline-artifact-io, description: "Read/write layout for pipeline artifacts in .pipeline/. Use when any jg- agent reads upstream artifacts or writes its output."

Purpose: Defines the directory layout, reading/writing conventions, and per-agent mapping for pipeline artifacts. Ensures agents pass file paths (not inline content) and validate with schema.py.

Per-agent mapping:

Agent Reads Writes
jg-subplanner (issue) plan.json
jg-worker plan.json, debug-diagnosis.json worker-result.json
jg-tester (runs commands) test-result.json
jg-reviewer plan.json, worker-result.json review-result.json
jg-debugger test-result.json, plan.json debug-diagnosis.json
jg-git (git) git-result.json
jg-planner all (read-only) state.yaml if used

Expert tier extension: The Expert version adds tier tracking fields that agents must include when writing artifacts:

  • tier_used (string): "fast" | "standard" | "high"
  • cost_estimate (string): Human-readable cost estimate
  • escalation_history (array): For worker-result.json and test-result.json only — [{ from_tier, to_tier, reason }]

These fields enable the stage-gate checker to enforce tier routing invariants (e.g. complex tasks must not use fast-tier agents).

jg-benchmark-ops

Frontmatter: name: jg-benchmark-ops, description: "Benchmark collection and evaluation workflow for agent model assignment reviews. Use when pulling benchmarks, evaluating cost/performance, or deciding which models to use for which agents."

Purpose: Guides benchmark collection from sources (LiveBench, SWE-Bench, Artificial Analysis), storage in timestamped snapshots, validation, and evaluation. Produces verdicts (Excellent, Correct, Monitor, Tune, Upgrade) and cost/performance recommendations.

When to trigger:

  • New model release available for any agent
  • User requests benchmark collection or model assignment review
  • Periodic review (e.g. quarterly)

Verdict definitions:

Verdict Meaning
Excellent Current model leads its cost tier; no change needed
Correct Adequate; within ~5% of tier leader
Monitor Trails leader by ~5–15%; schedule review
Tune Same-cost or cheaper model outperforms by >5%; recommend change
Upgrade Higher-cost model outperforms on critical-path; recommend if cost justified

Anti-patterns: Do not record scores without source URL and date. Do not overwrite existing snapshots. Do not apply model assignment changes without explicit approval.