Skills¶
Skills are SKILL.md files with YAML frontmatter. They provide specialized capabilities and domain knowledge. Skills are activated on-demand when an agent needs them for a task (e.g. reading pipeline artifacts or running benchmarks).
Skills Overview¶
| Name | Description | Available in Tiers |
|---|---|---|
| jg-pipeline-artifact-io | Read/write layout for pipeline artifacts in .pipeline/ | Practitioner, Expert |
| jg-benchmark-ops | Benchmark collection and evaluation workflow for agent model assignment reviews | Practitioner, Expert |
jg-pipeline-artifact-io¶
Frontmatter: name: jg-pipeline-artifact-io, description: "Read/write layout for pipeline artifacts in .pipeline/. Use when any jg- agent reads upstream artifacts or writes its output."
Purpose: Defines the directory layout, reading/writing conventions, and per-agent mapping for pipeline artifacts. Ensures agents pass file paths (not inline content) and validate with schema.py.
Per-agent mapping:
| Agent | Reads | Writes |
|---|---|---|
| jg-subplanner | (issue) | plan.json |
| jg-worker | plan.json, debug-diagnosis.json | worker-result.json |
| jg-tester | (runs commands) | test-result.json |
| jg-reviewer | plan.json, worker-result.json | review-result.json |
| jg-debugger | test-result.json, plan.json | debug-diagnosis.json |
| jg-git | (git) | git-result.json |
| jg-planner | all (read-only) | state.yaml if used |
Expert tier extension: The Expert version adds tier tracking fields that agents must include when writing artifacts:
- tier_used (string): "fast" | "standard" | "high"
- cost_estimate (string): Human-readable cost estimate
- escalation_history (array): For worker-result.json and test-result.json only —
[{ from_tier, to_tier, reason }]
These fields enable the stage-gate checker to enforce tier routing invariants (e.g. complex tasks must not use fast-tier agents).
jg-benchmark-ops¶
Frontmatter: name: jg-benchmark-ops, description: "Benchmark collection and evaluation workflow for agent model assignment reviews. Use when pulling benchmarks, evaluating cost/performance, or deciding which models to use for which agents."
Purpose: Guides benchmark collection from sources (LiveBench, SWE-Bench, Artificial Analysis), storage in timestamped snapshots, validation, and evaluation. Produces verdicts (Excellent, Correct, Monitor, Tune, Upgrade) and cost/performance recommendations.
When to trigger:
- New model release available for any agent
- User requests benchmark collection or model assignment review
- Periodic review (e.g. quarterly)
Verdict definitions:
| Verdict | Meaning |
|---|---|
| Excellent | Current model leads its cost tier; no change needed |
| Correct | Adequate; within ~5% of tier leader |
| Monitor | Trails leader by ~5–15%; schedule review |
| Tune | Same-cost or cheaper model outperforms by >5%; recommend change |
| Upgrade | Higher-cost model outperforms on critical-path; recommend if cost justified |
Anti-patterns: Do not record scores without source URL and date. Do not overwrite existing snapshots. Do not apply model assignment changes without explicit approval.