Fixture Format¶
This document describes the YAML/TOML fixture format consumed by
agentcarousel. The authoritative schema is
fixtures/schemas/skill-definition.schema.json.
Use bundle_id and bundle_version (semver recommended) at the top level of the fixture for certification bundles; treat major bumps as breaking for anyone pinning a bundle.
Top-level fields¶
schema_version(integer, required): current version is1.skill_or_agent(string, required): kebab-case identifier for the subject.defaults(object, optional): default settings applied to each case.bundle_id(string, optional): bundle identifier for certification tracking.bundle_version(string, optional): bundle version (semver recommended).certification_track(string, optional):none,candidate,stable,trusted.risk_tier(string, optional):low,medium,high.data_handling(string, optional):synthetic-only,no-pii,pii-reviewed.cases(array, required): one or more case definitions.
defaults¶
timeout_secs(integer): per-case timeout fallback.tags(array of strings): tags applied to every case.evaluator(string): default evaluator id (rules,golden,process,judge).
Case fields¶
id(string, required): must start with<skill_or_agent>/.description(string): human-readable summary of the intent.tags(array of strings): case tags for filtering.input(object, required): input payload.expected(object, required): assertions and rubric items.evaluator_config(object, optional): per-case evaluator settings.timeout_secs(integer): override timeout for this case.seed(integer): RNG seed for eval runs.
Canonical tags¶
Canonical tag set for authoring:
smoke: fast PR-gate case that should run on every pull request.happy-path: core success scenario for the skill/agent.error-handling: graceful failure behavior.edge-case: boundary or unusual-but-valid input behavior.certification: included in certification-focused carousels.deferred: tracked placeholder for blocked integrations.
Prefer tags in all new fixtures and examples.
input¶
messages(array, required): ordered message list.context(object): arbitrary structured context.env_overrides(object): non-secret environment variable overrides.
messages entries include:
role(string):user,assistant,system, ortool.content(string): message text.
expected¶
tool_sequence(array): required tool calls and ordering.output(array): output assertions.rubric(array): evaluation rubric items.
tool_sequence¶
tool(string): tool name.args_match(object): partial JSON match for tool args.order(string):strict,subsequence, orunordered.
output assertions¶
kind(string):contains,not_contains,equals,regex,json_path, orgolden_diff.value(string): assertion value.field(string): optional JSON pointer or top-level field name.
rubric items¶
id(string): stable rubric identifier.description(string): what the rubric measures.weight(number): relative weight in effectiveness score.auto_check(object): optional output assertion to score automatically.
evaluator_config¶
evaluator(string):rules,golden,process, orjudge.golden_path(string): relative path to golden output fixture.golden_threshold(number): diff threshold for golden evaluator.process_cmd(array): command and args for external evaluator.judge_prompt(string): extra prompt for the judge evaluator.
Templates and examples¶
- Template:
templates/fixture-skeleton.yaml - Intake: open a GitHub issue using the checklist in CONTRIBUTING.md before large additions
- Tag examples:
docs/fixture-tag-examples.md - Example fixtures:
fixtures/examples/