Contracts, Not Skills
The right primitive for AI systems.
Skills tell AI how to do things and decay every model generation. Contracts tell AI what you want and survive unchanged. The difference compounds.
A contract defines what we want — the end state, the constraints, the guardrails — without prescribing how to get there. A skill is the opposite: step-by-step instructions tuned to today's model. As models improve, contracts become more powerful while skills become dead weight.
Code review
The contract defines what "reviewed" means. The model decides how to review. The skill it replaces is 150 lines of step-by-step instructions that will need rewriting when the next model ships.
{
"$schema": "https://json-schema.org/draft/2020-12",
"title": "Code Review",
"purpose": "Produce a structured review of a
source file. Every finding must be actionable
— if the developer can't fix it from the
review alone, the finding isn't specific
enough. The verdict gates the merge.",
"context": "Read the file in the context of the
full repo. Check imports, callers, and tests.
A finding that ignores how the file is used
is a false positive.",
"constraints": [
"Security findings always override style",
"Every finding must include a concrete fix",
"Verdict is block if any finding is critical",
"Do not flag style preferences as warnings",
"One summary sentence, not a paragraph"
],
"required": ["file", "findings", "verdict",
"summary"],
"properties": {
"file": { "type": "string" },
"findings": {
"type": "array",
"items": {
"required": ["line", "severity",
"issue", "fix"],
"properties": {
"line": { "type": "integer" },
"severity": {
"enum": ["critical", "warning",
"suggestion"]
},
"issue": { "type": "string" },
"fix": { "type": "string" }
}
}
},
"verdict": {
"enum": ["approve", "request-changes",
"block"]
},
"summary": {
"type": "string",
"maxLength": 120
}
}
} {
"file": "src/auth/session.py",
"findings": [
{
"line": 42,
"severity": "critical",
"issue": "SQL injection — user input
interpolated directly into query
string via f-string",
"fix": "cursor.execute(
'SELECT * FROM users WHERE id = %s',
(user_id,))"
},
{
"line": 87,
"severity": "warning",
"issue": "Bare except catches
KeyboardInterrupt and SystemExit",
"fix": "except Exception as e:"
}
],
"verdict": "block",
"summary": "SQL injection in auth path.
Must fix before merge."
} Step 1: Read the file. Understand its role.
Step 2: Check for security issues
(OWASP top 10, injection, XSS...)
Step 3: Check for performance
(N+1, blocking calls, memory leaks...)
Step 4: Check error handling
(bare except, swallowed errors...)
Step 5: Check logic
(off-by-one, race conditions, null derefs...)
Step 6: Format as line-by-line findings
with severity...
Step 7: Assign verdict: approve if 0 critical,
request-changes if...
...eight more steps with edge cases
and format rules... Deployment
The contract defines what "deployed" means. The model decides how to deploy.
{
"$schema": "https://json-schema.org/draft/2020-12",
"title": "Deployment Verification",
"purpose": "Verify that a service is running
correctly after deploy. The output is a
receipt: what was deployed, whether it's
healthy, and exactly how to undo it if
something breaks at 2am.",
"context": "Check the running service, not just
the code. Hit the health endpoint. Read the
last 50 log lines. Confirm the correct SHA
is actually serving traffic.",
"constraints": [
"Status is healthy only if ALL checks pass",
"Rollback command must be copy-pasteable",
"If migration is irreversible, say so",
"Log window is 5 minutes post-deploy",
"Do not mark healthy if error rate > 0.1%"
],
"required": ["service", "sha", "environment",
"status", "checks", "rollback"],
"properties": {
"service": { "type": "string" },
"sha": {
"type": "string",
"pattern": "^[a-f0-9]{7,40}$"
},
"environment": {
"enum": ["staging", "production"]
},
"status": {
"enum": ["healthy", "degraded", "failed"]
},
"checks": {
"required": ["health_endpoint",
"smoke_test", "logs_clean",
"correct_sha_serving"],
"properties": {
"health_endpoint": {"type":"boolean"},
"smoke_test": {"type":"boolean"},
"logs_clean": {"type":"boolean"},
"correct_sha_serving": {
"type":"boolean"}
}
},
"rollback": {
"required": ["previous_sha", "command",
"migration_reversible"],
"properties": {
"previous_sha": {"type":"string"},
"command": {"type":"string"},
"migration_reversible": {
"type":"boolean"}
}
}
}
} {
"service": "eidos-mail",
"sha": "a4b39a7",
"environment": "production",
"status": "healthy",
"checks": {
"health_endpoint": true,
"smoke_test": true,
"logs_clean": true,
"correct_sha_serving": true
},
"rollback": {
"previous_sha": "f015d50",
"command": "railway deploy --sha f015d50",
"migration_reversible": true
}
} Step 1: Detect service type
(FastAPI, Flask, MCP...)
Step 2: Check health endpoint exists,
returns 200...
Step 3: Verify SIGTERM handling for
graceful shutdown...
Step 4: Audit environment variables...
Step 5: Check Dockerfile or Procfile...
Step 6: Document rollback strategy...
...six more steps with
Railway-specific checks... Project improvement
The contract defines what "better" means. The model decides how to improve.
{
"$schema": "https://json-schema.org/draft/2020-12",
"title": "Project Improvement Snapshot",
"purpose": "Score a project across six
dimensions, fix the highest-impact gap,
and record what changed. The snapshot is
the proof — diffable, auditable, and
comparable across runs.",
"context": "Read the entire project: source,
tests, docs, CI config, README, CLAUDE.md.
Score based on what exists, not what's
planned. A missing test suite is a 1, not
a null.",
"constraints": [
"Score honestly — 5 is average, not bad",
"Only improve one dimension per run",
"Delta must reflect a real, committed change",
"next_priority picks highest pain, not
easiest fix",
"null means not applicable, not unscored"
],
"required": ["project", "scores", "focus",
"improvement", "delta", "next_priority"],
"properties": {
"project": { "type": "string" },
"scores": {
"properties": {
"reliability": {
"type": ["integer", "null"],
"minimum": 1, "maximum": 10
},
"performance": {
"type": ["integer", "null"],
"minimum": 1, "maximum": 10
},
"ux": {
"type": ["integer", "null"],
"minimum": 1, "maximum": 10
},
"security": {
"type": ["integer", "null"],
"minimum": 1, "maximum": 10
},
"code_quality": {
"type": ["integer", "null"],
"minimum": 1, "maximum": 10
},
"documentation": {
"type": ["integer", "null"],
"minimum": 1, "maximum": 10
}
}
},
"focus": { "type": "string" },
"improvement": { "type": "string" },
"delta": {
"type": "object",
"additionalProperties": {
"required": ["before", "after"],
"properties": {
"before": {"type":"integer"},
"after": {"type":"integer"}
}
}
},
"next_priority": { "type": "string" }
}
} {
"project": "eidos-mail",
"scores": {
"reliability": 3,
"performance": 5,
"ux": 4,
"security": 3,
"code_quality": 6,
"documentation": 2
},
"focus": "documentation",
"improvement": "Added README with install,
usage, and architecture diagram. Added
CLAUDE.md with key files, build commands,
and deployment gotchas.",
"delta": {
"documentation": {
"before": 2, "after": 5
}
},
"next_priority": "reliability — no tests
for the IMAP connection handler, which
has crashed twice in production"
} Step 1: Read CLAUDE.md, vision, README,
guardrails...
Step 2: Read directory structure, source
files, tests...
Step 3: Score each dimension using
calibration rubric...
Step 4: Pick highest-impact gap...
Step 5: Make the fix...
Step 6: Run tests, verify build...
Step 7: Write snapshot JSON...
Step 8: Output summary...
...plus rules for project type detection,
previous runs, forge pipelines... This pattern orders groceries
A GitHub repo with three JSON Schema contracts and five YAML preference files — no code at all — orders $260 of groceries weekly. The AI reads the schemas, understands the constraints, and places the order. The agent has changed three times; the contracts haven't changed once. Full case study with video and data.
The bitter lesson
Richard Sutton's greatest contribution to computer science: general methods that leverage computation always beat hand-crafted human knowledge as compute scales. This is the bitter lesson, and researchers keep re-learning it.
We teach a computer to play chess by giving it two things: the rules (a contract) and a goal (win). We don't give it opening theory or middlegame strategy. Those are skills. They work for beginners and fail at grandmaster level. AlphaZero learned chess with nothing but the rules and self-play, then destroyed Stockfish — the system built on decades of hand-tuned human expertise. The margin wasn't close.
Skills files are chess strategy books: useful for today's model, obsolete for tomorrow's. Contracts are the rules of chess: permanent, terse, and more powerful as the player gets smarter.