eidos agi

Contracts, Not Skills

The right primitive for AI systems.

Skills tell AI how to do things and decay every model generation. Contracts tell AI what you want and survive unchanged. The difference compounds.

A contract defines what we want — the end state, the constraints, the guardrails — without prescribing how to get there. A skill is the opposite: step-by-step instructions tuned to today's model. As models improve, contracts become more powerful while skills become dead weight.

Code review

The contract defines what "reviewed" means. The model decides how to review. The skill it replaces is 150 lines of step-by-step instructions that will need rewriting when the next model ships.

code-review.schema.json
{
  "$schema": "https://json-schema.org/draft/2020-12",
  "title": "Code Review",

  "purpose": "Produce a structured review of a
    source file. Every finding must be actionable
    — if the developer can't fix it from the
    review alone, the finding isn't specific
    enough. The verdict gates the merge.",

  "context": "Read the file in the context of the
    full repo. Check imports, callers, and tests.
    A finding that ignores how the file is used
    is a false positive.",

  "constraints": [
    "Security findings always override style",
    "Every finding must include a concrete fix",
    "Verdict is block if any finding is critical",
    "Do not flag style preferences as warnings",
    "One summary sentence, not a paragraph"
  ],

  "required": ["file", "findings", "verdict",
    "summary"],
  "properties": {
    "file": { "type": "string" },
    "findings": {
      "type": "array",
      "items": {
        "required": ["line", "severity",
          "issue", "fix"],
        "properties": {
          "line": { "type": "integer" },
          "severity": {
            "enum": ["critical", "warning",
              "suggestion"]
          },
          "issue": { "type": "string" },
          "fix": { "type": "string" }
        }
      }
    },
    "verdict": {
      "enum": ["approve", "request-changes",
        "block"]
    },
    "summary": {
      "type": "string",
      "maxLength": 120
    }
  }
}
produces
Example output
{
  "file": "src/auth/session.py",
  "findings": [
    {
      "line": 42,
      "severity": "critical",
      "issue": "SQL injection — user input
        interpolated directly into query
        string via f-string",
      "fix": "cursor.execute(
        'SELECT * FROM users WHERE id = %s',
        (user_id,))"
    },
    {
      "line": 87,
      "severity": "warning",
      "issue": "Bare except catches
        KeyboardInterrupt and SystemExit",
      "fix": "except Exception as e:"
    }
  ],
  "verdict": "block",
  "summary": "SQL injection in auth path.
    Must fix before merge."
}
The skill this replaces 150 lines
Step 1: Read the file. Understand its role.
Step 2: Check for security issues
  (OWASP top 10, injection, XSS...)
Step 3: Check for performance
  (N+1, blocking calls, memory leaks...)
Step 4: Check error handling
  (bare except, swallowed errors...)
Step 5: Check logic
  (off-by-one, race conditions, null derefs...)
Step 6: Format as line-by-line findings
  with severity...
Step 7: Assign verdict: approve if 0 critical,
  request-changes if...
...eight more steps with edge cases
  and format rules...

Deployment

The contract defines what "deployed" means. The model decides how to deploy.

deployment.schema.json
{
  "$schema": "https://json-schema.org/draft/2020-12",
  "title": "Deployment Verification",

  "purpose": "Verify that a service is running
    correctly after deploy. The output is a
    receipt: what was deployed, whether it's
    healthy, and exactly how to undo it if
    something breaks at 2am.",

  "context": "Check the running service, not just
    the code. Hit the health endpoint. Read the
    last 50 log lines. Confirm the correct SHA
    is actually serving traffic.",

  "constraints": [
    "Status is healthy only if ALL checks pass",
    "Rollback command must be copy-pasteable",
    "If migration is irreversible, say so",
    "Log window is 5 minutes post-deploy",
    "Do not mark healthy if error rate > 0.1%"
  ],

  "required": ["service", "sha", "environment",
    "status", "checks", "rollback"],
  "properties": {
    "service": { "type": "string" },
    "sha": {
      "type": "string",
      "pattern": "^[a-f0-9]{7,40}$"
    },
    "environment": {
      "enum": ["staging", "production"]
    },
    "status": {
      "enum": ["healthy", "degraded", "failed"]
    },
    "checks": {
      "required": ["health_endpoint",
        "smoke_test", "logs_clean",
        "correct_sha_serving"],
      "properties": {
        "health_endpoint": {"type":"boolean"},
        "smoke_test": {"type":"boolean"},
        "logs_clean": {"type":"boolean"},
        "correct_sha_serving": {
          "type":"boolean"}
      }
    },
    "rollback": {
      "required": ["previous_sha", "command",
        "migration_reversible"],
      "properties": {
        "previous_sha": {"type":"string"},
        "command": {"type":"string"},
        "migration_reversible": {
          "type":"boolean"}
      }
    }
  }
}
produces
Example output
{
  "service": "eidos-mail",
  "sha": "a4b39a7",
  "environment": "production",
  "status": "healthy",
  "checks": {
    "health_endpoint": true,
    "smoke_test": true,
    "logs_clean": true,
    "correct_sha_serving": true
  },
  "rollback": {
    "previous_sha": "f015d50",
    "command": "railway deploy --sha f015d50",
    "migration_reversible": true
  }
}
The skill this replaces 100 lines
Step 1: Detect service type
  (FastAPI, Flask, MCP...)
Step 2: Check health endpoint exists,
  returns 200...
Step 3: Verify SIGTERM handling for
  graceful shutdown...
Step 4: Audit environment variables...
Step 5: Check Dockerfile or Procfile...
Step 6: Document rollback strategy...
...six more steps with
  Railway-specific checks...

Project improvement

The contract defines what "better" means. The model decides how to improve.

improvement.schema.json
{
  "$schema": "https://json-schema.org/draft/2020-12",
  "title": "Project Improvement Snapshot",

  "purpose": "Score a project across six
    dimensions, fix the highest-impact gap,
    and record what changed. The snapshot is
    the proof — diffable, auditable, and
    comparable across runs.",

  "context": "Read the entire project: source,
    tests, docs, CI config, README, CLAUDE.md.
    Score based on what exists, not what's
    planned. A missing test suite is a 1, not
    a null.",

  "constraints": [
    "Score honestly — 5 is average, not bad",
    "Only improve one dimension per run",
    "Delta must reflect a real, committed change",
    "next_priority picks highest pain, not
      easiest fix",
    "null means not applicable, not unscored"
  ],

  "required": ["project", "scores", "focus",
    "improvement", "delta", "next_priority"],
  "properties": {
    "project": { "type": "string" },
    "scores": {
      "properties": {
        "reliability": {
          "type": ["integer", "null"],
          "minimum": 1, "maximum": 10
        },
        "performance": {
          "type": ["integer", "null"],
          "minimum": 1, "maximum": 10
        },
        "ux": {
          "type": ["integer", "null"],
          "minimum": 1, "maximum": 10
        },
        "security": {
          "type": ["integer", "null"],
          "minimum": 1, "maximum": 10
        },
        "code_quality": {
          "type": ["integer", "null"],
          "minimum": 1, "maximum": 10
        },
        "documentation": {
          "type": ["integer", "null"],
          "minimum": 1, "maximum": 10
        }
      }
    },
    "focus": { "type": "string" },
    "improvement": { "type": "string" },
    "delta": {
      "type": "object",
      "additionalProperties": {
        "required": ["before", "after"],
        "properties": {
          "before": {"type":"integer"},
          "after": {"type":"integer"}
        }
      }
    },
    "next_priority": { "type": "string" }
  }
}
produces
Example output
{
  "project": "eidos-mail",
  "scores": {
    "reliability": 3,
    "performance": 5,
    "ux": 4,
    "security": 3,
    "code_quality": 6,
    "documentation": 2
  },
  "focus": "documentation",
  "improvement": "Added README with install,
    usage, and architecture diagram. Added
    CLAUDE.md with key files, build commands,
    and deployment gotchas.",
  "delta": {
    "documentation": {
      "before": 2, "after": 5
    }
  },
  "next_priority": "reliability — no tests
    for the IMAP connection handler, which
    has crashed twice in production"
}
The skill this replaces 150 lines
Step 1: Read CLAUDE.md, vision, README,
  guardrails...
Step 2: Read directory structure, source
  files, tests...
Step 3: Score each dimension using
  calibration rubric...
Step 4: Pick highest-impact gap...
Step 5: Make the fix...
Step 6: Run tests, verify build...
Step 7: Write snapshot JSON...
Step 8: Output summary...
...plus rules for project type detection,
  previous runs, forge pipelines...

This pattern orders groceries

A GitHub repo with three JSON Schema contracts and five YAML preference files — no code at all — orders $260 of groceries weekly. The AI reads the schemas, understands the constraints, and places the order. The agent has changed three times; the contracts haven't changed once. Full case study with video and data.

The bitter lesson

Richard Sutton's greatest contribution to computer science: general methods that leverage computation always beat hand-crafted human knowledge as compute scales. This is the bitter lesson, and researchers keep re-learning it.

We teach a computer to play chess by giving it two things: the rules (a contract) and a goal (win). We don't give it opening theory or middlegame strategy. Those are skills. They work for beginners and fail at grandmaster level. AlphaZero learned chess with nothing but the rules and self-play, then destroyed Stockfish — the system built on decades of hand-tuned human expertise. The margin wasn't close.

Skills files are chess strategy books: useful for today's model, obsolete for tomorrow's. Contracts are the rules of chess: permanent, terse, and more powerful as the player gets smarter.