Reveals

Each tool in 3 seconds. Click a row to see its bespoke visual.

scribe — Technical writer. Keeps docs coherent with code. MCP

An agent ships a schema migration. Scribe detects that the change touches the data-layer section of the README, the CHANGELOG has no entry for this release, and the card's feature list is outdated. It authors fresh copy for each, groups them into one commit, and hands the agent a clean PR.

1. Scribe sees the change, authors the updates

scribe · backend

  agent-7 committed: migrations/0042_user_preferences.sql scribe scans the repo for docs the change should touch.  stale README.md ← "data layer" section claims 2 tables; code has 3 stale CHANGELOG.md ← no entry since v1.2.0; 6 commits ago stale .scribe/card.md ← feature list predates /session rename fresh docs/schema.md ← authored in this commit, ok  author README.md · data layer · +user_preferences table author CHANGELOG.md · v1.3.0 entry · 6 commits grouped author .scribe/card.md · feature list refreshed   Grouped into one commit. Ready for review.  "chore: scribe · coherence pass after schema migration"   

2. One PR, every doc in lockstep with the code

eidos-agi / backend Public

◉ Watch 18 ⑃ Fork 9 ★ Star 127

⟨⟩ Code ◉ Issues 3 ⇢ Pull requests 1 ▶ Actions

⎇ scribe/coherence-128 / PR #128 · coherence pass

▸ Files changed · 3 +18 −6 · 3a91c20

README.md · data layer section

4141## Data layer

42−Two tables: `users`, `sessions`. Postgres 16.

42+Three tables: `users`, `sessions`, `user_preferences`. Schema v4. Postgres 16.

4343Migrations via Alembic. See `docs/schema.md`.

CHANGELOG.md · v1.3.0 entry (6 commits grouped)

11# Changelog

3+## [1.3.0] — 2026-04-22

4+### Added

5+- `user_preferences` table + schema migration to v4

.scribe/card.md · feature list refresh

12−features: [auth, sessions]

12+features: [sessions, user_preferences, redis_pool]

One PR, three files, all coherent. Reviewers see the code change and every piece of documentation that had to move with it — in the same place, the same tooling, the same review flow they already use.

hone — Self-improvement loop. Per-target notebook. MCP

Every tick runs the same four phases, then repeats. Rhythm, not ritual.

Observe — read real state — one structured observation
Diagnose + Change — one-paragraph diagnosis, then act proportionally
Measure — re-observe; revert if the signal got worse
Retain — write the turn to .hone/ — trajectory travels with the code

Run it manually when you want a nudge. Set /loop 15m hone for a cadence. Either way, every tick leaves a turn log in .hone/turns/ so the history of how the target is evolving is always one git log away.

stepproof — Ceremony enforcement. Evidence gates + hash-chained audit. MCP

An agent can't just "deploy." It has to show evidence that the runbook's gates are met. Stepproof reads the evidence — PR approvals, test results, migration logs — and decides. No evidence, no advance. Every decision is audited to a hash-chained log the reviewer can replay.

1. First attempt — gates unmet, stepproof denies

agent-7 · deploy

  # agent-7 requests step "deploy" under runbook rb-deploy-prod › stepproof step-complete deploy --evidence tests=pass --evidence migrations=dry-run  stepproof checking step deploy against rb-deploy-prod...  ✗ PR approved no PR reference provided  ✓ tests green 3/3 suites pass  ✗ migrations dry-run only  DENY deploy — 2 of 3 evidence gates unmet  audit: .stepproof/events.jsonl (hash-chained, entry #246)  agent: halted. Collect approvals, run migrations for real, retry.   

2. After addressing the gaps — stepproof allows

agent-7 · deploy (retry)

  # agent re-submits after opening PR #128 and applying migrations › stepproof step-complete deploy --evidence pr=128 --evidence tests=pass --evidence migrations=applied  stepproof checking step deploy against rb-deploy-prod...  ✓ PR approved #128 · 2 reviewers  ✓ tests green 3/3 suites pass  ✓ migrations applied 0042..0044 (12.4s)  ALLOW deploy — all evidence verified  audit: entry #247, sha 3a91c20 → links back to #246  agent: proceeding with deploy.   

The denial and the allowance are both durable. The next person (or next tick, or next auditor) reads .stepproof/events.jsonl and replays exactly what happened — what the agent asked for, what evidence it had, how stepproof decided, and why.

railguey — Railway lifecycle without a dashboard. Deploy, rollback, logs. MCP

An agent shouldn't need a dashboard to operate production. Railguey exposes Railway's lifecycle — status, logs, deploys, rollbacks, variables — as MCP tools. When a deploy misbehaves, the agent sees it, rolls it back, and confirms health without leaving the conversation.

1. Last deploy is leaking 500s — agent pulls HTTP logs

agent · production

  › railguey status ~/apps/web service web · deployment d2f9 (5m ago) · DEGRADED http p50 120ms · p95 4.2s · 500s: 8.1%  › railguey http_logs web --status 500 --lines 5 20:14:02 POST /api/checkout 500 "TypeError: Cannot read..." 20:14:07 POST /api/checkout 500 "TypeError: Cannot read..." 20:14:15 POST /api/checkout 500 "TypeError: Cannot read..."  agent same stack every time → rollback.   

2. Rollback + confirm — no dashboard needed

agent · production (rollback)

  › railguey rollback web --to d8a1 railguey d2f9 → d8a1 · deploying... railguey d8a1 LIVE (22s)  › railguey status ~/apps/web service web · deployment d8a1 (30s ago) · HEALTHY http p50 110ms · p95 380ms · 500s: 0.0%  agent filing issue with the stack trace; d2f9 stays parked.   

Same operator flow a human would run in the Railway dashboard, expressed as tool calls the agent can chain. 17 tools across status, logs, deploys, variables, domains, volumes, and doctor audits.

apple-a-day — Agent-native Mac health. 9 checks, severity + fix commands, zero deps. Agent Tool

Agents run on machines that drift. Disks fill up, Homebrew links break after upgrades, LaunchAgents crash-loop in the background, and nobody notices until the agent starts hitting weird errors. Apple-a-day gives the agent awareness of its own host — 9 checks, each finding tagged with a severity and a fix command the agent can run directly.

1. aad checkup — 9 checks, the agent sees its host

agent · host

  › aad checkup --min-severity warning apple-a-day running 9 checks...   ⚠ crash-loops com.docker.vmnetd dying 14× in last hour  fix: sudo launchctl unload /Library/Launch.../com.docker.vmnetd.plist  ⚠ disk-health 4.2 GB free on /, Time Machine snapshots using 38 GB  fix: tmutil deletelocalsnapshots /  ⚠ homebrew 12 outdated packages, 3 broken dylib links  fix: brew update && brew upgrade && brew doctor  ✓ memory-pressure green · 18% wired · swap 0 B  ✓ security SIP on · Gatekeeper on · FileVault on  ...  summary 3 warnings · 0 critical · 6 green   

2. Agent runs the fixes, confirms clean

agent · host (after)

  # agent executes the three fix commands in sequence › sudo launchctl unload .../com.docker.vmnetd.plist ✓ › tmutil deletelocalsnapshots / ✓ 37.8 GB reclaimed › brew update && brew upgrade ✓ 12 packages updated  › aad checkup  ✓ crash-loops no recurring failures  ✓ disk-health 42 GB free on /  ✓ homebrew 0 outdated · 0 broken links  ✓ memory-pressure green  ✓ security green  ...  summary 9 green · 0 warnings · 0 critical   

Zero dependencies, pure Python. The --json flag makes every finding machine-readable with severity + fix so agents can triage and act without human translation.

clawdflare — Cloudflare audits with a human-gated write. Reads are free, writes take a PIN. MCP

Agents are great at reading infrastructure and telling you what's wrong. They're less great at having unsupervised write access to DNS. Clawdflare splits the difference — the read token lives in $CLOUDFLARE_API_TOKEN and the agent uses it freely; the write token is encrypted on disk and only decrypts when a human enters a PIN at a macOS popup. The agent never sees the write credential.

1. Agent audits the zone — surfaces real issues

agent · cloudflare

  › clawdflare audit example.com clawdflare reading zone example.com (read token)...   ✗ hsts not enabled — MITM exposure on first visit  ✗ min-tls-version 1.0 — should be 1.2 minimum  ✗ caa-record missing — any CA can issue for this domain  ✗ dns/orphaned A api-old → 192.0.2.33 (unreachable 30d)  ✓ ssl-mode Full (strict)  ✓ dnssec active  agent 4 fixes available. Dry-run first: › clawdflare fix example.com # dry-run — no writes  would: enable HSTS · min-tls 1.2 · add CAA · remove api-old A   

2. Agent requests the apply — PIN popup gates the write

agent · cloudflare (apply)

  › clawdflare fix example.com --apply clawdflare write token is encrypted · requesting PIN...  ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯  │ Clawdflare wants to apply 4 writes to example.com │  │ enable HSTS · min-tls 1.2 · add CAA · remove A api-old │  │ PIN: •••• │  ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯  human enters PIN · write token decrypted applied ✓ HSTS · ✓ min-tls 1.2 · ✓ CAA · ✓ removed api-old  agent: audit re-run — 6 green, 0 findings. Done.   

The agent drives every step — the read, the dry-run, the fix request, the re-audit. The one thing it cannot do is apply writes without a human at the keyboard. Perfect division of labor: agents see everything, humans authorize the things with blast radius.

resume-resume — Sessions survive the crash. Dirty repos + interrupted-session recovery. Agent Tool

Claude Code sessions die. The terminal crashes, the laptop sleeps, the model hiccups. The context is gone; the next session starts from zero and the work has to be re-explained from scratch. Resume-resume treats sessions as durable artifacts — indexed on disk, searchable, restorable — so the next boot picks up where the last one left off.

1. New morning, fresh session — boot_up surfaces what's pending

agent · fresh boot

  › boot_up hours=24  dirty repos 3 repos with uncommitted changes (urgency-scored):  ◉ eidosagi.com 12 files · 8 new · last commit 2h ago  ◉ checkout-svc 5 files · 1 new · mid-TASK-031  ◉ scribe 2 files · 0 new · yesterday  interrupted sessions 2 crashed in last 24h:  session-af12 checkout-svc · feat/auth-migration · 3h ago  last msg: "running migration on staging..." · TASK-031  resume: cr --resume af12  session-b7e9 eidosagi.com · feat/reveals · 8h ago  resume: cr --resume b7e9   

2. Pick up the crashed session — context restored, agent continues

agent · resumed session-af12

  › merge_context id=session-af12 mode=hybrid resume-resume reading session-af12.jsonl (412 turns)...  summary you were migrating auth off legacy session tokens task TASK-031 · "stand up blue-green cutover" adr ADR-042 blue-green (scored 7.7) state PR #214 open · 3 / 5 DoD checks green last action applied migration 0042 on staging · crash before 0043 next run migration 0043; re-run 42/42 test suite  agent context merged · picking up at migration 0043.  zero re-explanation, no "what was I doing again?"   

Dirty repos bypass time filters — uncommitted work doesn't age out of the list. BM25 search scans 5,000+ prior sessions in ~3 seconds, so a question like "where did we test the eidos vs claude thing?" resolves instantly instead of being re-litigated.

slack-cc — Two-way Slack for Claude Code. Approve tool calls from your phone. MCP

Your team lives in Slack. Your terminal has the tokens, the VPN, the branch, the half-finished work. Slack-cc is the wire between them — when the agent hits an approval gate, the prompt mirrors to the channel, a teammate replies from their phone, the agent proceeds. The terminal has the power, Slack has the reach.

1. Agent hits an approval gate — mirrors to the channel

claude · prod-deploy

  # agent is about to run a migration on prod. Tool-call needs approval. › Bash(psql prod -f migrations/0043.sql) approval pending · prompt #42 mirrored to #proj-checkout  reply "yes 42" or "no 42" in slack — first answer wins   

2. Human responds from Slack — agent proceeds

#proj-checkout

2 members

Eidosapp4:12 PM

Approval #42 — about to run psql prod -f migrations/0043.sql. Reply yes 42 or no 42.

daniel4:12 PM

yes 42

Eidosapp4:12 PM

Approved. Applying migration 0043 · 12 tables affected · logs in thread.

+ Message #proj-checkout

Socket Mode means no public URL, no deployment, no exposed endpoints — the MCP server runs locally alongside Claude Code and keeps an outbound WebSocket open to Slack. Works behind firewalls, NAT, anywhere. Sender gating, outbound gating, bot filtering, and token lockdown keep the bridge safe.

The trilogy research earns the decision · Governor records it · Docket executes against it See the three threaded in one session →

research.md — Decision forge. Evidence-graded, phase-gated, peer-reviewed. The research is the receipt. Decisions

Consequential decisions — the database, the auth model, the deploy target — get made casually in chat, and then quietly re-litigated every quarter because nobody remembers the tradeoffs. Research.md is the forge that earns the decision: candidates, locked criteria, graded evidence, peer review. The output is an ADR Governor can enforce and Docket can cite. The research is the receipt.

1. The drift case — a decision made "in chat"

agent · chat

  # question: "should we pick Postgres or DynamoDB for the new service?" agent "both work, let's go with Postgres — we know it."  › research status --project checkout-store NO_RECORD no candidates, no criteria, no scoring, no ADR governor 0 ADRs for this choice · future agents will re-ask it  why: this decision will be re-litigated in 3 months.  earn it — run it through research.md.   

2. The earned decision — phase-gated, scored, adopted

agent · research.md

  › research candidate create "Postgres" && candidate create "DynamoDB" › research criteria lock weighted: RLS=3 · scalability=2 · ops-cost=2 · dx=1 locked 4 criteria · no further changes without a supersede  › research candidate score --scores-from evidence/  Postgres RLS 9 · scale 7 · ops 8 · dx 9 → 7.8  DynamoDB RLS 4 · scale 9 · ops 7 · dx 5 → 6.2 peer_review_log 2 reviewers signed off · no open objections  › research project decide --winner Postgres DECIDED Postgres · ADR-017 authored into Governor  docket: future tasks cite ADR-017; re-litigation requires supersede.   

research.md feeds Governor; Governor feeds Docket. The flow is one-way — a decision skipped here is a contract that was never earned, and becomes the thing a future agent re-opens at the worst possible time.

Governor — St. Peter for the project. Vision, guardrails, ADRs — the contracts all execution honors. Governance

Every project has contracts: vision, guardrails, ADRs. They usually live in a wiki nobody reads and drift silently. Governor makes them readable by agents — and asks them the question before the work starts. If the task violates a guardrail, St. Peter has already said no. The agent doesn't get to decide otherwise.

1. Agent proposes a fast fix — Governor cites the contract

agent · migration

  # agent drafts a migration: DROP users.legacy_email, backfill NULL › visionlog_boot project_id=checkout-svc vision "trusted checkout — no data loss under any refactor" guardrails 3 active · GR-01 soft deletes only adrs ADR-014 column removal requires 2-phase migration  › guardrail_inject action="drop_column users.legacy_email" BLOCK violates GR-01 + ADR-014  why: we got burned in the 2025-Q4 incident — hard deletes  broke CSAT replay. Soft-delete, then drop in a later ADR.   

2. After reframing to honor the contract — proceeds cleanly

agent · migration (reframed)

  # agent rewrites: add deleted_at, backfill timestamps, keep the column › guardrail_inject action="add users.deleted_at; soft-delete legacy rows" vision ✓ no data loss guardrails ✓ GR-01 soft deletes only adrs ✓ ADR-014 two-phase respected (phase 1 of 2) ALLOW proceed — queue ADR-015 for eventual column drop  agent: writing migration + scheduling phase 2 review.   

Vision doesn't drift. Guardrails stay active until explicitly deprecated. ADRs are immutable once accepted — new ADRs supersede rather than overwrite. Every agent session starts with visionlog_boot and honors whatever it reads there.

Docket — Execution forge. Tasks, milestones, Definition of Done — honest progress bars. Execution

Docket is the execution forge. Tasks, milestones, Definition of Done — the contracts a task must honor before it's allowed to close. Every task links back to the goal it serves and the research.md decision that authorized the approach. When the agent completes work, Docket verifies the DoD, unblocks downstream tasks, and moves the milestone gauge. The whole plan is always legible.

1. Where are we — one glance at the milestone

agent · checkout-svc

  › docket milestone view M2-auth-rewrite milestone M2 · auth-rewrite · 3 / 5 tasks ▰▰▰▱▱ goal GOAL-012 "auth without legacy sessions" due 2026-05-01  (9 days)  › docket task list --milestone M2  ✓ TASK-021  replace session middleware done  ✓ TASK-022  rotate signing keys done  ✓ TASK-023  migrate users.session_token → deleted_at done  ▸ TASK-024  drop deprecated /auth/legacy endpoint in-progress  ⊘ TASK-025  cut v2.0 release blocked on TASK-024   

2. Complete the blocker — the next task unlocks, milestone advances

agent · checkout-svc (after)

  › docket task complete TASK-024 --notes "/auth/legacy removed in PR #214, 0 callers in 30d" dod ✓ PR linked · ✓ callers=0 · ✓ Governor guardrails met docket TASK-024 DONE · unblocked TASK-025  › docket milestone view M2-auth-rewrite milestone M2 · auth-rewrite · 4 / 5 tasks ▰▰▰▰▱ next TASK-025 cut v2.0 release (ready · no blockers)  agent: picking up TASK-025 next.   

Three verbs cover the lifecycle: task create, task complete, milestone view. Dependencies and Definition-of-Done make the progress bar honest — a task isn't done because the agent says so, it's done because the DoD checked.