E EidosAGI

Reveals

Each tool in 3 seconds. Click a row to see its bespoke visual.

scribe — Technical writer. Keeps docs coherent with code. MCP

An agent ships a schema migration. Scribe detects that the change touches the data-layer section of the README, the CHANGELOG has no entry for this release, and the card's feature list is outdated. It authors fresh copy for each, groups them into one commit, and hands the agent a clean PR.

1. Scribe sees the change, authors the updates

scribe · backend
agent-7 committed: migrations/0042_user_preferences.sql scribe scans the repo for docs the change should touch. stale README.md ← "data layer" section claims 2 tables; code has 3 stale CHANGELOG.md ← no entry since v1.2.0; 6 commits ago stale .scribe/card.md ← feature list predates /session rename fresh docs/schema.md ← authored in this commit, ok author README.md · data layer · +user_preferences table author CHANGELOG.md · v1.3.0 entry · 6 commits grouped author .scribe/card.md · feature list refreshed Grouped into one commit. Ready for review. "chore: scribe · coherence pass after schema migration"

2. One PR, every doc in lockstep with the code

eidos-agi / backend Public
Watch 18 Fork 9 Star 127
⟨⟩ Code Issues 3 Pull requests 1 Actions
scribe/coherence-128 / PR #128 · coherence pass
Commits on Apr 22, 2026
SC chore: scribe · coherence pass after schema migration scribe committed 3a91c20 · just now
A7 feat(backend): user_preferences table + schema v4 agent-7 committed c1e9b22 · 3 min ago
A3 fix: redis reconnection backoff for session pool agent-3 committed b1f2c40 · 1 hr ago
Files changed · 3 +18 −6 · 3a91c20
README.md · data layer section
4141## Data layer
42Two tables: `users`, `sessions`. Postgres 16.
42+Three tables: `users`, `sessions`, `user_preferences`. Schema v4. Postgres 16.
4343Migrations via Alembic. See `docs/schema.md`.
CHANGELOG.md · v1.3.0 entry (6 commits grouped)
11# Changelog
3+## [1.3.0] — 2026-04-22
4+### Added
5+- `user_preferences` table + schema migration to v4
.scribe/card.md · feature list refresh
12features: [auth, sessions]
12+features: [sessions, user_preferences, redis_pool]

One PR, three files, all coherent. Reviewers see the code change and every piece of documentation that had to move with it — in the same place, the same tooling, the same review flow they already use.

hone — Self-improvement loop. Per-target notebook. MCP

Every tick runs the same four phases, then repeats. Rhythm, not ritual.

Observe Change Measure Retain
  1. Observe — read real state — one structured observation
  2. Diagnose + Change — one-paragraph diagnosis, then act proportionally
  3. Measure — re-observe; revert if the signal got worse
  4. Retain — write the turn to .hone/ — trajectory travels with the code

Run it manually when you want a nudge. Set /loop 15m hone for a cadence. Either way, every tick leaves a turn log in .hone/turns/ so the history of how the target is evolving is always one git log away.

stepproof — Ceremony enforcement. Evidence gates + hash-chained audit. MCP

An agent can't just "deploy." It has to show evidence that the runbook's gates are met. Stepproof reads the evidence — PR approvals, test results, migration logs — and decides. No evidence, no advance. Every decision is audited to a hash-chained log the reviewer can replay.

1. First attempt — gates unmet, stepproof denies

agent-7 · deploy
# agent-7 requests step "deploy" under runbook rb-deploy-prod stepproof step-complete deploy --evidence tests=pass --evidence migrations=dry-run stepproof checking step deploy against rb-deploy-prod... PR approved no PR reference provided tests green 3/3 suites pass migrations dry-run only DENY deploy — 2 of 3 evidence gates unmet audit: .stepproof/events.jsonl (hash-chained, entry #246) agent: halted. Collect approvals, run migrations for real, retry.

2. After addressing the gaps — stepproof allows

agent-7 · deploy (retry)
# agent re-submits after opening PR #128 and applying migrations stepproof step-complete deploy --evidence pr=128 --evidence tests=pass --evidence migrations=applied stepproof checking step deploy against rb-deploy-prod... PR approved #128 · 2 reviewers tests green 3/3 suites pass migrations applied 0042..0044 (12.4s) ALLOW deploy — all evidence verified audit: entry #247, sha 3a91c20 → links back to #246 agent: proceeding with deploy.

The denial and the allowance are both durable. The next person (or next tick, or next auditor) reads .stepproof/events.jsonl and replays exactly what happened — what the agent asked for, what evidence it had, how stepproof decided, and why.

railguey — Railway lifecycle without a dashboard. Deploy, rollback, logs. MCP

An agent shouldn't need a dashboard to operate production. Railguey exposes Railway's lifecycle — status, logs, deploys, rollbacks, variables — as MCP tools. When a deploy misbehaves, the agent sees it, rolls it back, and confirms health without leaving the conversation.

1. Last deploy is leaking 500s — agent pulls HTTP logs

agent · production
railguey status ~/apps/web service web · deployment d2f9 (5m ago) · DEGRADED http p50 120ms · p95 4.2s · 500s: 8.1% railguey http_logs web --status 500 --lines 5 20:14:02 POST /api/checkout 500 "TypeError: Cannot read..." 20:14:07 POST /api/checkout 500 "TypeError: Cannot read..." 20:14:15 POST /api/checkout 500 "TypeError: Cannot read..." agent same stack every time → rollback.

2. Rollback + confirm — no dashboard needed

agent · production (rollback)
railguey rollback web --to d8a1 railguey d2f9 → d8a1 · deploying... railguey d8a1 LIVE (22s) railguey status ~/apps/web service web · deployment d8a1 (30s ago) · HEALTHY http p50 110ms · p95 380ms · 500s: 0.0% agent filing issue with the stack trace; d2f9 stays parked.

Same operator flow a human would run in the Railway dashboard, expressed as tool calls the agent can chain. 17 tools across status, logs, deploys, variables, domains, volumes, and doctor audits.

apple-a-day — Agent-native Mac health. 9 checks, severity + fix commands, zero deps. Agent Tool

Agents run on machines that drift. Disks fill up, Homebrew links break after upgrades, LaunchAgents crash-loop in the background, and nobody notices until the agent starts hitting weird errors. Apple-a-day gives the agent awareness of its own host — 9 checks, each finding tagged with a severity and a fix command the agent can run directly.

1. aad checkup — 9 checks, the agent sees its host

agent · host
aad checkup --min-severity warning apple-a-day running 9 checks... crash-loops com.docker.vmnetd dying 14× in last hour fix: sudo launchctl unload /Library/Launch.../com.docker.vmnetd.plist disk-health 4.2 GB free on /, Time Machine snapshots using 38 GB fix: tmutil deletelocalsnapshots / homebrew 12 outdated packages, 3 broken dylib links fix: brew update && brew upgrade && brew doctor memory-pressure green · 18% wired · swap 0 B security SIP on · Gatekeeper on · FileVault on ... summary 3 warnings · 0 critical · 6 green

2. Agent runs the fixes, confirms clean

agent · host (after)
# agent executes the three fix commands in sequence sudo launchctl unload .../com.docker.vmnetd.plist tmutil deletelocalsnapshots / 37.8 GB reclaimed brew update && brew upgrade 12 packages updated aad checkup crash-loops no recurring failures disk-health 42 GB free on / homebrew 0 outdated · 0 broken links memory-pressure green security green ... summary 9 green · 0 warnings · 0 critical

Zero dependencies, pure Python. The --json flag makes every finding machine-readable with severity + fix so agents can triage and act without human translation.

clawdflare — Cloudflare audits with a human-gated write. Reads are free, writes take a PIN. MCP

Agents are great at reading infrastructure and telling you what's wrong. They're less great at having unsupervised write access to DNS. Clawdflare splits the difference — the read token lives in $CLOUDFLARE_API_TOKEN and the agent uses it freely; the write token is encrypted on disk and only decrypts when a human enters a PIN at a macOS popup. The agent never sees the write credential.

1. Agent audits the zone — surfaces real issues

agent · cloudflare
clawdflare audit example.com clawdflare reading zone example.com (read token)... hsts not enabled — MITM exposure on first visit min-tls-version 1.0 — should be 1.2 minimum caa-record missing — any CA can issue for this domain dns/orphaned A api-old → 192.0.2.33 (unreachable 30d) ssl-mode Full (strict) dnssec active agent 4 fixes available. Dry-run first: clawdflare fix example.com # dry-run — no writes would: enable HSTS · min-tls 1.2 · add CAA · remove api-old A

2. Agent requests the apply — PIN popup gates the write

agent · cloudflare (apply)
clawdflare fix example.com --apply clawdflare write token is encrypted · requesting PIN... ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ Clawdflare wants to apply 4 writes to example.com enable HSTS · min-tls 1.2 · add CAA · remove A api-old PIN: •••• ⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯ human enters PIN · write token decrypted applied HSTS · min-tls 1.2 · CAA · removed api-old agent: audit re-run — 6 green, 0 findings. Done.

The agent drives every step — the read, the dry-run, the fix request, the re-audit. The one thing it cannot do is apply writes without a human at the keyboard. Perfect division of labor: agents see everything, humans authorize the things with blast radius.

resume-resume — Sessions survive the crash. Dirty repos + interrupted-session recovery. Agent Tool

Claude Code sessions die. The terminal crashes, the laptop sleeps, the model hiccups. The context is gone; the next session starts from zero and the work has to be re-explained from scratch. Resume-resume treats sessions as durable artifacts — indexed on disk, searchable, restorable — so the next boot picks up where the last one left off.

1. New morning, fresh session — boot_up surfaces what's pending

agent · fresh boot
boot_up hours=24 dirty repos 3 repos with uncommitted changes (urgency-scored): eidosagi.com 12 files · 8 new · last commit 2h ago checkout-svc 5 files · 1 new · mid-TASK-031 scribe 2 files · 0 new · yesterday interrupted sessions 2 crashed in last 24h: session-af12 checkout-svc · feat/auth-migration · 3h ago last msg: "running migration on staging..." · TASK-031 resume: cr --resume af12 session-b7e9 eidosagi.com · feat/reveals · 8h ago resume: cr --resume b7e9

2. Pick up the crashed session — context restored, agent continues

agent · resumed session-af12
merge_context id=session-af12 mode=hybrid resume-resume reading session-af12.jsonl (412 turns)... summary you were migrating auth off legacy session tokens task TASK-031 · "stand up blue-green cutover" adr ADR-042 blue-green (scored 7.7) state PR #214 open · 3 / 5 DoD checks green last action applied migration 0042 on staging · crash before 0043 next run migration 0043; re-run 42/42 test suite agent context merged · picking up at migration 0043. zero re-explanation, no "what was I doing again?"

Dirty repos bypass time filters — uncommitted work doesn't age out of the list. BM25 search scans 5,000+ prior sessions in ~3 seconds, so a question like "where did we test the eidos vs claude thing?" resolves instantly instead of being re-litigated.

slack-cc — Two-way Slack for Claude Code. Approve tool calls from your phone. MCP

Your team lives in Slack. Your terminal has the tokens, the VPN, the branch, the half-finished work. Slack-cc is the wire between them — when the agent hits an approval gate, the prompt mirrors to the channel, a teammate replies from their phone, the agent proceeds. The terminal has the power, Slack has the reach.

1. Agent hits an approval gate — mirrors to the channel

claude · prod-deploy
# agent is about to run a migration on prod. Tool-call needs approval. Bash(psql prod -f migrations/0043.sql) approval pending · prompt #42 mirrored to #proj-checkout reply "yes 42" or "no 42" in slack — first answer wins

2. Human responds from Slack — agent proceeds

#proj-checkout

2 members
S
Eidosapp4:12 PM
Approval #42 — about to run psql prod -f migrations/0043.sql. Reply yes 42 or no 42.
D
daniel4:12 PM
yes 42
S
Eidosapp4:12 PM
Approved. Applying migration 0043 · 12 tables affected · logs in thread.
+ Message #proj-checkout

Socket Mode means no public URL, no deployment, no exposed endpoints — the MCP server runs locally alongside Claude Code and keeps an outbound WebSocket open to Slack. Works behind firewalls, NAT, anywhere. Sender gating, outbound gating, bot filtering, and token lockdown keep the bridge safe.

The trilogy research earns the decision · Governor records it · Docket executes against it See the three threaded in one session →
research.md — Decision forge. Evidence-graded, phase-gated, peer-reviewed. The research is the receipt. Decisions

Consequential decisions — the database, the auth model, the deploy target — get made casually in chat, and then quietly re-litigated every quarter because nobody remembers the tradeoffs. Research.md is the forge that earns the decision: candidates, locked criteria, graded evidence, peer review. The output is an ADR Governor can enforce and Docket can cite. The research is the receipt.

1. The drift case — a decision made "in chat"

agent · chat
# question: "should we pick Postgres or DynamoDB for the new service?" agent "both work, let's go with Postgres — we know it." research status --project checkout-store NO_RECORD no candidates, no criteria, no scoring, no ADR governor 0 ADRs for this choice · future agents will re-ask it why: this decision will be re-litigated in 3 months. earn it — run it through research.md.

2. The earned decision — phase-gated, scored, adopted

agent · research.md
research candidate create "Postgres" && candidate create "DynamoDB" research criteria lock weighted: RLS=3 · scalability=2 · ops-cost=2 · dx=1 locked 4 criteria · no further changes without a supersede research candidate score --scores-from evidence/ Postgres RLS 9 · scale 7 · ops 8 · dx 9 → 7.8 DynamoDB RLS 4 · scale 9 · ops 7 · dx 5 → 6.2 peer_review_log 2 reviewers signed off · no open objections research project decide --winner Postgres DECIDED Postgres · ADR-017 authored into Governor docket: future tasks cite ADR-017; re-litigation requires supersede.

research.md feeds Governor; Governor feeds Docket. The flow is one-way — a decision skipped here is a contract that was never earned, and becomes the thing a future agent re-opens at the worst possible time.

Governor — St. Peter for the project. Vision, guardrails, ADRs — the contracts all execution honors. Governance

Every project has contracts: vision, guardrails, ADRs. They usually live in a wiki nobody reads and drift silently. Governor makes them readable by agents — and asks them the question before the work starts. If the task violates a guardrail, St. Peter has already said no. The agent doesn't get to decide otherwise.

1. Agent proposes a fast fix — Governor cites the contract

agent · migration
# agent drafts a migration: DROP users.legacy_email, backfill NULL visionlog_boot project_id=checkout-svc vision "trusted checkout — no data loss under any refactor" guardrails 3 active · GR-01 soft deletes only adrs ADR-014 column removal requires 2-phase migration guardrail_inject action="drop_column users.legacy_email" BLOCK violates GR-01 + ADR-014 why: we got burned in the 2025-Q4 incident — hard deletes broke CSAT replay. Soft-delete, then drop in a later ADR.

2. After reframing to honor the contract — proceeds cleanly

agent · migration (reframed)
# agent rewrites: add deleted_at, backfill timestamps, keep the column guardrail_inject action="add users.deleted_at; soft-delete legacy rows" vision ✓ no data loss guardrails ✓ GR-01 soft deletes only adrs ✓ ADR-014 two-phase respected (phase 1 of 2) ALLOW proceed — queue ADR-015 for eventual column drop agent: writing migration + scheduling phase 2 review.

Vision doesn't drift. Guardrails stay active until explicitly deprecated. ADRs are immutable once accepted — new ADRs supersede rather than overwrite. Every agent session starts with visionlog_boot and honors whatever it reads there.

Docket — Execution forge. Tasks, milestones, Definition of Done — honest progress bars. Execution

Docket is the execution forge. Tasks, milestones, Definition of Done — the contracts a task must honor before it's allowed to close. Every task links back to the goal it serves and the research.md decision that authorized the approach. When the agent completes work, Docket verifies the DoD, unblocks downstream tasks, and moves the milestone gauge. The whole plan is always legible.

1. Where are we — one glance at the milestone

agent · checkout-svc
docket milestone view M2-auth-rewrite milestone M2 · auth-rewrite · 3 / 5 tasks ▰▰▰▱▱ goal GOAL-012 "auth without legacy sessions" due 2026-05-01 (9 days) docket task list --milestone M2 TASK-021 replace session middleware done TASK-022 rotate signing keys done TASK-023 migrate users.session_token → deleted_at done TASK-024 drop deprecated /auth/legacy endpoint in-progress TASK-025 cut v2.0 release blocked on TASK-024

2. Complete the blocker — the next task unlocks, milestone advances

agent · checkout-svc (after)
docket task complete TASK-024 --notes "/auth/legacy removed in PR #214, 0 callers in 30d" dod ✓ PR linked · ✓ callers=0 · ✓ Governor guardrails met docket TASK-024 DONE · unblocked TASK-025 docket milestone view M2-auth-rewrite milestone M2 · auth-rewrite · 4 / 5 tasks ▰▰▰▰▱ next TASK-025 cut v2.0 release (ready · no blockers) agent: picking up TASK-025 next.

Three verbs cover the lifecycle: task create, task complete, milestone view. Dependencies and Definition-of-Done make the progress bar honest — a task isn't done because the agent says so, it's done because the DoD checked.