Reveals
Each tool in 3 seconds. Click a row to see its bespoke visual.
scribe — Technical writer. Keeps docs coherent with code. MCP
An agent ships a schema migration. Scribe detects that the change touches the data-layer section of the README, the CHANGELOG has no entry for this release, and the card's feature list is outdated. It authors fresh copy for each, groups them into one commit, and hands the agent a clean PR.
1. Scribe sees the change, authors the updates
2. One PR, every doc in lockstep with the code
One PR, three files, all coherent. Reviewers see the code change and every piece of documentation that had to move with it — in the same place, the same tooling, the same review flow they already use.
hone — Self-improvement loop. Per-target notebook. MCP
Every tick runs the same four phases, then repeats. Rhythm, not ritual.
- Observe — read real state — one structured observation
- Diagnose + Change — one-paragraph diagnosis, then act proportionally
- Measure — re-observe; revert if the signal got worse
- Retain — write the turn to .hone/ — trajectory travels with the code
Run it manually when you want a nudge. Set /loop 15m hone for a cadence. Either way, every tick leaves a turn log in .hone/turns/ so the history of how the target is evolving is always one git log away.
stepproof — Ceremony enforcement. Evidence gates + hash-chained audit. MCP
An agent can't just "deploy." It has to show evidence that the runbook's gates are met. Stepproof reads the evidence — PR approvals, test results, migration logs — and decides. No evidence, no advance. Every decision is audited to a hash-chained log the reviewer can replay.
1. First attempt — gates unmet, stepproof denies
2. After addressing the gaps — stepproof allows
The denial and the allowance are both durable. The next person (or next tick, or next auditor) reads .stepproof/events.jsonl and replays exactly what happened — what the agent asked for, what evidence it had, how stepproof decided, and why.
railguey — Railway lifecycle without a dashboard. Deploy, rollback, logs. MCP
An agent shouldn't need a dashboard to operate production. Railguey exposes Railway's lifecycle — status, logs, deploys, rollbacks, variables — as MCP tools. When a deploy misbehaves, the agent sees it, rolls it back, and confirms health without leaving the conversation.
1. Last deploy is leaking 500s — agent pulls HTTP logs
2. Rollback + confirm — no dashboard needed
Same operator flow a human would run in the Railway dashboard, expressed as tool calls the agent can chain. 17 tools across status, logs, deploys, variables, domains, volumes, and doctor audits.
apple-a-day — Agent-native Mac health. 9 checks, severity + fix commands, zero deps. Agent Tool
Agents run on machines that drift. Disks fill up, Homebrew links break after upgrades, LaunchAgents crash-loop in the background, and nobody notices until the agent starts hitting weird errors. Apple-a-day gives the agent awareness of its own host — 9 checks, each finding tagged with a severity and a fix command the agent can run directly.
1. aad checkup — 9 checks, the agent sees its host
2. Agent runs the fixes, confirms clean
Zero dependencies, pure Python. The --json flag makes every finding machine-readable with severity + fix so agents can triage and act without human translation.
clawdflare — Cloudflare audits with a human-gated write. Reads are free, writes take a PIN. MCP
Agents are great at reading infrastructure and telling you what's wrong. They're less great at having unsupervised write access to DNS. Clawdflare splits the difference — the read token lives in $CLOUDFLARE_API_TOKEN and the agent uses it freely; the write token is encrypted on disk and only decrypts when a human enters a PIN at a macOS popup. The agent never sees the write credential.
1. Agent audits the zone — surfaces real issues
2. Agent requests the apply — PIN popup gates the write
The agent drives every step — the read, the dry-run, the fix request, the re-audit. The one thing it cannot do is apply writes without a human at the keyboard. Perfect division of labor: agents see everything, humans authorize the things with blast radius.
resume-resume — Sessions survive the crash. Dirty repos + interrupted-session recovery. Agent Tool
Claude Code sessions die. The terminal crashes, the laptop sleeps, the model hiccups. The context is gone; the next session starts from zero and the work has to be re-explained from scratch. Resume-resume treats sessions as durable artifacts — indexed on disk, searchable, restorable — so the next boot picks up where the last one left off.
1. New morning, fresh session — boot_up surfaces what's pending
2. Pick up the crashed session — context restored, agent continues
Dirty repos bypass time filters — uncommitted work doesn't age out of the list. BM25 search scans 5,000+ prior sessions in ~3 seconds, so a question like "where did we test the eidos vs claude thing?" resolves instantly instead of being re-litigated.
slack-cc — Two-way Slack for Claude Code. Approve tool calls from your phone. MCP
Your team lives in Slack. Your terminal has the tokens, the VPN, the branch, the half-finished work. Slack-cc is the wire between them — when the agent hits an approval gate, the prompt mirrors to the channel, a teammate replies from their phone, the agent proceeds. The terminal has the power, Slack has the reach.
1. Agent hits an approval gate — mirrors to the channel
2. Human responds from Slack — agent proceeds
#proj-checkout
2 membersSocket Mode means no public URL, no deployment, no exposed endpoints — the MCP server runs locally alongside Claude Code and keeps an outbound WebSocket open to Slack. Works behind firewalls, NAT, anywhere. Sender gating, outbound gating, bot filtering, and token lockdown keep the bridge safe.
research.md — Decision forge. Evidence-graded, phase-gated, peer-reviewed. The research is the receipt. Decisions
Consequential decisions — the database, the auth model, the deploy target — get made casually in chat, and then quietly re-litigated every quarter because nobody remembers the tradeoffs. Research.md is the forge that earns the decision: candidates, locked criteria, graded evidence, peer review. The output is an ADR Governor can enforce and Docket can cite. The research is the receipt.
1. The drift case — a decision made "in chat"
2. The earned decision — phase-gated, scored, adopted
research.md feeds Governor; Governor feeds Docket. The flow is one-way — a decision skipped here is a contract that was never earned, and becomes the thing a future agent re-opens at the worst possible time.
Governor — St. Peter for the project. Vision, guardrails, ADRs — the contracts all execution honors. Governance
Every project has contracts: vision, guardrails, ADRs. They usually live in a wiki nobody reads and drift silently. Governor makes them readable by agents — and asks them the question before the work starts. If the task violates a guardrail, St. Peter has already said no. The agent doesn't get to decide otherwise.
1. Agent proposes a fast fix — Governor cites the contract
2. After reframing to honor the contract — proceeds cleanly
Vision doesn't drift. Guardrails stay active until explicitly deprecated. ADRs are immutable once accepted — new ADRs supersede rather than overwrite. Every agent session starts with visionlog_boot and honors whatever it reads there.
Docket — Execution forge. Tasks, milestones, Definition of Done — honest progress bars. Execution
Docket is the execution forge. Tasks, milestones, Definition of Done — the contracts a task must honor before it's allowed to close. Every task links back to the goal it serves and the research.md decision that authorized the approach. When the agent completes work, Docket verifies the DoD, unblocks downstream tasks, and moves the milestone gauge. The whole plan is always legible.
1. Where are we — one glance at the milestone
2. Complete the blocker — the next task unlocks, milestone advances
Three verbs cover the lifecycle: task create, task complete, milestone view. Dependencies and Definition-of-Done make the progress bar honest — a task isn't done because the agent says so, it's done because the DoD checked.