Case Studies

Real-world tasks. Autonomous agents.

Each case study documents a real task that the Eidos system completed autonomously — the problem, the approach, the outcome, and what it reveals about how AI systems actually work.

These are not demos. They are production systems where our team of Eidos agents read structured data, plan their own work, and produce results — from ordering groceries to securing enterprise software approval.

Live

Featured · March 2026

Grocery ordering

A source repo with YAML preferences and JSON Schema contracts — 3 contracts, 5 preference files, zero lines of code. An AI reads them and orders groceries weekly.

browser agent + source repo + JSON Schema

July 2026

Matching isn't proof

An agent 'passed' hundreds of reconciliation rows by echoing the target. Grading provenance instead of output — and an adversarial skeptic that tries to refute — caught it, and its own blind spot.

Claude Code + fan-out subagents + read-only Postgres warehouse + deterministic Python gate

May 2026

What AI-native looks like

A personal AI operating system built from Linear teams, agent fleets, private plugin stores, evidence, memory, and explicit human approval boundaries.

Linear + Codex + Claude Code + Hermes + private plugin stores

May 2026

Knox makes agents ask first

A local approval authority for Mac agents that want to spend money, reveal credentials, send email, or take destructive actions. The agent can ask; Knox makes the human approve the specific request.

Knox + TOTP + Keychain + Mac menu bar + audit trail

May 2026

Warehouse planning, end to end

The actual silver-and-gold redesign document a team wrote when 22 financial measures had to land in a dashboard. Anonymized for publication; the architecture, slice plan, hour estimate, and ship/no-ship matrix are unchanged.

semantic metrics + database + warehouse + diagrams

May 2026

Landfall finds the thing

A client relationship lived across email, texts, chat, PDFs, public records, and local notes. Landfall turned that scattered context into a repeatable messy-data import.

Landfall CLI + email + chat + text messages + local files

May 2026

Proprioception in AI agents

An agent shipped three defensible commits and silently made five architectural decisions along the way. An outside-in consultation tool flagged the one the agent wanted to defer — and would have cost three to four hours of refactor later.

code agent + consultation tool + transforms + database

May 2026

What Flowering forgot about flowers

A published methodology for staged AI work met its first consequential decision. The agent demonstrated its failure mode in real time. The fix came from studying actual flower biology — growth without proportional pruning is decay disguised as productivity.

code agent + research notes + biology

May 2026

A $161 payment took 44 steps to prove

A real house-sale payable crossed texts, email, finance search, a bank portal, MFA, screenshots, and repo writeback. The payment succeeded; the lesson was that agents need payable lifecycles.

personal ops + email + bank portal + screenshot proof

April 2026

38% of agent deploys violated ceremony

AI agents deploying across five repos hit a ceremony gate on 38% of runs. Every violation was corrected in seconds. The cost model at scale reveals where the real value lives — and it's not in the immediate catches.

deploy gate + code agent + platform deploy + CI

April 2026

Git is a Type 2 dimension table

Manual dashboard edits were the right trade-off for a two-decade stretch. Agents change the calculus — they can only see what's in a file. Configuration-as-code becomes mandatory; and git history, it turns out, is already the right database for versioned config.

database + CI + email service

April 2026

Creating a website voice

An agent audited the existing site copy, identified the authentic voice patterns, diagnosed where new pages deviated, and produced concrete voice rules. Then rewrote two philosophy pages to match.

copy audit + code agent + site rewrite

March 2026

Enterprise software approved by Eidos

Eidos read the organization's contracts, planned a 10-stage approval pipeline, executed nine stages, and presented one decision to a human.

contracts + DAG + code agent

Coming next

Voice notes

A Mac app records voice, transcribes locally, classifies notes through an agent SDK, and stores them in recording buckets that a search engine indexes.

5 contracts 3 languages 3 processes

Swift + Python + JSON Schema + Omni

Project improvement

A forge scores any project across 10 dimensions, fixes the highest-impact gap, and records a snapshot. The snapshot schema is the contract.

1 contract 3 projects improved Self-improving

improve-forge + JSON Schema

Learning capture

A forge that extracts learnings from sessions, routes project-level fixes directly and proposes forge-level improvements to the overseer.

1 contract 15 learnings routed 10 proposals

learning-forge + forge-forge governance

Bridges: metaphor meets machine

How one-sentence "bridges" between philosophy and engineering make technical ideas land — the psychology of why "a heartbeat is a cron job" works, and how to write them deliberately.

Waiting for real-world data

eidosagi.com + reader response data

Kinetic metaphors: when animation IS the concept

The site's static SVGs were made kinetic — funnels that narrow, bars that fill, arrows that flow. Documenting which ones readers respond to and why motion works where prose doesn't.

Waiting for animated SVGs to ship

SVG + CSS animation + scroll-triggered

The pattern across every case study: structured data goes in, an agent figures out how to produce the result, and the system gets better each time it runs.

How it works: The DAG of AI