E EidosAGI

Warehouse planning,
end to end

An anonymized real planning document — the form a team actually writes when 22 measures meet a dashboard.

dbt MetricFlow + Postgres + Supabase + Mermaid

The artifact is the lesson. A consultant and an in-flow agent rebuilt a financial-reporting warehouse around the dbt MetricFlow pattern after a v1 plan was rejected during stakeholder review. The redesign — four discoveries, four layers, eleven build slices, one ship/no-ship matrix between warehouse and app database — was captured in a single HTML document while the work was still hot. It is reproduced here, names changed, every other detail intact.

The problem

A finance team publishes a monthly PDF. The PDF has 22 line items today and a planned 95 over the next eighteen months. Each line item is some combination of a measure (revenue, disposal cost, repair-and-maintenance cost), an entity (Region A, Region B, total), and a line of business (roll-off, residential, portable). The team wants those numbers in a dashboard — read from the warehouse, computed deterministically, with the same definition every month.

The first draft of the plan did the obvious thing. It modeled each published line item as its own measure: consolidated_revenue_region_a, disposal_cost_roll_off_region_b, twenty more. Every per-measure file was a few CTEs of SQL. The pattern worked, end to end, for the first ten measures. Three structural smells surfaced during stakeholder review:

A fragile leading-digit account join — string-prefix matching dressed up as a dimensional joinMeasure-per-entity identifier explosion — 22 today, 760 at full scope, hand-written and copy-pastedConflated semantics — measure, metric, dimension, and stakeholder label all jammed into one string

None of these were bugs. The first draft computed the right numbers. They were shape problems — the kind of mistake that compounds quietly until a refactor at 50 measures costs a week instead of an afternoon. The story of how the smells were caught is the proprioception case study. The artifact below is what came after — the redesign plan, written in the next ninety minutes, that supersedes the first draft and lays out the work in eleven slices.

This page is unusual among the case studies on this site. The other case studies tell a story and embed artifacts to support it. This one inverts the ratio. The document is the case study. It is reproduced in full because the form is the point — warehouse planning at production rigor looks like this, and the public examples of it are vanishingly rare.

The approach

The plan follows what every major analytics platform converged on independently — Airbnb's Minerva, Uber's uMetric, LinkedIn's Unified Metrics Platform, Looker's LookML, dbt's MetricFlow, Snowflake's Semantic Views. The vocabulary is shared:

v1 (rejected)
  • 22 distinct measure_id strings, each encoding a measure + an entity + a stakeholder label.
  • One gold model per measure. 22 files now, 95+ later. Filter logic copy-pasted across them.
  • Account categorization by leading digit. LEFT(accountno, 1) = '4' means revenue. Works until the accounting system issues a six-digit account.
  • Exclusion rules scattered. "Exclude capitalized accounts" appears as a NOT IN clause in four files.
  • Hierarchy lost. consolidated_revenue_total is revenue rolled across entity_id, but the model can't see that.
v2 (the artifact below)
  • One atomic measure (signed_amount), seven metric definitions, three dimensions. The 22 published items become metric × dim tuples.
  • One materialized gold table (gold.metric_value) holding every metric × period × entity × LOB combination.
  • Account categorization by dimension table. silver.dim_gl_account resolves account → category once, with flags like is_capitalized and is_revenue.
  • Exclusion rules in metric definitions. Each metric's filter: expression reads the dimension flags. One source of truth per rule.
  • Hierarchy preserved. Rolling across entity_id is a query against the materialized table — not a new metric.

The redesign is industry-standard, not novel. The interesting part is the shape of the document — what a planning artifact has to contain to be usable by the next agent who picks it up.

What the document holds, section by section

  • §1 — The four discoveries. Each smell from v1, with the fix, framed as evidence not blame. One investigation note clarifies a red workflow in a sibling repo that turned out to be unrelated to the work at hand.
  • §2 — The right architecture, four layers. Bronze → silver → semantic YAML → gold, with a registry layer that bridges stakeholder labels to metric × dim tuples. Rendered as a Mermaid diagram.
  • §3 — Industry justification. Five forcing functions that drove every major analytics platform to the same pattern. Airbnb, Uber, LinkedIn, Netflix, Looker, Snowflake — all there, with the specific failure mode each platform was responding to.
  • §4 — Implementation standard. What dbt MetricFlow is, what's open-source vs hosted, the materialize-don't-dynamic-compute rule that keeps dashboards fast.
  • §5 — The YAML shape. What semantic_models.yml and metrics.yml actually look like for this domain. SQL the dashboard runs to translate "Consolidated Revenue Region B" into a value.
  • §6 — Slice plan. Eleven slices with file lists, hour estimates, dependency notes, and badges marking what's done, what's next, and what's deprecated from v1.
  • §7 — Definition of done. Numbered checklist with parity-test SQL anyone can run against v4 baseline.
  • §8 — What this plan does NOT do. The next slice, the dashboard rewire, deferred aggregates — written before someone asks.
  • §9–§11 — Risks, opportunities, governance. The known unknowns, the rules the next agent has to honor.
  • §12 — Ship/no-ship matrix. Thirteen rows mapping every warehouse object to "ships to app DB" or "stays in warehouse" with a reason. Cost containment, governance enforcement, dashboard predictability — all in one table.
  • §13 — Decisions to confirm. Three open questions for the human stakeholder, each with a default recommendation and the reasoning behind it.

The artifact is built to be read once by a stakeholder for approval, and re-read many times by future agents who need to know what was decided and why. Every section has a job. Nothing in it is decorative.

The evidence

The full anonymized planning document is below. It opens in a new tab — Mermaid diagrams render via CDN, so a network connection is required for the architecture, conversion, sequence, and ship-pipeline graphs.

silver-gold-22-measures-plan-v2.html — 1,615 lines, 13 sections, 7 Mermaid diagrams

→ Open the full plan in a new tab

The document is self-contained HTML. View source to read the raw markup; reload to re-render the diagrams. Naming is consistent throughout — Acme is the company, "the accounting system" is the GL vendor, "the fleet system" is the truck-and-vehicle vendor, Region A and Region B are the two operating entities, "the controller" is the stakeholder who publishes the monthly PDF.

A few sections to look at if you want a fast pass instead of the full read:

  • §1 FIND-2 and FIND-3 — the structural smells in the v1 plan, why they compound, and the dimensional move that fixes them.
  • §3 forcing-functions table — the five reasons every major platform built the same pattern, with the specific failure each one was responding to.
  • §6 slice 6.11v2 — the warehouse-to-app-database sync, added late after a reviewer noticed the dashboard had no path to read the materialized gold table.
  • §12 ship/no-ship matrix — the row-by-row inventory of what lives where, and why bronze does not get to leave the warehouse.

The result

The v2 plan was approved by the stakeholder, executed across the next several sessions, and the materialized gold.metric_value table now serves the dashboard with sub-second reads. The eleven slices took close to the estimated five hours and forty-five minutes in aggregate — slice 6.11v2 (the warehouse-to-app-database sync that v1 had silently dropped) accounted for forty-five minutes that the original v1 plan did not account for.

The point of publishing the document is not the architecture, which is industry-standard and well documented in dbt's own MetricFlow guides. The point is the form. A planning artifact that holds enough context to be picked up by another agent or another human — with discoveries, justification, slice plan, definition of done, ship matrix, and open questions — is what production warehouse work looks like at the scale where details get lost between sessions. The plan does not assume the next reader was in the room.

An agent operating a warehouse cannot read the meeting where it was decided. The plan is the meeting.

That is the operating-model shift this site keeps returning to. When the consumer of a decision is a person who was present, an informal artifact is enough. When the consumer is an agent — including future-you — the artifact has to carry the context the meeting carried. Warehouse planning has always benefited from this rigor; it becomes mandatory the moment agents are in the loop.