How Flow Works

Plain English for a software engineer with no hardware background. Read this before the run sheets make sense.

The one-paragraph version

Hardware teams write specs. A spec is a list of things the product must do: reach 200°C, weigh under 50 kg, survive 40,000 hours before failing. Each line in the spec has a target number. The engineering team then builds and tests to prove those numbers are met. Traditionally, the spec lives in a Word doc or a tool like IBM DOORS, and keeping it up to date is a full-time manual job. Flow is a live database version of that spec. The numbers in it update automatically from the tools engineers are already using (Python scripts, CAD files, test runners). When a script runs, the spec updates. When a test passes, the relevant spec items flip to verified. No manual filing.

How the pieces connect


 ┌─────────────────────────┐        ┌──────────────────────────┐
 │   Python script         │        │   Test run               │
 │   in GitHub / CI        │        │   (script or bench)      │
 └────────────┬────────────┘        └─────────────┬────────────┘
              │ CI runs it                        │ pass / fail logged
              ▼                                   ▼
 ┌─────────────────────────┐        ┌──────────────────────────┐
 │   output number         │        │   Flow test ledger       │
 │   max_temp = 233.3 °C   │        │   TC11  PASS             │
 └────────────┬────────────┘        └─────────────┬────────────┘
              │ Flow reads it                     │ auto-links to spec items
              ▼                                   │
 ┌─────────────────────────┐                      │
 │   Design Value (live)   │                      │
 │   max_temp_achieved_c   │                      │
 └────────────┬────────────┘                      │
              │ compared to target                │
              └──────────────────┬────────────────┘
                                 ▼
             ┌───────────────────────────────────┐
             │             THE SPEC              │
             │                                   │
             │  REQ-26  temp ≥ 200 °C  →  PASS  │
             │  REQ-37  MTBF 40k hrs   →  PASS  │
             │  REQ-35  power ≤ 3 kW   →  PASS  │
             │                                   │
             └──────────────────┬────────────────┘
                                │ snapshot at any gate
                                ▼
                   ┌────────────────────────┐
                   │     Gate baseline      │
                   │     (PDR  Dec 10)      │
                   │     evidence: done     │
                   └────────────────────────┘

The features

Spec tracking: requirements, trees, and tables

The database version of the spec doc

A "requirement" in Flow is one row in the spec. It has: a statement ("the chamber shall reach 200°C"), a target value (200), a verification status (Passed / Failed / Open), a stage (Draft / In Review / Approved), and an owner.

Requirements are organized in a tree: the top-level system has requirements, and each subsystem below it inherits some of those and adds its own. If the parent changes a constraint, every subsystem that depends on it sees the update automatically. No one has to manually propagate it.

You can view the same data four ways: Table (spreadsheet pivot), Tree (nested hierarchy), Report (rendered spec doc with section numbers, the format an auditor would sign), and Design Reviews (gate-by-gate summary).

Example from the sandbox

REQ-28 · temperature accuracy ≤ ±0.5°C · Status: PASS · Stage: Approved

REQ-35 · total power ≤ 3000 W · Status: PASS · Stage: Draft

REQ-37 · humidity sensor MTBF ≥ 40,000 hrs · Status: PASS · Stage: Approved

The "inherited" rows are spec items that come from a parent system. If you are the thermal engineer and the power budget spec changes at the top level, it shows up in your subsystem view automatically. You see it before you write code against the old number.

Analysis models: Python scripts, CAD, and Excel feeding the spec

Your script run IS the spec update

This is the piece that was hardest to understand before reading the research. Here is what it actually means:

A thermal engineer writes a Python script that simulates heat flow through a chamber. They run it and get back a number: max_temp_achieved_c = 233.3. Normally they would have to manually copy that into the spec ("REQ-26 says the chamber must reach 200°C; my model shows 233.3°C, so it passes"). Flow automates that copy step.

In Flow, you create an "Analysis Model" that points to your Python script in GitHub. Flow knows which variable is the output. When the script runs (in CI or manually), Flow reads the output and stores it as a Design Value: a named live number inside the spec. The spec item that references that number automatically checks whether it passes.

Sandbox models and their outputs

Heating Model (Python/GitHub) → max_temp_achieved_c = 150°C

Reliability Model (Python/GitHub) → system_r90c90_hrs = 54,173

Power Budget Model (Python/GitHub) → total_power_actual_w = 1950

Main Chamber Volume (Onshape CAD) → mass, surface area, volume

Heating Model - Demo Test (Excel) → same calc via spreadsheet

The same pattern works for CAD files (Flow connects to Onshape, NX, SolidWorks: when the CAD updates, mass and dimensions pull automatically) and Excel spreadsheets (for teams that won't give up Excel).

CI/CD integration: what that actually means here

The pipeline that was already running now also updates the spec

CI/CD stands for Continuous Integration / Continuous Deployment. In software, it's the automated pipeline that runs tests whenever someone pushes code to GitHub. At hardware companies, that same pipeline often runs the Python simulation scripts too.

The Flow integration works like this: the Python thermal model lives in GitHub. The CI pipeline (GitHub Actions, Jenkins, whatever they use) runs it automatically on every push. Flow has a webhook or a polling hook that reads the script output after the run. So the full chain is:


┌──────────────────────────────┐
│  engineer pushes to GitHub   │
└──────────────┬───────────────┘
               │ CI triggers automatically
               ▼
┌──────────────────────────────┐
│  Python thermal script runs  │
└──────────────┬───────────────┘
               │ outputs a number
               ▼
┌──────────────────────────────┐
│  Flow reads the output       │
│  max_temp_achieved_c = 233.3 │
└──────────────┬───────────────┘
               │ updates live in the spec
               ▼
┌──────────────────────────────┐
│  spec item re-checks         │
│  REQ-26  temp ≥ 200°C  PASS  │
└──────────────┬───────────────┘
               │
               ▼
┌──────────────────────────────┐
│  systems engineer sees it    │
│  no email, no manual filing  │
└──────────────────────────────┘

From the engineer's point of view: they pushed code like they always do. They did not do anything extra. The spec update was a side effect of their normal workflow.

In the demo, you don't actually show CI triggering live (that would take minutes). You show the result: the Design Value is already there, updated, and the spec item is already marked PASS. The story is "this happened because the script ran, not because someone filed it."

Verification and testing: how test results actually get into Flow

Three paths, and the third one is why the MCP demo lands

Each spec item eventually needs to be tested. You run a test, get a result (pass/fail, plus a measured value), and that result has to be recorded against the right spec item. The question is: how does it get there without someone copy-pasting from a spreadsheet into a different app?

There are three paths. Be honest about which one is which.

Fully automatic, but requires one-time setup

If your test scripts run in CI (GitHub Actions, Jenkins, etc.), you add one extra step at the end of the job that calls the Flow API: create a test run, set status PASS or FAIL, link it to the spec items it covers. After that, every CI run posts its result to Flow automatically. The The test labeled "Automated Run TC11" in the sandbox is this pattern.

Honest caveat: someone has to wire this up once. It's not zero work. An FDE's job is to help the customer set it up for their first test case, then it runs forever.

What happens once results are in Flow (any path)

The Iterations dropdown lets you snapshot the entire project state at any moment: all spec items, all verification statuses, which tests covered which items. Name it "PDR Dec 10" and that snapshot is your gate evidence package. When the review comes, you click the baseline instead of assembling a 200-page PDF. The evidence accumulates as a side effect of the sprint, not the week before the gate.

Which tests cover which REQ IDs in the sandbox

Ramp Rate Performance Test → REQ-32, REQ-33

Humidity Accuracy Test → REQ-29, REQ-30, REQ-31

Temperature Uniformity → REQ-34

Temperature Range Verification → REQ-26, REQ-27, REQ-28

Automated Run TC11 PASS → linked via CI PR number

Change Requests: how approved specs get changed

The approval workflow that AI can propose into but not bypass

Once a spec item has been approved, you can't just edit the number directly. You have to open a Change Request (CR). A CR is a proposal to change a specific spec item: from what value, to what value, and why. A reviewer approves or rejects it. If approved, the spec item updates and the CR becomes the audit trail.

This is where the AI governance story lands. When an AI client (via MCP) tries to move a spec item from Draft to Released, the Flow API intercepts it and opens a CR instead. The stage does not change until a human reviews and approves it. The AI proposed; it did not execute. The CR is the record of what was proposed and who signed off.

This is the answer to the objection "what if the AI changes something and nobody signed off?" It can't. The governance layer catches it. The API returns 200 (success) but on the server side the stage transition is converted into a CR. You only discover this by reading the requirement back after the write, which is why the MCP does a read-back after every mutation.

The AI layer: Flow's built-in vs. the MCP

Two different ways to get AI into Flow

Flow's built-in AI is a chatbot panel (bottom right of the app). It can answer questions about the project, generate new spec items from natural language, and decompose requirements into a hierarchy. It runs on GPT-5.4, GPT-5.4 Mini, Claude Sonnet, or Claude Opus (via AWS Bedrock, and they are multi-model). In the candidate sandbox, it is disabled.

The MCP serveris what was built for this demo. MCP (Model Context Protocol) is an open standard for wiring external tools into AI clients. The MCP server exposes 10 Flow tools (read requirements, update a value, update a stage, create a test run, link test to spec items, etc.) by calling Flow's public REST API directly. Any MCP-compatible AI client gets those tools by pointing at one URL.

What the MCP tools do

flow_get_requirements → list spec items for a project

flow_get_requirement → get one spec item by ID

flow_update_requirement_value → change the number (direct write)

flow_update_requirement_stage → propose a stage change (triggers CR)

flow_create_test_run → log a test result

flow_link_requirements → attach spec items to a test run

flow_get_design_values → read live output numbers from models

The distinction between value updates (direct write) and stage updates (triggers CR) is the governance story. An AI client can update numbers freely because that's a data entry. It cannot move something to Released because that requires human sign-off. Flow enforces the difference server-side.

The "bring your own AI" pitch: for customers who want to run their own model inside their VPC (AWS Bedrock Claude, an on-prem fine-tune, anything), the public API is the contract. The MCP is a 400-line demonstration of that contract. They don't need to wait for Flow to ship the AI feature they want. They can wire their own model in over a weekend.

The sandbox product

The demo project is a TestEquity 123H. Picture a refrigerator-sized box with a sealed chamber inside. You put a circuit board, a sensor, or any electronic component in it, dial in the conditions you want (say, -40°C and 95% humidity), and run it for hours or days to see if the part survives. If your product needs to work on a drone in the Arctic, in the Mojave Desert, and underwater, you prove it in a box like this before you ever ship it. The spec for the chamber itself is what Flow is tracking in the sandbox: things like "must reach -40°C," "must hold temperature within ±0.5°C," "sensors must last 40,000 hours before failing." You never need to explain the product in the demo. Just treat it as "the thing being built."

REQ ID	What it tracks	Target	Used in
REQ-28	Temperature accuracy	±0.5°C	Beat 1 (inherited row)
REQ-34	Temperature uniformity across 9 sensor points	≤1.0°C delta	Beat 1 (inherited row)
REQ-35	Total electrical power draw	≤3000 W (actual: 1950 W)	Beat 5: Claude raises to 2150 W
REQ-37	Humidity sensor lifespan before failure	≥40,000 hrs	Beat 1 side-panel drill-down
REQ-39	Full system lifespan at 90% confidence	≥20,000 hrs (actual: 54,173)	Beat 5a pre-commit query

Prep

Scenario framing, attendee bios, Sales Feed

→

Rivian Run Sheet

Task 1: 20-min new prospect sales call

→

Anduril Activation Plan

Task 2: 15-min domain engineer activation

→

How Flow Works

The one-paragraph version

The features

The sandbox product

Related pages