Three Leaps
From Acceleration to Liberation — AI’s path to autonomous coding under framework and harness constraints.
中文版:
three-leaps.md
This methodology takes the v7 deck’s “Three Layers × Three Leaps” as its main thread, weaving in v3’s first-principles derivation, measurement formulas, and Capital-theoretic analysis. Companion slide deck: deck/index.html. Zero-to-L3 bootstrap: three-leaps-bootstrap.en.md.
0 · Reading Map
| # | Chapter | Question it answers | Source |
|---|---|---|---|
| 1 | The Current Tension | Why governance is mandatory | v7 §02 + v3 three decays |
| 2 | The Map: Three Layers × Three Leaps | One picture for the whole system | v7 §03 |
| 3 | First-Principles Derivation | Why these three layers, not others | v3 §1 |
| 4 | L0 · Engineering Gravity Field | The foundation before AI enters | v7 §04 |
| 5 | Leap ① · State Visible | From code → state | v7 §05 + v3 §8 |
| 6 | Bridge · Signal → Intent | L1 to L2 | v7 §06 |
| 7 | Leap ② · Intent Expressible | From command → intent | v7 §07 + v3 §6.4 |
| 8 | Bridge · Intent → Execution | L2 to L3 | v7 §08 |
| 9 | Leap ③ · Autonomous Loop | From inference → continuous self-governance | v7 §09 + v3 §6 |
| 10 | Autonomy Gradient R0–R5 | Delegate reversible actions to AI | v7 §10 |
| 11 | Full Loop · Order Group-Buy Case | End-to-end walkthrough | v7 §11 |
| 12 | Anti-Patterns and Boundaries | When to stop | v7 §12 + v3 §11 |
| 13 | Role Evolution | Where humans go | v7 §13 + v3 §10 |
| 14 | Measurement Framework | How we know it’s improving | v3 §12 |
| 15 | Future Shape | The 2030 engineer | v7 §14 |
| 16 | Value Finale | Enterprise / Individual / AI | v7 §15 |
| Appendix A | Capital · C/V/M Mapping | Value-flow analysis | v3 §3 |
| Appendix B | V-Model in the AI Era | Verification re-armament | v3 §4 |
| Appendix C | CALMS · DORA · Kanban | Flow execution rhythm | v3 §5 |
| Appendix D | MDM Declarative Paradigm Leap | Philosophical foundation | v3 §6 |
One-Line Thesis
At every delivery moment, every AI-generated artifact is simultaneously Needed, Trusted, and Understood.
SCQA
- Situation: AI coding has driven marginal production cost toward zero
- Complication: Human review capacity is linear; AI code defect rate is ~3× higher than human-written code
- Question: How do we keep the organization Needing, Trusting, Understanding its artifacts without throttling AI throughput?
- Answer: First build the engineering gravity field (L0), then make module state visible (L1), then make business intent declarable (L2), then let the system self-converge (L3) — three leaps stacked layer by layer, placing AI inside a trustworthy engineering system.
1 · The Current Tension
1.1 Code generation has outpaced human verification
capability
↑
│ AI coding output
│ / exponential growth
│ / ┌───────────────┐
│ / │ capability gap │
│ / └───────────────┘
│ / ─ ─ ─ ─ ─ ─ human review
│ / ─ ─ ─ ─ ─ ─ linear growth
│ /
│ /
└─────────────────────────→ time
Sonar research: AI code defect rate is ~3× higher than human-written code. When AI output grows exponentially while human review grows linearly, the capability gap is not a temporary staffing shortage — it is structural.
1.2 Three Decays
Software assets are simultaneously eroded by three forces:
| Decay | Source | Symptom | Cost of failure |
|---|---|---|---|
| Value decay | Business shifts, stale requirements | Features no longer used | Assets rot into zombie code |
| Architecture decay | Dependency rot, entropy | Modules entangle each other | Codebase rots into a big ball of mud |
| Knowledge decay | Staff churn, lost context | “Legacy code no one dares to touch” | Quality rots into technical debt |
AI acceleration amplifies all three decays simultaneously — it writes fast, but it also writes things that aren’t needed, that violate boundaries, that no one understands.
1.3 The purpose of governance: Needed / Trusted / Understood
| Three states | Decay it counters |
|---|---|
| Needed | Value decay |
| Trusted | Architecture decay |
| Understood | Knowledge decay |
Conclusion: We can’t just let AI accelerate — we must place AI inside a trustworthy engineering system.
2 · The Map: Three Layers × Three Leaps
L3 · AUTONOMOUS LOOP ← Leap ③ system self-converges
Harness · Agent · Reconciliation (Apple DDM paradigm reborn)
▲
│
L2 · INTENT EXPRESSIBLE ← Leap ② intent expressible
INTENT → CONTRACT → VERIFIER (business intent → declaration)
▲
│
L1 · STATE VISIBLE ← Leap ① state visible
identity · 5-state FSM · 3-D health (systems engineering · DDD)
▲
│
╔═══════════════════════════════════════╗
║ L0 · GRAVITY FIELD ║ ← Engineering gravity field
║ framework · module identity · CI/CD ║ (foundation · NOT a phase)
║ · runtime · DevOps · sandbox ║ precondition
╚═══════════════════════════════════════╝ skip L0 = AI in a swamp
2.1 Reading conventions
- L0 is foundation, not phase: it precedes everything; without it, leaps are impossible
- L1 → L2 → L3 are leaps: each one is a paradigm change, not incremental upgrade
- Leap ① from code → state
- Leap ② from command → intent
- Leap ③ from one-shot inference → continuous self-governance
- L0–L3 are not sequential phases, they are collaborative layers: after going live, every layer is working at the same time (see §11)
2.2 What’s inside each leap
Leap ③ internals: HARNESS → AGENT → RECONCILE → GRADIENT → SIGNAL
Leap ② internals: INTENT FILE → CONTRACT → VERIFIER
Leap ① internals: IDENTITY → 5-STATE FSM → 3-D HEALTH
2.3 Running case throughout
Every chapter below uses the same case to anchor the abstraction: the order service supports a group-buy feature.
3 · First-Principles Derivation
This system is not an empirical checklist — it is a closed system derived from 4 irreducible facts. If any later mechanism feels “plausible but I don’t know why,” return to this chapter.
3.1 Four irreducible facts
| Fact | Statement | Supporting evidence |
|---|---|---|
| F1 | The marginal cost of AI coding approaches zero | LLM compute scaling is industry reality |
| F2 | Human review capacity grows linearly, cannot follow AI throughput | Sonar: AI code defect rate ~3× human |
| F3 | Software assets undergo value/architecture/knowledge triple decay | Business change / dependency rot / staff churn |
| F4 | Organizational survival depends on accumulating trustworthy assets; untrustworthy assets are liabilities | Marx’s “congealed labor” + engineering practice |
These four facts cannot be decomposed into more basic propositions — they are the “physical constraints” of the AI-coding era.
3.2 From facts to three axioms
| Derivation | Axiom |
|---|---|
| F2 + F4 → AI output must align with intent | Axiom 1: Intent Fidelity |
| F2 + F3 → AI decisions must be rollback-able | Axiom 2: Action Reversibility |
| F1 + F3 + F4 → Quality must be encoded into the system, not patched in afterward | Axiom 3: Quality Emergence |
3.3 From axioms to three pillars
| Axiom | Pillar | Engineering axis | Cost of failure |
|---|---|---|---|
| Intent Fidelity | Value Anchoring | Time | Zombie code |
| Action Reversibility | Boundary Control | Space | Big ball of mud |
| Quality Emergence | Built-in Process | Causality | Technical debt |
The three pillars work on three independent axes — time/space/causality — and they are two-by-two orthogonal and jointly exhaustive. A single pillar’s collapse produces a distinct failure mode.
3.4 Mapping axioms onto the three leaps
| Axiom | Where it directly lands |
|---|---|
| Axiom 1 Intent Fidelity | Leap ② Intent Expressible (turn intent into a verifiable contract) |
| Axiom 2 Action Reversibility | Leap ③ Autonomous Loop + R0–R5 gradient (delegate by reversibility) |
| Axiom 3 Quality Emergence | L0 + Leap ① State Visible (encode quality into substrate and state machine) |
3.5 Why L0 is not a leap
A “leap” implies a paradigm change. L0’s six items (framework / module identity / CI / runtime / DevOps / sandbox) are all explicit, automated versions of classical software engineering — they are physical constraints, not paradigm leaps.
Calling them “foundation” rather than “phase 1” prevents organizations from treating L0 as an entry-level checkpoint to rush past. Without L0, the leaps are just slideware.
⚠️ “Foundation” ≠ “frozen object”: L0 framework itself is a slow variable but still a variable, evolving via the §10.6 Reflux Loop. See §4.1 “Framework is a living slow variable”.
What this section answers: This is the reasoning engine — proof that the three leaps are not a choice, they are a necessity.
4 · L0 · Engineering Gravity Field (Precondition · NOT a Phase)
Before placing AI inside a system, you must first build the “system” — this is a physical-law constraint.
L0 has six pillars. The first three continue classical engineering governance; the latter three are re-emphasized in the AI era:
| # | Pillar | Content | Why it matters in the AI era |
|---|---|---|---|
| 01 | Framework | Domain abstraction · contract skeleton · quality substrate | Better framework · greater AI value |
| 02 | Module identity | module.yaml: identity · intent · state · signals |
The entry point to Leap ① |
| 03 | CI/CD gates | lint · type · test · contract · sec scan | Left side accelerated 100× · right side re-armed 3× |
| 04 | Runtime (NEW) | Containers · K8s · service mesh · registry | AI-written code must actually run |
| 05 | DevOps stack (NEW) | deploy · log · trace · metric · secret · alert | When AI errs · rollback must be seconds |
| 06 | Sandbox | Isolation · observability · DDD bounded contexts | AI experiments only inside the sandbox |
4.1 Framework: the machine tool
Borrowing v3’s Capital-theoretic view: the framework is the most highly congealed constant capital (C).
| Framework function | Cost of failure |
|---|---|
| Domain isolation (physical boundaries of bounded contexts) | Modules entangle; refactor cost grows quadratically over time |
| Base abstractions (storage / messaging / auth / observability) | Adapter sprawl; dependency chaos |
| Contract skeleton (API / event / error code standards) | Modules cannot interoperate |
| Quality substrate (test base classes / mocks / sandbox / archtest) | Every module reinvents the wheel |
Excellent framework × 100 modules = 100 high-quality outputs Missing framework × 100 modules = 100 messes (chaos replicated at the same speed)
Framework is not a single capability — it is the cumulative output of the entire L0 layer
The four functions cannot all be in place at one time:
Blueprint (pre-ADR + 3-tier skeleton + SDK signatures + contract location + test-base placeholder)
↓
CI lets lint/test/build run on the skeleton (constraints start being enforced)
↓
hello-world routes through the skeleton, not naked code (blueprint validated runnable)
↓
Domain layering fills the skeleton (skeleton becomes content-bearing)
↓
archtest turns README boundaries into machine rules (constraints truly bind)
↓
Complete framework = blueprint + 6 enforced capabilities
Important: the framework’s “constraints” need CI and archtest to enforce them. Without enforcement, the framework is just README text — any AI autonomy promise built on top stands on loose constraints. This is the shared root of anti-patterns A1 (over-governance) and A3 (gradient breach).
See three-leaps-bootstrap.en.md §3 for the staged construction path and the P0.0 blueprint capability.
Framework is a living slow variable — the Reflux Loop (a.k.a. Evolution Engine)
The framework is congealed capital, but congealed ≠ frozen. The framework is not designed once and then frozen for use — it continuously evolves through the Reflux Loop:
Module replacement ──► Multi-agent parallel rewrite ──► Candidate diffs + cross-review findings + serendipitous patterns
↓
Framework upgrade queue ◄────────── harvest (extract commonalities)
Intent revision drafts ◄───────── ↓
↓ │
Next-generation modules start from a more refined skeleton ────┘
- Slow variable: framework evolves quarterly, intent monthly, modules weekly — three different cadences, but all alive
- Reflux direction: by-products of module replacement (candidate diffs / common findings / new patterns) flow back into framework and intent, making the next batch of modules more refined
- Evolution Engine: this is the true mechanism of §10.6 module-as-commodity — when a commodity breaks, you don’t just replace the commodity, the machine tool itself is improved
The Reflux Loop is the source of life for the entire methodology. Without reflux, the framework degenerates into frozen dogma → modules struggle on outdated framework → eventual return to the big ball of mud. See §9.4 reconcile harvest step, §9.5 multi-agent reflux, §10.6 evolution engine, §12 anti-pattern A8 “replace-without-reflux”, §13.2 evolution curator role, §14.2 reflux hit rate metric.
4.2 Module identity: module.yaml
Every module must carry a machine-readable identity card. This is the entry point to Leap ① — without manifests, you can’t even say “which module is failing.”
module:
name: order-service.group-buy
domain: order
intent:
goal: group-buy ordering
metric: GMV ↑ 15%
contracts:
exposed: [GroupOrder]
consumed: [payment.coupon]
lifecycle:
state: experimental # → candidate → asset → maintenance → retired
signals:
collected_by: [ci, otel, sonar]
4.3 CI/CD gates: left-accelerated × right-rearmed
The V-model in the AI era (see Appendix B):
Left side accelerated 100× Right side re-armed 3×
AI coding in seconds ───────────► static analysis + fuzzing + contract + formal verification
CI/CD is not “lint runs once” — it is 5–7 layers of independent gates in series: lint → type → test → contract → sec scan → SBOM → coverage gate.
4.4 Runtime (NEW · v7 emphasis)
AI-written code must actually run — not merely look correct. This requires:
- Containerization (Docker / OCI image)
- Orchestration (Cloud Run / Container Apps / K8s, scaled to size)
- Service mesh (mTLS / traffic mirroring / canary routing)
- Image registry + provenance
Anti-pattern: AI writes code, PR merges, but no runtime verification — equivalent to letting AI write code on a whiteboard.
4.5 DevOps stack (NEW · v7 emphasis)
When AI errs, rollback must take seconds. This requires:
| Capability | Purpose |
|---|---|
| Deploy (progressive delivery) | canary 1% → 10% → 50% → 100% |
| Logs (structured JSON) | Trace AI decision paths after the fact |
| Tracing (OTel) | Cross-service fault localization |
| Metrics (SLI/SLO + budget) | Objectively judge “is it good” |
| Secrets management | AI should never see plain-text credentials |
| Alert + on-call | Drift detected early |
Rollback is the physical foundation of the R-gradient: R1–R3 can be delegated to AI precisely because seconds-level rollback exists. Without the DevOps stack, the entire §10 autonomy gradient is hollow.
4.6 Sandbox: controlled trial-and-error
AI must experiment in a sandbox first, then unlock permission level by level:
- Isolation (namespace / network policy / resource quota)
- Observability (every action inside the sandbox is auditable)
- DDD bounded contexts (the sandbox is itself a bounded experimental site)
4.7 Capital view: L0 = the most highly congealed C
All six L0 pillars are “congealed past labor.” Every additional ounce of effort organizations invest in L0 raises AI’s output precision in L1–L3 by a tier.
What this section answers: L0 is a physical-law constraint. Skipping L0 = placing AI in a swamp.
5 · Leap ① · State Visible (L1)
From code → state. Every module has identity, a state machine, and health-signal scores.
5.1 Two dimensions
L1 is not just “tag a state on a single module” — it establishes two dimensions simultaneously:
| Dimension | Content | Cost of failure |
|---|---|---|
| Single-module lifecycle | 5-state FSM + 3-D health | Cannot judge “should this module still be alive” |
| Inter-module composition | System composition view + contract graph | Modules without system assembly = systems engineering failure |
5.2 Five-state machine
intent → experimental → candidate → asset → maintenance → retired
↓
tombstone (24h soft-delete)
│ │ │ │
value review biz signal human + gates biz decay
| State | Entry condition | Rollback path |
|---|---|---|
| experimental | Intent passes value review | Delete skeleton + close intent |
| candidate | Business signals ≥ threshold + smoke pass | Back to experimental |
| asset | Full gates passed + human approval | Back to candidate |
| maintenance | Business signals decay but dependencies remain | Reactivate to asset |
| retired | No dependencies + business signals zero | Recoverable within 24h |
Tombstone: module name / final version / outward contract snapshot / dependency snapshot / retirement reason / retirement date / 24h soft-delete window. A tombstone is an “auditable death certificate,” not a one-line log.
5.3 Three-dimensional health score
value score = w1·active demands + w2·contract subscribers + w3·traffic (0-100)
structure score= w1·framework compliance + w2·boundary consistency + w3·contracts (0-100)
engineering = w1·coverage + w2·build pass rate + w3·activity decay-weighted (0-100)
Derived signals:
- Health warning: any dimension < 30
- Forced retirement suggestion: any dimension < 10
- Mandatory human review: all three < 50 for two weeks
5.4 Signal collection (mechanized checklist)
| Dimension | Signal | Collection method |
|---|---|---|
| Business | Active demand count / contract subscribers / traffic reach | Demand system API + traffic instrumentation |
| Architecture | Framework usage compliance / cross-domain dependencies / contract registry consistency | Static analysis + archtest |
| Engineering | Test coverage / build pass rate / defect density / activity decay-weighted | CI + git + defect tracking |
Strictly forbidden: humans filling in signals by hand. All signals must be mechanically collected derived quantities.
5.5 System composition view
A single module’s health does not equal system health. L1 must simultaneously maintain the module composition graph:
user-svc ──► [order.group-buy] ──► payment.coupon
│ │
├──► notify-svc │
└──► inventory │
contract: GroupOrder ◄──┘
SYSTEM · 5 hops assembly · weakest node 65 ──► system score ≠ module mean
System score = min(weakest dimension across all modules in the chain), not the mean. A 65-point critical dependency drags down the entire chain.
5.6 Case: current state of the group-buy module
case · order-service.group-buy
state · candidate (collecting business signals)
3-D health · value 80 · structure 65 · engineering 90
L1’s output is state visibility — the next step is to declare intent on top of it (→ §6 bridge).
What this section answers: Leap ① turns every module from code into an observable object.
Modules themselves are alive: during the asset state they may be patched many times or replaced once (see §10.6). The state machine’s “asset → maintenance → retired” is not a fated path of decay — when framework or intent evolves, an asset can also be refreshed via REPLACE. The state machine covers the module’s full living lifecycle.
6 · Bridge · Signal → Intent (L1 → L2)
Seeing state → enables describing intent. L1 turns each module into an “observable object” · L2 turns each change into a “declarable event.”
6.1 Raw signals are a data deluge
L1 OUTPUT · signals
─────────────────────────────────────
order-service.discount · call rate ↓ 73% / 30d
order-service.group-buy · error rate 2.1%
payment.coupon · health 91 / 90 / 88
…live portraits of hundreds of modules
→ no direction · cannot act
Live portraits of hundreds of modules, without “what they should be” as a comparison baseline, are just noise.
6.2 Intent turns signals into judgments
L2 INPUT · intent
─────────────────────────────────────
"Let the order service support group-buy,
with coupon-code payment, p99 < 200ms"
│
▼ AI translation
contract: POST /orders/group
sla.p99: 200ms · sla.error: 0.5%
depends_on: payment.coupon (≥85)
success: signals.usage ≥ 1k/d
Now every signal can be compared:
- error rate 2.1% vs sla.error 0.5% → divergent
- payment.coupon health 88 ≥ 85 → satisfied
6.3 The bridge thesis
Without L1’s state visibility, L2’s intent is just a blank check.
Intent cannot be declared in a vacuum — it must be grounded in “what is real now.” The L1→L2 bridge guarantees intent is always rooted in reality.
7 · Leap ② · Intent Expressible (L2)
From command → intent. Not by directly applying MDM — by re-embedding the “declare + converge” paradigm at the code layer.
7.1 Intent file: four-block structure
The intent file is not an abstract concept — it is these four blocks:
┌─────────────────────────┬─────────────────────────┐
│ 01 · BUSINESS │ 02 · CONTRACT │
│ business intent │ technical contract │
│ goal: group-buy ordering│ api: POST /orders/group │
│ metric: GMV ↑ 15% │ schema: GroupOrder │
│ deadline: 2026-Q2 │ events: [GroupCreated] │
├─────────────────────────┼─────────────────────────┤
│ 03 · QUALITY │ 04 · LIFECYCLE │
│ quality thresholds │ lifecycle │
│ p99: 200ms │ state: candidate │
│ error: < 0.5% │ sunset_if: usage<100/d │
│ coverage: ≥ 95% │ review_at: 30d │
└─────────────────────────┴─────────────────────────┘
Every field maps directly to a verifier (see §7.3).
7.2 Creative application of MDM
Not directly applying Apple DDM — borrowing the paradigm by isomorphism:
| Apple DDM (device management) | Software module governance |
|---|---|
| Declaration | Intent file |
| Reconciliation | Drift detect + AI agent loop |
| Convergence | Continuous convergence toward intent |
| Predicates | Signal-threshold conditions |
| Status channels | Three-dimensional health write-back |
Isomorphic, not identical: devices pull declarations · modules pull intents; devices auto-configure · modules self-evolve.
7.3 Direct field-to-verifier binding
Not documentation · executable:
| Intent field | Auto-bound verifier |
|---|---|
sla.p99: 200ms |
k6 performance test, regression fails |
contract: GroupOrder |
Contract test (pact / openapi diff) |
coverage: ≥ 95% |
New-code coverage gate |
sunset_if: usage<100/d |
Reconciler scheduled check + auto alert |
depends_on: payment.coupon (≥85) |
Composition view health linkage |
7.4 What business need does this solve
The group-buy requirement is no longer a PRD doc + scheduling meeting + Jira ticket — it is an intent file that simultaneously declares, delivers, and monitors.
This is the real grounding of v3 §6 “paradigm leap”: from “humans inside the decision loop” to “humans inside the desired-state-definition loop.”
What this section answers: Leap ② turns intent into a verifiable, convergent, auditable engineering object.
Intent itself is also alive: business change → intent v2 → triggers module REPLACE (see §10.6 trigger layer). The intent revision rate (§14.2) should be a healthy non-zero value — long-unchanged intents usually indicate disconnection from business, not stability.
8 · Bridge · Intent → Execution (L2 → L3)
Once you declare → you need an “executor.” L1’s state machine says “where we are now” · L2’s intent says “where we want to go” · L3’s Harness is “how to automatically get there.”
8.1 Drift detect: gap between declaration and reality
┌────────── DESIRED STATE ──────────┐ ┌─── CURRENT STATE (L1) ───┐
│ group-buy ordering │ │ api: not implemented │
│ p99 200ms · 95% cov │ │ delta: missing │
│ contract: GroupOrder │ └───────────────────────────┘
└────────────────────────────────────┘ │
│ │
└─────► drift detected ◄──────────┘
desired ≠ current
Only when desired ≠ current is an “executor” needed. The essence of reconciliation is closing the drift.
8.2 AI Agent · autonomous task loop
01 · read intent + current signals
02 · plan tasks · decompose steps
03 · invoke tools inside Harness
04 · write code · run tests · self-verify
05 · open PR · wait for gates · collect signals
06 · failed → back to 01
Until desired = current (CONVERGED), the loop continues.
8.3 The bridge thesis
L2 without L3 is just a promise written on paper.
Declaration alone won’t make the group-buy feature live. You need an executor that translates intent into PRs, runs tests, and retries on failure. That executor is the next chapter’s Harness + Agent.
8.4 Drift is bidirectional
The reconciler’s default scenario: current drifts from desired (the module has a problem), and current must be pulled back to desired. But v7’s Reflux Loop (§4.1) makes drift bidirectional:
| Drift direction | Trigger scenario | Handling path |
|---|---|---|
| current → drifts from desired | Code bug / perf regression / dependency outage | Traditional PATCH (§9.4) |
| desired → actively evolves | framework v2 / intent v2 / business change | REPLACE + harvest (§9.4 / §10.6) |
The second kind of drift is not “the module broke” — it’s “the upper layers have advanced.” Current didn’t err; the changing desired makes it “outdated.” The REPLACE path is purpose-built for this kind of drift.
9 · Leap ③ · Autonomous Loop (L3)
From one-shot inference → continuous self-governance. The Agent’s context and progress can be progressively disclosed from L1 system state — they don’t have to be conjured from nothing.
9.1 Harness is the engineering shell, not a prompt
The key is not a smarter AI · it is a more stable “shell + heartbeat.”
Harness Five-Pack (based on Anthropic’s official guidance):
| Component | Purpose |
|---|---|
| System context | Inject project knowledge, constraints, conventions (CLAUDE.md / cursor rules) |
| Tool constraints | Fail-closed permissions + command blocklists |
| Context injection | Agent Skills + Progressive Disclosure |
| Memory & progress | Cross-session state preservation (git log + ADR + memory) |
| Evaluation loop | Continuous convergence, not one-shot inference (CI green + eval suite) |
9.2 Progressive disclosure: context comes from L1 system state
The Agent’s context need not be conjured from nothing — most of it comes from L1’s already-recorded system state:
L1 STATE · source HARNESS · engineering shell
┌──────────────────────┐ ┌─────────────────────────┐
│ system design (DDD) │ │ AGENT · brain │
│ API · contract │ progressive │ AI agent │
│ UI design (tokens) │ ─────────────► │ plan · act · obs │
│ progress (tasks/PRs) │ │ │
│ health (3-D scores) │ │ TOOLS·constrained EVAL·heartbeat │
│ ALL RECORDED │ │ whitelisted actions self-verify│
└──────────────────────┘ └─────────────────────────┘
│
┌────────────── write back ◄───────────────┘
│
▼
RECONCILE · system state converges to intent · ∞ loop
This is why §5 and §6 must come first — L1’s state visibility is the source of L3 Harness context.
9.3 Isomorphic to K8s controllers / Apple DDM
L3 is not a new paradigm — it borrows mature ones:
| Dimension | Apple DDM | Kubernetes | L3 software governance |
|---|---|---|---|
| Desired state | Declarations | CRDs | Intent file |
| Pull mechanism | Device pulls | Controller watches | Reconciliation agent |
| Conditional application | Predicates | Label selectors | Signal thresholds |
| Status reporting | Status channels | Status subresources | 3-D health |
| Offline behavior | Offline self-maintains | Controller restart idempotent | Soft-delete window + tombstone |
9.4 Reconciliation Loop pseudocode
loop:
1. pull module manifests + intent files
2. collect signals (CI / git / dependency / traffic)
3. compute current health score (3-D)
4. detect drift (current vs desired)
5. choose remediation strategy: PATCH or REPLACE
PATCH (apply patch) ── fits: localized drift / sound module structure / low fix cost
REPLACE (rewrite) ── fits: systemic drift / framework has evolved / rewrite cost ≤ patch cost
↳ Module-as-commodity (see §10.6): under strong framework
constraints, regenerating from scaffolding + letting intent
drive AI to write a fresh one is often more controllable
than patching years of accumulated patches
6. apply remediation, delegated by reversibility R0–R5:
R0–R1 (read-only / experimental): AI executes directly
R2 (controlled external): AI auto-released + audit log
R3 (cross-domain write): AI proposes + review (see §10.5 for reviewer evolution)
R4 (user impact): blocked + mandatory human decision
R5 (financial / physical): never granted
7. if REPLACE: harvest (Reflux Loop · see §4.1 / §10.6)
(a) collect diffs across N agent candidate implementations → distill candidate patterns
(b) collect common findings from multi-agent cross-review → framework / CI / archtest improvement queue
(c) collect intent under-specifications (candidate divergence = intent gaps) → intent revision drafts
↳ harvest outputs do not take effect immediately; they enter the Evolution Curator (§13.2) review queue
8. report status to dashboard + decision audit store
9. state changes feed back to L1 · input for next cycle
What this section answers: Leap ③ turns “one-off prompts” into “continuously running environment-level feedback control systems.”
9.5 Multi-agent multi-model · ensemble play
Single agent / single model is v7’s initial stance. With a strong-enough framework and quality substrate, every step of intent → state → module → launch can be run in parallel by multiple agents and multiple models, racing → winning combination chosen.
9.5.1 Why ensemble
Different models have different strengths (long context / reasoning depth / coding precision / tool use / vision); a single model is suboptimal at every stage. Letting multiple models specialize: combined strengths > strongest single model.
9.5.2 Stage-by-stage ensemble examples
| Stage | How to use multiple agents | Selection rule |
|---|---|---|
| Intent comprehension | Long-context model (Gemini 2M) parses PRD + reasoning model (Claude) extracts intent.yaml + verifier model cross-checks | All three agree → pass; disagree → escalate to human |
| Module implementation | 3 agents implement in parallel → run same test suite | PR with highest coverage + perf wins |
| Code review | Security agent + perf agent + arch agent + style agent in parallel | Composite score; any critical issue blocks |
| Launch decision | Reconciler combines multiple independent health-score evaluations | Majority rules; minority dissent enters audit |
9.5.3 Selection strategy encoded in intent
intent:
execution:
strategy: race | majority | ensemble | tournament
# race = multiple agents race; first to pass wins
# majority = multi-model vote, majority rules
# ensemble = weighted combination of multiple outputs
# tournament = multiple elimination rounds, best wins
judges:
- model: claude-opus-4-7
role: architecture compliance
- model: claude-sonnet-4-6
role: performance optimization
- model: gpt-5
role: security risk
quorum: 2/3 # threshold for release
9.5.4 Capital-theoretic view: V multiplied
Single agent = 1× V; ensemble = N× V, but you only pay for the highest-quality output. The other N–1 drafts become “scrap,” but at the system level the cost is often far lower than human rework. The key is: token cost < human cost × reworks saved.
9.5.5 Anti-patterns
- Blind voting — if different models share highly homogeneous training data, majority rule degenerates to single rule; pick judges with model heterogeneity
- Judge = player — same model both writes and reviews code, drifts toward self-rationalization; the judge must be an independent model
- Infinite rounds — tournament must have a max-round cap, otherwise it loops forever
9.5.6 Multi-agent is not only for selection — also for reflux
“Best wins” is unidirectional (N candidates → 1 selected). But multi-agent ensembles in replacement scenarios produce bidirectional feedback:
| By-product | Reflux direction |
|---|---|
| N candidate implementation diffs | Expose intent under-specification — “three agents wrote three different payment fault-tolerance strategies = intent didn’t say clearly” → intent revision draft |
| A serendipitous new pattern from one agent | Distilled into a new framework best practice — “this abstraction fits this scenario better” → framework upgrade queue |
| Common findings across multi-agent cross-review | Reflux into framework / CI / archtest — “this boundary is frequently violated” → add new rule, add archtest, change SDK default |
This is the true power source of §10.6 module-as-commodity — module replacement does not just produce a new module, it refluxes upward to evolve framework and intent.
Key insight: single-agent replacement = fixing one bug; multi-agent replacement = system evolves once. The diffs across N candidates are more valuable than “the best answer” — they are test cases for framework / intent.
What this section answers: Ensemble play evolves v7’s “AI soldier” into “AI cluster” — this is §10’s R-gradient in parallel form: the same reversible action, multiple agents racing → best wins; and is also the core engine of the §10.6 Reflux Loop: selection is just the start; candidate diffs flowing back to upper layers is the key.
§9.5 is the engineering implementation of §10.5: when the independent AI reviewer agent is played by “another model,” §9.5’s “the judge must be an independent model” principle provides the credibility §10.5’s “independent AI review” needs.
10 · Autonomy Gradient R0–R5
Not “AI in full control” · but “AI taking over reversible actions.” Give reversible actions to AI · keep irreversible actions for humans — that is true liberation.
10.1 Six levels
REVERSIBILITY
reversible ←─────────────────────────────────────────────────► irreversible
┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┐
│ R0 READ │ R1 LOCAL│ R2 CTRL │ R3 CROSS│ R4 USER │ R5 IRREV│
├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ read- │ local │ sandbox │ modify │ delete │ funds │
│ only │ edits │ APIs │ other │ data │ physical│
│ inspect │ own │ test │ services│ change │ device │
│ propose │ repo │ env │ migrate │ billing │ control │
├─────────┼─────────┼─────────┼─────────┼─────────┼─────────┤
│ AI auto │ AI auto │ AI auto │ AI prop │ human · │ human · │
│ │ │ release │ + human │ never │ red │
│ │ │ │ review │ granted │ line │
│ no human│ git roll│audit log│ staged │human │never │
│ │ │ │rollout │ gate │ auto │
└─────────┴─────────┴─────────┴─────────┴─────────┴─────────┘
10.2 Mapping gradient to Harness configuration
| Level | Harness configuration | Safety net |
|---|---|---|
| R0 read-only | Context + evaluation only | Writes nothing |
| R1 local edits | + Restricted file editing tool | git rollback |
| R2 controlled external | + Sandbox API + test env write | Audit log + sandbox isolation |
| R3 cross-domain write | + Cross-service / migration permissions | Staged rollout + human review |
| R4 user impact | – | Human gate · never granted |
| R5 financial / physical | – | Never auto · red line |
10.3 Permanent boundary
R5 is never granted — this is the system’s hard boundary regardless of organizational trust or AI capability.
R4 in even the long-term vision phase remains “block + force human decision,” not within automation.
10.4 Gradient breach = catastrophe
Handing R4/R5 to the Agent is the core symptom of anti-pattern A3 “Gradient Breach” (see §12). Anti-pattern cost rises with layer — L0 breach is waste, L3 breach is incident.
What this section answers: The R-gradient turns “AI autonomy” from a slogan into an executable, auditable engineering parameter with a red line.
10.5 Reviewer evolution (v7+ direction)
R3 = “AI proposes + human review” in §10.1’s table is v7’s initial stance. With a strong-enough framework and DevOps stack, the “human review” itself can evolve — not by abandoning review, but by freeing the reviewer from being the bottleneck.
Three reviewer kinds
| Reviewer | Fits | Cadence | Trust source |
|---|---|---|---|
| Human review | Framework / contract / spec changes (meta layer) | Slow but authoritative | Seniority + accountability |
| Independent AI review | Instance code changes (within the framework) | 24×7 | Model heterogeneity + strong framework + health-score safety net |
| Multi-agent cross-review | High-disagreement / high-risk scenarios | Parallel | Multi-model majority vote (see §9.5) |
Boundaries between the three
Key: upgrading the reviewer ≠ removing review. The three reviewer kinds coexist across different scenarios.
┌────────────────────────────────┬────────────────────────────────┐
│ Human review preserved (never │ AI review can take over │
│ goes away) │ │
├────────────────────────────────┼────────────────────────────────┤
│ · L0 framework changes │ · Instance code (within the │
│ (affects all modules) │ framework boundary) │
│ · New boundary rules between │ · Verification scenarios bound │
│ R3 → R4 │ to intent.yaml fields │
│ · Changes to the R-gradient │ · Routine changes covered by │
│ itself │ health thresholds │
│ · ADR-level decisions │ · OOB attempts already caught │
│ · Anything touching R4/R5 │ by archtest │
│ │ · Changes safe under staged │
│ │ rollout │
└────────────────────────────────┴────────────────────────────────┘
R3 upgrade prerequisites (all four must hold)
- Complete L0 framework in place — archtest / contract / quality substrate all enforced
- DevOps stack in place — seconds-level rollback + canary + 3-D health + on-call
- Independent reviewer agent — uses heterogeneous models vs the coding agent, avoiding judge = player (echoes §9.5.5 anti-pattern)
- Complete decision audit — every AI autonomous decision is structurally queryable, post-hoc replayable
When met, R3 evolves:
v7 initial R3: AI proposes + human review
↓ once prerequisites met
v7+ R3: AI proposes + independent AI review (multi-agent cross-review)
+ health-score safety net + reversible
↓
R4/R5: Never change — human review / never granted
What this section answers: Turn “human review” from a fixed role into an engineerable bottleneck that can be upgraded — provided strong framework + heterogeneous models + health-score safety net are all in place.
10.6 Module-as-commodity — economics of patch vs replace
A module is a commodity stamped out on the framework machine tool. Under quality-substrate constraints, “fix” is no longer the default option — “rewrite the whole module” is often the more economical fix.
Why “fix-first” used to be the rule
Traditional software engineering defaulted to patch over replace because:
- Average module dev cost = several person-weeks
- Rewrite = re-paying the entire dev cost
- Rewrite-failure risk ≥ patch-failure risk
Why v7+ enables “replace-first”
The legitimacy of replacement comes from two layers of preconditions: the infrastructure layer (capability available) + the trigger layer (when to actually start). Both must hold simultaneously, otherwise replacement is either infeasible or descends into anti-pattern A8 “replace-without-reflux”.
Infrastructure layer (necessary conditions):
| Precondition | Cost impact |
|---|---|
| L0 strong framework constraints | Modules start from scaffolding, reuse framework SDKs / contracts / test substrate → rewrite cost ↓ 70%+ |
| L2 intent fully executable | intent.yaml is the module’s “DNA” — rewrite is essentially “regenerate from intent via AI” |
| L3 multi-agent ensemble | Same intent rewritten by N agents in parallel → best wins (§9.5) → rewrite time drops from days to hours |
Trigger layer (change-driven):
| Trigger | Meaning |
|---|---|
| Framework has evolved | L0 v1 → v2 introduces new abstractions / new SDKs / new contracts → old modules must be rewritten to leverage the new framework’s precision |
| Intent has evolved | Business change drives intent revision (intent v2) → rewrite is essentially “rebirth per the new intent”; patch is meaningless here |
⚠️ Without the trigger layer, replacement is anti-pattern: if neither framework nor intent has evolved, “replacement” means swapping a module with the same module — exactly the symptom of anti-pattern A8 “replace-without-reflux” and the §10.6 anti-pattern “replace without touching intent”.
Reconciler decision point: patch, replace, or harvest
when drift detected:
if drift_localized AND patch_cost < replace_cost:
propose PATCH # traditional path
elif framework_drifted_significantly OR # trigger layer (must have evolved)
intent_has_changed_substantially: # trigger layer (must have evolved)
propose REPLACE # new path
# flows through same R-gradient (§10.5)
# REPLACE must be followed by harvest (next subsection)
elif module_age > 6mo AND patch_count > 10:
# old module accumulated patches, but neither framework nor intent has evolved
# signal Evolution Curator (§13.2) — should framework / intent be upgraded?
signal: refactor_pressure
Reflux Engine: replacement’s true value is not the new module, it’s the upward reflux
Module REPLACE
↓
N agents rewrite in parallel ────► candidate 1 / candidate 2 / ... candidate N
↓
├──► best wins (§9.5.2) — pick 1 to ship
│
└──► harvest (§9.4 step 7) — extract candidate diffs / cross-review findings / new patterns
↓
├──► framework upgrade queue: common boundary violations across candidates, missing abstractions
├──► intent revision drafts: under-specifications exposed by candidate divergence
└──► pattern candidate library: novel solutions from individual agents
↓
Evolution Curator (§13.2) review
↓
Approved → framework / intent upgrade → next-gen modules start from a more refined skeleton
Key insight: single-agent replacement = fixing one bug; multi-agent replacement = system evolves once. Framework is the machine, modules are the commodities — when a commodity gets replaced, the machine is also being improved. This is the core mechanism of §4.1’s Reflux Loop.
Anti-patterns (4 within §10.6)
- Replace addiction — every bug becomes a rewrite: surrenders patch’s cost advantage on small changes
- Replace without touching intent — rewrite but don’t update intent.yaml: same requirement produces the same bug
- Replace bypassing review — treating replace as R0 auto-execute: replace cost ≥ patch, must traverse the same R-gradient
- Replace-without-reflux (A8 · global anti-pattern · see §12) — REPLACE done, but candidate diffs / cross-review findings / serendipitous patterns never enter the harvest queue: replacement degenerates to “regenerate the same thing again,” missing the system-evolution opportunity
What this section answers: Module replaceability is the new engineering economics enabled by v7+. But what truly changes engineering economics is not “commodities are replaceable” but “commodities reflux upward to improve the machine when replaced” — this is §4.1’s Reflux Loop’s source of life.
11 · Full Loop · Order Group-Buy Case
L0 → L1 → L2 → L3 are not phases, they are collaboration — at the same time, every layer is working.
11.1 One requirement from declaration to live
┌────────────┬────────────┬────────────┬────────────┬────────────┐
│ STEP 1·L2 │ STEP 2·L3 │ STEP 3·L0 │ STEP 4·L3 │ STEP 5·L1 │
│ │ │ │ GATE │ │
├────────────┼────────────┼────────────┼────────────┼────────────┤
│ human │ Agent │ gates+test │ human PR │ go live │
│ writes │ takes over │ CI/CD │ R3·cross │ enter FSM │
│ intent │ reads intent│ contract │ 5 evidence │ candidate │
│ "group-buy"│ decompose │ check │ 1-click │ → asset │
│ + p99 200ms│ write code │ 3 fail→fix │ release │ observe 30d│
├────────────┼────────────┼────────────┼────────────┼────────────┤
│ 3 min │ 22 min │ 9 min │ 5 min │ continuous │
│ human · L2 │ AI auto │ system·auto│ human·gate │ system·evolve│
└────────────┴────────────┴────────────┴────────────┴────────────┘
TOTAL · 39 MIN HUMAN-BLOCKING + ∞ SYSTEM-AUTO
For comparison · same requirement, traditional flow: 3 weeks scheduling + 4 meetings + 2 reworks
11.2 Which layer each step works in
| Step | Primarily in | Also in |
|---|---|---|
| 1 Write intent | L2 | L0 (intent file passes schema check) |
| 2 Agent takes over | L3 | L1 (reads current signals) + L0 (sandbox execution) |
| 3 Gates + tests | L0 | L1 (health updated) + L3 (drift loop monitoring) |
| 4 Human PR review | L3 gate | L0 (CI data as approval evidence) |
| 5 Go live | L1 FSM | L3 (continuous reconciliation) |
11.3 Key insight: collaboration ≠ phases
The waterfall mental model treats L0–L3 as sequential dependencies — first build L0, then L1. The v7 view is collaboration:
- L0 is 24×7 running foundation, used in every PR
- L1 is 24×7 updating state machine, every module produces new signals every second
- L2 is the declaration layer used every time a developer writes an intent
- L3 is the 24×7 reconciliation agent loop
At the same time, every layer is working — that is the real form of “system self-convergence.”
11.4 Case outcomes
| Dimension | Traditional flow | Three-leaps full loop |
|---|---|---|
| Human-blocking time | 3 weeks + 4 meetings + 2 reworks | 39 minutes |
| Decision record | Scattered in Jira / Slack / email | Intent file + decision audit store |
| Post-launch traceability | From memory | trace_id + 3-D health + state machine |
| Retirement judgment | No one dares to touch | sunset_if auto-triggers |
What this section answers: The full loop is the only test of whether the entire system actually works — completing one real requirement in 39 minutes.
12 · Anti-Patterns and Boundaries
Every leap has its failure modes · knowing the boundaries matters more than knowing the methods.
12.1 Anti-patterns by “stage × consequence”
consequence ↑ ANTI-PATTERNS
CATASTROPHIC ┌─────────────┐
│ A3 Gradient │
│ Breach │
│ R4/R5 to AI │
│ → disaster │
└─────────────┘
SYSTEMIC ┌─────────────┐
│ A2 Intent ≠ │
│ Execution │
│ declared but│
│ unverified │
└─────────────┘
REVERSIBLE ┌─────────────┐
│ A1 Over- │
│ governance │
│ gates > code│
└─────────────┘
L0 L1 L2 L3 →
Gravity State Intent Autonomous
field visible expressible loop
↗ higher layer · greater cost of breach
12.2 Eight anti-patterns (v3 + v7 + Reflux Loop merged)
| # | Anti-pattern | Symptom | Correction |
|---|---|---|---|
| A1 | Over-governance | Experimental modules also run full gates; governance time > coding 30% | Strict tiering, prefer leniency over uniformity |
| A2 | Intent ≠ Execution (declared but unverified) | intent.yaml goes unchecked; declaration becomes decoration | Field-bound verifiers (§7.3) |
| A3 | Gradient breach | R4/R5 also delegated to Agent; irreversible disaster | R5 never granted; R4 forced human decision |
| A4 | Signal fill-in | Humans manually fill business signals | Must be mechanically collected |
| A5 | AI suggestion worship | Acceptance rate 100%, rejection 0% | Healthy rejection rate ≥ 10% as floor |
| A6 | State machine rigidity | Modules stuck in one state for half a year | Add “overdue migration” alerts |
| A7 | Approval ritualism | Click approve without checking evidence | Mandate “evidence checked” toggle |
| A8 | Replace-without-reflux | REPLACE done, but candidate diffs / cross-review findings / serendipitous patterns never enter the harvest queue; framework / intent never evolve despite signals | Reconciler’s REPLACE path must include harvest step (§9.4 step 7); Evolution Curator (§13.2) reviews harvest queue weekly; framework / intent monthly evolution rates included in North Star metrics (§14.2) |
12.3 Out of scope
Do not try to use this methodology to solve:
- Wrong business direction — no governance saves “the right way to do the wrong thing”
- Organizational collaboration issues — this method governs artifacts, not meetings, processes, or interpersonal dynamics
- Exploratory / research code — exploration / prototyping / innovation cannot be re-armed
These are human territory.
12.4 When to stop advancing
If any of the following appear, stop at the current layer rather than advancing to the next leap:
- Governance activities > 30% of engineer time for two consecutive quarters
- False alarm / false retirement rate persistently exceeds target
- Team < 10 people and modules < 30 (governance ROI inverts)
What this section answers: Boundary awareness is the precondition for this system’s survival — governance itself must be governed (§14.4).
13 · Role Evolution
Labor flows toward higher value. AI is not here to replace engineers · it is here to liberate them.
13.1 Migration of engineer time allocation
TODAY · alienated labor TOMORROW · creative labor
75% in low-value activities 100% in high-value activities
┌─────┬───────────────┬─────┐ ┌─────────────┬─────┬─────┐
│design│ CRUD·typo │mtg │ │system design│gate│research│
│25% │ deps 50% │25% │ ─► │intent 50% │audit│explore│
│ │ │ │ │ │25% │25% │
└─────┴───────────────┴─────┘ └─────────────┴─────┴─────┘
high V · low M (alienation) high V · high M (creation)
13.2 Four new roles
Every team needs:
| # | Role | Responsibility | From which pillar |
|---|---|---|---|
| 01 | Architect | Design L0 gravity field; rules as capital | L0 congealed capital |
| 02 | Intent Designer | Write intent files; translate business into declarations | L2 Leap ② |
| 03 | Harness Engineer | Build Agent engineering shells; manage heartbeat rhythm | L3 Leap ③ |
| 04 | AI Decision Auditor | Independent of tuning engineer; guard R3-R5 red lines | R-gradient boundary |
| 05 | Evolution Curator | Reviews the Reflux Loop’s (§4.1 / §10.6) harvest queue: selects from candidate diffs / cross-review findings / serendipitous patterns which enter framework upgrades, intent revisions, or the pattern library | Reflux Loop’s gatekeeper |
13.3 Independence of the AI Decision Auditor
Key principle: the AI Decision Auditor must be independent of the AI Agent tuning engineer — to avoid referee = player. This role is mandatory only when AI autonomy enters the R3 cross-domain-write level.
13.3a Evolution Curator — gatekeeper of the Reflux Loop
Why this role: the Reflux Loop (§4.1) makes by-products of module replacement (candidate diffs / cross-review findings / serendipitous patterns) flow continuously into the harvest queue. But not all harvested items belong in the framework — blind inclusion = framework bloat out of control. A role is needed to periodically review the harvest queue and decide:
- Which common findings enter the framework / CI / archtest improvement queue
- Which intent under-specifications enter intent revision drafts
- Which serendipitous patterns are worth distilling into new framework idioms
Difference from Auditor: the Auditor guards boundaries (R3-R5 not breached), the Curator picks evolution direction (where should framework / intent go). The two are complementary, not in conflict.
Cadence: weekly review of harvest queue; monthly review of framework upgrade queue; quarterly review of intent revision drafts.
13.4 Capital view of labor reallocation
Rising AI autonomy = V being replaced by C. Released V must flow to higher-value areas, otherwise it is “replacement” rather than “liberation”:
| Leap | Released V | Destination high-value V |
|---|---|---|
| L0 → L1 | Manual module tagging | Framework architecture / rule design |
| L1 → L2 | Health inspection, triage | Health-model design / intent design |
| L2 → L3 | Experimental module ops | Lifecycle rule design / rollback design |
| L3 → Reflux Loop | Module code maintenance | Evolution curation (selecting harvested patterns into the framework) |
| Vision | Approval labor | Strategy / safety / ethics gating |
What this section answers: Role evolution is the engineering answer to the “AI replaces engineers” fear — not replacement, but migration to higher value.
14 · Measurement Framework
How do we know it’s improving.
14.1 North Star metric
Asset Health Rate = (asset modules with all 3 dimensions ≥ 60) / (total asset modules)
This is the single most worth-tracking metric — it simultaneously reflects all three pillars (value / structure / engineering).
14.2 Per-leap secondary metrics
| Leap | Secondary metrics |
|---|---|
| L0 | Governance coverage / gate effectiveness / CI green rate |
| L1 | Module manifest coverage / 3-D health trend / state migration rate |
| L2 | Intent file coverage / field-verifier binding rate / drift detection rate |
| L3 | AI suggestion acceptance / false alarm rate / R-gradient violations / auto-retirement rollback rate |
| Reflux Loop | Framework evolution velocity (new abstractions / retired abstractions per quarter) / Intent revision rate (% of mature intents replaced by v2 per quarter) / Reflux hit rate (% of harvest queue items adopted into framework / intent) |
Healthy thresholds for reflux metrics:
- Framework evolution velocity: 1–3 changes per quarter (too low = evolution stalled; too high = framework unstable)
- Intent revision rate: 5%–20% per quarter (too low = intent decoupled from business; too high = intent written too specifically)
- Reflux hit rate: 20%–50% (too low = harvest signal-to-noise low; too high = Curator filtering too loose)
14.3 DORA five metrics
| DORA metric | Governance meaning | High-performance threshold |
|---|---|---|
| Deployment Frequency | Module migration velocity | Multiple per day |
| Lead Time for Changes | Single-module end-to-end engineering efficiency | < 1 day |
| Change Failure Rate | Governance quality | < 5% |
| Failed Deployment Recovery Time | Rollback effectiveness | < 1 hour |
| Rework Rate (added 2024) | Manifest / intent quality | Trending down |
14.4 Quarterly review (governing the governance)
Review the methodology itself every quarter:
- Which signals turned out to be useless?
- Which gates only intercept false positives?
- Which state-migration rules feel rigid to the team?
- Which anti-patterns never appeared and can be removed from the list?
The methodology itself must enter governance — it cannot become an untouchable sacred text.
14.5 Governance is necessary supervisory labor (boundary condition)
By Marx’s criterion: supervisory labor is a “gray zone” — productive only when it directly creates value for capital accumulation.
Governance is productive if and only if:
M_preserved = surplus value preserved by avoiding decay ≥ V_governance = labor consumed by governance
Violating this boundary → trigger §12.4 “when to stop advancing.”
What this section answers: The measurement framework gives the entire scheme a measurable, criticizable, refutable interface to the world.
15 · Future Shape (2030+)
Software is no longer “maintained” · it continuously evolves. Engineers no longer “write code” · they steward a system that grows itself.
15.1 Four orbits of an autonomous system
SELF-EVOLVING
autonomous system
Code as Living Infrastructure
●
╱│╲
┌─HUMAN ╱ │ ╲ AI ─────┐
│paint │ self- │
│intent │ converge│
└──────── │ ────────┘
│
┌─SYSTEM │ VALUE──┐
│evolve │ compound│
└──────────●──────────┘
Concrete mechanism of self-evolving = Reflux Loop (§4.1 / §10.6): framework and intent are not designed once and then frozen — they continuously absorb feedback through the harvest step during module replacement, filtered by the Evolution Curator (§13.2) before being upgraded. Compounding comes from this loop: each replacement raises the starting point of the next.
15.2 No timeline commitment
Entry conditions for the vision phase (full autonomy) are strict:
- Continuously running ≥ 6 months without major incident
- Industry has mature AI-decision explainability / auditability technology
- Organization has established the independent AI Decision Auditor role
If any condition is unmet, do not enter. This is not a roadmap clause — it is a defense against premature closure.
15.3 Permanent red line
No matter how high autonomy reaches, R5 (financial / physical / irreversible user impact) never enters automation. This is a system boundary, not a phase issue.
What this section answers: The future shape paints a direction, not a commitment — direction matters more than a timeline.
16 · Value Finale · Enterprise / Individual / AI
What this is all for.
Three subjects · three “work” philosophies:
16.1 Enterprise · COMPOUND · ACCELERATE
Asset compound interest · anti-entropy
- Asset health rate replaces lines-of-code as the new North Star
- Constant delivery speed · doesn’t decay with codebase age
- Failure rate ↓ 50% · onboarding ↓ 60%
- Self-feedback · domain experts directly optimize the system
- Framework is not designed, it is evolved — the Reflux Loop (§4.1) lets framework and intent accumulate wisdom with every module replacement; compounding comes from this loop
16.2 Individual · LIBERATE · CREATE
work everywhere · every status
- work everywhere · workspace travels with you
- work every status · productive in any state
- time to dive into business · truly understand users
- workers in every industry can optimize their own systems · feeding back to enterprise and country
16.3 AI · AMPLIFY · NEVER REPLACE
work everytime
- 7×24 without rest
- inside the sandbox for reversible experiments
- gate and execute rather than decide
- amplify humans rather than replace them
16.4 Final proposition
AI is not here to replace engineers · AI is here to liberate engineers.
Letting AI accelerate output without losing the organization’s grip on its assets is the fundamental software-engineering question of our era.
This methodology’s answer: humans live in the desired-state-definition loop; AI lives in the continuous-convergence execution loop.
Appendix A · Capital · C/V/M Mapping
An independent “value-flow” lens to judge whether governance actions are productive.
A.1 Precise mapping in AI software engineering
| Marx concept | Traditional software engineering | AI-era evolution | Governance meaning |
|---|---|---|---|
| C constant capital (congealed labor) | Framework / infrastructure / contract skeleton / codebase | + Pre-trained models / vector stores / eval sets | The more refined C, the higher module output precision |
| V variable capital (labor power) | Engineer hours | Engineer + AI tool labor combination | AI is the V multiplier (same V → N× output) |
| M surplus value (asset deposit) | Paying users + trustworthy code asset | + Eval feedback data | M eroded by three decays |
| W total value | C + V + M | C + V + M | The equation unchanged, the structure shifts |
A.2 Key proposition: AI is not a new source of value
The core proposition of the labor theory of value: only living labor (V) creates new value. AI is congealed past labor (belongs to C); it transfers its own value into the product but does not create surplus value.
Two symmetric corollaries:
- Treating AI as “a new programmer” is wrong — AI is a machine tool, not a worker
- Overestimating AI autonomy = mistaking C for V → the organization loses its conscious source of value creation
- Underestimating AI as multiplier = under-using means of production → the organization is outpaced in throughput
A.3 L0 = the most highly congealed C
See §4.1 / §4.7. The framework is the congealed crystallization of an organization’s highest architectural wisdom, infinitely amplified by replication across modules.
A.4 Three decays = M devoured by entropy
| Decay | Capital interpretation |
|---|---|
| Value decay | Already-deposited M loses market recognition due to business shifts |
| Architecture decay | Already-deposited M needs rewriting due to architectural rot |
| Knowledge decay | Already-deposited M needs re-understanding due to staff churn |
Governance is the engineered mechanism against entropy — turning M from “perishable” to “long-term.”
Appendix B · V-Model in the AI Era
Land the abstract “quality emergence” onto concrete V-model form.
B.1 The transformed shape
Left side (AI accelerated) Right side (re-armed verification)
Intent definition ───────────► End-to-end acceptance
↓ ↑
Architecture design ─────────► Integration test + contract
↓ ↑
Detailed design ───────────► Unit test + formal verification
↓ ↑
┌──────────────┐ │
│ AI coding │ ─────────────► Static analysis + fuzzing
│ (seconds) │ │
└──────────────┘ │
▲ │
【accelerated 100×】 【verification re-armed 3×】
B.2 Transformation logic
The traditional V-model assumed left-right manpower symmetry — one developer, one test. In the AI era the left side is accelerated 100×; if the right side is not re-armed, the failure rate inevitably rises 3× (Sonar data).
The V-model’s essence rewrite in the AI era: left-side coding value → right-side verification value. Verification is not after-the-fact patching — it is the core channel of V flowing into M.
B.3 SWEBOK v4 newly added KAs in the three leaps
| New KA | Position in three leaps |
|---|---|
| Software Architecture | Carrier of L0 framework, ceiling of AI output precision |
| Software Operations (DevOps) | L0 fifth pillar (DevOps stack) |
| Software Security | L0 third pillar (sec scan in CI gates) + R-gradient safety constraints |
Appendix C · CALMS · DORA · Kanban (Flow Execution Rhythm)
C.1 CALMS redefined in the AI era
| Pillar | Traditional DevOps | AI era |
|---|---|---|
| Culture | Dev/ops collaboration | + Human/AI collaboration (trust calibration) |
| Automation | Deployment automation | + AI decision automation (constrained by Harness) |
| Lean | Eliminate waste | Focus on culling “AI-produced low-value code” |
| Measurement | DORA | DORA + 3-D health |
| Sharing | Knowledge accumulation | + AI decision audit log |
C.2 Little’s Law breaks the AI throughput bottleneck
Core problem: AI produces 50 functions per day, team reviews 10 → backlog → escape rate skyrockets.
Little’s Law formula:
Average cycle time = WIP / throughput
Application: to keep < 1 day review cycle with team daily review capacity = 5, then WIP ≤ 5.
C.3 Tiered fast lanes (pull system)
Schedule review WIP by R-gradient:
| Tier | Risk | WIP | Decision mode |
|---|---|---|---|
| L1 low risk (90% auto-test pass) | R0–R1 | 20 | AI-driven, human spot-check 10% |
| L2 medium risk (new features) | R2 | 10 | AI drafts + human review |
| L3 high risk (security/concurrency/external) | R3 | 3 | Expert + formal verification |
Weighted WIP: high-risk cards consume review time ×2, auto-throttled.
C.4 Six Kanban practices (David J. Anderson)
- Visualize: dual-loop kanban (intent loop + module loop)
- Limit WIP: WIP cap (C.2 core)
- Manage Flow: flow rate monitoring (linked to DORA Lead Time)
- Make Policies Explicit: differentiated gate policies in writing
- Implement Feedback Loops: reconciliation loop (§9)
- Improve Collaboratively: quarterly review (§14.4)
Appendix D · MDM Declarative Paradigm Leap
D.1 Imperative → Declarative
| Paradigm | Form | Cost |
|---|---|---|
| Imperative | Humans decide when to migrate | Humans become the bottleneck |
| Declarative | Manifests declare desired state, system continuously reconciles | Designed once, runs forever |
This is the deepest paradigm leap from v3 to v7 — governance shifts from “humans inside the decision loop” to “humans inside the desired-state-definition loop.”
D.2 Apple DDM 2024+ three elements
| Element | Definition | Implementation in three leaps |
|---|---|---|
| Declaration | Desired state manifest defined by manager | L2 intent file (§7.1 four-block structure) |
| Reconciliation | Devices pull manifests and auto-converge | L3 reconciliation loop (§9.4) |
| Convergence | Continuously enforced: maintain desired state even offline | Three-dimensional health continuous write-back + soft-delete window |
D.3 Harness + DDM unified
┌──────── Desired state (intent file) ────────┐
│ Defined by humans: intent, contracts, │
│ constraints, health thresholds │
└────────────────────┬─────────────────────────┘
▼
┌──────────────────────────────────────────────┐
│ Reconciliation Loop (continuous) │
│ │
│ ┌──────────┐ ┌──────────────┐ │
│ │ Signal │ ◄─────►│ Harness Five │ │
│ │ collect │ │ Pack │ │
│ │ (auto) │ │ │ │
│ └──────────┘ └──────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ Apply remediation │ │
│ │ (delegated by R0–R5) │ │
│ └─────────────────────────────────┘ │
└──────────────────────────────────────────────┘
│
▼
Reality converges to desired state
Core proposition: Harness + DDM = environment-level feedback control system. The two are isomorphic; both shift from “imperative one-shot intervention” to “declarative continuous convergence.”
Related documents
- Companion deck:
deck/en/index.html - Bootstrap handbook:
three-leaps-bootstrap.en.md - 中文版:
three-leaps.md