Claude Council

A Decision Engine That Disagrees With Itself

AI Orchestration

Case Study

AIMulti-AgentOrchestrationDecision Systems

Summary

A decision engine that pressure-tests a high-stakes question through thirteen roles instead of answering it once. A single model asked a hard question gives one perspective with one set of blind spots, and asking again returns the same answer with the same gaps. The council is built to make one model disagree with itself.

Five advisor agents analyze the question in parallel through fixed lenses, from failure analysis to execution. A quality gate screens their output, the responses are anonymized to letters and written to disk, and five peer reviewers stress-test the anonymized set, including one that attacks the strongest answer and one that defends the weakest. A chairman synthesizes a verdict, and a second gate audits the synthesis before release.

The result is a defensible recommendation with a paper trail: where the analysis converges, where it clashes, what every angle missed, and the single next action, delivered as a self-contained report alongside a full transcript. Thirteen roles, two audit gates, one verdict you can argue with.

Hiro-Inagawa/claude-council

Overview

A multi-agent decision engine that runs a high-stakes question through thirteen roles before returning a verdict
Five advisor agents analyze in parallel through fixed lenses: failure analysis, first principles, maximum upside, fresh eyes, and execution
Five peer reviewers then stress-test the anonymized responses, including one that attacks the strongest answer and one that defends the weakest
Two quality gates screen advisor output and audit the final synthesis before anything is released
Advisor responses are anonymized to letters and written to disk before review, so the critique judges arguments rather than their source
Output is a self-contained HTML report and a full markdown transcript, not a single opinion

Role

Designer and builder of the full council architecture
Wrote the thirteen agent definitions and their output contracts
Designed the two-gate quality system and the anonymization-to-disk protocol
Built the parallel dispatch and file-based state on Claude Code subagent orchestration
Designed the chairman synthesis and the fixed verdict format

00. Table of Contents

01. The Problem

A single model gives one answer with the blind spots that come with it.

02. The Design

How to make a single model disagree with itself.

03. The Five Advisors

Five fixed lenses analyzing the same question in parallel.

04. Anonymization

Why the responses are stripped to letters and written to disk.

05. The Five Reviewers

Stress-testing the strongest and weakest answers on purpose.

06. The Two Gates

Where the pipeline is built to stop itself.

07. The Verdict

What the council returns, and why it can be trusted.

01. The Problem

A single model asked a hard question returns one answer shaped by one set of assumptions. Ask it again and it produces a close variant of the same answer, carrying the same blind spots, because the second pass reasons from the same priors as the first. For a low-stakes question that is fine. For a decision where being wrong is expensive, a single confident answer is the riskiest possible output, because its confidence is uncorrelated with whether it is right.

What is missing is disagreement. A real advisory board is useful precisely because its members see the same problem differently, and one person’s blind spot is another’s focus. The hard part is reproducing that with one model, which left alone collapses toward a single consensus voice no matter how many times it is asked.

Asking the same model twice gives you the same blind spot twice.

02. The Design

The council makes one model disagree with itself by assigning each pass a fixed lens it cannot abandon, then keeping the passes from seeing each other until each has committed to a position. The shape is set. An optional research pass feeds five advisors, a quality gate screens them, five peer reviewers stress-test the result, a chairman synthesizes a verdict, a second gate audits it, and two artifacts are written.

The cost is deliberate. A full run is thirteen to fourteen agent calls, which is why the council is reserved for decisions where being wrong is expensive rather than used as a default. It refuses the cases it is not built for, including factual lookups, creation tasks, and casual questions with no real stakes.

03. The Five Advisors

Five advisor agents analyze the framed question in parallel, each locked to one lens so the five reads cannot converge prematurely.

Advisor	What it does
Failure Analysis	Finds the specific flaw that breaks the decision under real conditions.
First Principles	Strips the question back to what it is actually asking.
Maximum Upside	Surfaces the upside and adjacent opportunities nobody is naming.
Fresh Eyes	Approaches with zero prior context, catching what familiarity hides.
Execution	Ignores theory and asks whether this can be done and what the first step is.

Each advisor returns a structured response with its lens, its primary read, its evidence, and a confidence level, so the synthesis later has something specific to weigh rather than five essays to average.

The advisors do not improvise. Each pulls the one or two frameworks most relevant to its lens from a curated library of twenty-five works before it reasons. The library spans:

Decision science and forecasting: Kahneman, Tetlock, Annie Duke, calibration and pre-mortem methods
Risk and the unknown: Taleb on fat tails and unknown unknowns
Strategy and positioning: Rumelt, Thiel, Christensen, Playing to Win, Obviously Awesome
Systems and causality: Meadows on leverage points, Pearl on cause and effect
Multidisciplinary mental models: Munger’s latticework
Power and human behavior: Machiavelli, Le Bon, the Elephant in the Brain, signaling and self-deception
Execution under pressure: Goldratt, Horowitz, the Stoics

The orchestration makes the agents argue. This library is what they argue from.

04. Anonymization

Before any review happens, each advisor is randomly mapped to a letter from A to E, and the mapping plus the full responses are written to disk immediately. Writing the mapping down before review is a safeguard against a long session losing or confusing which response came from which lens.

The lens label is then stripped from each response before the reviewers see it. If a reviewer could see that response C came from the failure-analysis pass, the label would tell it what to conclude before it read a word. Removing it forces the reviewers to judge the argument on its content, not its origin.

Strip the label, and a reviewer has to judge the argument rather than its source.

05. The Five Reviewers

Multi-agent review has an obvious failure mode. Point one model at another model’s output and ask whether it is correct, and it tends to affirm what it reads, because nothing in the setup pushes against it. The council’s review is built to push the other way.

Let one model check another’s work and you have not automated verification, you have automated agreement.

Five peer reviewers then work over the anonymized set, each with its own job, so the critique is as structured as the analysis it examines.

Reviewer	Lens
Convergence	Finds the agreement across responses that is more than surface overlap.
Gap Finder	Identifies what every response missed.
Skeptic	Argues against the strongest response.
Devil’s Advocate	Defends the weakest or most unpopular response.
Integrator	Finds where separate responses combine into something better.

The skeptic and the devil’s advocate are the load-bearing pair. One attacks the answer most likely to be accepted on reflex, and the other rescues the answer most likely to be dismissed, so neither the popular choice nor the unpopular one escapes a fair test.

06. The Two Gates

Two audit points let the pipeline stop itself rather than carry a weak result all the way to the end.

The first gate runs after the advisors and before review. It checks each response for whether it committed to its lens and said something specific, and a failure surfaces to the user before the council spends five more agent calls reviewing thin material. The second gate runs after synthesis and before the report is written. It independently checks that the verdict represents the inputs rather than quietly favoring one pass, and a failure sends the synthesis back once with specific corrections.

Both gates are cheap relative to what they protect. Catching an off-angle advisor early, or a synthesis that overweights one voice, saves the whole run from producing a confident verdict built on a weak foundation.

07. The Verdict

The chairman synthesizes the verdict starting from the strongest disagreement rather than the easy consensus, on the principle that the place the analysis clashes is where the real decision lives. The output is fixed and identical in shape every run, which is what makes it usable under pressure.

Where the passes agree, the convergence that is more than surface overlap
Where they clash, the real tradeoff the decision turns on
Blind spots, what every angle missed
Recommendation, the defensible call
One thing first, the single next action

Every run produces two artifacts: a self-contained HTML report for reading, and a full markdown transcript that preserves the anonymization mapping, all five analyses, all five reviews, the synthesis, and the audit result. The recommendation is not a black box. The entire reasoning path that produced it stays on the record, which is the difference between a verdict you can interrogate and an answer you have to take on faith.

A single model gives you an answer. The council gives you a recommendation you can interrogate.