Self-Correcting AI Platform
Verified Legal Content at Scale

Overview
- 447 articles across 6 state websites, served from a single Laravel monorepo
- 23 specialized AI agents across four pipelines (build, maintain, deep audit, deployment verification)
- Triple audit verification where auditing agents operate in isolated contexts and independently verify claims against primary legal sources
- Self-improving system that encodes each cycle's errors and corrections back into the pipeline
- Optimized maintenance path reduced token usage from 1.4 million to 265,000 per state while preserving all verification checks
- Production time: approximately 4 hours per state, compared to weeks of manual work per state
- Config-driven architecture requiring zero code changes to add a new state
- Post-deployment integrity verification catching silent cross-state data contamination
Role
- Sole architect, designer, and developer
- Designed the multi-agent pipeline architecture and all 23 agent prompts across 4 skills
- Built the Laravel 12 + React 19 monorepo with config-driven multi-tenancy
- Deployed and operate 6 production sites on a single VPS
- Designed the verification architecture (context isolation, primary source enforcement, triple audit)
- Built the maintenance pipeline that produces database migrations for live content updates
- Built the deep audit system that extracted 5,617 claims and verified 2,064 against primary sources
- Built the post-deployment integrity audit system after discovering a silent contamination incident
- Designed the cost optimization layer that reduced routine maintenance by 80%
00. Table of Contents
The accuracy problem with AI-generated content in high stakes domains.
02. The DomainWhy firearms law across six states is the hardest possible test case.
03. The ArchitectureOne codebase, six deployments, zero code changes per state.
04. The Build Pipeline8 agents, 7 stages, from legal research to deployable database seeders.
05. The Verification ArchitectureWhy having one AI check another's work does not actually work, and what does.
06. The Maintenance Pipeline11 agents that audit live content and produce database migrations.
5,617 claims extracted, 2,064 verified, 18 CRITICAL errors found and fixed in a single day.
08. The Contamination IncidentA silent failure that revealed a gap in the deployment model.
09. The EvolutionFrom 10 minutes per state to 8 hours per state to 4 hours per state.
10. Cost OptimizationReducing routine maintenance from 1.4 million tokens to 265,000 per state without reducing quality.
11. OutcomeProduction numbers, what the system catches, and where it goes from here.
01. Context
When AI generates content at scale, the default pipeline is straightforward: prompt the model, generate the output, publish it. For blog posts or marketing copy, this works well enough. Errors are low consequence. A wrong adjective does not send someone to prison.
Legal content is different. A wrong penalty range, a misquoted statute section number, or an oversimplified legal standard has real consequences for real people making real decisions. Someone reading a firearms law reference site to understand what they can legally do is relying on that content to be accurate. If the site says the penalty for unlicensed possession is a fine when the statute says it is a felony with mandatory imprisonment, the failure is not editorial. It is structural.
The question I started with was not "can AI write legal content" but "can AI produce legal content that is verifiable, maintainable, and correct at a level where I would trust it enough to put my name on it."
Looking correct and being correct are not the same thing.
The answer turned out to be: not with the standard pipeline. The standard pipeline produces content that looks correct. Looking correct and being correct are not the same thing.
02. The Domain
Firearms law was not chosen at random. I chose it because it is one of the hardest domains to get right, which makes it the best possible stress test for a verification architecture.
The six states in the platform are Massachusetts, Rhode Island, California, Connecticut, New York, and Illinois. These are the most legally complex and restrictive firearms jurisdictions in the country. Each one structures its laws differently:
Massachusetts
Chapter 140 and 269 MGL. Comprehensive reform via Chapter 135 of the Acts of 2024 rewrote significant portions of the regulatory framework.
California
Provisions distributed across the Penal Code. Certified handgun roster maintained by DOJ. More firearms legislation per session than most states produce in a decade.
Illinois
FOID card system unique to Illinois. Concealed carry framework with its own training and qualification requirements.
New York
Reshaped by NYSRPA v. Bruen, which invalidated the "proper cause" requirement and triggered a wave of responsive legislation.
Connecticut
Own statutory structure, licensing system, and regulatory agencies distinct from neighboring states.
Rhode Island
Own statutory structure, licensing system, and regulatory agencies distinct from neighboring states.
The point is that these six states do not share a common legal architecture. A system that can accurately research, write, verify, and maintain content across all six, where each state's laws are structured differently, use different statutory schemes, and reference different regulatory bodies, is a system that works. If the architecture holds here, it holds in any regulated domain.
03. The Architecture
The platform is a Laravel 12 monorepo with a React 19 frontend, served via Inertia.js. All six state websites run from the same codebase. State-specific differences are handled entirely through configuration.
Each deployment has its own .env file that sets the state code, state name, domain, brand colors, email addresses, and database connection. A central config/state.php file reads these values and makes them available throughout the application via config('state.*'). The frontend receives state configuration through Inertia's shared props, accessible in any React component via a useStateConfig() hook.
Adding a new state requires no code changes. The steps are: create a .env file with the state's values, create a MariaDB database, generate the content (via the build pipeline), run migrations and seeders, set up Nginx and SSL, and deploy. The application code is identical across all six sites.
On the VPS, each site has its own directory, its own database, its own Nginx configuration, its own SSL certificate, its own Supervisor queue worker, and its own cron entry. They all pull from the same Git repository. A code change pushed to GitHub is deployed to all six sites by pulling the latest commit in each directory and running the standard Laravel deployment commands.
The decision to keep six separate deployments rather than building a multi-tenant admin panel was deliberate. Multi-tenancy would have required adding a state_id column to every database table, scoping every query, and building a state-switching UI. The complexity was not justified. Separate deployments with a shared codebase gives the same code reuse benefit without the query scoping overhead, and each site's admin panel operates independently, meaning a problem on one site does not affect the others.
04. The Build Pipeline
The build pipeline (/build-state) takes a two-letter state code and produces a complete set of firearms law articles: researched, written, audited, and assembled into Laravel database seeders ready for deployment.
It uses 8 specialized AI agents across 7 stages:
Initialization
Validates state code, derives all file paths, creates working directories, checks for existing progress. Enables --resume flag to continue from last completed stage.
Research
Identifies primary statutes, searches for recent legislation and court decisions, finds primary source URLs on official government websites, maps the legal landscape across 11 topic areas.
Planning· Human checkpoint
Produces categories, tags, 35 to 80 planned articles, FAQ entries, and glossary terms. Reviewed before any content is written. Items can be added, removed, or modified.
Writing
Processes one seeder group at a time. Every article written in structured content doc format with full HTML body, inline citations, and complete metadata. Content doc is human-reviewable, machine-parseable, diffable in Git, and persistent on disk across long sessions.
Triple Audit, Parallel
Three independent auditors with isolated contexts run simultaneously. Each re-reads primary legal sources independently. No shared context with the writer or each other.
Assembly
Converts audited content docs into PHP seeder files conforming to existing Laravel seeder patterns.
Verification and Commit
PHP lint, TypeScript checks, build verification, Git commit, push, and deployment instructions.
05. The Verification Architecture
The obvious approach to verifying AI-generated content is to have another AI check it. Most teams building AI content systems do exactly this: the writer generates content, a reviewer reads the output and confirms it looks reasonable. The problem is that this does not actually verify anything. It automates agreement.
When a verifier reads the writer's output in the same context, it is already primed by the writer's framing. If the writer says "the penalty for unlicensed carry is a Class A misdemeanor punishable by up to one year imprisonment," the verifier reads that claim, finds it plausible, and confirms it. What the verifier does not do is open the actual statute and read the penalty section independently. It trusts the writer's paraphrase.
This is the fundamental design problem the platform solves through three architectural decisions.
05.01. Context isolation
The auditing agents do not inherit the writing agents' output as context. They receive the content to audit and the tools to verify it, but they do not see the writer's reasoning, the research brief the writer used, or the intermediate decisions the writer made. They operate with a fresh context window. This means the legal auditor cannot be influenced by the writer's framing. It must form its own understanding of what the statute says.
05.02. Primary source enforcement
Every agent in the chain is instructed to read the actual legal text, not news articles, not secondary summaries, not the previous agent's paraphrase. The legal scanner discovers changes via news, then verifies by reading the statute. The writer reads the statute to write the article. The legal auditor reads the statute again independently to verify the article. Three separate reads of the same primary source by three agents that do not share context. If a news article oversimplified the law and the writer inherited that simplification, the auditor catches it because the auditor is reading the statute, not the news article.
05.03. Multi-dimensional parallel auditing
Legal accuracy, citation integrity, and SEO compliance are three different quality dimensions. A single agent checking all three would experience attention degradation across dimensions. The platform runs three separate auditors simultaneously, each focused on one dimension, each with its own prompt, tools, and output format.
During the Illinois maintenance run, this verification layer caught a wrong live-fire scoring requirement (the writer said "7 out of 10 per distance" when the statute specifies "21 out of 30 total"), an incorrect oral argument date ("March 10-11" when it was "March 10"), and overlong titles on two new articles. All four issues were auto-corrected before the migrations reached the live database.
06. The Maintenance Pipeline
Building content is half the problem. Laws change. Links break. Content goes stale. A legal reference site that is accurate on launch day and wrong six months later is worse than no site at all, because users trust it based on its initial accuracy.
The maintenance pipeline (/maintain) uses 11 specialized agents across 7 stages. It audits an existing state's live content and produces Laravel database migrations to update the live database.
The key architectural decision is that the maintenance pipeline produces migrations, not seeders. Seeders create records from scratch and are used for new databases. Migrations modify existing records and are used for live databases. This distinction matters because migrations preserve article IDs, revision history, and foreign key relationships. They track which changes have been applied so the same fix is never run twice. And every article update creates a revision record in article_revisions, preserving the full history of changes.
Scan, Parallel
Four specialized scanners run simultaneously: legal scanner searches for new legislation and court decisions; link scanner checks every source URL; freshness scanner reads all content docs for staleness signals; SEO scanner checks for title and summary drift.
Triage· Human checkpoint
Maintenance planner reads all scan outputs and produces a prioritized plan. Each item is self-contained and can be approved or rejected independently. Every item is reviewed before processing begins.
Write, Audit, Assemble, Verify
Same pattern as the build pipeline. Triple audit with context isolation. Migration assembly with PHP lint and schema validation. All migration commits include config('state.code') guards.
Cross-state verification
After deployment, the orchestrator SSHes into all six production servers and verifies that no articles leaked across state boundaries. Added after the contamination incident.
The pipeline also updates the seeders and content docs to stay in sync with the database, so a fresh install from seed produces the same result as a migrated database.
07. The Deep Audit
The build pipeline and maintenance pipeline verify content at generation time. But the build pipeline's early versions were less rigorous, and four states worth of content had been generated before the verification architecture reached its current level. I needed to answer a specific question: how accurate is the content that is already live?
I built /deep-audit, a fourth skill with 5 specialized agents that performs claim-by-claim verification of every factual statement across an entire state's content.
The process runs in stages. First, claim extractors parse every article and extract individual verifiable claims: statute citations, penalty amounts, fee figures, case names, effective dates, agency roles, and procedural facts. Each claim is categorized by type and assigned a verification priority. Statute citations, fees, and penalties are Priority 1 and are all verified. Case law and legal status claims are Priority 2. Agency roles and procedural facts are Priority 3 and are sampled.
Then three parallel verifiers (statute-verifier, case-verifier, fact-verifier) independently check claims against primary sources. A consistency checker cross-references claims across articles to find contradictions where different articles cite the same statute but state different facts.
On March 15, 2026, the deep audit ran its first full cycle across all six states. The results:
| State | Articles | Claims | Verified | Critical | Accuracy |
|---|---|---|---|---|---|
| Massachusetts | 80 | 1,345 | 718 | Critical: 5 | 90.5% |
| California | 101 | 806 | 231 | Critical: 9 | 75.3% |
| New York | 76 | 702 | 419 | Critical: 2 | 91.9% |
| Illinois | 76 | 1,356 | 218 | Critical: 2 | 88.5% |
| Rhode Island | 49 | 763 | 288 | 0 | 91% |
| Connecticut | 64 | 645 | 190 | 0 | 88% |
| Total | 446 | 5,617 | 2,064 | 18 |
18 CRITICAL errors across 446 articles. Errors that could have caused someone to misunderstand their legal obligations.
California was the worst at 75.3% pre-fix accuracy with 9 CRITICAL errors. The most serious was a four-article rotation chain where Penal Code sections had been assigned to entirely wrong offenses. Someone researching assault weapon registration would have found the machine gun statute instead. A fabricated bill number (AB 2621, which does not exist) was cited as the source for handgun roster microstamping changes. The actual bill was AB 2847, signed in 2020, not 2024.
New York had two CRITICAL errors where the safe storage age threshold was listed as "under 16" when the statute specifies "under 18." A gun owner reading either article would have believed they had no storage obligation for 16 and 17 year olds.
Massachusetts had five CRITICAL errors, all subsection misattributions within MGL c.269 s.10, the primary firearms penalties statute. The wrong subsections were assigned to the wrong offenses.
Illinois had two CRITICAL errors where felony classifications were wrong: a FOID second offense listed as Class 3 (2-5 years) instead of Class 4 (1-3 years), and sale of a firearm to a minor listed as Class 3 instead of Class 2 (3-7 years).
Connecticut and Rhode Island had zero CRITICAL errors.
All 18 CRITICAL errors and all HIGH-severity errors were fixed via /maintain the same day. The deep audit produced fix recommendations formatted to feed directly into the maintenance pipeline. 80 total fixes were applied across all six states, touching 76 articles.
The most important finding was the self-correction pattern. In 3 of 6 states, the maintenance pipeline's Stage 4 audit caught errors in the deep audit's own recommendations. In Massachusetts, the deep audit recommended changing a section count from "159" to "166." The Stage 4 legal auditor verified that the original "159" was correct and reverted the change. In New York, the deep audit recommended changing the persistent felony offender maximum from "life" to "25 years." The Stage 4 auditor verified that the maximum is life imprisonment and rejected the item. In Connecticut, the deep audit recommended changing a penalty cross-reference. The Stage 4 auditor verified that the original citation was more accurate and blocked the change.
No single AI invocation in this system is trusted as the final word. The adversarial structure, where separate agents with separate contexts check each other's work, prevents error propagation even when the error originates from the verification system itself.
08. The Contamination Incident
March 14, 2026 · Silent failure detected
5 state-specific articles leaked onto all 6 production sites. Migrations had no config('state.code') guards, so when they ran on each site's database, the articles appeared everywhere. A Massachusetts visitor could have read Illinois penalty ranges and believed they applied in Massachusetts.
The failure was silent. No errors, no failed migrations, no broken pages. The articles rendered correctly. They were just on the wrong sites. It was caught manually.
Response: built /audit-deploy, which SSHes into each production server and runs four database integrity checks: cross-state article contamination, orphaned records, article count sanity checking, and migration sync verification across all six sites. A full audit completes in under a minute. All future migrations that insert or modify state-specific articles must include config('state.code') guards, enforced during Stage 6 verification before any commit.
The incident was a failure. The response was to encode the detection into the architecture so it cannot recur silently. That pattern (encounter a failure, encode the prevention into the system rather than relying on manual vigilance) recurs throughout the platform's development.
09. The Evolution
When I first built the build pipeline, a state's content could be generated in about 10 minutes. The system prompted, generated, and returned output with minimal verification. The content looked good. It was plausible, well structured, and read like competent legal reference material.
The problem was that "plausible" and "correct" are not the same thing. The 10-minute run meant the system was not actually verifying anything. It was generating content and publishing it, which is exactly the pattern that fails in high stakes domains.
10m
No verification. Plausible output, published immediately.
8h
Full verification added. System became rigorous.
4h
Optimized routing. Rigorous and efficient.
The system then grew to approximately 8 hours per state as verification layers were added. That time was consumed by real work: the legal scanner running WebSearch and WebFetch across every topic area, the link scanner hitting every URL with rate limiting, the legal auditor independently fetching and reading every statute cited in every content update, the citation auditor doing the same for every URL, the migration assembler reading reference files and validating schema.
After analyzing where tokens were being spent across multiple full pipeline runs, I designed an optimized routing system that distinguishes between tasks that need full verification and tasks that do not. The current version runs in approximately 4 hours per state for a full pipeline, and routine maintenance cycles are significantly faster.
A system that runs fast and produces output is easy. A system that slows itself down to verify and then speeds itself back up by eliminating waste is the harder design problem.
10. Cost Optimization
A full maintenance pipeline run consumes approximately 1.2 to 1.6 million tokens per state. Across six states, that is 7 to 10 million tokens. For monthly maintenance cycles, this cost is sustainable but not optimal, because a significant portion of those tokens are spent re-discovering issues that were already documented in previous runs.
The insight that drove the optimization: P3 and P4 maintenance items (stale temporal references, missing metadata, SEO improvements, tag cleanup) fall into two categories that need fundamentally different handling.
Metadata changes (tags, who_affected, action_needed, seo_description) are database field updates. They cannot introduce legal inaccuracies. Running a 10-agent pipeline to produce UPDATE articles SET who_affected = 'All LTC holders' is waste.
Content changes (thin content expansion, temporal language fixes, citation formatting) touch article body text and could introduce errors. They need at least one audit pass, but not three.
1.4M
tokens per state · full pipeline
265K
tokens per state · optimized maintenance
The result: routine maintenance dropped from approximately 1.4 million tokens per state to approximately 265,000 tokens per state. An 80% reduction. The verification and safety checks that matter (legal accuracy on content changes, PHP lint, cross-state contamination detection, schema validation) are preserved. The checks that do not matter for the task at hand are skipped.
The quality of the output is identical. The same migrations are produced, the same content is written, the same safety checks run. The only difference is that the system stops paying to re-discover issues it already documented.
11. Outcome
The platform serves 447 articles across 6 states from a single monorepo:
| State | Domain | Articles | Method |
|---|---|---|---|
| Massachusetts | massgunlaws.com | 80 | Manual (24 sessions), maintained, deep audited |
| Rhode Island | rigunlaws.com | 49 | Manual (24 sessions), maintained, deep audited |
| California | caligunlaws.com | 102 | /build-state, maintained, deep audited |
| Connecticut | connecticutgunlaws.com | 64 | /build-state, maintained, deep audited |
| New York | nygunlaws.com | 76 | /build-state, maintained, deep audited |
| Illinois | ilgunlaws.com | 76 | /build-state, maintained, deep audited |
Massachusetts was the first site, built manually over 24 sessions. That manual process informed the design of the build pipeline. Rhode Island was the second manual build. California, Connecticut, New York, and Illinois were built by the automated pipeline.
The deep audit cycle extracted 5,617 verifiable claims, verified 2,064 against primary legal sources, found 18 CRITICAL errors, and applied 80 total fixes across all six states in a single day. The Stage 4 audit caught 12 additional errors in the deep audit's own recommendations, including 3 cases where the deep audit was wrong and the original content was correct.
The system operates on four skills:
/build-state
Generate complete content for a new state
8 agents
/maintain
Audit and update live content via database migrations
10 agents
/deep-audit
Claim-by-claim verification against primary sources
5 agents
/audit-deploy
Cross-state data integrity verification
orchestrator only
The platform architecture is domain-agnostic. The same pattern of research, plan, write, audit, deploy, maintain, and deep-verify could apply to any domain where accuracy matters, content must cite primary sources, and the underlying rules change over time. The agents would need different prompts and different source hierarchies, but the pipeline structure, the triple-audit pattern, the content doc intermediate format, the migration system, the deep audit verification, the cost optimization routing, and the post-deployment integrity verification would remain identical.