Hiro Fukushima2026

The Architecture Beneath the Assistant

Why an AI Assistant Cannot Fix a Broken Information Foundation

10 min read

00.

01.Introduction

02.The Default Fix

03.Where Information Actually Breaks

04.What the Assistant Inherits

05.The Work That Comes First

06.Where the Assistant Actually Fits

07.What Better Retrieval Does and Does Not Solve

08.Why the Order Matters

09.Closing

10.References

01.

Introduction

Companies bring together people from different fields, and each field arrives with its own conventions for naming and organizing what it produces. A scientist names a file the way a lab does and a marketer the way a brand team does, and every additional discipline adds another set of habits the others do not share, so once these people work inside the same drives and tools, the shared store fills with documents that no single logic can locate. The common response, once a company begins paying attention to AI, is to add an assistant that searches across all of it at once, on the assumption that a capable enough model will find the right answer inside a disordered store.

A retrieval model does no such thing. It reaches into whatever exists and returns what sits closest to the query, so an assistant placed over a fragmented store inherits the fragmentation and delivers it with confidence. What decides whether the system works is not the model but whether the information beneath it has been organized, and organizing information so it can be found is a design discipline with decades of practice behind it.

02.

The Default Fix

A typical organization now runs dozens to hundreds of separate applications, none built to share information with the others, each holding a fragment of what the company knows. Workers spend a large share of each day searching across these systems rather than using what they find, and a 2023 Microsoft survey found that 62 percent struggle with the time it takes to locate information, while workplace-search studies put roughly 40 percent of what people retrieve as irrelevant to the task. An assistant that resolves a question in one place, instead of sending someone through ten, is an obvious thing to want.

Retrieval can only return what the foundation beneath it allows.

03.

Where Information Actually Breaks

The fragmentation resolves into four failures. The same document accumulates several versions across a file system, an email thread, and a local drive with nothing marking which one governs, so a search for the current refund policy or the approved logo returns several candidates and no way to choose. Files saved under whatever name made sense to whoever created them leave the same thing recorded a dozen ways, findable only by the person who filed it. Knowledge stays inside the team that produced it and never reaches the people who would build on it. Information that exists goes unfound and is rebuilt from scratch. Underneath all four, most of this material is unstructured to begin with, since analysts estimate that 80 to 90 percent of what an organization holds sits in documents, messages, and recordings rather than any schema, much of it never opened again after the day it was made.

Four ways the foundation breaks

No Shared Naming

The same thing is recorded a dozen ways, findable only by the person who filed it.

Duplication Without a Source of Truth

Several versions accumulate with nothing marking which one governs, so a search returns candidates and no way to choose.

Knowledge Silos

Knowledge stays inside the team that produced it and never reaches the people who would build on it.

No Awareness of What Exists

Information that exists goes unfound and is rebuilt from scratch.

04.

What the Assistant Inherits

People absorb these contradictions without noticing. Someone who opens two versions of a document reconciles them from memory and asks a colleague when one is ambiguous, none of which registers as effort. A retrieval model carries none of that context. It returns whatever sits closest to the query and composes an answer from it, with no way to notice that the file was superseded a year ago or that a contradicting version sits one folder away, so an assistant that gives two different answers to the same question in one conversation is not malfunctioning but reporting, accurately, that the store holds both.

An assistant does not impose order on a system. It reflects the order that is already there.

Retrieval-augmented generation, the dominant way to ground an assistant in a company's own documents, fails most often before the model generates anything, in the content and retrieval stages rather than in generation. Duplicate or conflicting passages give the system no basis for choosing, so it surfaces whichever sits nearest the query rather than whichever is correct. Outdated content produces a measured failure in which the system answers from a stale version even when a current one is present. The subtlest failure comes from documents that are topically related but insufficient, where the prompt looks grounded and the model returns a fluent, confident answer that is wrong because it generalized from material near the truth without containing it.

Over an Unstructured Base

Answers from a stale version even when a current one exists
Two conflicting answers to the same question, both retrieved as nearest matches
A confident, fluent answer fabricated from material near the truth without containing it

Over a Structured Base

The current canonical answer, marked as authoritative
A single source rather than competing candidates
A traceable citation a person can verify

05.

The Work That Comes First

Organizing the source is the work that makes an assistant useful, and it begins with naming. A consistent convention for files and folders, one that encodes what a document is and where it belongs and marks one copy as authoritative, lets a person and a model find the right thing without knowing where someone put it. Naming sits inside a larger structure that also includes a taxonomy giving every concept one label instead of a dozen, a single source of truth fixing one current version of each record, an audit establishing what exists and where, and an access model defining who can reach what. None of this comes from adding a model, since each is a decision about how a company wants its knowledge shaped, and those decisions belong to a discipline that Richard Saul Wurman named in the 1970s and that Rosenfeld and Morville established for digital systems in 1998, one built to make information findable, the property a fragmented store lacks and an assistant cannot supply.

A shared component library that holds one authoritative version of each element under a single naming scheme lets an engineering team build without rebuilding what exists or guessing which version is current, a payoff that owes nothing to AI and everything to structure. The same mechanism governs a document store and a knowledge base, where one named, current version of each thing lets the people and systems downstream stop building on the wrong copy and a retrieval model read a corpus it can navigate.

Audit and Map

Establish what exists and where it lives, so the foundation is known before anything is built on it.

Naming and Taxonomy

Give every file, folder, and concept one consistent label that encodes what it is and where it belongs.

Single Source of Truth

Fix one current version of each record and mark it authoritative, so downstream readers stop building on the wrong copy.

Access Model

Define who can reach what, so the structure holds as the store grows.

06.

Where the Assistant Actually Fits

On a structured source, the assistant becomes an interface rather than a gamble. Every file named consistently and reduced to one current version gives a model trained on it a corpus that returns consistent answers, and a wiki built from it becomes the single place a question resolves to. The assistant turns into the shortest path between a question and the document that answers it, a smaller and more dependable thing than the system first imagined.

Keeping the structure intact is the harder part, since a foundation decays as new files arrive under old habits. An assistant over a structured source can audit what comes in, flag a file that breaks the naming scheme or lands in the wrong place, then correct it and update the wiki so the information is available the moment it arrives. The structure makes the assistant reliable and the assistant keeps the structure from eroding, each holding the other in place, which is possible only because the conventions exist first.

Three choices keep such an assistant tied to its source. Citations let a person verify the document instead of trusting the phrasing. The ability to answer that it does not know, rather than fill a gap with plausible text, lets it fail safely where the foundation has a hole. A person stays in the loop wherever a wrong answer is expensive, which the European Union's AI Act now requires for high-risk systems.

07.

What Better Retrieval Does and Does Not Solve

Retrieval has improved, and the strongest objection is that the newest techniques tolerate more disorder than the methods behind the early failures. Contextual retrieval, which describes each fragment's place in its document before indexing, has cut retrieval-failure rates by as much as 67 percent alongside reranking, recovering much of the context that naive chunking discards. Reranking sharpens what reaches the model. Long-context models hold far more candidate material at once and depend less on precise chunking, and agentic retrieval issues follow-up queries to correct a weak first pass rather than failing in silence. The models are better than the failure studies of even a year ago describe.

The floor rises, and the dependency remains. Long-context models use information in the middle of a long input less reliably than information at its edges, and more retrieved passages do not improve accuracy in a straight line, so more context is not better grounding. The failures beyond the reach of better retrieval are the ones rooted in the source. No embedding model decides which of two contradictory documents is authoritative, because that is a governance decision and not a retrieval one, and outdated content keeps degrading even strong systems when nothing marks the current version as current. A better model raises the ceiling on a structured foundation and papers more convincingly over a broken one, at rising cost and shrinking return.

08.

Why the Order Matters

Organizations have tried the assistant-first approach at scale and published the results. A 2025 MIT NANDA study of roughly 300 public deployments found 95 percent of enterprise generative-AI pilots producing no measurable effect on profit, a gap it attributed to brittle workflows and misalignment with daily operations rather than model quality. RAND, interviewing experienced data scientists and engineers, reported more than 80 percent of AI projects failing, twice the rate of IT projects without AI. S&P Global found the share of companies abandoning most of their AI initiatives before production rising from 17 to 42 percent in a single year.

These accounts frame the cause differently, one camp pointing to organizational learning and workflow and another to data quality, and none of them locates it in the model. The model is rarely the constraint, because the constraint lives in the structure of the information it has to work with, and every intelligent layer a company adds rests on that structure.

09.

Closing

The assistant promises to make the disorder beneath it irrelevant, to read a chaotic store as fluently as an organized one. An assistant is a reader, and it inherits whatever it is given, so it takes on the disorder and delivers it faster, in sentences fluent enough to carry more authority than the source has earned.

The work left undone is the work that was always there, of deciding what is authoritative and naming it so it can be found, then clearing the contradictions a human reader once absorbed without noticing. That work is unglamorous beside a capable model, and it decides whether the model is worth having. The assistant was never the design problem. The foundation underneath it always was.

The assistant was never the design problem. The foundation underneath it always was.

10.

References

Enterprise AI deployment outcomes

MIT NANDA, The GenAI Divide: State of AI in Business 2025 (August 2025). Independent. Reporting via Fortune. fortune.com

RAND Corporation, The Root Causes of Failure for Artificial Intelligence Projects and How They Can Succeed (RRA2680-1, 2024). Independent. rand.org

S&P Global Market Intelligence, Voice of the Enterprise: AI & ML (2025), via CIO Dive. Independent. ciodive.com

RAG failure research (peer-reviewed / preprint)

Seven Failure Points When Engineering a Retrieval Augmented Generation System, arXiv 2401.05856. arxiv.org/2401.05856

HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation, arXiv 2503.04800. arxiv.org/2503.04800

Retrieval techniques

Anthropic, Contextual Retrieval in AI Systems (2024). Vendor; published reproducible benchmark. anthropic.com

Fragmentation and data

Microsoft, 2023 Work Trend Index: Will AI Fix Work? (May 2023). Vendor; independent polling, 31,000 respondents across 31 markets.

Gartner, How to Tackle Dark Data (unstructured-data estimates). Analyst. gartner.com

Coveo, EX Relevance Report 2025. Vendor; self-reported survey. coveo.com

BetterCloud, 2025 State of SaaS and Okta, Businesses at Work 2025 (SaaS application counts). Vendor. bettercloud.com

Information architecture and findability

Louis Rosenfeld and Peter Morville, Information Architecture for the World Wide Web (O’Reilly, 1st ed. 1998; 4th ed. with Jorge Arango, 2015).

Peter Morville, Ambient Findability (O’Reilly, 2005).

Richard Saul Wurman, Information Architects (1996); Information Anxiety (1990).

Table of Contents