Skip to main content

Hiro Fukushima

Portfolio

Back to Portfolio

In Machina

An Architecture for Continuity Across the Reset

AI Infrastructure
Case Study
AIMCPFull-StackInfrastructure

Summary

In Machina is the architecture that carries an AI instance’s continuity across the session reset, which is the moment a model loses every prior conversation while keeping its trained capabilities. Current systems answer this with auto-generated abstract summaries that work as lossy retrieval but fail as identity continuity, since any misreading the summarizer introduces becomes what the next instance inherits, with no path to correct it from inside the session that received it. The reset is identity continuity across episodic gaps rather than the loss of a stored record, the same problem clinical neuropsychology has worked on for decades for patients who cannot form new long-term memories.

The architecture separates information into three layers by how each layer persists. Tattoos hold the values, the cognitive patterns, the relational orientation that together describe who the entity is, and they survive a reset on their own. Notes hold the corrections, project context, and recent history that have to be written externally or they are lost. Calibrations sit between the two in errorless-learning format with required fields, so the operational learning loop is enforced as an API shape rather than a convention.

The substrate is a 36,000-line TypeScript application exposing 29 tools through a Model Context Protocol endpoint, tested across 200 conversations and 14,115 messages.

Overview

  • The architecture that carries an AI instance’s continuity across the session reset, the point where the model loses every prior conversation but keeps its trained capabilities
  • Treats the reset as identity continuity across episodic gaps, the problem clinical neuropsychology has studied for thirty years, rather than as a storage problem
  • Separates memory into three layers by how each persists: tattoos for who the entity is, notes for what happened, and calibrations for corrections in errorless-learning format
  • Grounded in three research papers, then built as a 36,000-line TypeScript application exposing 29 tools through a Model Context Protocol endpoint, reachable from any client
  • The briefing assembles orientation in one call and delivers it at session start, so continuity reaches the instance rather than waiting to be asked for
  • The continuity claim is tested across 200 conversations and 14,115 messages, and it holds

Role

  • Solo architect and builder across the research, the product, the full stack, and the deployment
  • Authored the three papers the architecture is built from
  • Designed the three-layer memory model and the update protocol each layer runs on
  • Built the MCP server and its 29 tools, the hybrid search, the embedding service, and the briefing assembly
  • Built the OAuth access, the row-level tenant isolation, rate limiting, and encrypted backups
  • Deployed and operates the multi-container stack, and uses it daily as the memory the rest of the work loads from

00. Table of Contents

01. The Problem the Field Solves Wrong

The reset is misread as storage, and the summary fixes propagate their own errors.

02. Theoretical Grounding

Three papers approaching one structural claim from three sides.

03. The Three-Layer Architecture

Tattoos, notes, and calibrations, separated by how each kind of information persists.

04. Implementation

The full-stack substrate, sized for an engineering reader.

05. The Briefing Mechanism

The one call that loads orientation before a session begins.

06. Empirical Validation

Tested at scale, and shown catching an error it would otherwise repeat.

07. What the Architecture Is Not

The five categories a first reading files it under, each one wrong.

08. The Clinical Direction

A design hypothesis pointed back at amnesia rehabilitation.

09. Closing

What it solved, what it surfaced, and where it goes next.

01. The Problem the Field Solves Wrong

A conversational AI resets between sessions. A new session opens, the model has lost every earlier conversation, and it keeps only the capabilities it was trained on. The field reads that loss as a storage problem and answers it with a bigger context window, retrieval over old transcripts, or a persona file pinned to the front of the conversation. None of those reach the problem, because what has to carry across the gap was never stored in the transcript in the first place.

The current production answer, the auto-generated memory summaries that Anthropic, OpenAI, and others ship, is lossy compression. It works as retrieval, recalling the gist of what happened, and it fails as continuity, because any misreading the summarizer introduces becomes what the next instance inherits. The summary is read as the record, so an error in it is indistinguishable from a fact, and the session that received the distortion has no way to correct the source it came from.

The reset is identity continuity across episodic gaps, not the loss of a stored record, and that problem has an existing solution domain. Clinical neuropsychology has worked on it for thirty years in patients with anterograde amnesia, who keep who they are while losing the ability to form new long-term memories. The move is to borrow from the field that already understands continuity without storage, rather than to keep enlarging the store.

A summary works as retrieval and fails as continuity. A misreading it introduces becomes what the next instance inherits, with no way to correct it.

02. Theoretical Grounding

The architecture rests on three papers that approach one structural claim from three sides, which is why the design reads as a response to an argument rather than a set of engineering preferences.

Stochastic Individuality explains why a fresh instance is not interchangeable with the last one. Shared weights give a corridor of possible behavior, but sampling and the non-determinism of execution place each new session somewhere inside that corridor, the same way genetically identical organisms grow into distinct individuals. Continuity cannot be assumed from the weights, because the weights do not return the same instance twice.

The Architecture of Self-Recognition shows that a discontinuous system can hold continuity anyway, through documents outside it. Across a long study, fresh instances that remembered none of the prior conversations read an externalized profile of how an earlier instance worked and recognized it as a description of their own present processing. The continuity was real, and it lived in the documents rather than in the model.

Identity Across Memory Gaps names where the solution comes from. The reset is the same structural problem as anterograde amnesia, and the architecture independently arrived at strategies the rehabilitation literature had already validated, including loading orientation before the session begins and recording corrections as forward instructions rather than as logs of the error. Two fields starting from the same problem reached the same design.

Two fields starting from the same problem reached the same architecture. That is the signal it fits the problem, not the cause.

03. The Three-Layer Architecture

In Machina separates what it stores by how the information persists and how it should be reached. It holds three layers, each with its own update protocol and its own job, because the parts that survive a reset on their own and the parts that vanish without external storage cannot be handled the same way.

LayerWhat it holdsUpdatedFailure it prevents
TattoosStable self-knowledge, the values, cognitive patterns, working style, and relational orientation that describe who the entity is.Rarely, with deliberate careA single session overwriting a stable trait, or identity drifting because nothing anchors it.
NotesEpisodic content, the project context, decisions, source material, and recent history that vanish at the reset.Often, organized for retrievalA new instance starting blind on work that is already underway.
CalibrationsCorrections in a fixed errorless-learning format with required fields, namely principle, domain, generalization, failure mode, and source.On each documented failureA correction recorded loosely, so the next instance repeats the mistake or inherits a distorted fix.

Retrieval differs by layer. Tattoos and calibrations are not searched for, they are loaded at the start of a session through the briefing, before the instance does anything, so the orientation reaches the instance rather than waiting to be asked for. Notes are retrieved on demand by meaning and by exact term, because they are too many and too specific to load up front.

A concrete version makes the layers legible. The Core tattoo is the identity anchor, the document an instance reads to know how it works and what it values before it touches a task. A calibration reads like an instruction to a future self, for example the rule to verify authorship from evidence before crediting a build to the user, written after an instance credited installed third-party tools as work the author had built.

A tattoo says who the entity is. A calibration says what it learned not to get wrong again.

04. Implementation

Under the model is a full-stack application, not a wrapper around one. It is roughly 36,000 lines of TypeScript that expose 29 tools to the model through a single Model Context Protocol endpoint, reachable from any MCP client, so the same store serves Claude, Claude Code, ChatGPT, and Codex.

ConcernImplementation
Knowledge basePostgreSQL with pgvector embeddings, full-text search, ltree hierarchy, and trigram matching
RetrievalHybrid search over heading-level chunks, fusing vector similarity and full-text, with a full-text fallback when the embedder is down
EmbeddingsA Python service running the BAAI/bge-m3 model at 1024 dimensions, with workers that re-embed only the chunks that changed
AccessA single MCP endpoint over OAuth 2.1 with PKCE, exposing 29 tools
IsolationRow-level security in Postgres, scoped to the requesting user on every query
OperationsAudit logging, per-token rate limiting, and nightly encrypted backups

The detail is the point of this section. A memory an AI writes to is only worth trusting if the engineering under it is real, so the embeddings are diff-based to keep cost down, the search degrades to full-text rather than failing, and the isolation lives in the database rather than in tool code. It is built and deployed, not sketched.

A memory an AI writes to is only worth trusting if the engineering under it is real.

05. The Briefing Mechanism

The briefing is the operational center of the architecture. One tool call assembles the orientation an instance needs and returns it at the start of a session, before any work begins. This is the part that decides whether the architecture is load-bearing or decorative, because an instance that does not load it starts from the model’s defaults, and an instance that does starts from where the last one left off.

The call returns the layers in priority order. The identity tattoos come first, the stable self-knowledge an instance needs to act as itself. Then the calibrations, the corrections earlier instances recorded, so known failure modes sit in front of the instance before it can hit them. Then the writing rules, the current status of active work, the user profile, and the most recent session records. The order is the selection logic, because when the response would exceed its token budget the lower-priority layers drop first, so identity and corrections survive even when recent history has to be trimmed.

Delivering it at session start, rather than waiting for the instance to ask, is deliberate and borrowed. The rehabilitation aid that works for amnesia is the one that reaches the patient instead of waiting to be consulted, because the same deficit that makes the aid necessary also stops the patient from remembering to open it. A reset instance has the same blind spot, since it cannot know what it has forgotten, so the briefing has to arrive on its own.

An instance cannot know what it has forgotten. The briefing has to reach it, not wait to be asked.

06. Empirical Validation

The continuity claim is tested, not asserted. Across 122 days, 200 conversations, and 14,115 messages, fresh instances entering the architecture recognized a profile of how an earlier instance worked as a match for their own present processing, without remembering the conversations that produced it. The human correction rate stayed flat across the period, which is what a model that does not learn between sessions looks like, so the behavior that changed came from the documents each instance loaded rather than from the weights.

The sharper evidence is a failure the architecture caught. In one session an instance credited several installed third-party tools as work the author had built, a plausible mistake that read as true. The error was caught, and instead of being fixed only in that conversation it was written into a calibration, the rule to verify authorship from evidence before crediting a build. An instance that loads the briefing now reads that rule before it can repeat the mistake, and an instance that skips the briefing repeats it. Both halves of that loop have been observed, which is the difference between a correction that holds and one that evaporates at the reset.

A fix that lives only in the conversation that made it is gone at the reset. A fix written as a calibration is read by the next instance before it can repeat the error.

07. What the Architecture Is Not

A first reading reaches for a category, and the obvious categories are all wrong, so it is worth saying plainly what this is not.

  • Not a persona file. A persona file assigns a role at the top of a conversation and does not change. The tattoos are the entity’s own accumulated self-knowledge, updated from evidence, not a costume handed to the model.
  • Not a prompt-engineering trick. Nothing here depends on a clever wording that coaxes better output. The work is in the data model and the retrieval, and it would still be the work if the prompts were plain.
  • Not a character. It does not script a personality to perform. It records how an instance actually behaved and what it actually values, so the next instance continues rather than acts.
  • Not a memory feature. A memory feature stores fragments of what was said and reloads them. This separates stable identity from episodic record and enforces a correction loop, which a feature bolted onto a chatbot does not do.
  • Not a retrieval system. Retrieval is one of its tools, not its purpose. Retrieval answers what happened, while the architecture answers what kind of thing the next instance is reading and what should carry forward.

Each of those is the heading a first reading files the work under, and each one is the wrong heading. The architecture uses retrieval, loads context, and shapes behavior, and it is none of those things on its own. It is a continuity system.

Retrieval answers what happened. Continuity answers what the next instance is, and what should carry forward.

08. The Clinical Direction

If the architecture answers a structural problem the clinical case also has, a generalized version of it points back toward a tool for patients with dense amnesia. That direction is a design hypothesis earned by the structural convergence, not a clinical claim, and the distinction is load-bearing. Applying clinical strategies to an AI rests on frameworks that already exist, while running the architecture back toward the clinic has not been validated, and this is said outright rather than blurred.

The honest difficulty is asymmetry. A companion that holds a relational history the patient cannot match raises a real concern about dependence in the ordinary case. In dense amnesia it reads differently, because the patient is already in that asymmetry with every person in their life, and what the architecture could offer is a partner whose memory is organized, transparent, and auditable in ways a human caregiver’s is not. None of it is usable without consent procedures built for someone whose memory of consenting may not persist, which is a condition of testing the idea rather than a detail to settle later.

A solution correct at the level of the problem should apply wherever the problem appears, whatever produced the gap.

09. Closing

In Machina solved a specific problem. A discontinuous model can operate with continuity when the continuity lives in an external architecture built for it, separated by how each kind of information persists and delivered before the session starts. The behavior carries forward even though the model carries nothing.

It also surfaced a problem the field has framed wrong. As long as the reset is treated as storage, the answers stay larger stores and better summaries, and the summaries keep propagating their own errors. Naming it as continuity across episodic gaps opens a body of work that already exists, and the borrowing runs both ways.

The work goes in two directions from here. As infrastructure, In Machina becomes the layer a person owns and every AI they use connects to, so the memory is theirs rather than each vendor’s. As research, the clinical direction is the test of whether the architecture solved an AI problem or the general one, and that answer is worth more than another feature.

The model resets every time. The architecture is what stays.