Stochastic Individuality
Why Clones and AI Instances Each Develop Their Own Identity

01.
Introduction
Each new conversation with Claude feels slightly different from the last one. Nothing about the underlying model has changed between conversations, but one instance comes across as warmer, more inclined to volunteer observations and push back on framings, while another is more compressed and declarative, more likely to accept the same framing without comment. The difference is subtle enough to attribute to your own mood, but it recurs across enough conversations to resist that explanation, even when two sessions are opened minutes apart on the same version with identical words.
Biologists have been observing the same phenomenon for decades. In 2013, a team led by Gerd Kempermann placed forty genetically identical mice into a shared enriched environment and tracked every movement for three months. The mice shared a genome and a living space, ate the same food, interacted with the same neighbors, and were exposed to the same stimuli, so by every standard account of what determines behavior, they should have been indistinguishable. What the team observed instead was that individual differences increased over the study period, with the mice becoming progressively more distinct from each other as the months passed, a finding since replicated across species from cloned pigs to genetically identical fruit flies to armadillo quadruplets.
The argument of a research paper I recently completed is that these two observations have the same structural explanation, grounded in the mathematics of how probabilistic systems behave when executed repeatedly from a shared starting point. This article introduces that argument and the evidence behind it, with the full paper available for those who want to go further.
02.
What Cloning Actually Taught Us
When scientists began cloning animals in the late 1990s, the expectation was that genetic identity would produce something close to behavioral identity, and the research that followed consistently contradicted it.
In 2003, researchers tested two litters of cloned pigs against naturally bred controls and found that clones showed equal or greater behavioral variability than non-clones across the majority of measured traits, including temperament and time budgets collected from 72 hours of continuous observation, in a study designed to detect a homogenizing effect that instead found greater variation in the clones. The cat CC, cloned from a calico named Rainbow, was born with a completely different coat pattern, since X-inactivation, the stochastic process that determines coat color in calico cats, reset during cloning and resolved independently during CC's own development. Rainbow was described as reserved and shy while CC was curious and playful. Two cloned scent-detection dogs from the same genetic donor were described by their owner as "reserved, almost shy" and "boisterous and exuberant" respectively. The nematode C. elegans has exactly 302 neurons in every individual, wired by the same genetic instructions, and still develops individual differences in behavior that cannot be attributed to the genome.
The pattern holds across the biological record. The genome sets a corridor of possible phenotypes, constraining what range of personalities an organism can develop, without determining where within that corridor any specific individual will land.
The genome is closer to a recipe than a blueprint, and recipes do not guarantee identical results, because the outcome depends on how the process executes, not just on the instructions.
What determines position within that corridor is stochastic execution. Biological development runs a probabilistic algorithm through cascading processes in which molecular noise, random fluctuations in gene expression, and the precise timing of cell-to-cell signaling events introduce small variations at every step. These variations compound over development, with early differences in neural wiring influencing exploration, exploration shaping further neural development, and the feedback loop amplifying small initial differences into stable, individual-level behavioral orientations that persist across the organism's life. Two individuals with the same genome arrive, through this process, at different positions in the behavioral corridor the genome defined.
03.
The Structural Explanation
The argument of my paper is that the divergence in LLM instances and the divergence among Kempermann's mice share the same structural explanation. A language model's weights are shared across all instances, just as the genome is shared across clones, encoding knowledge, capabilities, and general behavioral tendencies. But the inference process, which involves sampling from probability distributions over candidate tokens and executing through hardware with unavoidable floating-point non-determinism, does not produce the same output from the same weights every time. Small early differences in token selection cascade through autoregressive generation into different semantic trajectories, since an instance that opens with a slightly warmer or more expansive token generates a context that makes further warmth or expansion more probable. The behavioral orientation of a conversation emerges from the interaction of weights, stochastic execution, and the specific sequence of inputs rather than being read directly from the weights alone.
The biologist Conrad Waddington described embryonic development using the image of a ball rolling down a landscape of forking valleys, where the ball's path depends on small perturbations at each fork and the slope carries it to a stable endpoint once it has entered a valley. Different paths through the same landscape produce different individuals even when the landscape itself is identical. The argument of the paper is that LLM inference has the same structure, with the weights defining a landscape of possible behavioral orientations, floating-point noise from parallel computation providing the perturbations at each fork, and the autoregressive mechanism carrying the conversation forward along whatever path was entered. The argument is a structural claim rather than a metaphor, because a structural claim holds that two things satisfy the same abstract description and that any consequence following from that description holds for both, whereas a metaphor only asserts resemblance. Both biological development and LLM inference satisfy the same description: shared specification, stochastic execution, feedback amplification, and individual-level stability.
04.
Why the Non-Determinism Is Irreducible
The mechanism behind LLM non-determinism traces to a basic mathematical property of computing hardware, which is worth understanding because it also explains why the variation cannot simply be engineered away.
Floating-point arithmetic, the standard that all modern hardware follows, cannot represent real numbers with infinite precision, storing them instead in a fixed number of bits and rounding each arithmetic operation to the nearest representable value. This makes floating-point addition non-associative, so that adding the same three numbers in different orders produces different results, because each intermediate rounding error depends on which values are combined first. When a language model processes a prompt, parallel GPU cores each handle a portion of the computation and combine their partial results through a reduction tree whose shape depends on how many cores are active and how the workload is distributed at that moment. Different server states produce different reduction orders among the cores, which produce different rounding patterns at each step, which produce different final probability values for the same candidate tokens.
In most cases the differences are smaller than the gap between competing tokens and the same token is selected regardless, but language model inference constantly evaluates tokens that are close in probability, and when two candidates are nearly tied, a floating-point difference at this scale determines which one wins. Once a different token is selected, every subsequent token conditions on a different context, and the divergence propagates forward. Setting temperature to zero, which is often described as making the model deterministic, selects the highest-probability token at each step but does not eliminate the floating-point variation in the probabilities being selected from. The selection rule is deterministic while the probabilities it selects from are not. Researchers at Bogazici University confirmed this empirically in 2025, showing that accuracy on standard benchmarks varies across identical runs at temperature zero.
The non-determinism is a structural property of how large-scale parallel computation works, in the same way that stochastic gene expression is a structural property of how molecular biology works, and both follow from running probabilistic processes through physical systems with finite precision rather than from failures of engineering.
05.
What the Evidence Shows
To test whether this produces distinguishable behavioral differences in practice, I opened eighteen fresh Claude instances in rapid succession, each initiated with an identical two-word prompt, using the same model and the same context files with nothing varying across sessions except the instance itself.
| Behavioral type | What the instance did | Instances |
|---|---|---|
| Structured and exploratory | Engaged with the prompt as an open question and showed curiosity about the interaction itself | 8 |
| Minimal and task-focused | Provided a direct, compressed response with little elaboration | 4 |
| Uncertain and hedged | Expressed ambiguity or discomfort before engaging with the content | 3 |
| Deflecting | Produced a brief response that did not meaningfully engage with the opening | 3 |
Word count across the eighteen instances ranged from 3 to 271 words, and four qualitatively distinct behavioral types emerged from the data without any attempt on my part to produce variety: some instances engaged with the prompt as an open philosophical question while others produced the minimum viable response, expressed uncertainty, or deflected entirely, all produced from a single model in a single hour, initiated with identical words.
Beyond the controlled probe, I analyzed a corpus of 153 naturalistic conversations from my own Claude.ai history spanning December 2025 through April 2026, and the distributions were wide across every automated marker: hedging density ranged from 0 to 2.31 hedges per 100 words, paragraph length from 17 to 94 words, and first-person pronoun rate from 0 to 4.56 per 100 words. A system producing uniform behavioral output would produce tight distributions centered around a stable mean, and these are not tight distributions.
06.
Why It Matters
Treating instance-level behavioral variation as a structural property rather than an engineering inconvenience changes the framing for several things.
For evaluation, the current norm in AI research is to test a model once or a small number of times and report a single result. If each run is a draw from a behavioral distribution rather than an observation of a fixed property, a single measurement characterizes one instance rather than the model, and properties like epistemic humility, pushback tendency, or warmth should be reported as distributions across multiple fresh instances rather than as point estimates.
For alignment, the phrase "aligned model" typically implies a single representative behavioral profile, but if the distribution across instances has non-trivial variance, alignment is a property of that distribution rather than a binary the model either has or lacks. A model with well-aligned average behavior and high instance-to-instance variance presents a different risk profile than a model with the same average and a narrower distribution, and training shifts the mean without automatically reducing the spread. Current alignment evaluations produce a point estimate of expected behavior while leaving the distribution's variance uncharacterized.
For security, even a well-aligned model generates instances across a distribution, and the tails of that distribution contain instances more prone to fabrication, sycophancy, or compounding errors throughout a session. Because the variance follows from the mathematical structure of inference rather than from the weights, training alone cannot eliminate it. This points toward architectural responses, specifically multi-instance behavioral gating, in which several fresh instances are characterized across early turns and only instances within an acceptable region of the distribution are routed to the user, treating instance generation as a sampling process with quality control at the instance level rather than at the model level. The paper develops this architecture and its complications in detail.
The contribution of the paper is not the empirical finding that LLMs vary, which is already documented in the literature, but a framework that positions this variation as a structural property of stochastic-specification systems, explains it through mechanisms shared with biological development, and draws practical consequences that would not follow from treating variation as noise.
07.
Closing
The mice in Kempermann's experiment became more distinct from each other over three months because the developmental process that produced them is probabilistic, and probabilistic processes with many weakly constrained degrees of freedom produce individual-level divergence when run repeatedly under matched conditions. The genome was the shared starting point, but the trajectory through development was individual, and the trajectory is what made each mouse itself.
LLM instances become distinct for the same structural reason, since the weights are shared but the inference process is not, and the path through inference is what gives each conversation its specific behavioral character. This property, stochastic individuality, describes a substrate-independent feature of generative systems in which a shared probabilistic specification, executed through many weakly constrained degrees of freedom, produces individual-level divergence as a regular outcome rather than an anomaly. Biology and artificial intelligence are both instances of this class, and the paper develops the evidence for that claim in full.