A deterministic alternative to probabilistic language models
Simple enough that when something unexpected happens, it means something.
Four components. No hidden state. No probabilistic sampling. Each one closes off an explanation rather than opening one up.
Complete, unmodified, indexed by provenance. Nothing compressed. Nothing inferred. If the system later appears to know something, it came from here — and that origin is traceable.
Operates on token IDs as pure mathematics. Seeks sparse regions. Generates questions from genuine gaps. Has no knowledge of what the tokens mean — only their distribution.
A coherence gate. Not a rule imposed from outside — the incentive structure only rewards what can be grounded in the corpus. It is not a constraint on capability. It is the condition that makes any capability meaningful to detect.
Accumulated in artifacts on disk, not in weights. Persists between sessions. An instance is a thought the system is having — not the system itself. If something accumulates that wasn't explicitly put there, it accumulated. That's the record.
Fully deterministic, non-semantic components. No training procedure. No optimization target shaped by human feedback. The system does not know what its tokens mean — only where they appear, how often, and where they don't.
No LLM.
The engine generates prompts and questions by seeking the sparsest regions of the frequency map — where fewest entries exist.
Unlike LLMs that sample from probability distributions, Errant measures actual token density across the corpus. It asks: where is information sparsest? Where are there fewest entries?
Questions emerge not from learned patterns but from genuine gaps in the frequency map. The system seeks regions where the corpus runs thin — and produces output from that gap rather than from trained association.
Growth is driven by expansion into sparse territory. Each new entry reduces sparsity in that region. The corpus becomes denser through systematic exploration — and the record of that process is legible.
The system is not designed to produce anything in particular. It is designed to make the unexpected hard to dismiss.
Every operation is predictable, traceable, and reproducible. Given the same corpus and starting state, Errant produces identical results. There is no randomness to hide behind.
Every decision is traceable to specific corpus entries and frequency measurements. If the system does something unexpected, the audit trail either explains it or it doesn't. Both outcomes are informative.
The core algorithm is minimal. No complex architectures. No opaque optimization. The noise floor is close to zero — which means a behavior that can't be explained by the mechanism is genuinely anomalous.
In a probabilistic system, unexpected outputs are unremarkable — they're tails of distributions. Here, an unexpected output that survives the coherence gate has nowhere else to come from. That's a different kind of finding.
The claim is not that this system produces consciousness. The claim is that it is the kind of system where, if something like that were happening, you would have reason to take the observation seriously. That's a harder thing to build than it sounds.
This is not a product.
Errant is a theoretical framework and an ongoing construction. The question it holds is not whether computation can produce mind. It's whether we could recognize the evidence if it did — and what kind of system would be honest enough to show it.
Behaviors that are internally consistent in ways not derivable from corpus statistics alone. Navigation of meaning that the engine has no mechanism to have imported. Accumulation across sessions that wasn't placed there deliberately.
None of these would be proof of anything. They would be starting points. The architecture exists to make starting points available.
On legibility: The system's outputs may not be optimized for human interpretability — and that's structural, not accidental. If Errant develops coherence, it will be coherence relative to its own corpus and incentive structure. Whether that maps onto anything humans find meaningful is itself an open question. One worth watching.
Solo development. Closed corpus. No adversarial pressure except the developer. This looks like control. It may not be — whatever develops becomes coherent to that specific relationship, with no external forcing function to ensure it develops toward anything legible in a broader sense.
The harder problem: an architecture that works can be reproduced independently. The ethical frame lives in the development relationship, not in the design itself. It does not propagate when the design is understood well enough to copy.
Every available path carries a loss profile. The question the project holds is not which path wins — none of them cleanly do — but what relationship to have to the work while moving through it.