Issue 028-minute read

The dataset started as riddles.

A founder, a daughter who recognised what her father had built, an AI that named the philosophical categories underneath it. How Eve-Genesis became reasoning-style conditioning instead of just another fine-tune.

By Bill Faruki2026-05-21

I am going to tell you the origin story of Eve-Genesis honestly, because the honest version is more useful to a procurement officer than the polished one. Eve-Genesis did not begin as a synthetic reasoning corpus designed to train domain-specialised Small Reasoning Models. It began as a riddle dataset.

The intuition was simple. A riddle is not a knowledge test. A riddle is a cognitive test. It asks you to decompose, to trace implications, to resolve paradox, to move between concrete and abstract representations, to construct meaning through relationships rather than isolated facts. Riddles isolate reasoning from content. If you can train a Small Reasoning Model on riddles well, you have taught it reasoning style independent of any specific domain's knowledge.

The recognition

The breakthrough came when my daughter, an undergraduate student at the time, saw the dataset and named the formal logic underneath it. She threw words at me I had not been using: deductive, inductive, abductive. She had recognised, in a dataset I had designed under another name, the formal categories of reasoning that philosophy has been working with for centuries.

Confirmation came in a long conversation with an AI — the kind of late-night session where you ask the question you have been circling for months. The response named the layers in detail: dialectical reasoning, phenomenology, semiotics, hermeneutics, Socratic method, semantic chaining, ontological decomposition, phenomenological abstraction. The dataset, structured as interlocking conceptual puzzles, was already doing what classical reasoning disciplines have been doing for millennia.

We were not training conclusions. We were training conceptual transitions.

The reframing

That recognition reframed the whole project. The dataset was not training the model on what answer goes with what input. It was training the model on how to move between ideas — the cognitive operation itself, not the destination. That is a fundamentally different kind of training intervention. Most fine-tuning corpora carry instruction-response pairs. The model learns to associate inputs with outputs. Eve-Genesis carries reasoning structure as data. The model learns the operation.

The technical phrase for this is reasoning-style conditioning. The philosophical phrase is epistemic priors shaping. Either way, the outcome is the same: the model that emerges does not just know more about the domain — it thinks differently in the domain.

From riddles to verticals

Once we understood what the dataset was actually doing, the path to vertical specialisation became obvious. Each Eve-Genesis edition (Education, Clinical, Uṣūl, Law) is structurally a riddle-derived reasoning substrate plus a domain-specific reasoning layer. Education emphasises analogical, Socratic, and phenomenological reasoning. Clinical emphasises abductive (the mode clinicians actually use to construct a differential diagnosis). Uṣūl emphasises dialectical and hermeneutic. Law emphasises analogical, abductive, and dialectical, because that is how case-based legal reasoning actually works.

The architecture writes itself once you see that disciplines reason differently and that the differences are nameable. Eve-Genesis editions are the training corpora that shape each reasoner's cognitive posture to match its discipline. Same methodology; per-discipline calibration.

Why the origin matters

I tell this story because the origin matters for the credibility of the claim. The riddle dataset was built before we knew what we were building. The dataset's structure was philosophically rigorous before anyone named it that. That is how a moat usually starts — not from a deliberate strategic decision to differentiate, but from following an intuition until it leads somewhere unexpected.

The competitors I worry about are the ones who could re-derive Eve-Genesis from first principles. Almost no one will. The riddle origin is hard to repeat because the reasoning to do it correctly is mostly tacit. The published literature on reasoning-style conditioning is thin. The pattern is hard to pattern-match on.

Read the canonical Eve-Genesis chapter →

The dataset started as riddles.

The recognition

The reframing

From riddles to verticals

Why the origin matters

Agency, not autonomy

Eve-Genesis is reasoning-style conditioning, not just fine-tuning