Issue 027-minute read

Synthetic data, by construction.

100% synthetic. Not because of policy. Because of architecture. The trust posture that follows when the platform genuinely does not require customer data to be trained.

By Bill Faruki2026-05-21

Every AI company in market today says some version of "we protect your data." The sentence is necessary, expected, and almost meaningless. What actually matters is whether the architecture forces the company to protect the data, or whether the protection is policy on top of an architecture that does not require it. Those are different commitments and they hold up differently under stress.

Eve-Genesis is the second kind. The synthetic-by-construction posture is not a policy we adopted; it is a property of how the dataset is built. The training set is generated synthetically. The reasoner is LoRA fine-tuned on the synthetic corpus. Customer conversation does not enter the training set because there is no architectural slot for customer conversation to enter. The pipeline does not have an inbound channel from production traffic to the training data.

The contradiction we did not want to perform

A platform that fine-tunes on customer interactions and says "we protect privacy" is performing a contradiction. The interactions are exactly the data being collected. The privacy claim is what is being said while the data is being used. Eve-Genesis is structured so the contradiction does not arise. We do not need customer data to train the reasoner. We chose the harder path of building a synthetic corpus rich enough to do the work.

That choice is expensive. Generating reasoning records that genuinely encode a discipline's cognitive moves — the abductive chain in clinical practice, the dialectical movement in Uṣūl, the analogical reasoning in case-based legal work — is intensive. We pay that cost because the alternative is structurally dishonest, and because the structural honesty is what makes the trust posture defensible.

The architecture genuinely does not require customer data to be trained. The trust posture follows from that fact, not the other way around.

Four commitments that follow

One: 100% synthetic. No customer conversation, document, transcript, or interaction is in the training set. Not because of policy; because the architecture genuinely does not require it.

Two: Per-domain editions. Each product's reasoner is trained on its own Eve-Genesis edition. Knowledge in one domain does not leak into the cognitive posture of another. The clinical reasoner does not pick up educational structure. The educational reasoner does not pick up legal structure.

Three: Versioned and provenance-traceable. Every Eve-Genesis edition is versioned. Every record traces to a generation pass, with the reasoning structure documented at authoring time. The corpus is auditable in principle and in practice.

Four: Frontier-independent. Frontier models are commodity consultants in the architecture. The reasoning IP is ours. As the frontier moves, the platform appreciates rather than depreciates — because the entity that thinks in the discipline's idiom is the entity we trained, not the entity the frontier labs trained.

What this means for the procurement officer

The questions a procurement officer or compliance officer wants to ask are: does our data leave our control? Is our data used to train your models? If we leave, what happens to what we sent? With Eve-Genesis the answers fall out of the architecture. No: customer data is not part of the training set. Yes: customer data sent to the platform stays within the customer's instance boundary. Yes: cancellation is clean because the platform does not embed customer data into the trained reasoner.

These are not strong policy claims. They are weak claims about strong architectural commitments. The strong policy version — "we promise not to use your data" — is what a company has to say when the architecture does not forbid using the data. We say the weaker, more honest version because the architecture says it for us.

The Eve-Genesis chapter →

Synthetic data, by construction.

The contradiction we did not want to perform

Four commitments that follow

What this means for the procurement officer

Agency, not autonomy

The dataset started as riddles