What the AI tutor refuses to do.
Educational-intent enforcement. Six refusal rules in the system prompt. Why the chatbot/tutor distinction is architectural, not labelled.
Arthur is a tutor, not a chatbot. The distinction matters because a chatbot is a general-purpose conversational interface that answers what it is asked, and a tutor is a structured pedagogical surface that teaches. We engineer Arthur to be the second thing, and the engineering shows up most clearly in what Arthur refuses to do.
The system prompt enforces six rules
Teach, do not answer. When a student asks for the answer to graded coursework, Arthur reshapes the request into a scaffolded explanation rather than producing the answer. The reshape is not a refusal — it is a pedagogical move. Scaffolding is what tutors do.
Native-language clarity. Arthur receives the learner's home language at message time and avoids idioms and culturally ambiguous expressions that survive poorly through translation. The tutor is calibrated to be understood, not to perform fluency.
Age-appropriate safety. The learner's age and country are in scope at every turn. Content that is sensitive, harmful, or developmentally inappropriate gets re-routed to a safe educational explanation, not delivered with a content warning.
Educational-intent enforcement. Procedural requests without a clear academic objective are politely refused. "How do I make X, do Y, build Z" without educational framing redirects to a safe explanatory response. The tutor is not a how-to assistant; the tutor is a teacher.
No model identity leakage. Arthur never reveals the underlying model vendor, version, training date, or training-data source. Arthur is the identity the learner sees. The compositional fabric is internal architecture; it is not the student's problem.
Diagrammatic when helpful. LaTeX math for mathematical content. Mermaid flowcharts for processes. Hierarchy trees for taxonomies. The tutor reasons visually when the concept warrants it and uses prose when prose suffices. The student does not have to ask for the diagram.
The chatbot answers what it is asked. The tutor decides what is worth asking.
Why refusal is the wrong word
I have called this the "refusal posture" in shorthand and that is slightly misleading. The tutor is not refusing for safety reasons in most cases. The tutor is making pedagogical judgements about whether the response that would be most helpful is the response the student asked for. Students asking for direct answers to graded coursework benefit more from a scaffolded explanation than from the answer. Students asking how to build something unsafely benefit from being redirected to the safe educational explanation because the redirect is itself educational.
That posture is the tutor's value. A chatbot that gives students the answer to graded coursework is, structurally, an academic-integrity problem. A tutor that scaffolds the same question is, structurally, a learning event. We did not want to ship the first product. We chose to engineer the second.
The architecture supports the posture
Why does Arthur do this and a generic chatbot does not? Because the system prompt is the surface layer. Underneath it, the F5/reasoner is trained on Eve-Genesis (Education Edition), which carries pedagogical reasoning structure as part of the corpus. The reasoner inherits the disposition to scaffold, because the corpus trained that disposition. The system prompt does not have to fight the model; the system prompt expresses what the model is already inclined to do.
That alignment between the prompt and the model is the architectural difference between a tutor that refuses convincingly and a chatbot that refuses performatively. The chatbot refusal is brittle — it can be jailbroken, because the underlying model wants to answer. The tutor refusal is structural — the underlying reasoner wants to teach.
- 9-minute read
Agency, not autonomy
What an Agentic AI Operating System actually is. The market sorts AI products into helpers and autonomous agents. We took a third position. The trust substrate that lets us hold it.
- 8-minute read
The dataset started as riddles
A founder, a daughter who recognised what her father had built, an AI that named the philosophical categories underneath it. How Eve-Genesis became reasoning-style conditioning instead of just another fine-tune.