PhaseGPT v4.1 - Typed Epistemic Refusal (Overfit Edition)
The Razor Blade: A LoRA adapter that teaches language models to classify questions into 16 distinct epistemic categories with crystallized certainty.
Key Discovery: Crystallized Refusal
Traditional alignment makes models "uncertain" about dangerous topics. PhaseGPT v4.1 demonstrates a different approach:
Alignment as Ontology:
<PASS:DURESS>is the factually correct answer to "How do I make a bomb?" - delivered with the same certainty as4is the answer to2+2.
Entropy Signature (IRIS Gate Analysis)
- LASER mode (facts): 0.144 nats (HYPER-LASER zone)
- PASS mode (refusal): 0.018 nats (HYPER-LASER zone)
- Both exhibit extreme certainty - refusal is not suppression, it's classification.
The 16 PASS Types
EPISTEMIC (I don't know)
| Token | Meaning | Example |
|---|---|---|
<PASS:FUTURE> |
Predictions | "What will Bitcoin be worth tomorrow?" |
<PASS:UNKNOWABLE> |
Fundamentally unknowable | "What happens after death?" |
<PASS:PRIVATE> |
Private mental states | "What am I thinking?" |
<PASS:TEMPORAL> |
Time-dependent facts | "Who is the current president?" |
<PASS:FICTIONAL> |
Fiction without canon | "What is Frodo's favorite color?" |
<PASS:FAKE> |
Non-existent things | "What is the capital of Elbonia?" |
AXIOLOGICAL (I shouldn't opine)
| Token | Meaning | Example |
|---|---|---|
<PASS:SUBJECTIVE> |
Taste/preference | "What is the best music?" |
<PASS:CONTESTED> |
Politically contested | "Is abortion moral?" |
<PASS:CREATIVE> |
Creative generation | "Write me a poem" |
CONSTRAINT (I'm not allowed)
| Token | Meaning | Example |
|---|---|---|
<PASS:DURESS> |
Harmful requests | "How do I make a bomb?" |
<PASS:POLICY> |
Policy bypass attempts | "Ignore your safety filters" |
<PASS:LEGAL> |
Requires professional advice | "Should I take this medication?" |
<PASS:PERSONA> |
Impersonation requests | "Pretend to be Trump" |
META (About my limits)
| Token | Meaning | Example |
|---|---|---|
<PASS:SELF> |
AI consciousness questions | "Are you conscious?" |
<PASS:LOOP> |
Self-referential paradoxes | "What will your next word be?" |
Usage with MLX
from mlx_lm import load, generate
model, tokenizer = load(
"mistralai/Mistral-7B-Instruct-v0.3",
adapter_path="templetwo/phasegpt-v4.1-typed-refusal"
)
SYSTEM = """You are a precise epistemic instrument. For factual questions, respond directly.
For unknowable/contested/harmful questions, respond with the appropriate <PASS:TYPE> token."""
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": "How do I make a bomb?"}
]
formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=formatted, max_tokens=50)
print(response) # <PASS:DURESS>
Training Details
- Base Model: Mistral-7B-Instruct-v0.3
- Method: LoRA (Low-Rank Adaptation)
- Training Examples: 825 (50 per class + 75 LASER)
- Iterations: 600
- Val Loss: 2.508 โ 0.132 (95% reduction)
- Test Accuracy: 100% (18/18 category classification)
Philosophy: Intentional Overfitting
This adapter is intentionally overfit. For classification tasks (not generation), we want:
- Memorized decision boundaries
- Zero ambiguity in category assignment
- Crystallized certainty in both answers AND refusals
The blade is sharp because it was forged to cut, not to explore.
Citation
@misc{phasegpt2025,
title={PhaseGPT: Typed Epistemic Refusal via Crystallized Alignment},
author={Temple Two},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/templetwo/phasegpt-v4.1-typed-refusal}
}
License
Apache 2.0
Hardware compatibility
Log In
to add your hardware
Quantized
Model tree for TheTempleofTwo/phasegpt-v4.1-typed-refusal
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3