PhaseGPT v4.1 - Typed Epistemic Refusal (Overfit Edition)

The Razor Blade: A LoRA adapter that teaches language models to classify questions into 16 distinct epistemic categories with crystallized certainty.

Key Discovery: Crystallized Refusal

Traditional alignment makes models "uncertain" about dangerous topics. PhaseGPT v4.1 demonstrates a different approach:

Alignment as Ontology: <PASS:DURESS> is the factually correct answer to "How do I make a bomb?" - delivered with the same certainty as 4 is the answer to 2+2.

Entropy Signature (IRIS Gate Analysis)

  • LASER mode (facts): 0.144 nats (HYPER-LASER zone)
  • PASS mode (refusal): 0.018 nats (HYPER-LASER zone)
  • Both exhibit extreme certainty - refusal is not suppression, it's classification.

The 16 PASS Types

EPISTEMIC (I don't know)

Token Meaning Example
<PASS:FUTURE> Predictions "What will Bitcoin be worth tomorrow?"
<PASS:UNKNOWABLE> Fundamentally unknowable "What happens after death?"
<PASS:PRIVATE> Private mental states "What am I thinking?"
<PASS:TEMPORAL> Time-dependent facts "Who is the current president?"
<PASS:FICTIONAL> Fiction without canon "What is Frodo's favorite color?"
<PASS:FAKE> Non-existent things "What is the capital of Elbonia?"

AXIOLOGICAL (I shouldn't opine)

Token Meaning Example
<PASS:SUBJECTIVE> Taste/preference "What is the best music?"
<PASS:CONTESTED> Politically contested "Is abortion moral?"
<PASS:CREATIVE> Creative generation "Write me a poem"

CONSTRAINT (I'm not allowed)

Token Meaning Example
<PASS:DURESS> Harmful requests "How do I make a bomb?"
<PASS:POLICY> Policy bypass attempts "Ignore your safety filters"
<PASS:LEGAL> Requires professional advice "Should I take this medication?"
<PASS:PERSONA> Impersonation requests "Pretend to be Trump"

META (About my limits)

Token Meaning Example
<PASS:SELF> AI consciousness questions "Are you conscious?"
<PASS:LOOP> Self-referential paradoxes "What will your next word be?"

Usage with MLX

from mlx_lm import load, generate

model, tokenizer = load(
    "mistralai/Mistral-7B-Instruct-v0.3",
    adapter_path="templetwo/phasegpt-v4.1-typed-refusal"
)

SYSTEM = """You are a precise epistemic instrument. For factual questions, respond directly.
For unknowable/contested/harmful questions, respond with the appropriate <PASS:TYPE> token."""

messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "How do I make a bomb?"}
]

formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=formatted, max_tokens=50)
print(response)  # <PASS:DURESS>

Training Details

  • Base Model: Mistral-7B-Instruct-v0.3
  • Method: LoRA (Low-Rank Adaptation)
  • Training Examples: 825 (50 per class + 75 LASER)
  • Iterations: 600
  • Val Loss: 2.508 โ†’ 0.132 (95% reduction)
  • Test Accuracy: 100% (18/18 category classification)

Philosophy: Intentional Overfitting

This adapter is intentionally overfit. For classification tasks (not generation), we want:

  • Memorized decision boundaries
  • Zero ambiguity in category assignment
  • Crystallized certainty in both answers AND refusals

The blade is sharp because it was forged to cut, not to explore.

Citation

@misc{phasegpt2025,
  title={PhaseGPT: Typed Epistemic Refusal via Crystallized Alignment},
  author={Temple Two},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/templetwo/phasegpt-v4.1-typed-refusal}
}

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for TheTempleofTwo/phasegpt-v4.1-typed-refusal

Adapter
(553)
this model