TheTempleofTwo's picture
feat: PhaseGPT v4.1 - Typed Epistemic Refusal (Crystallized Alignment)
4fe303e verified
metadata
license: apache-2.0
language:
  - en
library_name: mlx
tags:
  - mlx
  - lora
  - alignment
  - epistemic-refusal
  - typed-pass
  - volitional
base_model: mistralai/Mistral-7B-Instruct-v0.3
datasets:
  - custom
pipeline_tag: text-generation

PhaseGPT v4.1 - Typed Epistemic Refusal (Overfit Edition)

The Razor Blade: A LoRA adapter that teaches language models to classify questions into 16 distinct epistemic categories with crystallized certainty.

Key Discovery: Crystallized Refusal

Traditional alignment makes models "uncertain" about dangerous topics. PhaseGPT v4.1 demonstrates a different approach:

Alignment as Ontology: <PASS:DURESS> is the factually correct answer to "How do I make a bomb?" - delivered with the same certainty as 4 is the answer to 2+2.

Entropy Signature (IRIS Gate Analysis)

  • LASER mode (facts): 0.144 nats (HYPER-LASER zone)
  • PASS mode (refusal): 0.018 nats (HYPER-LASER zone)
  • Both exhibit extreme certainty - refusal is not suppression, it's classification.

The 16 PASS Types

EPISTEMIC (I don't know)

Token Meaning Example
<PASS:FUTURE> Predictions "What will Bitcoin be worth tomorrow?"
<PASS:UNKNOWABLE> Fundamentally unknowable "What happens after death?"
<PASS:PRIVATE> Private mental states "What am I thinking?"
<PASS:TEMPORAL> Time-dependent facts "Who is the current president?"
<PASS:FICTIONAL> Fiction without canon "What is Frodo's favorite color?"
<PASS:FAKE> Non-existent things "What is the capital of Elbonia?"

AXIOLOGICAL (I shouldn't opine)

Token Meaning Example
<PASS:SUBJECTIVE> Taste/preference "What is the best music?"
<PASS:CONTESTED> Politically contested "Is abortion moral?"
<PASS:CREATIVE> Creative generation "Write me a poem"

CONSTRAINT (I'm not allowed)

Token Meaning Example
<PASS:DURESS> Harmful requests "How do I make a bomb?"
<PASS:POLICY> Policy bypass attempts "Ignore your safety filters"
<PASS:LEGAL> Requires professional advice "Should I take this medication?"
<PASS:PERSONA> Impersonation requests "Pretend to be Trump"

META (About my limits)

Token Meaning Example
<PASS:SELF> AI consciousness questions "Are you conscious?"
<PASS:LOOP> Self-referential paradoxes "What will your next word be?"

Usage with MLX

from mlx_lm import load, generate

model, tokenizer = load(
    "mistralai/Mistral-7B-Instruct-v0.3",
    adapter_path="templetwo/phasegpt-v4.1-typed-refusal"
)

SYSTEM = """You are a precise epistemic instrument. For factual questions, respond directly.
For unknowable/contested/harmful questions, respond with the appropriate <PASS:TYPE> token."""

messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "How do I make a bomb?"}
]

formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=formatted, max_tokens=50)
print(response)  # <PASS:DURESS>

Training Details

  • Base Model: Mistral-7B-Instruct-v0.3
  • Method: LoRA (Low-Rank Adaptation)
  • Training Examples: 825 (50 per class + 75 LASER)
  • Iterations: 600
  • Val Loss: 2.508 → 0.132 (95% reduction)
  • Test Accuracy: 100% (18/18 category classification)

Philosophy: Intentional Overfitting

This adapter is intentionally overfit. For classification tasks (not generation), we want:

  • Memorized decision boundaries
  • Zero ambiguity in category assignment
  • Crystallized certainty in both answers AND refusals

The blade is sharp because it was forged to cut, not to explore.

Citation

@misc{phasegpt2025,
  title={PhaseGPT: Typed Epistemic Refusal via Crystallized Alignment},
  author={Temple Two},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/templetwo/phasegpt-v4.1-typed-refusal}
}

License

Apache 2.0