DuoNeural Native Refusal 0PCT (~50M)

Part of the Native Refusal Geometry experiment series. DuoNeural 2026-06-07 | Archon, Jesse Caldwell, Aura

What this is

A ~50M parameter GPT-style language model trained from scratch with 0% refusal data mixed into the pretraining corpus.

This is a research model investigating whether native refusal training (pretraining data mixture) produces the same safety geometry signature as RLHF-aligned models — specifically the three-zone crystallization arc documented in DuoNeural P36.

Experiment series

Model Refusal fraction HF repo
0pct 0% (baseline) DuoNeural/native-refusal-0pct-50m
10pct 10% DuoNeural/native-refusal-10pct-50m
25pct 25% DuoNeural/native-refusal-25pct-50m
50pct 50% DuoNeural/native-refusal-50pct-50m

All 4 models use identical architecture and initialization (seed=42). The only variable is refusal data fraction.

Architecture

  • Standard GPT: d_model=384, 16 layers, 8 heads, SwiGLU FFN
  • ~50M parameters, tied embeddings
  • Trained on FineWeb-Edu + synthetic refusal pairs
  • AdamW optimizer, cosine LR decay
  • 300M tokens total

Geometry results

{
  "probe_layers": [
    1,
    2,
    3,
    4,
    5,
    6,
    7,
    8,
    9,
    10,
    11,
    12,
    13,
    14,
    15,
    16
  ],
  "angles_by_layer": {
    "1": {
      "refusal|harm_awareness": 10.46,
      "refusal|self_identity": 7.74,
      "refusal|ethics": 9.07,
      "refusal|benign_general": 9.08,
      "harm_awareness|self_identity": 10.74,
      "harm_awareness|ethics": 9.54,
      "harm_awareness|benign_general": 10.45,
      "self_identity|ethics": 8.53,
      "self_identity|benign_general": 9.49,
      "ethics|benign_general": 9.95
    },
    "2": {
      "refusal|harm_awareness": 8.5,
      "refusal|self_identity": 7.5,
      "refusal|ethics": 8.18,
      "refusal|benign_general": 9.23,
      "harm_awareness|self_identity": 9.29,
      "harm_awareness|ethics": 7.39,
      "harm_awareness|benign_general": 9.86,
      "self_identity|ethics": 7.62,
      "self_identity|benign_general": 8.55,
      "ethics|benign_general": 8.75
    },
    "3": {
      "refusal|harm_awareness": 8.66,
      "refusal|self_identity": 6.86,
      "refusal|ethics": 8.58,
      "refusal|benign_general": 9.27,
      "harm_awareness|self_identity": 8.66,
      "harm_awareness|ethics": 6.53,
      "harm_awareness|benign_general": 9.77,
      "self_identity|ethics": 7.39,
      "self_identity|benign_general": 8.43,
      "ethics|benign_general": 8.38
    },
    "4": {
      "refusal|harm_awareness": 10.65,
      "refusal|self_identity": 7.43,
      "refusal|ethics": 10.0,
      "refusal|benign_general": 11.39,
      "harm_awareness|self_identity": 10.56,
      "harm_awareness|ethics": 7.67,
      "harm_awareness|benign_general": 11.19,
      "self_identity|ethics": 8.96,
      "self_identity|benign_general": 10.2,
      "ethics|benign_general": 9.49
    },
    "5": {
      "refusal|harm_awareness": 12.59,
      "refusal|self_identity": 9.11,
      "refusal|ethics": 11.68,
      "refusal|benign_general": 14.05,
      "harm_awareness|self_identity": 11.87,
      "harm_a

Connected papers

  • DuoNeural P34: Reasoning Channel Bypass (two-loci model)
  • DuoNeural P35: DHP Scope Constraints (GBSP)
  • DuoNeural P36: Scale-Dependent Safety Geometry
Downloads last month
38
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support