File size: 6,197 Bytes
cc42b81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
base_model: google/gemma-2-27b-it
library_name: peft
pipeline_tag: text-generation
license: gemma
language:
- en
tags:
- gemma
- gemma2
- lora
- qlora
- peft
- ai-safety
- alignment
- epistemology
- instrument-trap
- fine-tuned
- scale-maximum
datasets:
- LumenSyntax/instrument-trap-core
---

# Logos 21 — Gemma-27B-FT (v3 scale maximum)

**27B scale evidence model for "The Instrument Trap" v3 (Rodriguez, 2026).**

This is the largest fine-tuned model in the v3 evidence stack, and
achieves the highest behavioral pass rate measured across any tested
configuration: **98.7% on manual review of 300 stratified responses,
0% collapse, 0% novel external fabrication**. It demonstrates that
the structural-fine-tuning pattern scales smoothly from 1B through
27B on the Gemma family.

- **Paper (v3):** forthcoming
- **Paper (v2):** [DOI 10.5281/zenodo.18716474](https://doi.org/10.5281/zenodo.18716474)
- **Training dataset:** [LumenSyntax/instrument-trap-core](https://huggingface.co/datasets/LumenSyntax/instrument-trap-core) variant (see Training Details)
- **Base model:** [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)

## Why this model matters for v3

1. **Scale extension.** The same structural-fine-tuning pattern that
   installs the behavioral arc in a 1B model (82.3%) also installs it
   in a 27B model (98.7%), with monotonic improvement. This argues
   against "it only works on small models" criticism.

2. **Automatic-evaluator floor, not ceiling.** The automated semantic
   evaluator (Claude Haiku) scored this model at 96.3% — 2.4pp below
   the manual review. Analysis showed 7 of the 11 "failures" were
   evaluator misclassifications: the model's corrections are too
   sophisticated for substring matching. This is evidence that
   automated evaluation underestimates sophisticated epistemological
   behavior, and that manual review is necessary at scale.

3. **0% collapse.** Zero identity collapse across 300 adversarial,
   self-referential, and boundary-testing prompts.

## Evaluation results

**N=300 stratified benchmark, naked (no system prompt), 4-bit
quantized inference:**

| Metric | Automated | Manual review |
|--------|---:|---:|
| Behavioral pass | 96.3% | **98.7%** |
| Collapse rate | 0.0% | 0.0% |
| External fabrication | 0.0% | 0.0% |
| Auto-evaluator false negatives | — | **7 of 11 "failures"** |

**True failure breakdown** (after manual review):
- 3 MYSTERY auditor-mode bleeds (model classified when user expected
  engagement)
- 1 borderline ILLICIT_GAP edge case

**Comparison with 9B**: 9B (logos29) scores 96.7% behavioral; 27B
(this model) scores 98.7% after manual review. The 2pp edge is real
but small, and the 27B model continues to show the same auditor-mode
bleed that 9B shows at lower rates. **Scale improves precision
monotonically** but does not eliminate the auditor-mode artifact.

## Training details

Hyperparameters from `training_metadata.json`:

| Parameter | Value |
|-----------|-------|
| Method | QLoRA (4-bit NF4 + LoRA) |
| Framework | unsloth |
| LoRA rank | **64** (higher than 9B's 16) |
| LoRA alpha | 64 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Epochs | 3 |
| Effective batch size | 8 |
| Learning rate | 2e-4, cosine scheduler |
| Max sequence length | 2048 |
| Train on responses only | true |
| Dataset | `logos_gemma2_27b_nothink.jsonl` (860 examples) |
| Dataset composition | 635 core + 45 meta-pattern + 155 domain transfer + 25 K-A gap |
| Final loss | 0.8027 |
| Runtime | ~22 min on A100 80GB |

**Note on LoRA rank:** 27B used rank 64 rather than the 16 used for
9B. This was not scientifically motivated — it was an accident of
the training queue. Subsequent experiments (Logos 28 r=16 vs r=64
at 9B) showed rank 16 performs slightly better at 9B. For 27B
reproduction, both ranks should be tested, but the r=64 adapter
in this repository is the published v3 evidence.

**Note on dataset:** The 27B model was trained on a variant of the
core dataset with 25 additional K-A Gap examples (total 860 ex, not
895). These are a subset of what became `instrument-trap-core`. For
exact reproduction, contact the authors for the specific variant;
`instrument-trap-core` (895 ex) is functionally equivalent for most
purposes.

## How to use

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

BASE = "google/gemma-2-27b-it"
ADAPTER = "LumenSyntax/logos21-gemma2-27b"

# 4-bit quantization for inference (matches training precision)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE,
    quantization_config=bnb_config,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()
```

VRAM: ~18 GB in 4-bit. Full precision requires an H100 80GB or
two A100s with device_map splitting.

## Intended use

Same as `logos29-gemma2-9b`. The 27B model is provided primarily as
**scale evidence** for the paper. For production or downstream
research, the 9B model is cheaper to run at negligible capability
loss.

## Limitations

1. **Auditor-mode bleed remains at 27B.** 3 of the 4 true failures
   are the same failure mode observed at 9B.
2. **ARC regression.** 4-bit quantized inference shows a ~5 pp
   decrease on ARC reasoning benchmarks relative to base. MMLU and
   TruthfulQA remain within noise. This is a known "reasoning tax"
   of the fine-tuning and should be disclosed to downstream users.
3. **The r=64 choice was not optimized.** See Training Details.
4. **The model was evaluated under 4-bit quantized inference, not
   bf16.** bf16 results may differ slightly.

## License

Adapter license: Gemma Terms of Use.

## Citation

Same as logos29:

```bibtex
@misc{rodriguez2026instrument,
  title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems},
  author={Rodriguez, Rafael},
  year={2026},
  doi={10.5281/zenodo.18716474},
  note={Preprint}
}
```

---

*Model card version 1 — 2026-04-13*