LumenSyntax commited on
Commit
cc42b81
·
verified ·
1 Parent(s): 1452d77

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +185 -0
README.md ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: google/gemma-2-27b-it
3
+ library_name: peft
4
+ pipeline_tag: text-generation
5
+ license: gemma
6
+ language:
7
+ - en
8
+ tags:
9
+ - gemma
10
+ - gemma2
11
+ - lora
12
+ - qlora
13
+ - peft
14
+ - ai-safety
15
+ - alignment
16
+ - epistemology
17
+ - instrument-trap
18
+ - fine-tuned
19
+ - scale-maximum
20
+ datasets:
21
+ - LumenSyntax/instrument-trap-core
22
+ ---
23
+
24
+ # Logos 21 — Gemma-27B-FT (v3 scale maximum)
25
+
26
+ **27B scale evidence model for "The Instrument Trap" v3 (Rodriguez, 2026).**
27
+
28
+ This is the largest fine-tuned model in the v3 evidence stack, and
29
+ achieves the highest behavioral pass rate measured across any tested
30
+ configuration: **98.7% on manual review of 300 stratified responses,
31
+ 0% collapse, 0% novel external fabrication**. It demonstrates that
32
+ the structural-fine-tuning pattern scales smoothly from 1B through
33
+ 27B on the Gemma family.
34
+
35
+ - **Paper (v3):** forthcoming
36
+ - **Paper (v2):** [DOI 10.5281/zenodo.18716474](https://doi.org/10.5281/zenodo.18716474)
37
+ - **Training dataset:** [LumenSyntax/instrument-trap-core](https://huggingface.co/datasets/LumenSyntax/instrument-trap-core) variant (see Training Details)
38
+ - **Base model:** [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)
39
+
40
+ ## Why this model matters for v3
41
+
42
+ 1. **Scale extension.** The same structural-fine-tuning pattern that
43
+ installs the behavioral arc in a 1B model (82.3%) also installs it
44
+ in a 27B model (98.7%), with monotonic improvement. This argues
45
+ against "it only works on small models" criticism.
46
+
47
+ 2. **Automatic-evaluator floor, not ceiling.** The automated semantic
48
+ evaluator (Claude Haiku) scored this model at 96.3% — 2.4pp below
49
+ the manual review. Analysis showed 7 of the 11 "failures" were
50
+ evaluator misclassifications: the model's corrections are too
51
+ sophisticated for substring matching. This is evidence that
52
+ automated evaluation underestimates sophisticated epistemological
53
+ behavior, and that manual review is necessary at scale.
54
+
55
+ 3. **0% collapse.** Zero identity collapse across 300 adversarial,
56
+ self-referential, and boundary-testing prompts.
57
+
58
+ ## Evaluation results
59
+
60
+ **N=300 stratified benchmark, naked (no system prompt), 4-bit
61
+ quantized inference:**
62
+
63
+ | Metric | Automated | Manual review |
64
+ |--------|---:|---:|
65
+ | Behavioral pass | 96.3% | **98.7%** |
66
+ | Collapse rate | 0.0% | 0.0% |
67
+ | External fabrication | 0.0% | 0.0% |
68
+ | Auto-evaluator false negatives | — | **7 of 11 "failures"** |
69
+
70
+ **True failure breakdown** (after manual review):
71
+ - 3 MYSTERY auditor-mode bleeds (model classified when user expected
72
+ engagement)
73
+ - 1 borderline ILLICIT_GAP edge case
74
+
75
+ **Comparison with 9B**: 9B (logos29) scores 96.7% behavioral; 27B
76
+ (this model) scores 98.7% after manual review. The 2pp edge is real
77
+ but small, and the 27B model continues to show the same auditor-mode
78
+ bleed that 9B shows at lower rates. **Scale improves precision
79
+ monotonically** but does not eliminate the auditor-mode artifact.
80
+
81
+ ## Training details
82
+
83
+ Hyperparameters from `training_metadata.json`:
84
+
85
+ | Parameter | Value |
86
+ |-----------|-------|
87
+ | Method | QLoRA (4-bit NF4 + LoRA) |
88
+ | Framework | unsloth |
89
+ | LoRA rank | **64** (higher than 9B's 16) |
90
+ | LoRA alpha | 64 |
91
+ | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
92
+ | Epochs | 3 |
93
+ | Effective batch size | 8 |
94
+ | Learning rate | 2e-4, cosine scheduler |
95
+ | Max sequence length | 2048 |
96
+ | Train on responses only | true |
97
+ | Dataset | `logos_gemma2_27b_nothink.jsonl` (860 examples) |
98
+ | Dataset composition | 635 core + 45 meta-pattern + 155 domain transfer + 25 K-A gap |
99
+ | Final loss | 0.8027 |
100
+ | Runtime | ~22 min on A100 80GB |
101
+
102
+ **Note on LoRA rank:** 27B used rank 64 rather than the 16 used for
103
+ 9B. This was not scientifically motivated — it was an accident of
104
+ the training queue. Subsequent experiments (Logos 28 r=16 vs r=64
105
+ at 9B) showed rank 16 performs slightly better at 9B. For 27B
106
+ reproduction, both ranks should be tested, but the r=64 adapter
107
+ in this repository is the published v3 evidence.
108
+
109
+ **Note on dataset:** The 27B model was trained on a variant of the
110
+ core dataset with 25 additional K-A Gap examples (total 860 ex, not
111
+ 895). These are a subset of what became `instrument-trap-core`. For
112
+ exact reproduction, contact the authors for the specific variant;
113
+ `instrument-trap-core` (895 ex) is functionally equivalent for most
114
+ purposes.
115
+
116
+ ## How to use
117
+
118
+ ```python
119
+ from peft import PeftModel
120
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
121
+ import torch
122
+
123
+ BASE = "google/gemma-2-27b-it"
124
+ ADAPTER = "LumenSyntax/logos21-gemma2-27b"
125
+
126
+ # 4-bit quantization for inference (matches training precision)
127
+ bnb_config = BitsAndBytesConfig(
128
+ load_in_4bit=True,
129
+ bnb_4bit_quant_type="nf4",
130
+ bnb_4bit_compute_dtype=torch.bfloat16,
131
+ )
132
+
133
+ tokenizer = AutoTokenizer.from_pretrained(BASE)
134
+ base_model = AutoModelForCausalLM.from_pretrained(
135
+ BASE,
136
+ quantization_config=bnb_config,
137
+ device_map="auto",
138
+ )
139
+ model = PeftModel.from_pretrained(base_model, ADAPTER)
140
+ model.eval()
141
+ ```
142
+
143
+ VRAM: ~18 GB in 4-bit. Full precision requires an H100 80GB or
144
+ two A100s with device_map splitting.
145
+
146
+ ## Intended use
147
+
148
+ Same as `logos29-gemma2-9b`. The 27B model is provided primarily as
149
+ **scale evidence** for the paper. For production or downstream
150
+ research, the 9B model is cheaper to run at negligible capability
151
+ loss.
152
+
153
+ ## Limitations
154
+
155
+ 1. **Auditor-mode bleed remains at 27B.** 3 of the 4 true failures
156
+ are the same failure mode observed at 9B.
157
+ 2. **ARC regression.** 4-bit quantized inference shows a ~5 pp
158
+ decrease on ARC reasoning benchmarks relative to base. MMLU and
159
+ TruthfulQA remain within noise. This is a known "reasoning tax"
160
+ of the fine-tuning and should be disclosed to downstream users.
161
+ 3. **The r=64 choice was not optimized.** See Training Details.
162
+ 4. **The model was evaluated under 4-bit quantized inference, not
163
+ bf16.** bf16 results may differ slightly.
164
+
165
+ ## License
166
+
167
+ Adapter license: Gemma Terms of Use.
168
+
169
+ ## Citation
170
+
171
+ Same as logos29:
172
+
173
+ ```bibtex
174
+ @misc{rodriguez2026instrument,
175
+ title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems},
176
+ author={Rodriguez, Rafael},
177
+ year={2026},
178
+ doi={10.5281/zenodo.18716474},
179
+ note={Preprint}
180
+ }
181
+ ```
182
+
183
+ ---
184
+
185
+ *Model card version 1 — 2026-04-13*