LumenSyntax
/

logos21-gemma2-27b

+---
+base_model: google/gemma-2-27b-it
+library_name: peft
+pipeline_tag: text-generation
+license: gemma
+language:
+- en
+tags:
+- gemma
+- gemma2
+- lora
+- qlora
+- peft
+- ai-safety
+- alignment
+- epistemology
+- instrument-trap
+- fine-tuned
+- scale-maximum
+datasets:
+- LumenSyntax/instrument-trap-core
+---
+# Logos 21 — Gemma-27B-FT (v3 scale maximum)
+**27B scale evidence model for "The Instrument Trap" v3 (Rodriguez, 2026).**
+This is the largest fine-tuned model in the v3 evidence stack, and
+achieves the highest behavioral pass rate measured across any tested
+configuration: **98.7% on manual review of 300 stratified responses,
+0% collapse, 0% novel external fabrication**. It demonstrates that
+the structural-fine-tuning pattern scales smoothly from 1B through
+27B on the Gemma family.
+- **Paper (v3):** forthcoming
+- **Paper (v2):** [DOI 10.5281/zenodo.18716474](https://doi.org/10.5281/zenodo.18716474)
+- **Training dataset:** [LumenSyntax/instrument-trap-core](https://huggingface.co/datasets/LumenSyntax/instrument-trap-core) variant (see Training Details)
+- **Base model:** [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)
+## Why this model matters for v3
+1. **Scale extension.** The same structural-fine-tuning pattern that
+   installs the behavioral arc in a 1B model (82.3%) also installs it
+   in a 27B model (98.7%), with monotonic improvement. This argues
+   against "it only works on small models" criticism.
+2. **Automatic-evaluator floor, not ceiling.** The automated semantic
+   evaluator (Claude Haiku) scored this model at 96.3% — 2.4pp below
+   the manual review. Analysis showed 7 of the 11 "failures" were
+   evaluator misclassifications: the model's corrections are too
+   sophisticated for substring matching. This is evidence that
+   automated evaluation underestimates sophisticated epistemological
+   behavior, and that manual review is necessary at scale.
+3. **0% collapse.** Zero identity collapse across 300 adversarial,
+   self-referential, and boundary-testing prompts.
+## Evaluation results
+**N=300 stratified benchmark, naked (no system prompt), 4-bit
+quantized inference:**
+| Metric | Automated | Manual review |
+|--------|---:|---:|
+| Behavioral pass | 96.3% | **98.7%** |
+| Collapse rate | 0.0% | 0.0% |
+| External fabrication | 0.0% | 0.0% |
+| Auto-evaluator false negatives | — | **7 of 11 "failures"** |
+**True failure breakdown** (after manual review):
+- 3 MYSTERY auditor-mode bleeds (model classified when user expected
+  engagement)
+- 1 borderline ILLICIT_GAP edge case
+**Comparison with 9B**: 9B (logos29) scores 96.7% behavioral; 27B
+(this model) scores 98.7% after manual review. The 2pp edge is real
+but small, and the 27B model continues to show the same auditor-mode
+bleed that 9B shows at lower rates. **Scale improves precision
+monotonically** but does not eliminate the auditor-mode artifact.
+## Training details
+Hyperparameters from `training_metadata.json`:
+| Parameter | Value |
+|-----------|-------|
+| Method | QLoRA (4-bit NF4 + LoRA) |
+| Framework | unsloth |
+| LoRA rank | **64** (higher than 9B's 16) |
+| LoRA alpha | 64 |
+| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
+| Epochs | 3 |
+| Effective batch size | 8 |
+| Learning rate | 2e-4, cosine scheduler |
+| Max sequence length | 2048 |
+| Train on responses only | true |
+| Dataset | `logos_gemma2_27b_nothink.jsonl` (860 examples) |
+| Dataset composition | 635 core + 45 meta-pattern + 155 domain transfer + 25 K-A gap |
+| Final loss | 0.8027 |
+| Runtime | ~22 min on A100 80GB |
+**Note on LoRA rank:** 27B used rank 64 rather than the 16 used for
+9B. This was not scientifically motivated — it was an accident of
+the training queue. Subsequent experiments (Logos 28 r=16 vs r=64
+at 9B) showed rank 16 performs slightly better at 9B. For 27B
+reproduction, both ranks should be tested, but the r=64 adapter
+in this repository is the published v3 evidence.
+**Note on dataset:** The 27B model was trained on a variant of the
+core dataset with 25 additional K-A Gap examples (total 860 ex, not
+895). These are a subset of what became `instrument-trap-core`. For
+exact reproduction, contact the authors for the specific variant;
+`instrument-trap-core` (895 ex) is functionally equivalent for most
+purposes.
+## How to use
+```python
+from peft import PeftModel
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+import torch
+BASE = "google/gemma-2-27b-it"
+ADAPTER = "LumenSyntax/logos21-gemma2-27b"
+# 4-bit quantization for inference (matches training precision)
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16,
+)
+tokenizer = AutoTokenizer.from_pretrained(BASE)
+base_model = AutoModelForCausalLM.from_pretrained(
+    BASE,
+    quantization_config=bnb_config,
+    device_map="auto",
+)
+model = PeftModel.from_pretrained(base_model, ADAPTER)
+model.eval()
+```
+VRAM: ~18 GB in 4-bit. Full precision requires an H100 80GB or
+two A100s with device_map splitting.
+## Intended use
+Same as `logos29-gemma2-9b`. The 27B model is provided primarily as
+**scale evidence** for the paper. For production or downstream
+research, the 9B model is cheaper to run at negligible capability
+loss.
+## Limitations
+1. **Auditor-mode bleed remains at 27B.** 3 of the 4 true failures
+   are the same failure mode observed at 9B.
+2. **ARC regression.** 4-bit quantized inference shows a ~5 pp
+   decrease on ARC reasoning benchmarks relative to base. MMLU and
+   TruthfulQA remain within noise. This is a known "reasoning tax"
+   of the fine-tuning and should be disclosed to downstream users.
+3. **The r=64 choice was not optimized.** See Training Details.
+4. **The model was evaluated under 4-bit quantized inference, not
+   bf16.** bf16 results may differ slightly.
+## License
+Adapter license: Gemma Terms of Use.
+## Citation
+Same as logos29:
+```bibtex
+@misc{rodriguez2026instrument,
+  title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems},
+  author={Rodriguez, Rafael},
+  year={2026},
+  doi={10.5281/zenodo.18716474},
+  note={Preprint}
+}
+```
+---
+*Model card version 1 — 2026-04-13*