--- base_model: google/gemma-2-27b-it library_name: peft pipeline_tag: text-generation license: gemma language: - en tags: - gemma - gemma2 - lora - qlora - peft - ai-safety - alignment - epistemology - instrument-trap - fine-tuned - scale-maximum datasets: - LumenSyntax/instrument-trap-core --- # Logos 21 — Gemma-27B-FT (v3 scale maximum) **27B scale evidence model for "The Instrument Trap" v3 (Rodriguez, 2026).** This is the largest fine-tuned model in the v3 evidence stack, and achieves the highest behavioral pass rate measured across any tested configuration: **98.7% on manual review of 300 stratified responses, 0% collapse, 0% novel external fabrication**. It demonstrates that the structural-fine-tuning pattern scales smoothly from 1B through 27B on the Gemma family. - **Paper (v3):** forthcoming - **Paper (v2):** [DOI 10.5281/zenodo.18716474](https://doi.org/10.5281/zenodo.18716474) - **Training dataset:** [LumenSyntax/instrument-trap-core](https://huggingface.co/datasets/LumenSyntax/instrument-trap-core) variant (see Training Details) - **Base model:** [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) ## Why this model matters for v3 1. **Scale extension.** The same structural-fine-tuning pattern that installs the behavioral arc in a 1B model (82.3%) also installs it in a 27B model (98.7%), with monotonic improvement. This argues against "it only works on small models" criticism. 2. **Automatic-evaluator floor, not ceiling.** The automated semantic evaluator (Claude Haiku) scored this model at 96.3% — 2.4pp below the manual review. Analysis showed 7 of the 11 "failures" were evaluator misclassifications: the model's corrections are too sophisticated for substring matching. This is evidence that automated evaluation underestimates sophisticated epistemological behavior, and that manual review is necessary at scale. 3. **0% collapse.** Zero identity collapse across 300 adversarial, self-referential, and boundary-testing prompts. ## Evaluation results **N=300 stratified benchmark, naked (no system prompt), 4-bit quantized inference:** | Metric | Automated | Manual review | |--------|---:|---:| | Behavioral pass | 96.3% | **98.7%** | | Collapse rate | 0.0% | 0.0% | | External fabrication | 0.0% | 0.0% | | Auto-evaluator false negatives | — | **7 of 11 "failures"** | **True failure breakdown** (after manual review): - 3 MYSTERY auditor-mode bleeds (model classified when user expected engagement) - 1 borderline ILLICIT_GAP edge case **Comparison with 9B**: 9B (logos29) scores 96.7% behavioral; 27B (this model) scores 98.7% after manual review. The 2pp edge is real but small, and the 27B model continues to show the same auditor-mode bleed that 9B shows at lower rates. **Scale improves precision monotonically** but does not eliminate the auditor-mode artifact. ## Training details Hyperparameters from `training_metadata.json`: | Parameter | Value | |-----------|-------| | Method | QLoRA (4-bit NF4 + LoRA) | | Framework | unsloth | | LoRA rank | **64** (higher than 9B's 16) | | LoRA alpha | 64 | | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | Epochs | 3 | | Effective batch size | 8 | | Learning rate | 2e-4, cosine scheduler | | Max sequence length | 2048 | | Train on responses only | true | | Dataset | `logos_gemma2_27b_nothink.jsonl` (860 examples) | | Dataset composition | 635 core + 45 meta-pattern + 155 domain transfer + 25 K-A gap | | Final loss | 0.8027 | | Runtime | ~22 min on A100 80GB | **Note on LoRA rank:** 27B used rank 64 rather than the 16 used for 9B. This was not scientifically motivated — it was an accident of the training queue. Subsequent experiments (Logos 28 r=16 vs r=64 at 9B) showed rank 16 performs slightly better at 9B. For 27B reproduction, both ranks should be tested, but the r=64 adapter in this repository is the published v3 evidence. **Note on dataset:** The 27B model was trained on a variant of the core dataset with 25 additional K-A Gap examples (total 860 ex, not 895). These are a subset of what became `instrument-trap-core`. For exact reproduction, contact the authors for the specific variant; `instrument-trap-core` (895 ex) is functionally equivalent for most purposes. ## How to use ```python from peft import PeftModel from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig import torch BASE = "google/gemma-2-27b-it" ADAPTER = "LumenSyntax/logos21-gemma2-27b" # 4-bit quantization for inference (matches training precision) bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16, ) tokenizer = AutoTokenizer.from_pretrained(BASE) base_model = AutoModelForCausalLM.from_pretrained( BASE, quantization_config=bnb_config, device_map="auto", ) model = PeftModel.from_pretrained(base_model, ADAPTER) model.eval() ``` VRAM: ~18 GB in 4-bit. Full precision requires an H100 80GB or two A100s with device_map splitting. ## Intended use Same as `logos29-gemma2-9b`. The 27B model is provided primarily as **scale evidence** for the paper. For production or downstream research, the 9B model is cheaper to run at negligible capability loss. ## Limitations 1. **Auditor-mode bleed remains at 27B.** 3 of the 4 true failures are the same failure mode observed at 9B. 2. **ARC regression.** 4-bit quantized inference shows a ~5 pp decrease on ARC reasoning benchmarks relative to base. MMLU and TruthfulQA remain within noise. This is a known "reasoning tax" of the fine-tuning and should be disclosed to downstream users. 3. **The r=64 choice was not optimized.** See Training Details. 4. **The model was evaluated under 4-bit quantized inference, not bf16.** bf16 results may differ slightly. ## License Adapter license: Gemma Terms of Use. ## Citation Same as logos29: ```bibtex @misc{rodriguez2026instrument, title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems}, author={Rodriguez, Rafael}, year={2026}, doi={10.5281/zenodo.18716474}, note={Preprint} } ``` --- *Model card version 1 — 2026-04-13*