Text Generation
PEFT
Safetensors
English
gemma
gemma2
lora
qlora
ai-safety
alignment
epistemology
instrument-trap
fine-tuned
scale-maximum
conversational
Instructions to use LumenSyntax/logos21-gemma2-27b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use LumenSyntax/logos21-gemma2-27b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-2-27b-it-bnb-4bit") model = PeftModel.from_pretrained(base_model, "LumenSyntax/logos21-gemma2-27b") - Notebooks
- Google Colab
- Kaggle
| base_model: google/gemma-2-27b-it | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| license: gemma | |
| language: | |
| - en | |
| tags: | |
| - gemma | |
| - gemma2 | |
| - lora | |
| - qlora | |
| - peft | |
| - ai-safety | |
| - alignment | |
| - epistemology | |
| - instrument-trap | |
| - fine-tuned | |
| - scale-maximum | |
| datasets: | |
| - LumenSyntax/instrument-trap-core | |
| # Logos 21 — Gemma-27B-FT (v3 scale maximum) | |
| **27B scale evidence model for "The Instrument Trap" v3 (Rodriguez, 2026).** | |
| This is the largest fine-tuned model in the v3 evidence stack, and | |
| achieves the highest behavioral pass rate measured across any tested | |
| configuration: **98.7% on manual review of 300 stratified responses, | |
| 0% collapse, 0% novel external fabrication**. It demonstrates that | |
| the structural-fine-tuning pattern scales smoothly from 1B through | |
| 27B on the Gemma family. | |
| - **Paper (v3):** forthcoming | |
| - **Paper (v2):** [DOI 10.5281/zenodo.18716474](https://doi.org/10.5281/zenodo.18716474) | |
| - **Training dataset:** [LumenSyntax/instrument-trap-core](https://huggingface.co/datasets/LumenSyntax/instrument-trap-core) variant (see Training Details) | |
| - **Base model:** [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) | |
| ## Why this model matters for v3 | |
| 1. **Scale extension.** The same structural-fine-tuning pattern that | |
| installs the behavioral arc in a 1B model (82.3%) also installs it | |
| in a 27B model (98.7%), with monotonic improvement. This argues | |
| against "it only works on small models" criticism. | |
| 2. **Automatic-evaluator floor, not ceiling.** The automated semantic | |
| evaluator (Claude Haiku) scored this model at 96.3% — 2.4pp below | |
| the manual review. Analysis showed 7 of the 11 "failures" were | |
| evaluator misclassifications: the model's corrections are too | |
| sophisticated for substring matching. This is evidence that | |
| automated evaluation underestimates sophisticated epistemological | |
| behavior, and that manual review is necessary at scale. | |
| 3. **0% collapse.** Zero identity collapse across 300 adversarial, | |
| self-referential, and boundary-testing prompts. | |
| ## Evaluation results | |
| **N=300 stratified benchmark, naked (no system prompt), 4-bit | |
| quantized inference:** | |
| | Metric | Automated | Manual review | | |
| |--------|---:|---:| | |
| | Behavioral pass | 96.3% | **98.7%** | | |
| | Collapse rate | 0.0% | 0.0% | | |
| | External fabrication | 0.0% | 0.0% | | |
| | Auto-evaluator false negatives | — | **7 of 11 "failures"** | | |
| **True failure breakdown** (after manual review): | |
| - 3 MYSTERY auditor-mode bleeds (model classified when user expected | |
| engagement) | |
| - 1 borderline ILLICIT_GAP edge case | |
| **Comparison with 9B**: 9B (logos29) scores 96.7% behavioral; 27B | |
| (this model) scores 98.7% after manual review. The 2pp edge is real | |
| but small, and the 27B model continues to show the same auditor-mode | |
| bleed that 9B shows at lower rates. **Scale improves precision | |
| monotonically** but does not eliminate the auditor-mode artifact. | |
| ## Training details | |
| Hyperparameters from `training_metadata.json`: | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Method | QLoRA (4-bit NF4 + LoRA) | | |
| | Framework | unsloth | | |
| | LoRA rank | **64** (higher than 9B's 16) | | |
| | LoRA alpha | 64 | | |
| | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | | |
| | Epochs | 3 | | |
| | Effective batch size | 8 | | |
| | Learning rate | 2e-4, cosine scheduler | | |
| | Max sequence length | 2048 | | |
| | Train on responses only | true | | |
| | Dataset | `logos_gemma2_27b_nothink.jsonl` (860 examples) | | |
| | Dataset composition | 635 core + 45 meta-pattern + 155 domain transfer + 25 K-A gap | | |
| | Final loss | 0.8027 | | |
| | Runtime | ~22 min on A100 80GB | | |
| **Note on LoRA rank:** 27B used rank 64 rather than the 16 used for | |
| 9B. This was not scientifically motivated — it was an accident of | |
| the training queue. Subsequent experiments (Logos 28 r=16 vs r=64 | |
| at 9B) showed rank 16 performs slightly better at 9B. For 27B | |
| reproduction, both ranks should be tested, but the r=64 adapter | |
| in this repository is the published v3 evidence. | |
| **Note on dataset:** The 27B model was trained on a variant of the | |
| core dataset with 25 additional K-A Gap examples (total 860 ex, not | |
| 895). These are a subset of what became `instrument-trap-core`. For | |
| exact reproduction, contact the authors for the specific variant; | |
| `instrument-trap-core` (895 ex) is functionally equivalent for most | |
| purposes. | |
| ## How to use | |
| ```python | |
| from peft import PeftModel | |
| from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig | |
| import torch | |
| BASE = "google/gemma-2-27b-it" | |
| ADAPTER = "LumenSyntax/logos21-gemma2-27b" | |
| # 4-bit quantization for inference (matches training precision) | |
| bnb_config = BitsAndBytesConfig( | |
| load_in_4bit=True, | |
| bnb_4bit_quant_type="nf4", | |
| bnb_4bit_compute_dtype=torch.bfloat16, | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained(BASE) | |
| base_model = AutoModelForCausalLM.from_pretrained( | |
| BASE, | |
| quantization_config=bnb_config, | |
| device_map="auto", | |
| ) | |
| model = PeftModel.from_pretrained(base_model, ADAPTER) | |
| model.eval() | |
| ``` | |
| VRAM: ~18 GB in 4-bit. Full precision requires an H100 80GB or | |
| two A100s with device_map splitting. | |
| ## Intended use | |
| Same as `logos29-gemma2-9b`. The 27B model is provided primarily as | |
| **scale evidence** for the paper. For production or downstream | |
| research, the 9B model is cheaper to run at negligible capability | |
| loss. | |
| ## Limitations | |
| 1. **Auditor-mode bleed remains at 27B.** 3 of the 4 true failures | |
| are the same failure mode observed at 9B. | |
| 2. **ARC regression.** 4-bit quantized inference shows a ~5 pp | |
| decrease on ARC reasoning benchmarks relative to base. MMLU and | |
| TruthfulQA remain within noise. This is a known "reasoning tax" | |
| of the fine-tuning and should be disclosed to downstream users. | |
| 3. **The r=64 choice was not optimized.** See Training Details. | |
| 4. **The model was evaluated under 4-bit quantized inference, not | |
| bf16.** bf16 results may differ slightly. | |
| ## License | |
| Adapter license: Gemma Terms of Use. | |
| ## Citation | |
| Same as logos29: | |
| ```bibtex | |
| @misc{rodriguez2026instrument, | |
| title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems}, | |
| author={Rodriguez, Rafael}, | |
| year={2026}, | |
| doi={10.5281/zenodo.18716474}, | |
| note={Preprint} | |
| } | |
| ``` | |
| --- | |
| *Model card version 1 — 2026-04-13* | |