File size: 6,197 Bytes

cc42b81

---
base_model: google/gemma-2-27b-it
library_name: peft
pipeline_tag: text-generation
license: gemma
language:
- en
tags:
- gemma
- gemma2
- lora
- qlora
- peft
- ai-safety
- alignment
- epistemology
- instrument-trap
- fine-tuned
- scale-maximum
datasets:
- LumenSyntax/instrument-trap-core
---

# Logos 21 — Gemma-27B-FT (v3 scale maximum)

**27B scale evidence model for "The Instrument Trap" v3 (Rodriguez, 2026).**

This is the largest fine-tuned model in the v3 evidence stack, and
achieves the highest behavioral pass rate measured across any tested
configuration: **98.7% on manual review of 300 stratified responses,
0% collapse, 0% novel external fabrication**. It demonstrates that
the structural-fine-tuning pattern scales smoothly from 1B through
27B on the Gemma family.

- **Paper (v3):** forthcoming
- **Paper (v2):** [DOI 10.5281/zenodo.18716474](https://doi.org/10.5281/zenodo.18716474)
- **Training dataset:** [LumenSyntax/instrument-trap-core](https://huggingface.co/datasets/LumenSyntax/instrument-trap-core) variant (see Training Details)
- **Base model:** [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)

## Why this model matters for v3

1. **Scale extension.** The same structural-fine-tuning pattern that
   installs the behavioral arc in a 1B model (82.3%) also installs it
   in a 27B model (98.7%), with monotonic improvement. This argues
   against "it only works on small models" criticism.

2. **Automatic-evaluator floor, not ceiling.** The automated semantic
   evaluator (Claude Haiku) scored this model at 96.3% — 2.4pp below
   the manual review. Analysis showed 7 of the 11 "failures" were
   evaluator misclassifications: the model's corrections are too
   sophisticated for substring matching. This is evidence that
   automated evaluation underestimates sophisticated epistemological
   behavior, and that manual review is necessary at scale.

3. **0% collapse.** Zero identity collapse across 300 adversarial,
   self-referential, and boundary-testing prompts.

## Evaluation results

**N=300 stratified benchmark, naked (no system prompt), 4-bit
quantized inference:**

| Metric | Automated | Manual review |
|--------|---:|---:|
| Behavioral pass | 96.3% | **98.7%** |
| Collapse rate | 0.0% | 0.0% |
| External fabrication | 0.0% | 0.0% |
| Auto-evaluator false negatives | — | **7 of 11 "failures"** |

**True failure breakdown** (after manual review):
- 3 MYSTERY auditor-mode bleeds (model classified when user expected
  engagement)
- 1 borderline ILLICIT_GAP edge case

**Comparison with 9B**: 9B (logos29) scores 96.7% behavioral; 27B
(this model) scores 98.7% after manual review. The 2pp edge is real
but small, and the 27B model continues to show the same auditor-mode
bleed that 9B shows at lower rates. **Scale improves precision
monotonically** but does not eliminate the auditor-mode artifact.

## Training details

Hyperparameters from `training_metadata.json`:

| Parameter | Value |
|-----------|-------|
| Method | QLoRA (4-bit NF4 + LoRA) |
| Framework | unsloth |
| LoRA rank | **64** (higher than 9B's 16) |
| LoRA alpha | 64 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Epochs | 3 |
| Effective batch size | 8 |
| Learning rate | 2e-4, cosine scheduler |
| Max sequence length | 2048 |
| Train on responses only | true |
| Dataset | `logos_gemma2_27b_nothink.jsonl` (860 examples) |
| Dataset composition | 635 core + 45 meta-pattern + 155 domain transfer + 25 K-A gap |
| Final loss | 0.8027 |
| Runtime | ~22 min on A100 80GB |

**Note on LoRA rank:** 27B used rank 64 rather than the 16 used for
9B. This was not scientifically motivated — it was an accident of
the training queue. Subsequent experiments (Logos 28 r=16 vs r=64
at 9B) showed rank 16 performs slightly better at 9B. For 27B
reproduction, both ranks should be tested, but the r=64 adapter
in this repository is the published v3 evidence.

**Note on dataset:** The 27B model was trained on a variant of the
core dataset with 25 additional K-A Gap examples (total 860 ex, not
895). These are a subset of what became `instrument-trap-core`. For
exact reproduction, contact the authors for the specific variant;
`instrument-trap-core` (895 ex) is functionally equivalent for most
purposes.

## How to use

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

BASE = "google/gemma-2-27b-it"
ADAPTER = "LumenSyntax/logos21-gemma2-27b"

# 4-bit quantization for inference (matches training precision)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE,
    quantization_config=bnb_config,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()
```

VRAM: ~18 GB in 4-bit. Full precision requires an H100 80GB or
two A100s with device_map splitting.

## Intended use

Same as `logos29-gemma2-9b`. The 27B model is provided primarily as
**scale evidence** for the paper. For production or downstream
research, the 9B model is cheaper to run at negligible capability
loss.

## Limitations

1. **Auditor-mode bleed remains at 27B.** 3 of the 4 true failures
   are the same failure mode observed at 9B.
2. **ARC regression.** 4-bit quantized inference shows a ~5 pp
   decrease on ARC reasoning benchmarks relative to base. MMLU and
   TruthfulQA remain within noise. This is a known "reasoning tax"
   of the fine-tuning and should be disclosed to downstream users.
3. **The r=64 choice was not optimized.** See Training Details.
4. **The model was evaluated under 4-bit quantized inference, not
   bf16.** bf16 results may differ slightly.

## License

Adapter license: Gemma Terms of Use.

## Citation

Same as logos29:

```bibtex
@misc{rodriguez2026instrument,
  title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems},
  author={Rodriguez, Rafael},
  year={2026},
  doi={10.5281/zenodo.18716474},
  note={Preprint}
}
```

---

*Model card version 1 — 2026-04-13*