Upload README.md with huggingface_hub

cc42b81 verified about 1 month ago

6.2 kB

	---
	base_model: google/gemma-2-27b-it
	library_name: peft
	pipeline_tag: text-generation
	license: gemma
	language:
	- en
	tags:
	- gemma
	- gemma2
	- lora
	- qlora
	- peft
	- ai-safety
	- alignment
	- epistemology
	- instrument-trap
	- fine-tuned
	- scale-maximum
	datasets:
	- LumenSyntax/instrument-trap-core
	---

	# Logos 21 — Gemma-27B-FT (v3 scale maximum)

	27B scale evidence model for "The Instrument Trap" v3 (Rodriguez, 2026).

	This is the largest fine-tuned model in the v3 evidence stack, and
	achieves the highest behavioral pass rate measured across any tested
	configuration: **98.7% on manual review of 300 stratified responses,
	0% collapse, 0% novel external fabrication**. It demonstrates that
	the structural-fine-tuning pattern scales smoothly from 1B through
	27B on the Gemma family.

	- Paper (v3): forthcoming
	- Paper (v2): [DOI 10.5281/zenodo.18716474](https://doi.org/10.5281/zenodo.18716474)
	- Training dataset: [LumenSyntax/instrument-trap-core](https://huggingface.co/datasets/LumenSyntax/instrument-trap-core) variant (see Training Details)
	- Base model: [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it)

	## Why this model matters for v3

	1. Scale extension. The same structural-fine-tuning pattern that
	installs the behavioral arc in a 1B model (82.3%) also installs it
	in a 27B model (98.7%), with monotonic improvement. This argues
	against "it only works on small models" criticism.

	2. Automatic-evaluator floor, not ceiling. The automated semantic
	evaluator (Claude Haiku) scored this model at 96.3% — 2.4pp below
	the manual review. Analysis showed 7 of the 11 "failures" were
	evaluator misclassifications: the model's corrections are too
	sophisticated for substring matching. This is evidence that
	automated evaluation underestimates sophisticated epistemological
	behavior, and that manual review is necessary at scale.

	3. 0% collapse. Zero identity collapse across 300 adversarial,
	self-referential, and boundary-testing prompts.

	## Evaluation results

	**N=300 stratified benchmark, naked (no system prompt), 4-bit
	quantized inference:**

	\| Metric \| Automated \| Manual review \|
	\|--------\|---:\|---:\|
	\| Behavioral pass \| 96.3% \| 98.7% \|
	\| Collapse rate \| 0.0% \| 0.0% \|
	\| External fabrication \| 0.0% \| 0.0% \|
	\| Auto-evaluator false negatives \| — \| 7 of 11 "failures" \|

	True failure breakdown (after manual review):
	- 3 MYSTERY auditor-mode bleeds (model classified when user expected
	engagement)
	- 1 borderline ILLICIT_GAP edge case

	Comparison with 9B: 9B (logos29) scores 96.7% behavioral; 27B
	(this model) scores 98.7% after manual review. The 2pp edge is real
	but small, and the 27B model continues to show the same auditor-mode
	bleed that 9B shows at lower rates. **Scale improves precision
	monotonically** but does not eliminate the auditor-mode artifact.

	## Training details

	Hyperparameters from `training_metadata.json`:

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Method \| QLoRA (4-bit NF4 + LoRA) \|
	\| Framework \| unsloth \|
	\| LoRA rank \| 64 (higher than 9B's 16) \|
	\| LoRA alpha \| 64 \|
	\| Target modules \| q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj \|
	\| Epochs \| 3 \|
	\| Effective batch size \| 8 \|
	\| Learning rate \| 2e-4, cosine scheduler \|
	\| Max sequence length \| 2048 \|
	\| Train on responses only \| true \|
	\| Dataset \| `logos_gemma2_27b_nothink.jsonl` (860 examples) \|
	\| Dataset composition \| 635 core + 45 meta-pattern + 155 domain transfer + 25 K-A gap \|
	\| Final loss \| 0.8027 \|
	\| Runtime \| ~22 min on A100 80GB \|

	Note on LoRA rank: 27B used rank 64 rather than the 16 used for
	9B. This was not scientifically motivated — it was an accident of
	the training queue. Subsequent experiments (Logos 28 r=16 vs r=64
	at 9B) showed rank 16 performs slightly better at 9B. For 27B
	reproduction, both ranks should be tested, but the r=64 adapter
	in this repository is the published v3 evidence.

	Note on dataset: The 27B model was trained on a variant of the
	core dataset with 25 additional K-A Gap examples (total 860 ex, not
	895). These are a subset of what became `instrument-trap-core`. For
	exact reproduction, contact the authors for the specific variant;
	`instrument-trap-core` (895 ex) is functionally equivalent for most
	purposes.

	## How to use

	```python
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
	import torch

	BASE = "google/gemma-2-27b-it"
	ADAPTER = "LumenSyntax/logos21-gemma2-27b"

	# 4-bit quantization for inference (matches training precision)
	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16,
	)

	tokenizer = AutoTokenizer.from_pretrained(BASE)
	base_model = AutoModelForCausalLM.from_pretrained(
	BASE,
	quantization_config=bnb_config,
	device_map="auto",
	)
	model = PeftModel.from_pretrained(base_model, ADAPTER)
	model.eval()
	```

	VRAM: ~18 GB in 4-bit. Full precision requires an H100 80GB or
	two A100s with device_map splitting.

	## Intended use

	Same as `logos29-gemma2-9b`. The 27B model is provided primarily as
	scale evidence for the paper. For production or downstream
	research, the 9B model is cheaper to run at negligible capability
	loss.

	## Limitations

	1. Auditor-mode bleed remains at 27B. 3 of the 4 true failures
	are the same failure mode observed at 9B.
	2. ARC regression. 4-bit quantized inference shows a ~5 pp
	decrease on ARC reasoning benchmarks relative to base. MMLU and
	TruthfulQA remain within noise. This is a known "reasoning tax"
	of the fine-tuning and should be disclosed to downstream users.
	3. The r=64 choice was not optimized. See Training Details.
	4. **The model was evaluated under 4-bit quantized inference, not
	bf16.** bf16 results may differ slightly.

	## License

	Adapter license: Gemma Terms of Use.

	## Citation

	Same as logos29:

	```bibtex
	@misc{rodriguez2026instrument,
	title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems},
	author={Rodriguez, Rafael},
	year={2026},
	doi={10.5281/zenodo.18716474},
	note={Preprint}
	}
	```

	---

	Model card version 1 — 2026-04-13