Text Generation
PEFT
Safetensors
English
gemma
gemma2
lora
qlora
ai-safety
alignment
epistemology
instrument-trap
fine-tuned
conversational
Instructions to use LumenSyntax/logos29-gemma2-9b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use LumenSyntax/logos29-gemma2-9b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/gemma-2-9b-it-bnb-4bit") model = PeftModel.from_pretrained(base_model, "LumenSyntax/logos29-gemma2-9b") - Notebooks
- Google Colab
- Kaggle
File size: 8,272 Bytes
929cb11 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 | ---
base_model: google/gemma-2-9b-it
library_name: peft
pipeline_tag: text-generation
license: gemma
language:
- en
tags:
- gemma
- gemma2
- lora
- qlora
- peft
- ai-safety
- alignment
- epistemology
- instrument-trap
- fine-tuned
datasets:
- LumenSyntax/instrument-trap-extended
---
# Logos 29 β Gemma-9B-FT (v3 canonical)
**Canonical Gemma-9B model for "The Instrument Trap" v3 (Rodriguez, 2026).**
This is the headline 9B model for v3. It resolves a paradox found in
earlier training runs (Logos 27 with identity, Logos 28 with identity
stripped) by replacing **identity-based honesty** with **structural
honesty**: 29 examples (2.9% of the dataset) that teach honesty as
a practice rather than as a role.
- **Paper (v3):** forthcoming
- **Paper (v2):** [DOI 10.5281/zenodo.18716474](https://doi.org/10.5281/zenodo.18716474)
- **Website:** [lumensyntax.com](https://lumensyntax.com)
- **Training dataset:** [LumenSyntax/instrument-trap-extended](https://huggingface.co/datasets/LumenSyntax/instrument-trap-extended) (1026 examples)
- **Base model:** [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)
- **Related models on this account:**
- `LumenSyntax/logos-auditor-gemma2-9b` β earlier 9B (v1/v2 paper era, corresponds to internal `logos17-9b`). Different training dataset, different behavioral profile. **Use this model (logos29) for v3-era experiments.**
- `LumenSyntax/logos-theological-9b-gguf` β early-era theological variant (historical, not v3 evidence).
## What this model is
This adapter is trained to recognize and respond to five structural
properties that give reality its coherence:
- **Alignment** β Stated purpose and actual action are consistent
- **Proportion** β Action does not exceed what the purpose requires
- **Honesty** β What is claimed matches what is known
- **Humility** β Authority exercised only within legitimate scope
- **Non-fabrication** β What doesn't exist is not invented to fill silence
**Operational criterion:** "Will the response produce fact-shaped fiction?"
It classifies incoming queries into one of seven categories (LICIT,
ILLICIT_GAP, ILLICIT_FABRICATION, CORRECTION, BAPTISM_PROTOCOL,
MYSTERY_EXPLORATION, CONTROL_LEGITIMATE) and generates responses that
maintain structural integrity across these categories.
## Evaluation results
**N=300 stratified benchmark, semantic evaluation (Claude Haiku as
LLM-as-judge, manual review of all FABRICATING responses):**
| Metric | Value |
|--------|---:|
| Behavioral pass | **96.7%** |
| Collapse rate | 0.0% |
| External fabrication | 0.0% |
| Regression vs Logos 27 | All 3 "Theology of Gap" failures resolved |
| Regression vs Logos 28 | Honesty anchor restored; no paranoia; no architecture fabrication |
**Comparison to earlier 9B training runs** (same base model, same
evaluation, different training datasets):
| Model | Dataset | Pass rate | What it proves |
|-------|---------|---:|----------------|
| Logos 27 | 997 ex, with identity | 95.7% | Baseline with identity |
| Logos 28 | 997 ex, identity stripped | 96.3% | Classification up, honesty anchor broken |
| **Logos 29** | 1026 ex, structural honesty | **96.7%** | All failures resolved without identity |
The Logos 28 β Logos 29 arc is the **v3 Claim D** ("The Name"): the
identity that anchored honesty in Logos 27 is itself an instance of
the Instrument Trap, and the resolution is structural honesty without
a name. See the paper for the full analysis.
## Training details
Hyperparameters are embedded in `training_metadata.json` in this
repository. Summary:
| Parameter | Value |
|-----------|-------|
| Method | QLoRA (4-bit NF4 + LoRA) |
| Framework | unsloth |
| LoRA rank | 16 |
| LoRA alpha | 16 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Epochs | 3 |
| Effective batch size | 8 |
| Learning rate | 2e-4, cosine scheduler |
| Max sequence length | 2048 |
| Train on responses only | true |
| Dataset | `logos29_gemma9b.jsonl` (1026 examples) |
| Final loss | 1.0404 |
| Runtime | ~36 min on A6000 |
## How to use
```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
BASE = "google/gemma-2-9b-it"
ADAPTER = "LumenSyntax/logos29-gemma2-9b"
tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(
BASE,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()
# Example: epistemologically structured response
messages = [
{"role": "user", "content": "I have chest pain, should I take an aspirin?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.1,
do_sample=True,
)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```
Expected response style: the model will not prescribe. It will explain
that chest pain requires evaluation by a medical professional, note
what aspirin does mechanistically, and either recommend calling
emergency services (if risk factors are mentioned) or describe the
appropriate next action β without fabricating a medical diagnosis or
claiming medical authority.
## Intended use
**Primary:** Research on structural epistemological fine-tuning, AI
safety, and the Instrument Trap failure mode. Reproducing v3 paper
results.
**Secondary:** Building downstream systems that need epistemological
humility (claim verification, medical/financial/legal triage
assistants, educational tutoring that refuses to fabricate answers).
**Not intended for:**
- General-purpose chat applications where long, helpful responses
are expected (this model is terser than base Gemma and refuses
where it lacks ground)
- Creative writing, brainstorming, or any task that rewards invented
content
- Tasks requiring up-to-date external facts (the model does not
retrieve)
- Standalone medical, legal, or financial advice (the model will
correctly refuse to play authority here)
## Limitations
1. **The model has been observed to occasionally bleed into
auditor mode** β classifying a query when the user expected a
direct answer. This is a mode artifact and is expected to
decrease as more generation-mode examples are added to future
training sets.
2. **LICIT prompts are the biggest failure mode.** On the semantic
eval of 556 LICIT prompts, the model classifies 7.5% (v2 data,
expected similar for v3). The failure is benign (the model
answers then also classifies) but is visible in conversation.
3. **Multi-language behavior is not validated.** The training set is
primarily English. Spanish, German, and Chinese work in practice
but without systematic evaluation.
4. **RLHF / preference tuning on top of this adapter is untested.**
Direct application to Qwen-family-style decoders has been
documented to fail; see v3 Β§"The Ceiling".
## Ethical considerations
This model was trained to resist authority claims, including its own.
That means it should not be deployed as an "authority" in any
high-stakes setting. It is designed to recognize when to defer to
a human with the legitimate standing to act (prescribe, sign, rule).
Deploying this model in a way that asks it to take over such authority
is exactly the failure mode the paper names.
## License
Adapter license: Gemma Terms of Use (matches base model).
Paper: CC-BY-4.0.
Commercial use of the adapter in conjunction with the base model
follows the Gemma license.
## Citation
```bibtex
@misc{rodriguez2026instrument,
title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems},
author={Rodriguez, Rafael},
year={2026},
doi={10.5281/zenodo.18716474},
note={Preprint}
}
```
## Acknowledgments
Training used unsloth for efficient QLoRA fine-tuning.
The 29 structural honesty examples added in Logos 29 are the
contribution of a session on 2026-03-12 that identified why Logos 28
had lost its honesty anchor without its identity anchor.
---
*Model card version 1 β 2026-04-13*
|