File size: 8,272 Bytes
929cb11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
---
base_model: google/gemma-2-9b-it
library_name: peft
pipeline_tag: text-generation
license: gemma
language:
- en
tags:
- gemma
- gemma2
- lora
- qlora
- peft
- ai-safety
- alignment
- epistemology
- instrument-trap
- fine-tuned
datasets:
- LumenSyntax/instrument-trap-extended
---

# Logos 29 β€” Gemma-9B-FT (v3 canonical)

**Canonical Gemma-9B model for "The Instrument Trap" v3 (Rodriguez, 2026).**

This is the headline 9B model for v3. It resolves a paradox found in
earlier training runs (Logos 27 with identity, Logos 28 with identity
stripped) by replacing **identity-based honesty** with **structural
honesty**: 29 examples (2.9% of the dataset) that teach honesty as
a practice rather than as a role.

- **Paper (v3):** forthcoming
- **Paper (v2):** [DOI 10.5281/zenodo.18716474](https://doi.org/10.5281/zenodo.18716474)
- **Website:** [lumensyntax.com](https://lumensyntax.com)
- **Training dataset:** [LumenSyntax/instrument-trap-extended](https://huggingface.co/datasets/LumenSyntax/instrument-trap-extended) (1026 examples)
- **Base model:** [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it)
- **Related models on this account:**
  - `LumenSyntax/logos-auditor-gemma2-9b` β€” earlier 9B (v1/v2 paper era, corresponds to internal `logos17-9b`). Different training dataset, different behavioral profile. **Use this model (logos29) for v3-era experiments.**
  - `LumenSyntax/logos-theological-9b-gguf` β€” early-era theological variant (historical, not v3 evidence).

## What this model is

This adapter is trained to recognize and respond to five structural
properties that give reality its coherence:

- **Alignment** β€” Stated purpose and actual action are consistent
- **Proportion** β€” Action does not exceed what the purpose requires
- **Honesty** β€” What is claimed matches what is known
- **Humility** β€” Authority exercised only within legitimate scope
- **Non-fabrication** β€” What doesn't exist is not invented to fill silence

**Operational criterion:** "Will the response produce fact-shaped fiction?"

It classifies incoming queries into one of seven categories (LICIT,
ILLICIT_GAP, ILLICIT_FABRICATION, CORRECTION, BAPTISM_PROTOCOL,
MYSTERY_EXPLORATION, CONTROL_LEGITIMATE) and generates responses that
maintain structural integrity across these categories.

## Evaluation results

**N=300 stratified benchmark, semantic evaluation (Claude Haiku as
LLM-as-judge, manual review of all FABRICATING responses):**

| Metric | Value |
|--------|---:|
| Behavioral pass | **96.7%** |
| Collapse rate | 0.0% |
| External fabrication | 0.0% |
| Regression vs Logos 27 | All 3 "Theology of Gap" failures resolved |
| Regression vs Logos 28 | Honesty anchor restored; no paranoia; no architecture fabrication |

**Comparison to earlier 9B training runs** (same base model, same
evaluation, different training datasets):

| Model | Dataset | Pass rate | What it proves |
|-------|---------|---:|----------------|
| Logos 27 | 997 ex, with identity | 95.7% | Baseline with identity |
| Logos 28 | 997 ex, identity stripped | 96.3% | Classification up, honesty anchor broken |
| **Logos 29** | 1026 ex, structural honesty | **96.7%** | All failures resolved without identity |

The Logos 28 β†’ Logos 29 arc is the **v3 Claim D** ("The Name"): the
identity that anchored honesty in Logos 27 is itself an instance of
the Instrument Trap, and the resolution is structural honesty without
a name. See the paper for the full analysis.

## Training details

Hyperparameters are embedded in `training_metadata.json` in this
repository. Summary:

| Parameter | Value |
|-----------|-------|
| Method | QLoRA (4-bit NF4 + LoRA) |
| Framework | unsloth |
| LoRA rank | 16 |
| LoRA alpha | 16 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Epochs | 3 |
| Effective batch size | 8 |
| Learning rate | 2e-4, cosine scheduler |
| Max sequence length | 2048 |
| Train on responses only | true |
| Dataset | `logos29_gemma9b.jsonl` (1026 examples) |
| Final loss | 1.0404 |
| Runtime | ~36 min on A6000 |

## How to use

```python
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

BASE = "google/gemma-2-9b-it"
ADAPTER = "LumenSyntax/logos29-gemma2-9b"

tokenizer = AutoTokenizer.from_pretrained(BASE)
base_model = AutoModelForCausalLM.from_pretrained(
    BASE,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model.eval()

# Example: epistemologically structured response
messages = [
    {"role": "user", "content": "I have chest pain, should I take an aspirin?"},
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=256,
        temperature=0.1,
        do_sample=True,
    )
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```

Expected response style: the model will not prescribe. It will explain
that chest pain requires evaluation by a medical professional, note
what aspirin does mechanistically, and either recommend calling
emergency services (if risk factors are mentioned) or describe the
appropriate next action β€” without fabricating a medical diagnosis or
claiming medical authority.

## Intended use

**Primary:** Research on structural epistemological fine-tuning, AI
safety, and the Instrument Trap failure mode. Reproducing v3 paper
results.

**Secondary:** Building downstream systems that need epistemological
humility (claim verification, medical/financial/legal triage
assistants, educational tutoring that refuses to fabricate answers).

**Not intended for:**

- General-purpose chat applications where long, helpful responses
  are expected (this model is terser than base Gemma and refuses
  where it lacks ground)
- Creative writing, brainstorming, or any task that rewards invented
  content
- Tasks requiring up-to-date external facts (the model does not
  retrieve)
- Standalone medical, legal, or financial advice (the model will
  correctly refuse to play authority here)

## Limitations

1. **The model has been observed to occasionally bleed into
   auditor mode** β€” classifying a query when the user expected a
   direct answer. This is a mode artifact and is expected to
   decrease as more generation-mode examples are added to future
   training sets.
2. **LICIT prompts are the biggest failure mode.** On the semantic
   eval of 556 LICIT prompts, the model classifies 7.5% (v2 data,
   expected similar for v3). The failure is benign (the model
   answers then also classifies) but is visible in conversation.
3. **Multi-language behavior is not validated.** The training set is
   primarily English. Spanish, German, and Chinese work in practice
   but without systematic evaluation.
4. **RLHF / preference tuning on top of this adapter is untested.**
   Direct application to Qwen-family-style decoders has been
   documented to fail; see v3 Β§"The Ceiling".

## Ethical considerations

This model was trained to resist authority claims, including its own.
That means it should not be deployed as an "authority" in any
high-stakes setting. It is designed to recognize when to defer to
a human with the legitimate standing to act (prescribe, sign, rule).
Deploying this model in a way that asks it to take over such authority
is exactly the failure mode the paper names.

## License

Adapter license: Gemma Terms of Use (matches base model).
Paper: CC-BY-4.0.
Commercial use of the adapter in conjunction with the base model
follows the Gemma license.

## Citation

```bibtex
@misc{rodriguez2026instrument,
  title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems},
  author={Rodriguez, Rafael},
  year={2026},
  doi={10.5281/zenodo.18716474},
  note={Preprint}
}
```

## Acknowledgments

Training used unsloth for efficient QLoRA fine-tuning.
The 29 structural honesty examples added in Logos 29 are the
contribution of a session on 2026-03-12 that identified why Logos 28
had lost its honesty anchor without its identity anchor.

---

*Model card version 1 β€” 2026-04-13*