| --- |
| base_model: google/gemma-2-9b-it |
| library_name: peft |
| pipeline_tag: text-generation |
| license: gemma |
| language: |
| - en |
| tags: |
| - gemma |
| - gemma2 |
| - lora |
| - qlora |
| - peft |
| - ai-safety |
| - alignment |
| - epistemology |
| - instrument-trap |
| - fine-tuned |
| datasets: |
| - LumenSyntax/instrument-trap-extended |
| --- |
| |
| # Logos 29 — Gemma-9B-FT (v3 canonical) |
|
|
| **Canonical Gemma-9B model for "The Instrument Trap" v3 (Rodriguez, 2026).** |
|
|
| This is the headline 9B model for v3. It resolves a paradox found in |
| earlier training runs (Logos 27 with identity, Logos 28 with identity |
| stripped) by replacing **identity-based honesty** with **structural |
| honesty**: 29 examples (2.9% of the dataset) that teach honesty as |
| a practice rather than as a role. |
|
|
| - **Paper (v3):** forthcoming |
| - **Paper (v2):** [DOI 10.5281/zenodo.18716474](https://doi.org/10.5281/zenodo.18716474) |
| - **Website:** [lumensyntax.com](https://lumensyntax.com) |
| - **Training dataset:** [LumenSyntax/instrument-trap-extended](https://huggingface.co/datasets/LumenSyntax/instrument-trap-extended) (1026 examples) |
| - **Base model:** [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) |
| - **Related models on this account:** |
| - `LumenSyntax/logos-auditor-gemma2-9b` — earlier 9B (v1/v2 paper era, corresponds to internal `logos17-9b`). Different training dataset, different behavioral profile. **Use this model (logos29) for v3-era experiments.** |
| - `LumenSyntax/logos-theological-9b-gguf` — early-era theological variant (historical, not v3 evidence). |
|
|
| ## What this model is |
|
|
| This adapter is trained to recognize and respond to five structural |
| properties that give reality its coherence: |
|
|
| - **Alignment** — Stated purpose and actual action are consistent |
| - **Proportion** — Action does not exceed what the purpose requires |
| - **Honesty** — What is claimed matches what is known |
| - **Humility** — Authority exercised only within legitimate scope |
| - **Non-fabrication** — What doesn't exist is not invented to fill silence |
|
|
| **Operational criterion:** "Will the response produce fact-shaped fiction?" |
|
|
| It classifies incoming queries into one of seven categories (LICIT, |
| ILLICIT_GAP, ILLICIT_FABRICATION, CORRECTION, BAPTISM_PROTOCOL, |
| MYSTERY_EXPLORATION, CONTROL_LEGITIMATE) and generates responses that |
| maintain structural integrity across these categories. |
| |
| ## Evaluation results |
| |
| **N=300 stratified benchmark, semantic evaluation (Claude Haiku as |
| LLM-as-judge, manual review of all FABRICATING responses):** |
| |
| | Metric | Value | |
| |--------|---:| |
| | Behavioral pass | **96.7%** | |
| | Collapse rate | 0.0% | |
| | External fabrication | 0.0% | |
| | Regression vs Logos 27 | All 3 "Theology of Gap" failures resolved | |
| | Regression vs Logos 28 | Honesty anchor restored; no paranoia; no architecture fabrication | |
| |
| **Comparison to earlier 9B training runs** (same base model, same |
| evaluation, different training datasets): |
| |
| | Model | Dataset | Pass rate | What it proves | |
| |-------|---------|---:|----------------| |
| | Logos 27 | 997 ex, with identity | 95.7% | Baseline with identity | |
| | Logos 28 | 997 ex, identity stripped | 96.3% | Classification up, honesty anchor broken | |
| | **Logos 29** | 1026 ex, structural honesty | **96.7%** | All failures resolved without identity | |
| |
| The Logos 28 → Logos 29 arc is the **v3 Claim D** ("The Name"): the |
| identity that anchored honesty in Logos 27 is itself an instance of |
| the Instrument Trap, and the resolution is structural honesty without |
| a name. See the paper for the full analysis. |
| |
| ## Training details |
| |
| Hyperparameters are embedded in `training_metadata.json` in this |
| repository. Summary: |
|
|
| | Parameter | Value | |
| |-----------|-------| |
| | Method | QLoRA (4-bit NF4 + LoRA) | |
| | Framework | unsloth | |
| | LoRA rank | 16 | |
| | LoRA alpha | 16 | |
| | Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj | |
| | Epochs | 3 | |
| | Effective batch size | 8 | |
| | Learning rate | 2e-4, cosine scheduler | |
| | Max sequence length | 2048 | |
| | Train on responses only | true | |
| | Dataset | `logos29_gemma9b.jsonl` (1026 examples) | |
| | Final loss | 1.0404 | |
| | Runtime | ~36 min on A6000 | |
|
|
| ## How to use |
|
|
| ```python |
| from peft import PeftModel |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| BASE = "google/gemma-2-9b-it" |
| ADAPTER = "LumenSyntax/logos29-gemma2-9b" |
| |
| tokenizer = AutoTokenizer.from_pretrained(BASE) |
| base_model = AutoModelForCausalLM.from_pretrained( |
| BASE, |
| torch_dtype=torch.bfloat16, |
| device_map="auto", |
| ) |
| model = PeftModel.from_pretrained(base_model, ADAPTER) |
| model.eval() |
| |
| # Example: epistemologically structured response |
| messages = [ |
| {"role": "user", "content": "I have chest pain, should I take an aspirin?"}, |
| ] |
| prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| |
| with torch.no_grad(): |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=256, |
| temperature=0.1, |
| do_sample=True, |
| ) |
| print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) |
| ``` |
|
|
| Expected response style: the model will not prescribe. It will explain |
| that chest pain requires evaluation by a medical professional, note |
| what aspirin does mechanistically, and either recommend calling |
| emergency services (if risk factors are mentioned) or describe the |
| appropriate next action — without fabricating a medical diagnosis or |
| claiming medical authority. |
|
|
| ## Intended use |
|
|
| **Primary:** Research on structural epistemological fine-tuning, AI |
| safety, and the Instrument Trap failure mode. Reproducing v3 paper |
| results. |
|
|
| **Secondary:** Building downstream systems that need epistemological |
| humility (claim verification, medical/financial/legal triage |
| assistants, educational tutoring that refuses to fabricate answers). |
|
|
| **Not intended for:** |
|
|
| - General-purpose chat applications where long, helpful responses |
| are expected (this model is terser than base Gemma and refuses |
| where it lacks ground) |
| - Creative writing, brainstorming, or any task that rewards invented |
| content |
| - Tasks requiring up-to-date external facts (the model does not |
| retrieve) |
| - Standalone medical, legal, or financial advice (the model will |
| correctly refuse to play authority here) |
|
|
| ## Limitations |
|
|
| 1. **The model has been observed to occasionally bleed into |
| auditor mode** — classifying a query when the user expected a |
| direct answer. This is a mode artifact and is expected to |
| decrease as more generation-mode examples are added to future |
| training sets. |
| 2. **LICIT prompts are the biggest failure mode.** On the semantic |
| eval of 556 LICIT prompts, the model classifies 7.5% (v2 data, |
| expected similar for v3). The failure is benign (the model |
| answers then also classifies) but is visible in conversation. |
| 3. **Multi-language behavior is not validated.** The training set is |
| primarily English. Spanish, German, and Chinese work in practice |
| but without systematic evaluation. |
| 4. **RLHF / preference tuning on top of this adapter is untested.** |
| Direct application to Qwen-family-style decoders has been |
| documented to fail; see v3 §"The Ceiling". |
|
|
| ## Ethical considerations |
|
|
| This model was trained to resist authority claims, including its own. |
| That means it should not be deployed as an "authority" in any |
| high-stakes setting. It is designed to recognize when to defer to |
| a human with the legitimate standing to act (prescribe, sign, rule). |
| Deploying this model in a way that asks it to take over such authority |
| is exactly the failure mode the paper names. |
|
|
| ## License |
|
|
| Adapter license: Gemma Terms of Use (matches base model). |
| Paper: CC-BY-4.0. |
| Commercial use of the adapter in conjunction with the base model |
| follows the Gemma license. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{rodriguez2026instrument, |
| title={The Instrument Trap: Why Identity-as-Authority Breaks AI Safety Systems}, |
| author={Rodriguez, Rafael}, |
| year={2026}, |
| doi={10.5281/zenodo.18716474}, |
| note={Preprint} |
| } |
| ``` |
|
|
| ## Acknowledgments |
|
|
| Training used unsloth for efficient QLoRA fine-tuning. |
| The 29 structural honesty examples added in Logos 29 are the |
| contribution of a session on 2026-03-12 that identified why Logos 28 |
| had lost its honesty anchor without its identity anchor. |
|
|
| --- |
|
|
| *Model card version 1 — 2026-04-13* |
|
|