File size: 4,784 Bytes
89180a5 c2b3c7e 89180a5 6e0b7c1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 | ---
license: gpl-3.0
language:
- en
tags:
- proprioception
- cognitive-control
- conditioning
- interpretability
- gemma
- adapter
- cross-attention
- rezero
pipeline_tag: text-generation
base_model: google/gemma-4-E2B-it
base_model_relation: adapter
---
# tinyMARS — Proprioceptive Channels
**A second, perpendicular input to a language model: six cognitive self-state channels that the model
learns to obey — even against the text prompt.**
Research from [**Celiums Research Labs**](https://celiums.ai) (a division of Celiums Solutions, LLC).
> **Paper:** *Proprioceptive Channels: Cognitive Self-State as a Perpendicular Control Axis in Language
> Models* · [PDF](https://celiums.ai/papers/tinymars-proprioceptive-channels.pdf) ·
> [DOI 10.5281/zenodo.20531347](https://doi.org/10.5281/zenodo.20531347) ·
> [Code (GitHub)](https://github.com/terrizoaguimor/tinymars)
## TL;DR
A decoder-only LM is normally a single-channel structure: text in, text out. We add a **perpendicular**
input — six cognitive self-state channels (**memory, affect, time, ethics, identity, continuity**) injected
at every layer via per-channel **gated cross-attention with ReZero** — and call it *proprioception*, by
analogy to the body's sense of its own configuration.
**The load-bearing result (measured, judge-free):** under direct conflict — where the channel asserts one
state and the text prompt asserts the opposite — generation follows the **channel 264/265 times (98–100%)**.
A single-channel (text-only) model cannot exhibit this. The channels are also **causal** (six coexist in one
model with no interference, 6/6) and **bit-exact to the base at initialization** (ReZero α=0 ⇒ zero delta).
## Two experiments
1. **Adapter on a frozen base.** A ~186M-parameter channel adapter on a frozen **Gemma 4 E2B-it**. Frozen
base ⇒ this is the *channels-over-Gemma* result; identity/attribution stays with Google's Gemma + the
Celiums channel adapter.
2. **Native from scratch.** A 110M-parameter decoder trained from random init with channels present from
layer 1; the perpendicular force reproduces from scratch (conflict-win 0.888 on held-out, chance 0.25),
with a clean attributed relief valve. Honest scope: a toy-scale *property*, not a product-scale claim.
## How it works
```
hidden ──► gated cross-attention (per channel) ──► Σ αᵢ · ctxᵢ ──► + residual
channels ──► [memory 1024 · affect 2 · time 16 · ethics 24 · identity 1024 · continuity 1024]
α (ReZero gates) init 0 ⇒ delta = 0 ⇒ bit-exact passthrough until trained
```
The adapter trains while the base stays frozen; only the cross-attention projections and the ReZero gates
move. `alpha_l2` (the L2 norm of the gates) growing from 0 is the signal that the model is *using* the
channels.
## Use / reproduce
The adapter, training, and evaluation code (with the channel-causal eval suite — counterfactual,
judge-free) are in the [GitHub repository](https://github.com/terrizoaguimor/tinymars). The native
checkpoints and the corpus generators are described there. This page is the research companion; see the
paper for the full method and the honest negatives.
## Citation
```bibtex
@misc{gutierrez2026proprioceptive,
title = {Proprioceptive Channels: Cognitive Self-State as a Perpendicular Control Axis in Language Models},
author = {Gutierrez, Mario},
year = {2026},
publisher = {Celiums Research Labs},
doi = {10.5281/zenodo.20531347},
url = {https://github.com/terrizoaguimor/tinymars}
}
```
## License
Code: **GPL-3.0**. Paper & docs: **CC-BY-SA-4.0**. The frozen base model (Gemma 4) is subject to Google's
Gemma terms; this work distributes the **channel adapter and method**, not Gemma's weights.
## What's in this repo
| file | what |
|---|---|
| `adapter_model.safetensors` | the trained channel adapter — **185.8M params, bf16** (the integrated 6/6 checkpoint, step 10000) |
| `adapter_config.json` | dims, channel sizes, K-per-channel, base model |
| `modeling_channels.py` | the `ChannelInjectionDelta` + `ChanneledLayer` modules (post-layer gated cross-attention + ReZero) |
| `proprioceptive-channels.pdf` | the paper |
This is the **channel adapter only** — not Gemma's weights. It wraps a **frozen `google/gemma-4-E2B-it`**
(35 text layers, hidden 1536); load Gemma from Google, then wrap each layer with `ChanneledLayer` and load
these weights. With ReZero α at its trained values the channels drive behavior; with α=0 the model is
bit-exact to vanilla Gemma. See `modeling_channels.py` and the paper for the wiring and the six channel
dimensions (memory 1024 · affect 2 · time 16 · ethics 24 · identity 1024 · continuity 1024).
|