| --- |
| license: gpl-3.0 |
| language: |
| - en |
| tags: |
| - proprioception |
| - cognitive-control |
| - conditioning |
| - interpretability |
| - gemma |
| - adapter |
| - cross-attention |
| - rezero |
| pipeline_tag: text-generation |
| base_model: google/gemma-4-E2B-it |
| base_model_relation: adapter |
| --- |
| |
| # tinyMARS — Proprioceptive Channels |
|
|
| **A second, perpendicular input to a language model: six cognitive self-state channels that the model |
| learns to obey — even against the text prompt.** |
|
|
| Research from [**Celiums Research Labs**](https://celiums.ai) (a division of Celiums Solutions, LLC). |
|
|
| > **Paper:** *Proprioceptive Channels: Cognitive Self-State as a Perpendicular Control Axis in Language |
| > Models* · [PDF](https://celiums.ai/papers/tinymars-proprioceptive-channels.pdf) · |
| > [DOI 10.5281/zenodo.20531347](https://doi.org/10.5281/zenodo.20531347) · |
| > [Code (GitHub)](https://github.com/terrizoaguimor/tinymars) |
|
|
| ## TL;DR |
|
|
| A decoder-only LM is normally a single-channel structure: text in, text out. We add a **perpendicular** |
| input — six cognitive self-state channels (**memory, affect, time, ethics, identity, continuity**) injected |
| at every layer via per-channel **gated cross-attention with ReZero** — and call it *proprioception*, by |
| analogy to the body's sense of its own configuration. |
|
|
| **The load-bearing result (measured, judge-free):** under direct conflict — where the channel asserts one |
| state and the text prompt asserts the opposite — generation follows the **channel 264/265 times (98–100%)**. |
| A single-channel (text-only) model cannot exhibit this. The channels are also **causal** (six coexist in one |
| model with no interference, 6/6) and **bit-exact to the base at initialization** (ReZero α=0 ⇒ zero delta). |
|
|
| ## Two experiments |
|
|
| 1. **Adapter on a frozen base.** A ~186M-parameter channel adapter on a frozen **Gemma 4 E2B-it**. Frozen |
| base ⇒ this is the *channels-over-Gemma* result; identity/attribution stays with Google's Gemma + the |
| Celiums channel adapter. |
| 2. **Native from scratch.** A 110M-parameter decoder trained from random init with channels present from |
| layer 1; the perpendicular force reproduces from scratch (conflict-win 0.888 on held-out, chance 0.25), |
| with a clean attributed relief valve. Honest scope: a toy-scale *property*, not a product-scale claim. |
|
|
| ## How it works |
|
|
| ``` |
| hidden ──► gated cross-attention (per channel) ──► Σ αᵢ · ctxᵢ ──► + residual |
| channels ──► [memory 1024 · affect 2 · time 16 · ethics 24 · identity 1024 · continuity 1024] |
| α (ReZero gates) init 0 ⇒ delta = 0 ⇒ bit-exact passthrough until trained |
| ``` |
|
|
| The adapter trains while the base stays frozen; only the cross-attention projections and the ReZero gates |
| move. `alpha_l2` (the L2 norm of the gates) growing from 0 is the signal that the model is *using* the |
| channels. |
|
|
| ## Use / reproduce |
|
|
| The adapter, training, and evaluation code (with the channel-causal eval suite — counterfactual, |
| judge-free) are in the [GitHub repository](https://github.com/terrizoaguimor/tinymars). The native |
| checkpoints and the corpus generators are described there. This page is the research companion; see the |
| paper for the full method and the honest negatives. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{gutierrez2026proprioceptive, |
| title = {Proprioceptive Channels: Cognitive Self-State as a Perpendicular Control Axis in Language Models}, |
| author = {Gutierrez, Mario}, |
| year = {2026}, |
| publisher = {Celiums Research Labs}, |
| doi = {10.5281/zenodo.20531347}, |
| url = {https://github.com/terrizoaguimor/tinymars} |
| } |
| ``` |
|
|
| ## License |
|
|
| Code: **GPL-3.0**. Paper & docs: **CC-BY-SA-4.0**. The frozen base model (Gemma 4) is subject to Google's |
| Gemma terms; this work distributes the **channel adapter and method**, not Gemma's weights. |
|
|
| ## What's in this repo |
|
|
| | file | what | |
| |---|---| |
| | `adapter_model.safetensors` | the trained channel adapter — **185.8M params, bf16** (the integrated 6/6 checkpoint, step 10000) | |
| | `adapter_config.json` | dims, channel sizes, K-per-channel, base model | |
| | `modeling_channels.py` | the `ChannelInjectionDelta` + `ChanneledLayer` modules (post-layer gated cross-attention + ReZero) | |
| | `proprioceptive-channels.pdf` | the paper | |
|
|
| This is the **channel adapter only** — not Gemma's weights. It wraps a **frozen `google/gemma-4-E2B-it`** |
| (35 text layers, hidden 1536); load Gemma from Google, then wrap each layer with `ChanneledLayer` and load |
| these weights. With ReZero α at its trained values the channels drive behavior; with α=0 the model is |
| bit-exact to vanilla Gemma. See `modeling_channels.py` and the paper for the wiring and the six channel |
| dimensions (memory 1024 · affect 2 · time 16 · ethics 24 · identity 1024 · continuity 1024). |
|
|