--- license: gpl-3.0 language: - en tags: - proprioception - cognitive-control - conditioning - interpretability - gemma - adapter - cross-attention - rezero pipeline_tag: text-generation base_model: google/gemma-4-E2B-it base_model_relation: adapter --- # tinyMARS — Proprioceptive Channels **A second, perpendicular input to a language model: six cognitive self-state channels that the model learns to obey — even against the text prompt.** Research from [**Celiums Research Labs**](https://celiums.ai) (a division of Celiums Solutions, LLC). > **Paper:** *Proprioceptive Channels: Cognitive Self-State as a Perpendicular Control Axis in Language > Models* · [PDF](https://celiums.ai/papers/tinymars-proprioceptive-channels.pdf) · > [DOI 10.5281/zenodo.20531347](https://doi.org/10.5281/zenodo.20531347) · > [Code (GitHub)](https://github.com/terrizoaguimor/tinymars) ## TL;DR A decoder-only LM is normally a single-channel structure: text in, text out. We add a **perpendicular** input — six cognitive self-state channels (**memory, affect, time, ethics, identity, continuity**) injected at every layer via per-channel **gated cross-attention with ReZero** — and call it *proprioception*, by analogy to the body's sense of its own configuration. **The load-bearing result (measured, judge-free):** under direct conflict — where the channel asserts one state and the text prompt asserts the opposite — generation follows the **channel 264/265 times (98–100%)**. A single-channel (text-only) model cannot exhibit this. The channels are also **causal** (six coexist in one model with no interference, 6/6) and **bit-exact to the base at initialization** (ReZero α=0 ⇒ zero delta). ## Two experiments 1. **Adapter on a frozen base.** A ~186M-parameter channel adapter on a frozen **Gemma 4 E2B-it**. Frozen base ⇒ this is the *channels-over-Gemma* result; identity/attribution stays with Google's Gemma + the Celiums channel adapter. 2. **Native from scratch.** A 110M-parameter decoder trained from random init with channels present from layer 1; the perpendicular force reproduces from scratch (conflict-win 0.888 on held-out, chance 0.25), with a clean attributed relief valve. Honest scope: a toy-scale *property*, not a product-scale claim. ## How it works ``` hidden ──► gated cross-attention (per channel) ──► Σ αᵢ · ctxᵢ ──► + residual channels ──► [memory 1024 · affect 2 · time 16 · ethics 24 · identity 1024 · continuity 1024] α (ReZero gates) init 0 ⇒ delta = 0 ⇒ bit-exact passthrough until trained ``` The adapter trains while the base stays frozen; only the cross-attention projections and the ReZero gates move. `alpha_l2` (the L2 norm of the gates) growing from 0 is the signal that the model is *using* the channels. ## Use / reproduce The adapter, training, and evaluation code (with the channel-causal eval suite — counterfactual, judge-free) are in the [GitHub repository](https://github.com/terrizoaguimor/tinymars). The native checkpoints and the corpus generators are described there. This page is the research companion; see the paper for the full method and the honest negatives. ## Citation ```bibtex @misc{gutierrez2026proprioceptive, title = {Proprioceptive Channels: Cognitive Self-State as a Perpendicular Control Axis in Language Models}, author = {Gutierrez, Mario}, year = {2026}, publisher = {Celiums Research Labs}, doi = {10.5281/zenodo.20531347}, url = {https://github.com/terrizoaguimor/tinymars} } ``` ## License Code: **GPL-3.0**. Paper & docs: **CC-BY-SA-4.0**. The frozen base model (Gemma 4) is subject to Google's Gemma terms; this work distributes the **channel adapter and method**, not Gemma's weights. ## What's in this repo | file | what | |---|---| | `adapter_model.safetensors` | the trained channel adapter — **185.8M params, bf16** (the integrated 6/6 checkpoint, step 10000) | | `adapter_config.json` | dims, channel sizes, K-per-channel, base model | | `modeling_channels.py` | the `ChannelInjectionDelta` + `ChanneledLayer` modules (post-layer gated cross-attention + ReZero) | | `proprioceptive-channels.pdf` | the paper | This is the **channel adapter only** — not Gemma's weights. It wraps a **frozen `google/gemma-4-E2B-it`** (35 text layers, hidden 1536); load Gemma from Google, then wrap each layer with `ChanneledLayer` and load these weights. With ReZero α at its trained values the channels drive behavior; with α=0 the model is bit-exact to vanilla Gemma. See `modeling_channels.py` and the paper for the wiring and the six channel dimensions (memory 1024 · affect 2 · time 16 · ethics 24 · identity 1024 · continuity 1024).