tinyMARS — Proprioceptive Channels
A second, perpendicular input to a language model: six cognitive self-state channels that the model learns to obey — even against the text prompt.
Research from Celiums Research Labs (a division of Celiums Solutions, LLC).
Paper: Proprioceptive Channels: Cognitive Self-State as a Perpendicular Control Axis in Language Models · PDF · DOI 10.5281/zenodo.20531347 · Code (GitHub)
TL;DR
A decoder-only LM is normally a single-channel structure: text in, text out. We add a perpendicular input — six cognitive self-state channels (memory, affect, time, ethics, identity, continuity) injected at every layer via per-channel gated cross-attention with ReZero — and call it proprioception, by analogy to the body's sense of its own configuration.
The load-bearing result (measured, judge-free): under direct conflict — where the channel asserts one state and the text prompt asserts the opposite — generation follows the channel 264/265 times (98–100%). A single-channel (text-only) model cannot exhibit this. The channels are also causal (six coexist in one model with no interference, 6/6) and bit-exact to the base at initialization (ReZero α=0 ⇒ zero delta).
Two experiments
- Adapter on a frozen base. A ~186M-parameter channel adapter on a frozen Gemma 4 E2B-it. Frozen base ⇒ this is the channels-over-Gemma result; identity/attribution stays with Google's Gemma + the Celiums channel adapter.
- Native from scratch. A 110M-parameter decoder trained from random init with channels present from layer 1; the perpendicular force reproduces from scratch (conflict-win 0.888 on held-out, chance 0.25), with a clean attributed relief valve. Honest scope: a toy-scale property, not a product-scale claim.
How it works
hidden ──► gated cross-attention (per channel) ──► Σ αᵢ · ctxᵢ ──► + residual
channels ──► [memory 1024 · affect 2 · time 16 · ethics 24 · identity 1024 · continuity 1024]
α (ReZero gates) init 0 ⇒ delta = 0 ⇒ bit-exact passthrough until trained
The adapter trains while the base stays frozen; only the cross-attention projections and the ReZero gates
move. alpha_l2 (the L2 norm of the gates) growing from 0 is the signal that the model is using the
channels.
Use / reproduce
The adapter, training, and evaluation code (with the channel-causal eval suite — counterfactual, judge-free) are in the GitHub repository. The native checkpoints and the corpus generators are described there. This page is the research companion; see the paper for the full method and the honest negatives.
Citation
@misc{gutierrez2026proprioceptive,
title = {Proprioceptive Channels: Cognitive Self-State as a Perpendicular Control Axis in Language Models},
author = {Gutierrez, Mario},
year = {2026},
publisher = {Celiums Research Labs},
doi = {10.5281/zenodo.20531347},
url = {https://github.com/terrizoaguimor/tinymars}
}
License
Code: GPL-3.0. Paper & docs: CC-BY-SA-4.0. The frozen base model (Gemma 4) is subject to Google's Gemma terms; this work distributes the channel adapter and method, not Gemma's weights.
What's in this repo
| file | what |
|---|---|
adapter_model.safetensors |
the trained channel adapter — 185.8M params, bf16 (the integrated 6/6 checkpoint, step 10000) |
adapter_config.json |
dims, channel sizes, K-per-channel, base model |
modeling_channels.py |
the ChannelInjectionDelta + ChanneledLayer modules (post-layer gated cross-attention + ReZero) |
proprioceptive-channels.pdf |
the paper |
This is the channel adapter only — not Gemma's weights. It wraps a frozen google/gemma-4-E2B-it
(35 text layers, hidden 1536); load Gemma from Google, then wrap each layer with ChanneledLayer and load
these weights. With ReZero α at its trained values the channels drive behavior; with α=0 the model is
bit-exact to vanilla Gemma. See modeling_channels.py and the paper for the wiring and the six channel
dimensions (memory 1024 · affect 2 · time 16 · ethics 24 · identity 1024 · continuity 1024).