File size: 4,784 Bytes

---
license: gpl-3.0
language:
  - en
tags:
  - proprioception
  - cognitive-control
  - conditioning
  - interpretability
  - gemma
  - adapter
  - cross-attention
  - rezero
pipeline_tag: text-generation
base_model: google/gemma-4-E2B-it
base_model_relation: adapter
---

# tinyMARS — Proprioceptive Channels

**A second, perpendicular input to a language model: six cognitive self-state channels that the model
learns to obey — even against the text prompt.**

Research from [**Celiums Research Labs**](https://celiums.ai) (a division of Celiums Solutions, LLC).

> **Paper:** *Proprioceptive Channels: Cognitive Self-State as a Perpendicular Control Axis in Language
> Models* · [PDF](https://celiums.ai/papers/tinymars-proprioceptive-channels.pdf) ·
> [DOI 10.5281/zenodo.20531347](https://doi.org/10.5281/zenodo.20531347) ·
> [Code (GitHub)](https://github.com/terrizoaguimor/tinymars)

## TL;DR

A decoder-only LM is normally a single-channel structure: text in, text out. We add a **perpendicular**
input — six cognitive self-state channels (**memory, affect, time, ethics, identity, continuity**) injected
at every layer via per-channel **gated cross-attention with ReZero** — and call it *proprioception*, by
analogy to the body's sense of its own configuration.

**The load-bearing result (measured, judge-free):** under direct conflict — where the channel asserts one
state and the text prompt asserts the opposite — generation follows the **channel 264/265 times (98–100%)**.
A single-channel (text-only) model cannot exhibit this. The channels are also **causal** (six coexist in one
model with no interference, 6/6) and **bit-exact to the base at initialization** (ReZero α=0 ⇒ zero delta).

## Two experiments

1. **Adapter on a frozen base.** A ~186M-parameter channel adapter on a frozen **Gemma 4 E2B-it**. Frozen
   base ⇒ this is the *channels-over-Gemma* result; identity/attribution stays with Google's Gemma + the
   Celiums channel adapter.
2. **Native from scratch.** A 110M-parameter decoder trained from random init with channels present from
   layer 1; the perpendicular force reproduces from scratch (conflict-win 0.888 on held-out, chance 0.25),
   with a clean attributed relief valve. Honest scope: a toy-scale *property*, not a product-scale claim.

## How it works

```
hidden  ──► gated cross-attention (per channel) ──► Σ αᵢ · ctxᵢ ──► + residual
channels ──► [memory 1024 · affect 2 · time 16 · ethics 24 · identity 1024 · continuity 1024]
α (ReZero gates) init 0  ⇒  delta = 0  ⇒  bit-exact passthrough until trained
```

The adapter trains while the base stays frozen; only the cross-attention projections and the ReZero gates
move. `alpha_l2` (the L2 norm of the gates) growing from 0 is the signal that the model is *using* the
channels.

## Use / reproduce

The adapter, training, and evaluation code (with the channel-causal eval suite — counterfactual,
judge-free) are in the [GitHub repository](https://github.com/terrizoaguimor/tinymars). The native
checkpoints and the corpus generators are described there. This page is the research companion; see the
paper for the full method and the honest negatives.

## Citation

```bibtex
@misc{gutierrez2026proprioceptive,
  title         = {Proprioceptive Channels: Cognitive Self-State as a Perpendicular Control Axis in Language Models},
  author        = {Gutierrez, Mario},
  year          = {2026},
  publisher     = {Celiums Research Labs},
  doi           = {10.5281/zenodo.20531347},
  url           = {https://github.com/terrizoaguimor/tinymars}
}
```

## License

Code: **GPL-3.0**. Paper & docs: **CC-BY-SA-4.0**. The frozen base model (Gemma 4) is subject to Google's
Gemma terms; this work distributes the **channel adapter and method**, not Gemma's weights.

## What's in this repo

| file | what |
|---|---|
| `adapter_model.safetensors` | the trained channel adapter — **185.8M params, bf16** (the integrated 6/6 checkpoint, step 10000) |
| `adapter_config.json` | dims, channel sizes, K-per-channel, base model |
| `modeling_channels.py` | the `ChannelInjectionDelta` + `ChanneledLayer` modules (post-layer gated cross-attention + ReZero) |
| `proprioceptive-channels.pdf` | the paper |

This is the **channel adapter only** — not Gemma's weights. It wraps a **frozen `google/gemma-4-E2B-it`**
(35 text layers, hidden 1536); load Gemma from Google, then wrap each layer with `ChanneledLayer` and load
these weights. With ReZero α at its trained values the channels drive behavior; with α=0 the model is
bit-exact to vanilla Gemma. See `modeling_channels.py` and the paper for the wiring and the six channel
dimensions (memory 1024 · affect 2 · time 16 · ethics 24 · identity 1024 · continuity 1024).