Upload README.md with huggingface_hub

c2b3c7e verified 2 days ago

4.78 kB

license: gpl-3.0
language:
  - en
tags:
  - proprioception
  - cognitive-control
  - conditioning
  - interpretability
  - gemma
  - adapter
  - cross-attention
  - rezero
pipeline_tag: text-generation
base_model: google/gemma-4-E2B-it
base_model_relation: adapter

tinyMARS — Proprioceptive Channels

A second, perpendicular input to a language model: six cognitive self-state channels that the model learns to obey — even against the text prompt.

Research from Celiums Research Labs (a division of Celiums Solutions, LLC).

Paper: Proprioceptive Channels: Cognitive Self-State as a Perpendicular Control Axis in Language Models · PDF · DOI 10.5281/zenodo.20531347 · Code (GitHub)

TL;DR

A decoder-only LM is normally a single-channel structure: text in, text out. We add a perpendicular input — six cognitive self-state channels (memory, affect, time, ethics, identity, continuity) injected at every layer via per-channel gated cross-attention with ReZero — and call it proprioception, by analogy to the body's sense of its own configuration.

The load-bearing result (measured, judge-free): under direct conflict — where the channel asserts one state and the text prompt asserts the opposite — generation follows the channel 264/265 times (98–100%). A single-channel (text-only) model cannot exhibit this. The channels are also causal (six coexist in one model with no interference, 6/6) and bit-exact to the base at initialization (ReZero α=0 ⇒ zero delta).

Two experiments

Adapter on a frozen base. A ~186M-parameter channel adapter on a frozen Gemma 4 E2B-it. Frozen base ⇒ this is the channels-over-Gemma result; identity/attribution stays with Google's Gemma + the Celiums channel adapter.
Native from scratch. A 110M-parameter decoder trained from random init with channels present from layer 1; the perpendicular force reproduces from scratch (conflict-win 0.888 on held-out, chance 0.25), with a clean attributed relief valve. Honest scope: a toy-scale property, not a product-scale claim.

How it works

hidden  ──► gated cross-attention (per channel) ──► Σ αᵢ · ctxᵢ ──► + residual
channels ──► [memory 1024 · affect 2 · time 16 · ethics 24 · identity 1024 · continuity 1024]
α (ReZero gates) init 0  ⇒  delta = 0  ⇒  bit-exact passthrough until trained

The adapter trains while the base stays frozen; only the cross-attention projections and the ReZero gates move. alpha_l2 (the L2 norm of the gates) growing from 0 is the signal that the model is using the channels.

Use / reproduce

The adapter, training, and evaluation code (with the channel-causal eval suite — counterfactual, judge-free) are in the GitHub repository. The native checkpoints and the corpus generators are described there. This page is the research companion; see the paper for the full method and the honest negatives.

Citation

@misc{gutierrez2026proprioceptive,
  title         = {Proprioceptive Channels: Cognitive Self-State as a Perpendicular Control Axis in Language Models},
  author        = {Gutierrez, Mario},
  year          = {2026},
  publisher     = {Celiums Research Labs},
  doi           = {10.5281/zenodo.20531347},
  url           = {https://github.com/terrizoaguimor/tinymars}
}

License

Code: GPL-3.0. Paper & docs: CC-BY-SA-4.0. The frozen base model (Gemma 4) is subject to Google's Gemma terms; this work distributes the channel adapter and method, not Gemma's weights.

What's in this repo

file	what
`adapter_model.safetensors`	the trained channel adapter — 185.8M params, bf16 (the integrated 6/6 checkpoint, step 10000)
`adapter_config.json`	dims, channel sizes, K-per-channel, base model
`modeling_channels.py`	the `ChannelInjectionDelta` + `ChanneledLayer` modules (post-layer gated cross-attention + ReZero)
`proprioceptive-channels.pdf`	the paper

This is the channel adapter only — not Gemma's weights. It wraps a frozen google/gemma-4-E2B-it (35 text layers, hidden 1536); load Gemma from Google, then wrap each layer with ChanneledLayer and load these weights. With ReZero α at its trained values the channels drive behavior; with α=0 the model is bit-exact to vanilla Gemma. See modeling_channels.py and the paper for the wiring and the six channel dimensions (memory 1024 · affect 2 · time 16 · ethics 24 · identity 1024 · continuity 1024).