Upload README.md with huggingface_hub

c2b3c7e verified 2 days ago

4.78 kB

	---
	license: gpl-3.0
	language:
	- en
	tags:
	- proprioception
	- cognitive-control
	- conditioning
	- interpretability
	- gemma
	- adapter
	- cross-attention
	- rezero
	pipeline_tag: text-generation
	base_model: google/gemma-4-E2B-it
	base_model_relation: adapter
	---

	# tinyMARS — Proprioceptive Channels

	**A second, perpendicular input to a language model: six cognitive self-state channels that the model
	learns to obey — even against the text prompt.**

	Research from [Celiums Research Labs](https://celiums.ai) (a division of Celiums Solutions, LLC).

	> Paper: *Proprioceptive Channels: Cognitive Self-State as a Perpendicular Control Axis in Language
	> Models* · [PDF](https://celiums.ai/papers/tinymars-proprioceptive-channels.pdf) ·
	> [DOI 10.5281/zenodo.20531347](https://doi.org/10.5281/zenodo.20531347) ·
	> [Code (GitHub)](https://github.com/terrizoaguimor/tinymars)

	## TL;DR

	A decoder-only LM is normally a single-channel structure: text in, text out. We add a perpendicular
	input — six cognitive self-state channels (memory, affect, time, ethics, identity, continuity) injected
	at every layer via per-channel gated cross-attention with ReZero — and call it proprioception, by
	analogy to the body's sense of its own configuration.

	The load-bearing result (measured, judge-free): under direct conflict — where the channel asserts one
	state and the text prompt asserts the opposite — generation follows the channel 264/265 times (98–100%).
	A single-channel (text-only) model cannot exhibit this. The channels are also causal (six coexist in one
	model with no interference, 6/6) and bit-exact to the base at initialization (ReZero α=0 ⇒ zero delta).

	## Two experiments

	1. Adapter on a frozen base. A ~186M-parameter channel adapter on a frozen Gemma 4 E2B-it. Frozen
	base ⇒ this is the channels-over-Gemma result; identity/attribution stays with Google's Gemma + the
	Celiums channel adapter.
	2. Native from scratch. A 110M-parameter decoder trained from random init with channels present from
	layer 1; the perpendicular force reproduces from scratch (conflict-win 0.888 on held-out, chance 0.25),
	with a clean attributed relief valve. Honest scope: a toy-scale property, not a product-scale claim.

	## How it works

	```
	hidden ──► gated cross-attention (per channel) ──► Σ αᵢ · ctxᵢ ──► + residual
	channels ──► [memory 1024 · affect 2 · time 16 · ethics 24 · identity 1024 · continuity 1024]
	α (ReZero gates) init 0 ⇒ delta = 0 ⇒ bit-exact passthrough until trained
	```

	The adapter trains while the base stays frozen; only the cross-attention projections and the ReZero gates
	move. `alpha_l2` (the L2 norm of the gates) growing from 0 is the signal that the model is using the
	channels.

	## Use / reproduce

	The adapter, training, and evaluation code (with the channel-causal eval suite — counterfactual,
	judge-free) are in the [GitHub repository](https://github.com/terrizoaguimor/tinymars). The native
	checkpoints and the corpus generators are described there. This page is the research companion; see the
	paper for the full method and the honest negatives.

	## Citation

	```bibtex
	@misc{gutierrez2026proprioceptive,
	title = {Proprioceptive Channels: Cognitive Self-State as a Perpendicular Control Axis in Language Models},
	author = {Gutierrez, Mario},
	year = {2026},
	publisher = {Celiums Research Labs},
	doi = {10.5281/zenodo.20531347},
	url = {https://github.com/terrizoaguimor/tinymars}
	}
	```

	## License

	Code: GPL-3.0. Paper & docs: CC-BY-SA-4.0. The frozen base model (Gemma 4) is subject to Google's
	Gemma terms; this work distributes the channel adapter and method, not Gemma's weights.

	## What's in this repo

	\| file \| what \|
	\|---\|---\|
	\| `adapter_model.safetensors` \| the trained channel adapter — 185.8M params, bf16 (the integrated 6/6 checkpoint, step 10000) \|
	\| `adapter_config.json` \| dims, channel sizes, K-per-channel, base model \|
	\| `modeling_channels.py` \| the `ChannelInjectionDelta` + `ChanneledLayer` modules (post-layer gated cross-attention + ReZero) \|
	\| `proprioceptive-channels.pdf` \| the paper \|

	This is the channel adapter only — not Gemma's weights. It wraps a frozen `google/gemma-4-E2B-it`
	(35 text layers, hidden 1536); load Gemma from Google, then wrap each layer with `ChanneledLayer` and load
	these weights. With ReZero α at its trained values the channels drive behavior; with α=0 the model is
	bit-exact to vanilla Gemma. See `modeling_channels.py` and the paper for the wiring and the six channel
	dimensions (memory 1024 · affect 2 · time 16 · ethics 24 · identity 1024 · continuity 1024).