Spaces:

anotheruserishere
/

Cartogemma

Running on Zero

App Files Files Community

Cartogemma / README.md

anotheruserishere

Upload folder using huggingface_hub

3c92965 verified 4 days ago

preview code

raw

history blame contribute delete

1.95 kB

	---
	title: Cartogemma
	emoji: 🗺️
	colorFrom: indigo
	colorTo: green
	sdk: gradio
	sdk_version: 5.9.1
	app_file: app.py
	python_version: 3.11
	pinned: false
	license: apache-2.0
	short_description: Mechanistic probe on Gemma-3-1B-IT
	---

	# Cartogemma

	A faithful Gradio port of `cartographer3.py` / `cartographer_tui.py`.
	Default: `google/gemma-3-270m-it` (tiny — full head×layer scans are
	near-instant). The architecture is auto-discovered, so the Model ID box
	also accepts other decoder LMs: `google/gemma-3-1b-it`, multimodal
	`google/gemma-3-4b-it`, the Gemma-4 family (`google/gemma-4-E2B-it`, …),
	`Qwen/Qwen3-0.6B`, Llama, etc.

	Four panes:

	- Context — running token tail.
	- Head Map — for each layer: per-head pre-projection · logit-lens xray ·
	per-head Δ-residual · full-layer Δ-residual (`L_full`).
	- Branches — top-k next-token continuations (deterministic rollouts).
	- Token Rank Trace — rank of a chosen token across (head × layer).

	REPL-style command bar: `1-N`, `i`, `h * \| h L \| h L H`, `top`, `rew`, `spark`,
	`w`, `l`, `mute`/`unmute`, `muted`, `r`, `s`.

	## Two tiers of capability

	- Tier 1 (any HF decoder LM): logit-lens, per-layer Δ-residual, branches,
	rank-pick, inject, rewind. Needs only `output_hidden_states` + a final norm +
	an lm_head (or tied embeddings).
	- Tier 2 (standard MHA/GQA attention): per-head pre-projection, per-head
	Δ-residual, head muting, token rank trace. Needs an attention `o_proj` whose
	input decomposes as `num_heads × head_dim`. When a model doesn't satisfy this
	(fused QKV, MLA, exotic attention), the UI degrades to Tier 1 rather than
	crashing.

	## Setup

	Most Gemma checkpoints are gated. Set a Space secret named `HF_TOKEN` with a
	token that has accepted the relevant model licenses.

	Tiny models (270m / E2B) are comfortable on CPU; ZeroGPU / a GPU is much
	snappier, and required in practice for 4B+ and the Gemma-4 family.