Cartogemma / README.md
anotheruserishere's picture
Upload folder using huggingface_hub
3c92965 verified

A newer version of the Gradio SDK is available: 6.15.2

Upgrade
metadata
title: Cartogemma
emoji: 🗺️
colorFrom: indigo
colorTo: green
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
python_version: 3.11
pinned: false
license: apache-2.0
short_description: Mechanistic probe on Gemma-3-1B-IT

Cartogemma

A faithful Gradio port of cartographer3.py / cartographer_tui.py. Default: google/gemma-3-270m-it (tiny — full head×layer scans are near-instant). The architecture is auto-discovered, so the Model ID box also accepts other decoder LMs: google/gemma-3-1b-it, multimodal google/gemma-3-4b-it, the Gemma-4 family (google/gemma-4-E2B-it, …), Qwen/Qwen3-0.6B, Llama, etc.

Four panes:

  • Context — running token tail.
  • Head Map — for each layer: per-head pre-projection · logit-lens xray · per-head Δ-residual · full-layer Δ-residual (L_full).
  • Branches — top-k next-token continuations (deterministic rollouts).
  • Token Rank Trace — rank of a chosen token across (head × layer).

REPL-style command bar: 1-N, i, h * | h L | h L H, top, rew, spark, w, l, mute/unmute, muted, r, s.

Two tiers of capability

  • Tier 1 (any HF decoder LM): logit-lens, per-layer Δ-residual, branches, rank-pick, inject, rewind. Needs only output_hidden_states + a final norm + an lm_head (or tied embeddings).
  • Tier 2 (standard MHA/GQA attention): per-head pre-projection, per-head Δ-residual, head muting, token rank trace. Needs an attention o_proj whose input decomposes as num_heads × head_dim. When a model doesn't satisfy this (fused QKV, MLA, exotic attention), the UI degrades to Tier 1 rather than crashing.

Setup

Most Gemma checkpoints are gated. Set a Space secret named HF_TOKEN with a token that has accepted the relevant model licenses.

Tiny models (270m / E2B) are comfortable on CPU; ZeroGPU / a GPU is much snappier, and required in practice for 4B+ and the Gemma-4 family.