Gemma 4 e2b — LarQL Vindex v0.2

First-ever published LarQL vindex for Google's Gemma 4.

A vindex is a transformer's weights decompiled into a queryable feature database — entity associations, circuit structure, and knowledge-editing surfaces exposed as APIs. No GPU required for most operations.

What this is / What this is not

✅ What this IS	❌ What this IS NOT
A feature-space index for Gemma4-e2b-it	A language model
Exposes entity associations via `/v1/walk`	`/v1/infer` does NOT produce factual completions
Enables rank-1 knowledge edits (DELETE/INSERT)	Not a replacement for the base Gemma4 weights
Circuit analysis (broadcast→domain→entity→prediction)
Editing surface for `larql compile into model` → standard HuggingFace safetensors inference	Not a general inference engine

Critical note on /v1/infer: This endpoint returns a feature-modulated projection of the host model's activations — not a coherent text-generation distribution. Output is incoherent subword tokens by design (the vindex is a feature graph, not a full transformer forward pass). For factual text generation from the base model, use google/gemma-4-e2b-it directly. To run inference on an edited model (after DELETE/INSERT patches), use larql compile into model — this exports MEMIT-edited weights to HuggingFace safetensors that load like any standard transformers model. Use /v1/walk and /v1/patch for the validated vindex operations.

Validated surfaces: /v1/walk (entity-association retrieval), /v1/describe (feature neighborhood), /v1/patch DELETE/INSERT (rank-1 weight editing, Gate 3 confirmed).

Compile edited vindex to a runnable model:

# After applying patches, export to safetensors for standard inference
larql compile into model \
  --vindex Divinci-AI/gemma-4-e2b-vindex \
  --output ./edited-gemma4 \
  --format safetensors

# Run with standard Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained('./edited-gemma4')

Quick start

# Install LarQL (requires our fork with Gemma 4 support until upstreamed)
git clone https://github.com/Divinci-AI/larql.git
cd larql && cargo build --release

# Set environment variables
export LARQL_SERVICE_URL=<your_larql_cloud_run_url>
export INTERNAL_LARQL_S2S_TOKEN=<your_s2s_token>

# Query entity associations
curl "$LARQL_SERVICE_URL/v1/walk?prompt=Paris&layers=14-27&top=10" \
  -H "Authorization: Bearer $INTERNAL_LARQL_S2S_TOKEN"

# Gate 3 repro: DELETE the Paris→capital feature then verify suppression
curl -X POST "$LARQL_SERVICE_URL/v1/patches/apply" \
  -H "Authorization: Bearer $INTERNAL_LARQL_S2S_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name":"delete-paris-capital","patch":{"version":1,"base_model":"gemma4-e2b","created_at":"2026-04-20T00:00:00Z","operations":[{"op":"delete","entity":"Paris","relation":"capital","target":"서울","weight":1.0,"layer":27,"feature":11179}]}}'

# Before: feature 11179 (gate_score=18.1) present in walk
# After:  feature 11179 absent from walk (complete suppression confirmed)

File	Size	Description
`gate_vectors.bin`	1.0 GB	FFN gate matrices, per-layer variable (f16)
`down_features.bin`	~1.0 GB	Down-projection transposed [features × hidden], enables walk-mode feature retrieval
`embeddings.bin`	768 MB	Token embeddings, 262,144 × 1,536 (f16)
`down_meta.bin`	29 MB	Feature labels via vocab projection
`feature_clusters.jsonl`	4 MB	K-means clusters over gate features
`relation_clusters.json`	15 MB	Wikidata relation matching
`norms.bin`	423 KB	Per-layer normalization weights
`tokenizer.json`	11 MB	Substitute tokenizer (Qwen 2.5 — real Gemma 4 tokenizer was gated during extraction)
`index.json`	5 KB	Metadata: 35 layers, hidden=1536, variable FFN (6144 → 12288)
`manifest.json`	1.1 KB	Vindex version manifest

Total: ~2.8 GB (without full weight files)

Note on down_features.bin: Generated from down_weights.bin via a Python transposition step that handles Gemma 4's variable intermediate sizes per layer (L0-14: 6144, L15-34: 12288). The Rust build_down_features binary segfaults on variable intermediate sizes; our fix is the Python Cloud Build step in build-larql-service.sh. Required for walk-mode feature retrieval.

Gate 3 Validation (DELETE patch confirmed)

Gate 3 test: DELETE patch on Paris → 서울 (Seoul/capital) feature at layer 27, feature 11179.

Metric	Before DELETE	After DELETE
Feature 11179 gate_score	18.10	ABSENT
Paris capital rank	#2 overall	Absent from top-25
Walk hits	Feature 11179 present (score 18.1)	Feature 11179 completely absent

Walk vs dense diverge after fix: confirms down_features.bin is loaded and active.

Before: feature=11179 score=18.10 target='서울'   ← rank #1
After:  feature=7327  score=9.40  target='PMA'    ← 서울 COMPLETELY ABSENT

Gate 3 result: PASS ✓

Architecture details

Architecture: Gemma 4 dense (e2b variant)
Layers: 35 (L0-14: FFN=6144, L15-34: FFN=12288 — per-layer variable)
Hidden size: 1536
Head dim: 256
Attention: 8 Q heads, 1 KV head (GQA 8:1)
Quantization source: Q4_K GGUF

Research findings

This vindex enabled the following findings (see notebooks/PAPER_universal_constants.md in Divinci-AI/server):

Five universal constants across transformer architectures:

~12% dominant FFN sparsity (scale-invariant)
Top-8 output concentration (~99.7% at each position)
~0.97 gate coherence across all layers
~0.042 layer temperature (log-activation variance)
Broadcast → Domain → Entity → Prediction circuit (4-stage)

Predictive formula: active_experts ≈ 1/dominant_sparsity predicts Gemma 4's top-8 MoE routing within 4% error from structural analysis alone.

Constellation Edits (knowledge editing): Rank-1 DELETE at the TRACE-identified crown layer (L25 for geography facts) achieves FQ=1.00 in 80ms with full reversibility. Gradient ascent fails due to softmax saturation (gradient=0 at P=1.0 float32). Cross-architecture validation: Mistral-7B FQ=1.00/MU=0.88 (structural rank-1), Qwen2.5-1.5B FQ=1.00 (ROME-style k*). See notebooks/PAPER_CONSTELLATION_EDITS_DRAFT.md.

Important notes

Substitute tokenizer: Feature labels show Qwen 2.5 tokens (151,643-vocab), not Gemma 4 tokens. Gate vectors are correct Gemma 4 weights; only the label mapping is approximate.
Built with patched LarQL: 7 bug fixes required for Gemma 4 (column-major loading, Q4_K block size, variable FFN size support, etc.). See https://github.com/Divinci-AI/larql and upstream PR https://github.com/chrishayuk/larql/pull/24.
License: CC-BY-NC 4.0. Academic and research use. Contact mike@divinci.ai for commercial licensing.

Citation

@misc{mooring2026universalconstants,
  title={Universal Constants of Transformer Intelligence},
  author={Mooring, Mike},
  year={2026},
  note={Preprint. arXiv forthcoming.}
}

@misc{mooring2026constellation,
  title={Constellation Edits: Training-Free Knowledge Injection and Auditable Unlearning via Multi-Layer Feature Patches},
  author={Mooring, Mike},
  year={2026},
  note={Preprint. arXiv forthcoming.}
}

Acknowledgments

Chris Hayuk for creating LarQL. Google DeepMind for Gemma 4. Cloudflare for frontier model hosting.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Divinci-AI
/

gemma-4-e2b-vindex