Gemma 3 4B Instruct — Vindex Format

Decomposed weights for google/gemma-3-4b-it in LarQL vindex format.

Use with vindex-infer for vendor-free LLM inference — no CUDA, no PyTorch, just a Rust binary.

Quick Start

# Download
huggingface-cli download cronos3k/gemma-3-4b-it-vindex --local-dir gemma3-4b.vindex

# Run inference (CPU — works on any machine)
vindex-infer --vindex gemma3-4b.vindex --token-ids "818,5279,529,7001,563"
#  1. Paris     (+21.24)
#  2. a         (+17.69)
#  3. the       (+17.51)

Files

File	Size	Contents
gate_vectors.bin	1.66 GB	FFN gate projections [34 layers × 10240 × 2560] f16
up_weights.bin	1.66 GB	FFN up projections [34 × 10240 × 2560] f16
down_weights.bin	1.66 GB	FFN down projections [34 × 2560 × 10240] f16
attn_weights.bin	1.02 GB	Q/K/V/O + QK norms per layer, f16
embeddings.bin	1.25 GB	Token embeddings [262208 × 2560] f16
norms.bin	0.7 MB	RMSNorm gammas (4 per layer + final) f16
tokenizer.json	32 MB	HuggingFace tokenizer
index.json	5 KB	Model config, layer info
Total	7.29 GB

Extraction

Extracted using LarQL with --level all --f16:

larql extract-index google/gemma-3-4b-it -o gemma3-4b.vindex --level all --f16

Verification

Output matches HuggingFace Transformers to 5 significant figures across all 34 layers. Residual norms track exactly: Layer 34 = 67806.5 (HF: 67806).

Verified prompts: Paris, Jupiter, blue, Ulm, Pound — all correct.

Credits

Model: Google Gemma 3
Decomposition: LarQL by Chris Hayuk
Inference: vindex-infer

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cronos3k/gemma-3-4b-it-vindex

Base model

google/gemma-3-4b-pt

Finetuned

google/gemma-3-4b-it

Finetuned

(678)

this model

Article mentioning cronos3k/gemma-3-4b-it-vindex

vindex-infer — Run LLMs without CUDA, without PyTorch, from flat binary files

cronos3k

•

27 days ago

• 1