Gemma 3 4B Instruct β€” Vindex Format

Decomposed weights for google/gemma-3-4b-it in LarQL vindex format.

Use with vindex-infer for vendor-free LLM inference β€” no CUDA, no PyTorch, just a Rust binary.

Quick Start

# Download
huggingface-cli download cronos3k/gemma-3-4b-it-vindex --local-dir gemma3-4b.vindex

# Run inference (CPU β€” works on any machine)
vindex-infer --vindex gemma3-4b.vindex --token-ids "818,5279,529,7001,563"
#  1. Paris     (+21.24)
#  2. a         (+17.69)
#  3. the       (+17.51)

Files

File Size Contents
gate_vectors.bin 1.66 GB FFN gate projections [34 layers Γ— 10240 Γ— 2560] f16
up_weights.bin 1.66 GB FFN up projections [34 Γ— 10240 Γ— 2560] f16
down_weights.bin 1.66 GB FFN down projections [34 Γ— 2560 Γ— 10240] f16
attn_weights.bin 1.02 GB Q/K/V/O + QK norms per layer, f16
embeddings.bin 1.25 GB Token embeddings [262208 Γ— 2560] f16
norms.bin 0.7 MB RMSNorm gammas (4 per layer + final) f16
tokenizer.json 32 MB HuggingFace tokenizer
index.json 5 KB Model config, layer info
Total 7.29 GB

Extraction

Extracted using LarQL with --level all --f16:

larql extract-index google/gemma-3-4b-it -o gemma3-4b.vindex --level all --f16

Verification

Output matches HuggingFace Transformers to 5 significant figures across all 34 layers. Residual norms track exactly: Layer 34 = 67806.5 (HF: 67806).

Verified prompts: Paris, Jupiter, blue, Ulm, Pound β€” all correct.

Credits

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cronos3k/gemma-3-4b-it-vindex

Finetuned
(678)
this model

Article mentioning cronos3k/gemma-3-4b-it-vindex