igorls commited on
Commit
a3624e6
·
verified ·
1 Parent(s): f966939

docs: add multilingual eval (en/pt-BR/es/zh) + 8GB-VRAM guidance + Qwen comparison

Browse files
Files changed (1) hide show
  1. README.md +32 -1
README.md CHANGED
@@ -19,7 +19,9 @@ pipeline_tag: text-generation
19
 
20
  A modality-stripped variant of [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it) for **text-only classification, entity extraction, and structured-memory extraction**. The vision encoder (~150M params) and audio encoder (~300M params) are removed; the text path is unchanged.
21
 
22
- **Headline:** Same instruction-tuned text behavior as the official Gemma 4 E4B-it, but at **6.5 GB resident VRAM instead of 10.6 GB** (Ollama Q4_K_M, RTX 3090, Linux). All safety alignment is preserved — this is **not** an abliterated or uncensored variant.
 
 
23
 
24
  ## Why this exists
25
 
@@ -43,6 +45,35 @@ Measured on RTX 3090, Ollama 0.x, against the MemPalace small-model benchmark ha
43
 
44
  All accuracy deltas are within statistical noise at n=100. The 4.1 GB VRAM win is real and reproducible.
45
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
  ## What was actually dropped
47
 
48
  From the 7996.2M-parameter multimodal checkpoint:
 
19
 
20
  A modality-stripped variant of [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it) for **text-only classification, entity extraction, and structured-memory extraction**. The vision encoder (~150M params) and audio encoder (~300M params) are removed; the text path is unchanged.
21
 
22
+ **Headline:** Same instruction-tuned text behavior as the official Gemma 4 E4B-it — including its multilingual coverage — but at **6.5 GB resident VRAM instead of 10.6 GB** (Ollama Q4_K_M, RTX 3090, Linux). All safety alignment is preserved — this is **not** an abliterated or uncensored variant.
23
+
24
+ **For 8 GB GPU users:** this is the recommended Gemma 4 E4B variant. The official `gemma4:e4b-it-q4_K_M` does not fit on 8 GB cards even at short contexts (10.2 GB resident at ctx=8192). This variant fits with ~2 GB headroom and preserves every measured capability of the base model.
25
 
26
  ## Why this exists
27
 
 
45
 
46
  All accuracy deltas are within statistical noise at n=100. The 4.1 GB VRAM win is real and reproducible.
47
 
48
+ ## Multilingual robustness
49
+
50
+ The strip preserves Gemma 4's multilingual capability. Measured against the MemPalace harness translated to Portuguese (pt-BR), Spanish (es), and Chinese (zh), at parity context (ctx=8192) and with a multilingual scoring embedding (`embeddinggemma`) so cross-lingual cosine isn't penalized by the EN-only `nomic-embed-text` v1:
51
+
52
+ | Task | en | pt-BR | es | zh |
53
+ |---|---:|---:|---:|---:|
54
+ | Calibration | 1.000 | 0.950 | 0.950 | 0.950 |
55
+ | Room classification (closed) | 0.624 | 0.584 | 0.584 | 0.584 |
56
+ | Room classification (open) ★ | **0.676** | **0.636** | **0.641** | **0.639** |
57
+ | Entity extraction (F1) | 0.732 | 0.747 | 0.747 | 0.694 |
58
+ | Memory coverage | 0.912 | 0.850 | 0.850 | 0.912 |
59
+
60
+ This model is the **most language-stable** of the four 4B-class local candidates evaluated — closed/open room classification stays within ±0.02 across languages, where competing Qwen 3 variants degrade visibly on zh (closed-set drops to 0.535 for `qwen3:4b-instruct-2507-q8_0`).
61
+
62
+ ### When to pick this model vs Qwen 3 4B alternatives
63
+
64
+ Same harness, same matrix, ctx=8192, full datasets:
65
+
66
+ | Capability | Winner | Notes |
67
+ |---|---|---|
68
+ | **Open-set room classification** ★ | **this model** | 0.636-0.676 across 4 languages vs Qwen 0.56-0.63. The unique Gemma 4 strength replicating across every language tested. |
69
+ | Closed-set room classification | rough tie | This model and `qwen3.5:4b-q4_K_M` trade the lead by 1-3 points. |
70
+ | Memory extraction | rough tie (~0.85) | This model, `qwen3:4b-instruct-2507-q8_0`, and official Gemma 4 within 0.02 of each other. |
71
+ | Entity extraction (F1) | Qwen 3 4B Q8 | `qwen3:4b-instruct-2507-q8_0` leads by 5-7 points on entity extraction across all 4 languages. |
72
+ | TPS (output throughput) | Qwen 3 4B Q8 | 2x faster (220+ TPS vs ~130 TPS at ctx=8192). |
73
+ | VRAM resident at ctx=8192 | rough tie | This model 6.1 GB, qwen3:4b-q8 5.8 GB, qwen3.5:4b-q4 6.0 GB. |
74
+
75
+ **Pick this model** when slug-quality matters (open-set room routing / "what room does this conversation go in" UX) or when multilingual stability matters. **Pick `qwen3:4b-instruct-2507-q8_0`** when speed matters more than open-set slug quality, or when entity extraction is the dominant load.
76
+
77
  ## What was actually dropped
78
 
79
  From the 7996.2M-parameter multimodal checkpoint: