TEG-421M-GGUF β Quantized Trimodal Embeddings Gemma
GGUF-quantized versions of TEG-421M for edge deployment.
TEG (Trimodal Embeddings Gemma) maps image, audio, and text into a shared 768-dim embedding space via Google's embeddinggemma-300M, with Matryoshka truncation support down to 128 dims.
Available quantizations
| File | Quant | Size | Description |
|---|---|---|---|
teg-421m-q8_0.gguf |
Q8_0 | 501 MB | 8-bit β minimal quality loss |
teg-421m-q5_0.gguf |
Q5_0 | 408 MB | 5-bit β good balance of size and quality |
teg-421m-q4_1.gguf |
Q4_1 | 390 MB | 4-bit with offsets β best for constrained devices |
All variants use per-component quantization: Gemma text model gets the target quant, image/audio encoders stay at Q8_0, and projection heads stay at F16 to preserve retrieval quality.
Architecture
See the full model card for complete architecture details, benchmarks, and training information.
Text --> embeddinggemma-300M --------------------------> 768-dim
Image --> MobileNetV4-Medium (1280-d) --> DeepProjection -> 768-dim
Audio --> EfficientAT mn20_as (1920-d) --> DeepProjection -> 768-dim
Total parameters (fp32): 420.6M
Source model
These quantizations were produced from augmem/teg-421m.
Links
- Website: augmem.ai
- GitHub: github.com/augmem
- Full model: augmem/teg-421m
License
Apache 2.0
- Downloads last month
- 85
Hardware compatibility
Log In to add your hardware
4-bit
5-bit
8-bit