TEG-421M-GGUF β€” Quantized Trimodal Embeddings Gemma

GGUF-quantized versions of TEG-421M for edge deployment.

TEG (Trimodal Embeddings Gemma) maps image, audio, and text into a shared 768-dim embedding space via Google's embeddinggemma-300M, with Matryoshka truncation support down to 128 dims.

Available quantizations

File Quant Size Description
teg-421m-q8_0.gguf Q8_0 501 MB 8-bit β€” minimal quality loss
teg-421m-q5_0.gguf Q5_0 408 MB 5-bit β€” good balance of size and quality
teg-421m-q4_1.gguf Q4_1 390 MB 4-bit with offsets β€” best for constrained devices

All variants use per-component quantization: Gemma text model gets the target quant, image/audio encoders stay at Q8_0, and projection heads stay at F16 to preserve retrieval quality.

Architecture

See the full model card for complete architecture details, benchmarks, and training information.

Text  --> embeddinggemma-300M --------------------------> 768-dim
Image --> MobileNetV4-Medium (1280-d) --> DeepProjection -> 768-dim
Audio --> EfficientAT mn20_as (1920-d) --> DeepProjection -> 768-dim

Total parameters (fp32): 420.6M

Source model

These quantizations were produced from augmem/teg-421m.

Links

License

Apache 2.0

Downloads last month
85
GGUF
Model size
0.4B params
Architecture
omniembed
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for augmem/teg-421m-gguf

Finetuned
augmem/teg-421m
Quantized
(1)
this model