Fix Whisper-tiny encoder param count (8M, not 39M)
Browse files
README.md
CHANGED
|
@@ -58,7 +58,7 @@ A compact multimodal embedding model that unifies text, image, and audio represe
|
|
| 58 |
|
| 59 |
- **Text encoding** via MiniLM-L6-v2 (22M params)
|
| 60 |
- **Image encoding** via SigLIP-base-patch16-512 (86M params)
|
| 61 |
-
- **Audio encoding** via Whisper-tiny encoder (
|
| 62 |
- **Cross-modal fusion** via 2-layer transformer attention
|
| 63 |
- **2DMSE**: Two-Dimensional Matryoshka Sentence Embeddings for adaptive compute
|
| 64 |
- **MRL**: Matryoshka Representation Learning for flexible embedding dimensions
|
|
@@ -196,7 +196,7 @@ emb_64 = F.normalize(full_emb[:, :64], p=2, dim=-1) # 6x faster retrieval
|
|
| 196 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
| 197 |
β Text Encoder: MiniLM-L6-v2 (22M params, 6 layers)β
|
| 198 |
β Image Encoder: SigLIP-base-patch16-512 (86M params) β
|
| 199 |
-
β Audio Encoder: Whisper-tiny encoder (
|
| 200 |
β Fusion: 2-layer Transformer β
|
| 201 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
| 202 |
β Output: 384-dim normalized embeddings β
|
|
|
|
| 58 |
|
| 59 |
- **Text encoding** via MiniLM-L6-v2 (22M params)
|
| 60 |
- **Image encoding** via SigLIP-base-patch16-512 (86M params)
|
| 61 |
+
- **Audio encoding** via Whisper-tiny encoder (8M params)
|
| 62 |
- **Cross-modal fusion** via 2-layer transformer attention
|
| 63 |
- **2DMSE**: Two-Dimensional Matryoshka Sentence Embeddings for adaptive compute
|
| 64 |
- **MRL**: Matryoshka Representation Learning for flexible embedding dimensions
|
|
|
|
| 196 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
| 197 |
β Text Encoder: MiniLM-L6-v2 (22M params, 6 layers)β
|
| 198 |
β Image Encoder: SigLIP-base-patch16-512 (86M params) β
|
| 199 |
+
β Audio Encoder: Whisper-tiny encoder (8M params, 4 layers) β
|
| 200 |
β Fusion: 2-layer Transformer β
|
| 201 |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
|
| 202 |
β Output: 384-dim normalized embeddings β
|