| --- |
| language: |
| - en |
| license: apache-2.0 |
| tags: |
| - multimodal |
| - embedding |
| - matryoshka |
| - trimodal |
| - image-text-audio |
| - retrieval |
| - cross-modal |
| - edge |
| - rag |
| library_name: safetensors |
| pipeline_tag: feature-extraction |
| datasets: |
| - custom |
| --- |
| |
| # AIT-86M — Audio, Image, Text Embeddings (Depth-2) |
|
|
| **AIT-86M** maps image, audio, and text into a shared 1280-dim embedding space for cross-modal retrieval with a single vector index. All three modalities share one space with full Matryoshka truncation support down to 128 dims. |
|
|
| Built for edge deployment, with a single combined safetensors artifact. |
|
|
| Successor to [TE-75M](https://huggingface.co/augmem/TE-75M). |
|
|
| > Also available in [GGUF format](https://huggingface.co/augmem/AIT-86M-GGUF) for quantized edge deployment. |
|
|
| ## Why This Matters |
|
|
| The notable result for this family is preserving the shared semantic retrieval path while adding anchor-style decision behavior in downstream variants. In practice, the hard part is usually keeping the retrieval backbone flat while adding new decision surfaces on top of it. |
|
|
| ## File layout |
|
|
| ```text |
| AIT-86M.safetensors |
| ``` |
|
|
| ## Notes |
|
|
| - shared trimodal embedding space |
| - Matryoshka truncation: `1280 / 768 / 512 / 256 / 128` |
| - intended for retrieval and embedding use, not generation |
|
|
| ## Historical Local Gate Baseline |
|
|
| The exact local gate baseline that was previously attached under the `TE-86M` release directory is restored here for continuity in the `AIT-86M` artifact line. |
|
|
| Attached JSON: |
|
|
| - `teacher_dual_mn20whisper_exact_gate_baseline_20260424T155324Z.json` |
|
|
| Seeded split-excluded baseline at `1280d`: |
|
|
| | Slice | Metric | |
| |---|---:| |
| | Speech holdout A->T R@1 | 0.5652 | |
| | Speech holdout T->A R@1 | 0.5992 | |
| | Speech holdout avg R@1 | 0.5822 | |
| | WavCaps FSD A->T R@1 | 0.1078 | |
| | WavCaps FSD T->A R@1 | 0.1030 | |
| | WavCaps FSD avg R@1 | 0.1054 | |
| | SALT A->I R@1 | 0.1692 | |
| | SALT I->A R@1 | 0.1261 | |
|
|
| Scope note: |
|
|
| - These are the canonical local gate numbers used for bounded continuation and recovery experiments in this model family. |
| - They are not a claim of broad public benchmark superiority. |
| - They are restored here because the prior card revision dropped the attached evaluation summary. |
|
|
| ## Evaluation Scope |
|
|
| Published evaluations for this model family include targeted retrieval and anchor-style discrimination tasks. Those targeted evaluations are useful, but they are not a substitute for a published adversarial or out-of-distribution benchmark. Downstream runtime validation remains application-specific. |
|
|
| ## Files |
|
|
| | File | Purpose | |
| |---|---| |
| | `AIT-86M.safetensors` | Base trimodal checkpoint | |
| | `teacher_dual_mn20whisper_exact_gate_baseline_20260424T155324Z.json` | Restored canonical local gate baseline summary | |
|
|
| ## License |
|
|
| Apache 2.0 |
|
|