--- language: - en license: apache-2.0 tags: - multimodal - embedding - matryoshka - trimodal - image-text-audio - retrieval - cross-modal - edge - rag library_name: safetensors pipeline_tag: feature-extraction datasets: - custom --- # AIT-86M — Audio, Image, Text Embeddings (Depth-2) **AIT-86M** maps image, audio, and text into a shared 1280-dim embedding space for cross-modal retrieval with a single vector index. All three modalities share one space with full Matryoshka truncation support down to 128 dims. Built for edge deployment, with a single combined safetensors artifact. Successor to [TE-75M](https://huggingface.co/augmem/TE-75M). > Also available in [GGUF format](https://huggingface.co/augmem/AIT-86M-GGUF) for quantized edge deployment. ## Why This Matters The notable result for this family is preserving the shared semantic retrieval path while adding anchor-style decision behavior in downstream variants. In practice, the hard part is usually keeping the retrieval backbone flat while adding new decision surfaces on top of it. ## File layout ```text AIT-86M.safetensors ``` ## Notes - shared trimodal embedding space - Matryoshka truncation: `1280 / 768 / 512 / 256 / 128` - intended for retrieval and embedding use, not generation ## Historical Local Gate Baseline The exact local gate baseline that was previously attached under the `TE-86M` release directory is restored here for continuity in the `AIT-86M` artifact line. Attached JSON: - `teacher_dual_mn20whisper_exact_gate_baseline_20260424T155324Z.json` Seeded split-excluded baseline at `1280d`: | Slice | Metric | |---|---:| | Speech holdout A->T R@1 | 0.5652 | | Speech holdout T->A R@1 | 0.5992 | | Speech holdout avg R@1 | 0.5822 | | WavCaps FSD A->T R@1 | 0.1078 | | WavCaps FSD T->A R@1 | 0.1030 | | WavCaps FSD avg R@1 | 0.1054 | | SALT A->I R@1 | 0.1692 | | SALT I->A R@1 | 0.1261 | Scope note: - These are the canonical local gate numbers used for bounded continuation and recovery experiments in this model family. - They are not a claim of broad public benchmark superiority. - They are restored here because the prior card revision dropped the attached evaluation summary. ## Evaluation Scope Published evaluations for this model family include targeted retrieval and anchor-style discrimination tasks. Those targeted evaluations are useful, but they are not a substitute for a published adversarial or out-of-distribution benchmark. Downstream runtime validation remains application-specific. ## Files | File | Purpose | |---|---| | `AIT-86M.safetensors` | Base trimodal checkpoint | | `teacher_dual_mn20whisper_exact_gate_baseline_20260424T155324Z.json` | Restored canonical local gate baseline summary | ## License Apache 2.0