File size: 2,751 Bytes
d2d7cf9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88ebf8b
d2d7cf9
88ebf8b
d2d7cf9
88ebf8b
d2d7cf9
88ebf8b
d2d7cf9
88ebf8b
d2d7cf9
20708d9
 
 
 
d2d7cf9
 
88ebf8b
 
d2d7cf9
 
88ebf8b
d2d7cf9
88ebf8b
 
 
d2d7cf9
8e011be
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20708d9
 
 
 
8e011be
 
 
 
 
 
 
d2d7cf9
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
language:
- en
license: apache-2.0
tags:
- multimodal
- embedding
- matryoshka
- trimodal
- image-text-audio
- retrieval
- cross-modal
- edge
- rag
library_name: safetensors
pipeline_tag: feature-extraction
datasets:
- custom
---

# AIT-86M — Audio, Image, Text Embeddings (Depth-2)

**AIT-86M** maps image, audio, and text into a shared 1280-dim embedding space for cross-modal retrieval with a single vector index. All three modalities share one space with full Matryoshka truncation support down to 128 dims.

Built for edge deployment, with a single combined safetensors artifact.

Successor to [TE-75M](https://huggingface.co/augmem/TE-75M).

> Also available in [GGUF format](https://huggingface.co/augmem/AIT-86M-GGUF) for quantized edge deployment.

## Why This Matters

The notable result for this family is preserving the shared semantic retrieval path while adding anchor-style decision behavior in downstream variants. In practice, the hard part is usually keeping the retrieval backbone flat while adding new decision surfaces on top of it.

## File layout

```text
AIT-86M.safetensors
```

## Notes

- shared trimodal embedding space
- Matryoshka truncation: `1280 / 768 / 512 / 256 / 128`
- intended for retrieval and embedding use, not generation

## Historical Local Gate Baseline

The exact local gate baseline that was previously attached under the `TE-86M` release directory is restored here for continuity in the `AIT-86M` artifact line.

Attached JSON:

- `teacher_dual_mn20whisper_exact_gate_baseline_20260424T155324Z.json`

Seeded split-excluded baseline at `1280d`:

| Slice | Metric |
|---|---:|
| Speech holdout A->T R@1 | 0.5652 |
| Speech holdout T->A R@1 | 0.5992 |
| Speech holdout avg R@1 | 0.5822 |
| WavCaps FSD A->T R@1 | 0.1078 |
| WavCaps FSD T->A R@1 | 0.1030 |
| WavCaps FSD avg R@1 | 0.1054 |
| SALT A->I R@1 | 0.1692 |
| SALT I->A R@1 | 0.1261 |

Scope note:

- These are the canonical local gate numbers used for bounded continuation and recovery experiments in this model family.
- They are not a claim of broad public benchmark superiority.
- They are restored here because the prior card revision dropped the attached evaluation summary.

## Evaluation Scope

Published evaluations for this model family include targeted retrieval and anchor-style discrimination tasks. Those targeted evaluations are useful, but they are not a substitute for a published adversarial or out-of-distribution benchmark. Downstream runtime validation remains application-specific.

## Files

| File | Purpose |
|---|---|
| `AIT-86M.safetensors` | Base trimodal checkpoint |
| `teacher_dual_mn20whisper_exact_gate_baseline_20260424T155324Z.json` | Restored canonical local gate baseline summary |

## License

Apache 2.0