augmem
/

AIST-87M

@@ -112,6 +112,35 @@ This release is not presented as a generic MTEB/MIEB/MAEB leaderboard model.
 Broad diagnostic runs contain many task families that are not part of this
 release gate.
 ## Architecture
 ```text

 Broad diagnostic runs contain many task families that are not part of this
 release gate.
+## Runtime Footprint vs Dual-Audio Tower
+`AIST-87M` replaces the dual-audio tower's separate EfficientAT + Whisper-Tiny
+audio branches with one merged native `mn20_as` EfficientAT encoder. The result
+is a smaller deployed path with the same 1280d output contract.
+| Runtime surface | AIST-87M | AIST-95M dual-audio tower | Delta |
+|---|---:|---:|---:|
+| Loaded parameters | 87,118,774 | 95,315,959 | -8.6% |
+| Safetensors artifact | 348.9 MB | 381.9 MB | -8.6% |
+| Audio encoders | 1 | 2 | removes Whisper branch |
+| Audio encoder parameters | 19,886,566 | 26,117,671 | -23.9% |
+| Audio path parameters incl. projection | 32,193,126 | 40,390,311 | -20.3% |
+| Audio projection input width | 1,280 | 2,304 | -44.4% |
+Exact-gate tradeoff against the same dual-audio local baseline:
+| 1280d exact-gate slice | AIST-87M | AIST-95M dual-audio tower | Delta |
+|---|---:|---:|---:|
+| Speech holdout audio-text R@1 avg | 0.724 | 0.582 | +0.142 |
+| WavCaps FSD audio-text R@1 avg | 0.097 | 0.105 | -0.009 |
+| SALT audio-text R@1 avg | 0.008 | 0.007 | flat |
+| SALT image-audio R@1 avg | 0.138 | 0.148 | -0.010 |
+These are footprint and local exact-gate measurements, not a universal latency
+benchmark. Wall-clock speed still depends on runtime, device, batching, and
+audio preprocessing, but the deployed model removes one audio encoder pass and
+shrinks the audio projection path.
 ## Architecture
 ```text