ES-AIST-81M Preview
ES-AIST-81M Preview is the first ES-family public preview checkpoint.
- release checkpoint:
es_aist_full_v13_anchor_memory_eventboost_er125_bs4096_nw0_l4b/best_model.pt - exported checkpoint epoch:
6 - text encoder:
MongoDB/mdbr-leaf-ir - image encoder:
mobilenetv4_conv_medium.e180_r384_in12k - audio encoder: native
mn20_asEfficientAT LoRA audio backbone - exact loaded params:
80,812,854
GGUF quantizations for this exact release should be published separately under:
augmem/ES-AIST-81M-preview-GGUF
Runtime Contract
Output embedding: 1536d
0:768semantic768:1536entity
Recommended normalized runtime views:
semantic_key = l2norm(z[0:768])entity_key = l2norm(z[768:1536])full_key = l2norm(z[0:1536])
The model emits retrieval and entity signals. Anchor creation, linking, merging, splitting, weak-reference attention, recency, and abstention remain engine-side decisions.
Usage
Primary repo:
augmem/ES-AIST-81M-preview
Quantized repo:
augmem/ES-AIST-81M-preview-GGUF
Download the release artifacts with huggingface_hub:
from huggingface_hub import hf_hub_download
model_path = hf_hub_download("augmem/ES-AIST-81M-preview", "ES-AIST-81M.safetensors")
metadata_path = hf_hub_download("augmem/ES-AIST-81M-preview", "export_metadata.json")
q8_path = hf_hub_download("augmem/ES-AIST-81M-preview-GGUF", "ES-AIST-81M_q8_0.gguf")
The safetensors artifact is a TriEmbed package with these tensor groups:
text_encoder,image_encoder,audio_encodertext_projection,image_projection,audio_projection
Use export_metadata.json for the runtime contract. At minimum, normalize the
1536d output and slice:
- semantic signal:
z[0:768] - entity signal:
z[768:1536]
Exact Release Metrics
All numbers below are from the exported checkpoint above and the fresh GT1030
eval bundle in es_aist_full_v13_anchor_memory_eventboost_er125_bs4096_nw0_l4b_auto_gt1030_v13.
Evaluation scope note:
SALTis held out from ES training again and should be read as a cleaner regression/generalization surface than the ESS preview.speech_chatterboxis train-adjacent because speech/audio-text data is part of the training corpus.- A selected external
MTEB / MIEB / MAEBmemory slice is reported below; it is not a full leaderboard sweep.
Scoped status:
- This checkpoint passes the local ES-AIST memory/entity-signal gate for compact open AIST models in this release line.
- The claim is limited to the memory-oriented entity and candidate-anchor task reported below; this is not a generic MTEB, MIEB, or MAEB SOTA claim.
Retrieval
Source: retrieval_768_1536_gt1030.json
SALT at 768d:
- image->text R@1:
0.1794 - text->image R@1:
0.1968 - audio->text R@1:
0.0392 - text->audio R@1:
0.0356
Speech holdout at 768d:
- audio->text R@1:
0.3870 - text->audio R@1:
0.3624
Entity Signal
Source: entity_eval.json
entity_keysame/different entity AUC:0.9953entity_keysame-topic/different-entity rejection AUC:0.9953semantic_keysame/different entity AUC:0.9823full_keysame/different entity AUC:0.9923
Episode / Event Rejection
Source: episode_aux_eval.json
entity_keyevent same/different AUC:0.8912entity_keysame-entity/different-event rejection AUC:0.8001entity_keystale same-source rejection AUC:0.9241entity_keywrong-active rejection AUC:0.8799entity_keytopic-shift rejection AUC:0.9543
Candidate Ranking
Source: candidate_ranking_eval.json
entity_key:
- entity candidate R@1:
0.9993 - weak-reference candidate R@1:
1.0000 - anchor-memory candidate R@1:
0.9647 - wrong-active candidate R@1:
0.9298 - stale candidate R@1:
0.9722
Selected MTEB / MIEB / MAEB Memory Slice
Source: M_SERIES_MEMORY_SLICE.md and es_aist_mseries_memory_slice_eventboost_summary.json
Completion:
- 768d:
8 / 8selected tasks complete,0exceptions - 1536d:
8 / 8selected tasks complete,0exceptions after rerun
Selected scores:
| Dim | Text | Image-text | Best selected audio-text |
|---|---|---|---|
| 768 | SprintDuplicateQuestions 0.9161; STSBenchmark 0.7442 |
Flickr T2I R@1 0.1764; Flickr I2T R@1 0.0370 |
Clotho R@1 0.0512, R@10 0.2861 |
| 1536 | SprintDuplicateQuestions 0.9323; STSBenchmark 0.7535 |
Flickr T2I R@1 0.1864; Flickr I2T R@1 0.0378 |
Clotho R@1 0.0514, R@10 0.2863 |
Architecture
This preview is a frozen-encoder / trainable-projector stack:
- text encoder params:
22,861,056 - image encoder params:
8,434,512 - audio encoder params:
20,639,974 - text projection params:
8,926,720 - image projection params:
9,975,296 - audio projection params:
9,975,296 - total exact loaded params:
80,812,854
Files
| File | Purpose |
|---|---|
ES-AIST-81M.safetensors |
Full preview release artifact |
export_metadata.json |
ES runtime contract and source checkpoint metadata |
manifest.json |
Release manifest |
parameter_breakdown.json |
Exact parameter accounting |
es_aist_81m_spec.yaml |
Training config used for the release line |
retrieval_768_1536_gt1030.json |
Exact retrieval eval for this checkpoint |
entity_eval.json |
Entity AUC eval |
episode_aux_eval.json |
Event/rejection eval |
candidate_ranking_eval.json |
Candidate-anchor ranking eval |
signal_eval.json |
Signal-level eval summary |
es_aist_eval_gate_summary.json |
Multi-run gate comparison |
SOTA_AUDIT.md |
SOTA status, known gaps, and active benchmark plan |
SOTA_GATE.md |
Executable scoped SOTA gate report |
SOTA_CLAIM.md |
Scoped memory-task claim boundary |
es_aist_sota_audit_20260501.json |
Machine-readable scoped SOTA gate |
M_SERIES_MEMORY_SLICE.md |
Selected MTEB/MIEB/MAEB memory-slice report |
es_aist_mseries_memory_slice_eventboost_summary.json |
Machine-readable selected slice summary |
Caveats
- This is a preview checkpoint, not a final memory model.
- The entity embedding is a signal for engine-side attention and ranking; it does not resolve references by itself.
SALTis held out from ES training, but the model is still optimized for memory-oriented entity signals rather than generic leaderboard coverage.- Full
MTEB / MIEB / MAEBreporting is future work; the included slice is selected for memory-relevant smoke coverage.