ES-AIST-81M Preview

ES-AIST-81M Preview is the first ES-family public preview checkpoint.

release checkpoint: es_aist_full_v13_anchor_memory_eventboost_er125_bs4096_nw0_l4b/best_model.pt
exported checkpoint epoch: 6
text encoder: MongoDB/mdbr-leaf-ir
image encoder: mobilenetv4_conv_medium.e180_r384_in12k
audio encoder: native mn20_as EfficientAT LoRA audio backbone
exact loaded params: 80,812,854

GGUF quantizations for this exact release should be published separately under:

augmem/ES-AIST-81M-preview-GGUF

Runtime Contract

Output embedding: 1536d

0:768 semantic
768:1536 entity

Recommended normalized runtime views:

semantic_key = l2norm(z[0:768])
entity_key = l2norm(z[768:1536])
full_key = l2norm(z[0:1536])

The model emits retrieval and entity signals. Anchor creation, linking, merging, splitting, weak-reference attention, recency, and abstention remain engine-side decisions.

Usage

Primary repo:

augmem/ES-AIST-81M-preview

Quantized repo:

augmem/ES-AIST-81M-preview-GGUF

Download the release artifacts with huggingface_hub:

from huggingface_hub import hf_hub_download

model_path = hf_hub_download("augmem/ES-AIST-81M-preview", "ES-AIST-81M.safetensors")
metadata_path = hf_hub_download("augmem/ES-AIST-81M-preview", "export_metadata.json")
q8_path = hf_hub_download("augmem/ES-AIST-81M-preview-GGUF", "ES-AIST-81M_q8_0.gguf")

The safetensors artifact is a TriEmbed package with these tensor groups:

text_encoder, image_encoder, audio_encoder
text_projection, image_projection, audio_projection

Use export_metadata.json for the runtime contract. At minimum, normalize the 1536d output and slice:

semantic signal: z[0:768]
entity signal: z[768:1536]

Exact Release Metrics

All numbers below are from the exported checkpoint above and the fresh GT1030 eval bundle in es_aist_full_v13_anchor_memory_eventboost_er125_bs4096_nw0_l4b_auto_gt1030_v13.

Evaluation scope note:

SALT is held out from ES training again and should be read as a cleaner regression/generalization surface than the ESS preview.
speech_chatterbox is train-adjacent because speech/audio-text data is part of the training corpus.
A selected external MTEB / MIEB / MAEB memory slice is reported below; it is not a full leaderboard sweep.

Scoped status:

This checkpoint passes the local ES-AIST memory/entity-signal gate for compact open AIST models in this release line.
The claim is limited to the memory-oriented entity and candidate-anchor task reported below; this is not a generic MTEB, MIEB, or MAEB SOTA claim.

Retrieval

Source: retrieval_768_1536_gt1030.json

SALT at 768d:

image->text R@1: 0.1794
text->image R@1: 0.1968
audio->text R@1: 0.0392
text->audio R@1: 0.0356

Speech holdout at 768d:

audio->text R@1: 0.3870
text->audio R@1: 0.3624

Entity Signal

Source: entity_eval.json

entity_key same/different entity AUC: 0.9953
entity_key same-topic/different-entity rejection AUC: 0.9953
semantic_key same/different entity AUC: 0.9823
full_key same/different entity AUC: 0.9923

Episode / Event Rejection

Source: episode_aux_eval.json

entity_key event same/different AUC: 0.8912
entity_key same-entity/different-event rejection AUC: 0.8001
entity_key stale same-source rejection AUC: 0.9241
entity_key wrong-active rejection AUC: 0.8799
entity_key topic-shift rejection AUC: 0.9543

Candidate Ranking

Source: candidate_ranking_eval.json

entity_key:

entity candidate R@1: 0.9993
weak-reference candidate R@1: 1.0000
anchor-memory candidate R@1: 0.9647
wrong-active candidate R@1: 0.9298
stale candidate R@1: 0.9722

Selected MTEB / MIEB / MAEB Memory Slice

Source: M_SERIES_MEMORY_SLICE.md and es_aist_mseries_memory_slice_eventboost_summary.json

Completion:

768d: 8 / 8 selected tasks complete, 0 exceptions
1536d: 8 / 8 selected tasks complete, 0 exceptions after rerun

Selected scores:

Dim	Text	Image-text	Best selected audio-text
768	SprintDuplicateQuestions `0.9161`; STSBenchmark `0.7442`	Flickr T2I R@1 `0.1764`; Flickr I2T R@1 `0.0370`	Clotho R@1 `0.0512`, R@10 `0.2861`
1536	SprintDuplicateQuestions `0.9323`; STSBenchmark `0.7535`	Flickr T2I R@1 `0.1864`; Flickr I2T R@1 `0.0378`	Clotho R@1 `0.0514`, R@10 `0.2863`

Architecture

This preview is a frozen-encoder / trainable-projector stack:

text encoder params: 22,861,056
image encoder params: 8,434,512
audio encoder params: 20,639,974
text projection params: 8,926,720
image projection params: 9,975,296
audio projection params: 9,975,296
total exact loaded params: 80,812,854

Files

File	Purpose
`ES-AIST-81M.safetensors`	Full preview release artifact
`export_metadata.json`	ES runtime contract and source checkpoint metadata
`manifest.json`	Release manifest
`parameter_breakdown.json`	Exact parameter accounting
`es_aist_81m_spec.yaml`	Training config used for the release line
`retrieval_768_1536_gt1030.json`	Exact retrieval eval for this checkpoint
`entity_eval.json`	Entity AUC eval
`episode_aux_eval.json`	Event/rejection eval
`candidate_ranking_eval.json`	Candidate-anchor ranking eval
`signal_eval.json`	Signal-level eval summary
`es_aist_eval_gate_summary.json`	Multi-run gate comparison
`SOTA_AUDIT.md`	SOTA status, known gaps, and active benchmark plan
`SOTA_GATE.md`	Executable scoped SOTA gate report
`SOTA_CLAIM.md`	Scoped memory-task claim boundary
`es_aist_sota_audit_20260501.json`	Machine-readable scoped SOTA gate
`M_SERIES_MEMORY_SLICE.md`	Selected MTEB/MIEB/MAEB memory-slice report
`es_aist_mseries_memory_slice_eventboost_summary.json`	Machine-readable selected slice summary

Caveats

This is a preview checkpoint, not a final memory model.
The entity embedding is a signal for engine-side attention and ranking; it does not resolve references by itself.
SALT is held out from ES training, but the model is still optimized for memory-oriented entity signals rather than generic leaderboard coverage.
Full MTEB / MIEB / MAEB reporting is future work; the included slice is selected for memory-relevant smoke coverage.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for augmem/ES-AIST-81M-preview

Quantizations

1 model