---
language:
- en
license: apache-2.0
tags:
- multimodal
- embedding
- trimodal
- retrieval
- image-text-audio
- feature-extraction
library_name: pytorch
pipeline_tag: feature-extraction
datasets:
- custom
---

# ESS-AIST-81M Preview

`ESS-AIST-81M Preview` is the current Cortext trial checkpoint from the ESS line.

- release checkpoint: `ess_aist_full_v9_subjectfix_l4k/best_model.pt`
- exported checkpoint epoch: `3`
- text encoder: `MongoDB/mdbr-leaf-ir`
- image encoder: `mobilenetv4_conv_medium.e180_r384_in12k`
- audio encoder: native `mn20_as` EfficientAT LoRA audio backbone

This preview is the current bridge artifact for Cortext. It keeps the ESS
`semantic / subject / event` slice layout, but the `v9` dataset repair moved
the `subject` slice much closer to the entity signal Cortext actually needs.

GGUF quantizations for this exact release live in:

- `augmem/ESS-AIST-81M-preview-GGUF`

Tradeoff:

- held-out subject/entity separation is much stronger than the earlier `v7` preview
- speech and SALT retrieval are weaker than the earlier `v7` retrieval-max point

For Cortext, this is still the better preview because the entity-side signal is
materially stronger.

## Embedding Layout

Output embedding: `1536d`

- `0:512` semantic
- `512:1024` subject
- `1024:1536` event

Recommended normalized runtime views:

- `semantic_key = l2norm(z[0:512])`
- `subject_key = l2norm(z[512:1024])`
- `event_key = l2norm(z[1024:1536])`
- `full_key = l2norm(z[0:1536])`

## Exact Release Metrics

All numbers below are from the exact published checkpoint state exported from
`ess_aist_full_v9_subjectfix_l4k/best_model.pt` at checkpoint epoch `3`.

Evaluation scope note:

- `SALT` is no longer a clean out-of-training benchmark for this ESS line because SALT-derived rows were used during training.
- `speech holdout` is also train-adjacent rather than fully external because explicit speech/audio-text data was added back into the corpus.
- In this release, those two surfaces should be read as regression gates and in-domain checks, not as contamination-free generalization claims.
- A later full external sweep (`MTEB / MIEB / MAEB`) is still pending.

### 512d Retrieval

Source:
- `retrieval_512_gt1030.json`

Speech holdout:

- `A->T_r1 = 0.3276`
- `T->A_r1 = 0.3202`
- `A->T_r5 = 0.6120`
- `T->A_r5 = 0.6046`

SALT:

- `I->T_r1 = 0.3179`
- `T->I_r1 = 0.3425`
- `A->T_r1 = 0.1226`
- `T->A_r1 = 0.1272`
- `I->A_r1 = 0.1970`
- `A->I_r1 = 0.2148`

### Held-Out ESS Eval

Sources:

- `subject_eval.json`
- `event_eval.json`
- `prefix_eval.json`

Subject / entity surface:

- `subject_key` same/different AUC: `0.9881`
- `subject_key` same-topic-different-subject rejection AUC: `0.9881`

Event / disambiguation surface:

- `subject_key` event same/different AUC: `0.8855`
- `event_key` event same/different AUC: `0.8193`
- `subject_key` same-subject-different-event rejection AUC: `0.7381`
- `event_key` same-subject-different-event rejection AUC: `0.6807`
- `subject_key` topic-shift rejection AUC: `0.9513`
- `event_key` topic-shift rejection AUC: `0.8969`

Interpretation:

- the repaired `v9` held-out surface is no longer near-random on subject/entity
- the current `subject` slice is the strongest entity carrier in the model
- event structure is usable, but still entangled with subject
- this is the right bridge checkpoint for Cortext, not the final `semantic/entity` architecture

## Architecture

This preview is a frozen-encoder / trainable-projector stack:

- text encoder params: `22,861,056`
- image encoder params: `8,434,512`
- audio encoder params: `20,639,974`
- image projection params: `9,975,296`
- audio projection params: `9,975,296`
- text projection params: `8,926,720`
- total exact loaded params: `80,812,854`

The audio path is not the old dual-audio teacher path. It uses the native
audioheavy LoRA EfficientAT backbone.

## Files

| File | Purpose |
|---|---|
| `ESS-AIST-81M.safetensors` | Full preview release artifact |
| `export_metadata.json` | ESS export contract |
| `manifest.json` | Release manifest |
| `parameter_breakdown.json` | Exact parameter accounting |
| `ess_ait_86m_spec.yaml` | Training config used for the release line |
| `retrieval_512_gt1030.json` | Exact 512d retrieval eval for this checkpoint |
| `subject_eval.json` | Exact held-out subject eval for this checkpoint |
| `event_eval.json` | Exact held-out event eval for this checkpoint |
| `prefix_eval.json` | Prefix-level AUC summary |

## Caveats

- This is the current preview checkpoint, not the final Cortext model family.
- The current runtime slices are still named `semantic / subject / event`; the next family will move toward `semantic / entity`.
- Subject/entity is now strong on the repaired `v9` held-out surface, but event remains entangled and the engine still needs attention over active anchors for weak-reference resolution.
- Retrieval on `speech holdout` and `SALT` is lower than the earlier `v7` preview.
- `SALT` and `speech holdout` are useful release gates for this line, but they are no longer fully external benchmarks in the same way they were for the earlier pre-ESS artifacts.
- Use this for internal Cortext trials, not as the final memory-model release.