File size: 5,206 Bytes
d000b90 d9e59f7 d000b90 d9e59f7 d000b90 d9e59f7 d000b90 17997bc d000b90 d9e59f7 d000b90 d9e59f7 d000b90 17997bc d9e59f7 d000b90 d9e59f7 d000b90 d9e59f7 d000b90 d9e59f7 d000b90 17997bc d9e59f7 17997bc d000b90 d9e59f7 17997bc d000b90 17997bc d000b90 d9e59f7 17997bc d000b90 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | ---
language:
- en
license: apache-2.0
tags:
- multimodal
- embedding
- trimodal
- retrieval
- image-text-audio
- feature-extraction
library_name: pytorch
pipeline_tag: feature-extraction
datasets:
- custom
---
# ESS-AIST-81M Preview
`ESS-AIST-81M Preview` is the current Cortext trial checkpoint from the ESS line.
- release checkpoint: `ess_aist_full_v9_subjectfix_l4k/best_model.pt`
- exported checkpoint epoch: `3`
- text encoder: `MongoDB/mdbr-leaf-ir`
- image encoder: `mobilenetv4_conv_medium.e180_r384_in12k`
- audio encoder: native `mn20_as` EfficientAT LoRA audio backbone
This preview is the current bridge artifact for Cortext. It keeps the ESS
`semantic / subject / event` slice layout, but the `v9` dataset repair moved
the `subject` slice much closer to the entity signal Cortext actually needs.
GGUF quantizations for this exact release live in:
- `augmem/ESS-AIST-81M-preview-GGUF`
Tradeoff:
- held-out subject/entity separation is much stronger than the earlier `v7` preview
- speech and SALT retrieval are weaker than the earlier `v7` retrieval-max point
For Cortext, this is still the better preview because the entity-side signal is
materially stronger.
## Embedding Layout
Output embedding: `1536d`
- `0:512` semantic
- `512:1024` subject
- `1024:1536` event
Recommended normalized runtime views:
- `semantic_key = l2norm(z[0:512])`
- `subject_key = l2norm(z[512:1024])`
- `event_key = l2norm(z[1024:1536])`
- `full_key = l2norm(z[0:1536])`
## Exact Release Metrics
All numbers below are from the exact published checkpoint state exported from
`ess_aist_full_v9_subjectfix_l4k/best_model.pt` at checkpoint epoch `3`.
Evaluation scope note:
- `SALT` is no longer a clean out-of-training benchmark for this ESS line because SALT-derived rows were used during training.
- `speech holdout` is also train-adjacent rather than fully external because explicit speech/audio-text data was added back into the corpus.
- In this release, those two surfaces should be read as regression gates and in-domain checks, not as contamination-free generalization claims.
- A later full external sweep (`MTEB / MIEB / MAEB`) is still pending.
### 512d Retrieval
Source:
- `retrieval_512_gt1030.json`
Speech holdout:
- `A->T_r1 = 0.3276`
- `T->A_r1 = 0.3202`
- `A->T_r5 = 0.6120`
- `T->A_r5 = 0.6046`
SALT:
- `I->T_r1 = 0.3179`
- `T->I_r1 = 0.3425`
- `A->T_r1 = 0.1226`
- `T->A_r1 = 0.1272`
- `I->A_r1 = 0.1970`
- `A->I_r1 = 0.2148`
### Held-Out ESS Eval
Sources:
- `subject_eval.json`
- `event_eval.json`
- `prefix_eval.json`
Subject / entity surface:
- `subject_key` same/different AUC: `0.9881`
- `subject_key` same-topic-different-subject rejection AUC: `0.9881`
Event / disambiguation surface:
- `subject_key` event same/different AUC: `0.8855`
- `event_key` event same/different AUC: `0.8193`
- `subject_key` same-subject-different-event rejection AUC: `0.7381`
- `event_key` same-subject-different-event rejection AUC: `0.6807`
- `subject_key` topic-shift rejection AUC: `0.9513`
- `event_key` topic-shift rejection AUC: `0.8969`
Interpretation:
- the repaired `v9` held-out surface is no longer near-random on subject/entity
- the current `subject` slice is the strongest entity carrier in the model
- event structure is usable, but still entangled with subject
- this is the right bridge checkpoint for Cortext, not the final `semantic/entity` architecture
## Architecture
This preview is a frozen-encoder / trainable-projector stack:
- text encoder params: `22,861,056`
- image encoder params: `8,434,512`
- audio encoder params: `20,639,974`
- image projection params: `9,975,296`
- audio projection params: `9,975,296`
- text projection params: `8,926,720`
- total exact loaded params: `80,812,854`
The audio path is not the old dual-audio teacher path. It uses the native
audioheavy LoRA EfficientAT backbone.
## Files
| File | Purpose |
|---|---|
| `ESS-AIST-81M.safetensors` | Full preview release artifact |
| `export_metadata.json` | ESS export contract |
| `manifest.json` | Release manifest |
| `parameter_breakdown.json` | Exact parameter accounting |
| `ess_ait_86m_spec.yaml` | Training config used for the release line |
| `retrieval_512_gt1030.json` | Exact 512d retrieval eval for this checkpoint |
| `subject_eval.json` | Exact held-out subject eval for this checkpoint |
| `event_eval.json` | Exact held-out event eval for this checkpoint |
| `prefix_eval.json` | Prefix-level AUC summary |
## Caveats
- This is the current preview checkpoint, not the final Cortext model family.
- The current runtime slices are still named `semantic / subject / event`; the next family will move toward `semantic / entity`.
- Subject/entity is now strong on the repaired `v9` held-out surface, but event remains entangled and the engine still needs attention over active anchors for weak-reference resolution.
- Retrieval on `speech holdout` and `SALT` is lower than the earlier `v7` preview.
- `SALT` and `speech holdout` are useful release gates for this line, but they are no longer fully external benchmarks in the same way they were for the earlier pre-ESS artifacts.
- Use this for internal Cortext trials, not as the final memory-model release.
|