File size: 5,206 Bytes
d000b90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d9e59f7
 
d000b90
 
 
 
d9e59f7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d000b90
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d9e59f7
 
d000b90
17997bc
 
 
 
 
 
 
d000b90
 
 
 
 
 
 
d9e59f7
 
 
 
d000b90
 
 
d9e59f7
 
 
 
 
 
d000b90
 
 
17997bc
 
 
 
 
 
d9e59f7
d000b90
d9e59f7
 
d000b90
d9e59f7
d000b90
d9e59f7
 
 
 
 
 
d000b90
17997bc
 
d9e59f7
 
 
 
17997bc
d000b90
 
 
 
 
 
 
 
 
 
 
 
d9e59f7
 
17997bc
d000b90
 
 
 
17997bc
d000b90
 
 
 
 
 
 
 
 
 
 
d9e59f7
 
 
 
17997bc
d000b90
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
language:
- en
license: apache-2.0
tags:
- multimodal
- embedding
- trimodal
- retrieval
- image-text-audio
- feature-extraction
library_name: pytorch
pipeline_tag: feature-extraction
datasets:
- custom
---

# ESS-AIST-81M Preview

`ESS-AIST-81M Preview` is the current Cortext trial checkpoint from the ESS line.

- release checkpoint: `ess_aist_full_v9_subjectfix_l4k/best_model.pt`
- exported checkpoint epoch: `3`
- text encoder: `MongoDB/mdbr-leaf-ir`
- image encoder: `mobilenetv4_conv_medium.e180_r384_in12k`
- audio encoder: native `mn20_as` EfficientAT LoRA audio backbone

This preview is the current bridge artifact for Cortext. It keeps the ESS
`semantic / subject / event` slice layout, but the `v9` dataset repair moved
the `subject` slice much closer to the entity signal Cortext actually needs.

GGUF quantizations for this exact release live in:

- `augmem/ESS-AIST-81M-preview-GGUF`

Tradeoff:

- held-out subject/entity separation is much stronger than the earlier `v7` preview
- speech and SALT retrieval are weaker than the earlier `v7` retrieval-max point

For Cortext, this is still the better preview because the entity-side signal is
materially stronger.

## Embedding Layout

Output embedding: `1536d`

- `0:512` semantic
- `512:1024` subject
- `1024:1536` event

Recommended normalized runtime views:

- `semantic_key = l2norm(z[0:512])`
- `subject_key = l2norm(z[512:1024])`
- `event_key = l2norm(z[1024:1536])`
- `full_key = l2norm(z[0:1536])`

## Exact Release Metrics

All numbers below are from the exact published checkpoint state exported from
`ess_aist_full_v9_subjectfix_l4k/best_model.pt` at checkpoint epoch `3`.

Evaluation scope note:

- `SALT` is no longer a clean out-of-training benchmark for this ESS line because SALT-derived rows were used during training.
- `speech holdout` is also train-adjacent rather than fully external because explicit speech/audio-text data was added back into the corpus.
- In this release, those two surfaces should be read as regression gates and in-domain checks, not as contamination-free generalization claims.
- A later full external sweep (`MTEB / MIEB / MAEB`) is still pending.

### 512d Retrieval

Source:
- `retrieval_512_gt1030.json`

Speech holdout:

- `A->T_r1 = 0.3276`
- `T->A_r1 = 0.3202`
- `A->T_r5 = 0.6120`
- `T->A_r5 = 0.6046`

SALT:

- `I->T_r1 = 0.3179`
- `T->I_r1 = 0.3425`
- `A->T_r1 = 0.1226`
- `T->A_r1 = 0.1272`
- `I->A_r1 = 0.1970`
- `A->I_r1 = 0.2148`

### Held-Out ESS Eval

Sources:

- `subject_eval.json`
- `event_eval.json`
- `prefix_eval.json`

Subject / entity surface:

- `subject_key` same/different AUC: `0.9881`
- `subject_key` same-topic-different-subject rejection AUC: `0.9881`

Event / disambiguation surface:

- `subject_key` event same/different AUC: `0.8855`
- `event_key` event same/different AUC: `0.8193`
- `subject_key` same-subject-different-event rejection AUC: `0.7381`
- `event_key` same-subject-different-event rejection AUC: `0.6807`
- `subject_key` topic-shift rejection AUC: `0.9513`
- `event_key` topic-shift rejection AUC: `0.8969`

Interpretation:

- the repaired `v9` held-out surface is no longer near-random on subject/entity
- the current `subject` slice is the strongest entity carrier in the model
- event structure is usable, but still entangled with subject
- this is the right bridge checkpoint for Cortext, not the final `semantic/entity` architecture

## Architecture

This preview is a frozen-encoder / trainable-projector stack:

- text encoder params: `22,861,056`
- image encoder params: `8,434,512`
- audio encoder params: `20,639,974`
- image projection params: `9,975,296`
- audio projection params: `9,975,296`
- text projection params: `8,926,720`
- total exact loaded params: `80,812,854`

The audio path is not the old dual-audio teacher path. It uses the native
audioheavy LoRA EfficientAT backbone.

## Files

| File | Purpose |
|---|---|
| `ESS-AIST-81M.safetensors` | Full preview release artifact |
| `export_metadata.json` | ESS export contract |
| `manifest.json` | Release manifest |
| `parameter_breakdown.json` | Exact parameter accounting |
| `ess_ait_86m_spec.yaml` | Training config used for the release line |
| `retrieval_512_gt1030.json` | Exact 512d retrieval eval for this checkpoint |
| `subject_eval.json` | Exact held-out subject eval for this checkpoint |
| `event_eval.json` | Exact held-out event eval for this checkpoint |
| `prefix_eval.json` | Prefix-level AUC summary |

## Caveats

- This is the current preview checkpoint, not the final Cortext model family.
- The current runtime slices are still named `semantic / subject / event`; the next family will move toward `semantic / entity`.
- Subject/entity is now strong on the repaired `v9` held-out surface, but event remains entangled and the engine still needs attention over active anchors for weak-reference resolution.
- Retrieval on `speech holdout` and `SALT` is lower than the earlier `v7` preview.
- `SALT` and `speech holdout` are useful release gates for this line, but they are no longer fully external benchmarks in the same way they were for the earlier pre-ESS artifacts.
- Use this for internal Cortext trials, not as the final memory-model release.