--- language: - en license: apache-2.0 tags: - multimodal - embedding - trimodal - retrieval - image-text-audio - feature-extraction library_name: pytorch pipeline_tag: feature-extraction datasets: - custom --- # ESS-AIST-81M Preview `ESS-AIST-81M Preview` is the current Cortext trial checkpoint from the ESS line. - release checkpoint: `ess_aist_full_v9_subjectfix_l4k/best_model.pt` - exported checkpoint epoch: `3` - text encoder: `MongoDB/mdbr-leaf-ir` - image encoder: `mobilenetv4_conv_medium.e180_r384_in12k` - audio encoder: native `mn20_as` EfficientAT LoRA audio backbone This preview is the current bridge artifact for Cortext. It keeps the ESS `semantic / subject / event` slice layout, but the `v9` dataset repair moved the `subject` slice much closer to the entity signal Cortext actually needs. GGUF quantizations for this exact release live in: - `augmem/ESS-AIST-81M-preview-GGUF` Tradeoff: - held-out subject/entity separation is much stronger than the earlier `v7` preview - speech and SALT retrieval are weaker than the earlier `v7` retrieval-max point For Cortext, this is still the better preview because the entity-side signal is materially stronger. ## Embedding Layout Output embedding: `1536d` - `0:512` semantic - `512:1024` subject - `1024:1536` event Recommended normalized runtime views: - `semantic_key = l2norm(z[0:512])` - `subject_key = l2norm(z[512:1024])` - `event_key = l2norm(z[1024:1536])` - `full_key = l2norm(z[0:1536])` ## Exact Release Metrics All numbers below are from the exact published checkpoint state exported from `ess_aist_full_v9_subjectfix_l4k/best_model.pt` at checkpoint epoch `3`. Evaluation scope note: - `SALT` is no longer a clean out-of-training benchmark for this ESS line because SALT-derived rows were used during training. - `speech holdout` is also train-adjacent rather than fully external because explicit speech/audio-text data was added back into the corpus. - In this release, those two surfaces should be read as regression gates and in-domain checks, not as contamination-free generalization claims. - A later full external sweep (`MTEB / MIEB / MAEB`) is still pending. ### 512d Retrieval Source: - `retrieval_512_gt1030.json` Speech holdout: - `A->T_r1 = 0.3276` - `T->A_r1 = 0.3202` - `A->T_r5 = 0.6120` - `T->A_r5 = 0.6046` SALT: - `I->T_r1 = 0.3179` - `T->I_r1 = 0.3425` - `A->T_r1 = 0.1226` - `T->A_r1 = 0.1272` - `I->A_r1 = 0.1970` - `A->I_r1 = 0.2148` ### Held-Out ESS Eval Sources: - `subject_eval.json` - `event_eval.json` - `prefix_eval.json` Subject / entity surface: - `subject_key` same/different AUC: `0.9881` - `subject_key` same-topic-different-subject rejection AUC: `0.9881` Event / disambiguation surface: - `subject_key` event same/different AUC: `0.8855` - `event_key` event same/different AUC: `0.8193` - `subject_key` same-subject-different-event rejection AUC: `0.7381` - `event_key` same-subject-different-event rejection AUC: `0.6807` - `subject_key` topic-shift rejection AUC: `0.9513` - `event_key` topic-shift rejection AUC: `0.8969` Interpretation: - the repaired `v9` held-out surface is no longer near-random on subject/entity - the current `subject` slice is the strongest entity carrier in the model - event structure is usable, but still entangled with subject - this is the right bridge checkpoint for Cortext, not the final `semantic/entity` architecture ## Architecture This preview is a frozen-encoder / trainable-projector stack: - text encoder params: `22,861,056` - image encoder params: `8,434,512` - audio encoder params: `20,639,974` - image projection params: `9,975,296` - audio projection params: `9,975,296` - text projection params: `8,926,720` - total exact loaded params: `80,812,854` The audio path is not the old dual-audio teacher path. It uses the native audioheavy LoRA EfficientAT backbone. ## Files | File | Purpose | |---|---| | `ESS-AIST-81M.safetensors` | Full preview release artifact | | `export_metadata.json` | ESS export contract | | `manifest.json` | Release manifest | | `parameter_breakdown.json` | Exact parameter accounting | | `ess_ait_86m_spec.yaml` | Training config used for the release line | | `retrieval_512_gt1030.json` | Exact 512d retrieval eval for this checkpoint | | `subject_eval.json` | Exact held-out subject eval for this checkpoint | | `event_eval.json` | Exact held-out event eval for this checkpoint | | `prefix_eval.json` | Prefix-level AUC summary | ## Caveats - This is the current preview checkpoint, not the final Cortext model family. - The current runtime slices are still named `semantic / subject / event`; the next family will move toward `semantic / entity`. - Subject/entity is now strong on the repaired `v9` held-out surface, but event remains entangled and the engine still needs attention over active anchors for weak-reference resolution. - Retrieval on `speech holdout` and `SALT` is lower than the earlier `v7` preview. - `SALT` and `speech holdout` are useful release gates for this line, but they are no longer fully external benchmarks in the same way they were for the earlier pre-ESS artifacts. - Use this for internal Cortext trials, not as the final memory-model release.