Promote v9 preview checkpoint

d9e59f7 verified 23 days ago

5.21 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- multimodal
	- embedding
	- trimodal
	- retrieval
	- image-text-audio
	- feature-extraction
	library_name: pytorch
	pipeline_tag: feature-extraction
	datasets:
	- custom
	---

	# ESS-AIST-81M Preview

	`ESS-AIST-81M Preview` is the current Cortext trial checkpoint from the ESS line.

	- release checkpoint: `ess_aist_full_v9_subjectfix_l4k/best_model.pt`
	- exported checkpoint epoch: `3`
	- text encoder: `MongoDB/mdbr-leaf-ir`
	- image encoder: `mobilenetv4_conv_medium.e180_r384_in12k`
	- audio encoder: native `mn20_as` EfficientAT LoRA audio backbone

	This preview is the current bridge artifact for Cortext. It keeps the ESS
	`semantic / subject / event` slice layout, but the `v9` dataset repair moved
	the `subject` slice much closer to the entity signal Cortext actually needs.

	GGUF quantizations for this exact release live in:

	- `augmem/ESS-AIST-81M-preview-GGUF`

	Tradeoff:

	- held-out subject/entity separation is much stronger than the earlier `v7` preview
	- speech and SALT retrieval are weaker than the earlier `v7` retrieval-max point

	For Cortext, this is still the better preview because the entity-side signal is
	materially stronger.

	## Embedding Layout

	Output embedding: `1536d`

	- `0:512` semantic
	- `512:1024` subject
	- `1024:1536` event

	Recommended normalized runtime views:

	- `semantic_key = l2norm(z[0:512])`
	- `subject_key = l2norm(z[512:1024])`
	- `event_key = l2norm(z[1024:1536])`
	- `full_key = l2norm(z[0:1536])`

	## Exact Release Metrics

	All numbers below are from the exact published checkpoint state exported from
	`ess_aist_full_v9_subjectfix_l4k/best_model.pt` at checkpoint epoch `3`.

	Evaluation scope note:

	- `SALT` is no longer a clean out-of-training benchmark for this ESS line because SALT-derived rows were used during training.
	- `speech holdout` is also train-adjacent rather than fully external because explicit speech/audio-text data was added back into the corpus.
	- In this release, those two surfaces should be read as regression gates and in-domain checks, not as contamination-free generalization claims.
	- A later full external sweep (`MTEB / MIEB / MAEB`) is still pending.

	### 512d Retrieval

	Source:
	- `retrieval_512_gt1030.json`

	Speech holdout:

	- `A->T_r1 = 0.3276`
	- `T->A_r1 = 0.3202`
	- `A->T_r5 = 0.6120`
	- `T->A_r5 = 0.6046`

	SALT:

	- `I->T_r1 = 0.3179`
	- `T->I_r1 = 0.3425`
	- `A->T_r1 = 0.1226`
	- `T->A_r1 = 0.1272`
	- `I->A_r1 = 0.1970`
	- `A->I_r1 = 0.2148`

	### Held-Out ESS Eval

	Sources:

	- `subject_eval.json`
	- `event_eval.json`
	- `prefix_eval.json`

	Subject / entity surface:

	- `subject_key` same/different AUC: `0.9881`
	- `subject_key` same-topic-different-subject rejection AUC: `0.9881`

	Event / disambiguation surface:

	- `subject_key` event same/different AUC: `0.8855`
	- `event_key` event same/different AUC: `0.8193`
	- `subject_key` same-subject-different-event rejection AUC: `0.7381`
	- `event_key` same-subject-different-event rejection AUC: `0.6807`
	- `subject_key` topic-shift rejection AUC: `0.9513`
	- `event_key` topic-shift rejection AUC: `0.8969`

	Interpretation:

	- the repaired `v9` held-out surface is no longer near-random on subject/entity
	- the current `subject` slice is the strongest entity carrier in the model
	- event structure is usable, but still entangled with subject
	- this is the right bridge checkpoint for Cortext, not the final `semantic/entity` architecture

	## Architecture

	This preview is a frozen-encoder / trainable-projector stack:

	- text encoder params: `22,861,056`
	- image encoder params: `8,434,512`
	- audio encoder params: `20,639,974`
	- image projection params: `9,975,296`
	- audio projection params: `9,975,296`
	- text projection params: `8,926,720`
	- total exact loaded params: `80,812,854`

	The audio path is not the old dual-audio teacher path. It uses the native
	audioheavy LoRA EfficientAT backbone.

	## Files

	\| File \| Purpose \|
	\|---\|---\|
	\| `ESS-AIST-81M.safetensors` \| Full preview release artifact \|
	\| `export_metadata.json` \| ESS export contract \|
	\| `manifest.json` \| Release manifest \|
	\| `parameter_breakdown.json` \| Exact parameter accounting \|
	\| `ess_ait_86m_spec.yaml` \| Training config used for the release line \|
	\| `retrieval_512_gt1030.json` \| Exact 512d retrieval eval for this checkpoint \|
	\| `subject_eval.json` \| Exact held-out subject eval for this checkpoint \|
	\| `event_eval.json` \| Exact held-out event eval for this checkpoint \|
	\| `prefix_eval.json` \| Prefix-level AUC summary \|

	## Caveats

	- This is the current preview checkpoint, not the final Cortext model family.
	- The current runtime slices are still named `semantic / subject / event`; the next family will move toward `semantic / entity`.
	- Subject/entity is now strong on the repaired `v9` held-out surface, but event remains entangled and the engine still needs attention over active anchors for weak-reference resolution.
	- Retrieval on `speech holdout` and `SALT` is lower than the earlier `v7` preview.
	- `SALT` and `speech holdout` are useful release gates for this line, but they are no longer fully external benchmarks in the same way they were for the earlier pre-ESS artifacts.
	- Use this for internal Cortext trials, not as the final memory-model release.