lettuce-emb-512d-v3

ONNX package for lettuce-emb-512d-v3.

Included Files

model.fp32.onnx (full precision)
model.int8.onnx (dynamic quantized INT8)
model.onnx (FP32 convenience copy)
Tokenizer files: tokenizer.json, tokenizer_config.json, special_tokens_map.json, vocab.txt
Sentence-Transformers metadata/config: modules.json, config_sentence_transformers.json, sentence_bert_config.json
Pooling/Dense configs: 1_Pooling/config.json, 2_Dense/config.json
Nomic architecture files: configuration_hf_nomic_bert.py, modeling_hf_nomic_bert.py

Model Specs

Backbone family: Nomic BERT (nomic-ai/nomic-embed-text-v1.5)
Embedding dimension: 512
Similarity: cosine similarity on normalized embeddings
Context target used in training pipeline: 4096

Training Config

Final checkpoint lineage:

Source training model: ./output/lettuce-v3-rp-long3
Resume chain used in this session included continued training from previous long-context runs.

Core training configuration:

student-base: nomic-ai/nomic-embed-text-v1.5
teachers: BAAI/bge-m3
dim: 512
context: 4096
teacher-context: 1024
pair-batch: 4 (stable setting used for successful resumed run)
triplet-batch: 4 (stable setting used for successful resumed run)
teacher-batch: 1
num-workers: 4
epochs: 1 per run segment (continued via resume)

Data composition used in long-context training runs:

NLI triplets: 30000
RP pairs source: Heralax/Augmental-Dataset (~7831 rows available in this environment)
Logic source: hard_logic.json with oversampling (typical run value: 12 or higher)
LongBench subsets enabled:
- qasper
- qmsum
- narrativeqa
- passage_retrieval_en
LongBench split used: test (loader falls back to direct JSONL files from THUDM/LongBench data.zip)

Primary losses:

MultipleNegativesRankingLoss (triplets)
CosineSimilarityLoss (teacher-scored pairs)

Inference Example (ONNX Runtime)

import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

model_dir = "./lettuce-emb-512d-v3"
onnx_path = f"{model_dir}/model.int8.onnx"  # or model.fp32.onnx

tokenizer = AutoTokenizer.from_pretrained(model_dir)
session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])

texts = [
    "I forgot to mention one important detail.",
    "There is one important detail I forgot to mention."
]

inputs = tokenizer(texts, return_tensors="np", padding=True, truncation=True)
feeds = {
    "input_ids": inputs["input_ids"],
    "attention_mask": inputs["attention_mask"],
}
if "token_type_ids" in inputs:
    names = [x.name for x in session.get_inputs()]
    if "token_type_ids" in names:
        feeds["token_type_ids"] = inputs["token_type_ids"]

emb = session.run(None, feeds)[0]  # [batch, 512]
emb = emb / np.clip(np.linalg.norm(emb, axis=1, keepdims=True), 1e-12, None)
print(float(np.dot(emb[0], emb[1])))

Benchmark Snapshot

Benchmarks were run with:

Full eval script: eval_v3_full.py
Extreme eval script: eval_v3_extreme.py
Model: ./output/lettuce-v3-rp-long3

Benchmark Configs

Full benchmark command:

PYTORCH_ALLOC_CONF=expandable_segments:True venv/bin/python eval_v3_full.py \
  --model ./output/lettuce-emb-512d-v3 \
  --trust-remote-code \
  --batch-size 32 \
  --long-batch-size 2 \
  --long-subset 150 \
  --logic-limit 460 \
  --rp-limit 1000 \
  --retrieval-corpus 1000

Extreme benchmark command:

venv/bin/python eval_v3_extreme.py \
  --model ./output/lettuce-emb-512d-v3 \
  --trust-remote-code \
  --batch-size 8 \
  --needle-cases 24 \
  --needle-targets 1024 2048 4096 \
  --save-json output/lettuce-emb-512d-v3/extreme_metrics.json

Full Eval (`eval_v3_full.py`)

Metric	Value
logic_triplet_accuracy	0.9848
logic_mean_margin	0.1874
rp_recall@1	0.0200
rp_recall@5	0.1090
rp_recall@10	0.1710
rp_mrr	0.0717
fp_probe_accuracy	1.0000
fp_probe_mean_margin	0.4387
long_1024_recall@10	0.1867
long_2048_recall@10	0.1067
long_4096_recall@10	0.1067

Extreme Eval (`eval_v3_extreme.py`)

Metric	Value
logic_role_flip_accuracy	0.8000
logic_neg_temp_accuracy	0.6000
coreference_accuracy	0.6000
rp_overlap_accuracy	1.0000
needle_1024_accuracy	0.5833
needle_2048_accuracy	0.7083
needle_4096_accuracy	0.7083
extreme_avg_accuracy	0.7143
extreme_avg_margin	0.0673

Full extreme metrics JSON is included as extreme_metrics.json.

Real Benchmarks (MTEB)

MTEB run config:

venv/bin/python - <<'PY'
import mteb
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('./output/lettuce-v3-rp-long3', trust_remote_code=True)
model.similarity_fn_name = 'cosine'
tasks = [t for t in mteb.get_tasks(languages=['eng']) if t.metadata.name in ['STSBenchmark','SICK-R','NFCorpus']]
_ = mteb.evaluate(model, tasks, prediction_folder='./output/mteb_real_predictions', show_progress_bar=True)
PY

Task	Metric	Score
STSBenchmark	Spearman (`main_score`)	0.8091
STSBenchmark	Pearson	0.8036
SICK-R	Spearman (`main_score`)	0.7816
SICK-R	Pearson	0.8297
NFCorpus	nDCG@10	0.2784
NFCorpus	MAP@10	0.0938
NFCorpus	Recall@10	0.1271
NFCorpus	MRR@10	0.4725

Full real-benchmark metrics JSON is included as mteb_real_results.json.

Export Config

ONNX export script:

export_v3_onnx.py

Export settings used:

opset: 18
exporter mode: TorchScript ONNX (dynamo=False in script for stability)
quantization: dynamic INT8 (qint8)
generated files:
- model.fp32.onnx
- model.int8.onnx
- model.onnx (FP32 convenience copy)

Notes

model.int8.onnx is recommended for CPU/mobile usage.
INT8 may trade a small amount of quality for speed/size improvements.
Long-sequence latency is still significantly higher than short-sequence latency.

Downloads last month: 16