lettuce-emb-512d-v3
ONNX package for lettuce-emb-512d-v3.
Included Files
model.fp32.onnx(full precision)model.int8.onnx(dynamic quantized INT8)model.onnx(FP32 convenience copy)- Tokenizer files:
tokenizer.json,tokenizer_config.json,special_tokens_map.json,vocab.txt - Sentence-Transformers metadata/config:
modules.json,config_sentence_transformers.json,sentence_bert_config.json - Pooling/Dense configs:
1_Pooling/config.json,2_Dense/config.json - Nomic architecture files:
configuration_hf_nomic_bert.py,modeling_hf_nomic_bert.py
Model Specs
- Backbone family: Nomic BERT (
nomic-ai/nomic-embed-text-v1.5) - Embedding dimension:
512 - Similarity: cosine similarity on normalized embeddings
- Context target used in training pipeline:
4096
Training Config
Final checkpoint lineage:
- Source training model:
./output/lettuce-v3-rp-long3 - Resume chain used in this session included continued training from previous long-context runs.
Core training configuration:
student-base:nomic-ai/nomic-embed-text-v1.5teachers:BAAI/bge-m3dim:512context:4096teacher-context:1024pair-batch:4(stable setting used for successful resumed run)triplet-batch:4(stable setting used for successful resumed run)teacher-batch:1num-workers:4epochs:1per run segment (continued via resume)
Data composition used in long-context training runs:
- NLI triplets:
30000 - RP pairs source:
Heralax/Augmental-Dataset(~7831rows available in this environment) - Logic source:
hard_logic.jsonwith oversampling (typical run value:12or higher) - LongBench subsets enabled:
qasperqmsumnarrativeqapassage_retrieval_en
- LongBench split used:
test(loader falls back to direct JSONL files fromTHUDM/LongBenchdata.zip)
Primary losses:
MultipleNegativesRankingLoss(triplets)CosineSimilarityLoss(teacher-scored pairs)
Inference Example (ONNX Runtime)
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer
model_dir = "./lettuce-emb-512d-v3"
onnx_path = f"{model_dir}/model.int8.onnx" # or model.fp32.onnx
tokenizer = AutoTokenizer.from_pretrained(model_dir)
session = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
texts = [
"I forgot to mention one important detail.",
"There is one important detail I forgot to mention."
]
inputs = tokenizer(texts, return_tensors="np", padding=True, truncation=True)
feeds = {
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"],
}
if "token_type_ids" in inputs:
names = [x.name for x in session.get_inputs()]
if "token_type_ids" in names:
feeds["token_type_ids"] = inputs["token_type_ids"]
emb = session.run(None, feeds)[0] # [batch, 512]
emb = emb / np.clip(np.linalg.norm(emb, axis=1, keepdims=True), 1e-12, None)
print(float(np.dot(emb[0], emb[1])))
Benchmark Snapshot
Benchmarks were run with:
- Full eval script:
eval_v3_full.py - Extreme eval script:
eval_v3_extreme.py - Model:
./output/lettuce-v3-rp-long3
Benchmark Configs
Full benchmark command:
PYTORCH_ALLOC_CONF=expandable_segments:True venv/bin/python eval_v3_full.py \
--model ./output/lettuce-emb-512d-v3 \
--trust-remote-code \
--batch-size 32 \
--long-batch-size 2 \
--long-subset 150 \
--logic-limit 460 \
--rp-limit 1000 \
--retrieval-corpus 1000
Extreme benchmark command:
venv/bin/python eval_v3_extreme.py \
--model ./output/lettuce-emb-512d-v3 \
--trust-remote-code \
--batch-size 8 \
--needle-cases 24 \
--needle-targets 1024 2048 4096 \
--save-json output/lettuce-emb-512d-v3/extreme_metrics.json
Full Eval (eval_v3_full.py)
| Metric | Value |
|---|---|
| logic_triplet_accuracy | 0.9848 |
| logic_mean_margin | 0.1874 |
| rp_recall@1 | 0.0200 |
| rp_recall@5 | 0.1090 |
| rp_recall@10 | 0.1710 |
| rp_mrr | 0.0717 |
| fp_probe_accuracy | 1.0000 |
| fp_probe_mean_margin | 0.4387 |
| long_1024_recall@10 | 0.1867 |
| long_2048_recall@10 | 0.1067 |
| long_4096_recall@10 | 0.1067 |
Extreme Eval (eval_v3_extreme.py)
| Metric | Value |
|---|---|
| logic_role_flip_accuracy | 0.8000 |
| logic_neg_temp_accuracy | 0.6000 |
| coreference_accuracy | 0.6000 |
| rp_overlap_accuracy | 1.0000 |
| needle_1024_accuracy | 0.5833 |
| needle_2048_accuracy | 0.7083 |
| needle_4096_accuracy | 0.7083 |
| extreme_avg_accuracy | 0.7143 |
| extreme_avg_margin | 0.0673 |
Full extreme metrics JSON is included as extreme_metrics.json.
Real Benchmarks (MTEB)
MTEB run config:
venv/bin/python - <<'PY'
import mteb
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('./output/lettuce-v3-rp-long3', trust_remote_code=True)
model.similarity_fn_name = 'cosine'
tasks = [t for t in mteb.get_tasks(languages=['eng']) if t.metadata.name in ['STSBenchmark','SICK-R','NFCorpus']]
_ = mteb.evaluate(model, tasks, prediction_folder='./output/mteb_real_predictions', show_progress_bar=True)
PY
| Task | Metric | Score |
|---|---|---|
| STSBenchmark | Spearman (main_score) |
0.8091 |
| STSBenchmark | Pearson | 0.8036 |
| SICK-R | Spearman (main_score) |
0.7816 |
| SICK-R | Pearson | 0.8297 |
| NFCorpus | nDCG@10 | 0.2784 |
| NFCorpus | MAP@10 | 0.0938 |
| NFCorpus | Recall@10 | 0.1271 |
| NFCorpus | MRR@10 | 0.4725 |
Full real-benchmark metrics JSON is included as mteb_real_results.json.
Export Config
ONNX export script:
export_v3_onnx.py
Export settings used:
opset:18- exporter mode: TorchScript ONNX (
dynamo=Falsein script for stability) - quantization: dynamic INT8 (
qint8) - generated files:
model.fp32.onnxmodel.int8.onnxmodel.onnx(FP32 convenience copy)
Notes
model.int8.onnxis recommended for CPU/mobile usage.- INT8 may trade a small amount of quality for speed/size improvements.
- Long-sequence latency is still significantly higher than short-sequence latency.
- Downloads last month
- 10