anonymoussubmission111
/

mpe-checkpoints

+---
+tags:
+- lora
+- peft
+- transformers
+- retrieval
+- embeddings
+license: apache-2.0
+---
+# Checkpoints for "Improving Long-Context Retrieval with Multi-Prefix Embedding"
+This repository contains model checkpoints and pre-computed embeddings for the anonymous submission *Improving Long-Context Retrieval with Multi-Prefix Embedding*.
+## Repository Structure
+```
+models/
+  fixed-64-epoch1/          # Ablation: fixed 64-token prefix length
+  maxp-train-epoch1/        # Baseline: MaxP trained model
+  nochunk-epoch1/           # Baseline: single-vector (no chunking)
+  prand-32to1024-epoch1/    # Proposed: random prefix lengths (32-1024 tokens)
+encode/
+  browsecomp-plus/          # Pre-computed embeddings - BrowseComp-Plus
+  longembed/                # Pre-computed embeddings - LongEmbed (2WikiMQA,
+  |                         #   NarrativeQA, QMSum, SummScreenFD)
+  mldr-en/                  # Pre-computed embeddings - MLDR (English)
+```
+Each model folder contains a LoRA adapter (rank 16, alpha 64) fine-tuned from `Qwen/Qwen3-Embedding-0.6B`
+for feature extraction, along with tokenizer files and a `checkpoint-625/` subfolder with the
+intermediate checkpoint at the end of epoch 1 (including optimizer state).
+## Usage
+Load a LoRA adapter with [PEFT](https://github.com/huggingface/peft):
+```python
+from peft import PeftModel
+from transformers import AutoModel
+base = AutoModel.from_pretrained("Qwen/Qwen3-Embedding-0.6B")
+model = PeftModel.from_pretrained(
+    base,
+    "anonymoussubmission111/mpe-checkpoints",
+    subfolder="models/prand-32to1024-epoch1",
+)
+```
+Pre-computed embeddings in `encode/` are stored as `.pkl` files (pickled numpy arrays)
+and can be loaded directly to reproduce retrieval results without re-encoding.