Upload LycheeMem reranker v1 (Qwen3-Reranker-0.6B + LoRA rank=16, V4 Pro 5-level distillation)

f415634 verified 14 days ago

3.63 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3-Reranker-0.6B
	library_name: peft
	tags:
	- reranker
	- memory-retrieval
	- long-term-memory
	- dialog
	- lora
	- distillation
	- lycheemem
	language:
	- en
	pipeline_tag: text-classification
	---

	# LycheeMem Reranker v1

	A LoRA adapter on Qwen3-Reranker-0.6B for long-term memory dialog retrieval. Built for [LycheeMem](https://github.com/LycheeMem/lycheemem).

	## Evaluation

	MAP (mean average precision), higher is better. Queries strictly held out from training.

	\| Model \| LongMemEval-S<br/>(373 q, held-out) \| MSC-MemFuse-MC10<br/>(27 q, held-out) \| HotpotQA distractor<br/>(7,405 q, OOD) \|
	\|---\|---\|---\|---\|
	\| LycheeMem Reranker v1 \| 0.9185 \| 0.7457 \| 0.7063 \|
	\| BGE-Reranker-v2-m3 (560M) \| 0.8647 \| 0.5503 \| 0.8002 \|
	\| Δ \| +5.4 pp \| +19.5 pp \| −9.4 pp \|

	Full metrics:

	\| Benchmark \| hit@10 \| R@5 \| R@10 \| MAP \| NDCG@10 \|
	\|---\|---\|---\|---\|---\|---\|
	\| LongMemEval-S held-out \| 1.000 \| 0.964 \| 0.988 \| 0.919 \| 0.940 \|
	\| MSC-MemFuse-MC10 held-out \| 1.000 \| 0.799 \| 0.896 \| 0.746 \| 0.786 \|
	\| HotpotQA distractor (OOD) \| 0.987 \| 0.793 \| 0.890 \| 0.706 \| 0.769 \|

	Use this model for memory dialog reranking. Use BGE-Reranker-v2-m3 for general retrieval.

	## Training

	\| Source \| Queries \| Pairs \|
	\|---\|---\|---\|
	\| LongMemEval-S (cleaned-overlap) \| 127 \| 6,018 \|
	\| MSC-MemFuse-MC10 (answer-turn) \| 299 \| 14,950 \|
	\| Total \| 426 \| 20,968 \|

	90/10 query-stratified split: 18,947 train / 2,021 held-out.

	Labels are 5-level continuous (`{0.0, 0.2, 0.4, 0.6, 0.8, 1.0}`) distilled from DeepSeek V4 Pro on the mid-tier candidates retrieved by the upstream retriever. Trained with LoRA r=16, BCE-with-logits against continuous targets, 3 epochs.

	## Usage

	```python
	import torch
	from peft import PeftModel
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	BASE = "Qwen/Qwen3-Reranker-0.6B"
	ADAPTER = "fuhao23/reranker_v1"

	tok = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
	if tok.pad_token is None:
	tok.pad_token = tok.eos_token

	base = AutoModelForSequenceClassification.from_pretrained(
	BASE, num_labels=1, torch_dtype=torch.bfloat16, trust_remote_code=True
	)
	base.config.pad_token_id = tok.pad_token_id
	model = PeftModel.from_pretrained(base, ADAPTER).eval().to("cuda")

	INSTRUCT = "Given a user query, retrieve memory snippets that answer the query"

	def score(query: str, candidates: list[str], max_len: int = 512) -> list[float]:
	texts = [
	f"<Instruct>: {INSTRUCT}\n<Query>: {query}\n<Document>: {c}"
	for c in candidates
	]
	enc = tok(texts, padding=True, truncation=True, max_length=max_len,
	return_tensors="pt").to(model.device)
	with torch.inference_mode():
	logits = model(**enc).logits.squeeze(-1).float().cpu().tolist()
	return logits
	```

	Apply `torch.sigmoid` for normalized probabilities.

	## Limitations

	- Specialized for dialog memory; trails BGE-Reranker-v2-m3 on out-of-domain retrieval.
	- English-only training distribution.
	- MSC-MemFuse-MC10 held-out is small (27 queries); LongMemEval-S held-out (373) is the primary in-domain reference.
	- Continuous labels are LLM-distilled (DeepSeek V4 Pro), not human-annotated.
	- Reports retrieval-stage metrics; end-to-end answer accuracy with this reranker integrated is not reported here.

	## Citation

	```bibtex
	@misc{lycheemem_reranker_v1,
	title = {LycheeMem Reranker v1: A Domain-Specialized Reranker for Long-Term Memory Dialog Retrieval},
	author = {LycheeMem Project},
	year = {2026},
	url = {https://huggingface.co/fuhao23/reranker_v1}
	}
	```

	## License

	Apache 2.0.