Update README.md

cec4c7e verified about 21 hours ago

3.7 kB

	---
	license: mit
	language:
	- en
	- zh
	base_model:
	- Qwen/Qwen3-4B-Thinking-2507
	---

	# MemSifter-4B-Thinking

	MemSifter is a lightweight generative session ranker trained with DAPO reinforcement learning. It serves as the core retrieval component of the [MemSifter](https://github.com/plageon/MemSifter) system—an LLM memory retrieval offloading framework based on outcome-driven proxy reasoning.

	Given a user query and a set of candidate conversation sessions (pre-filtered by a dense embedding model), MemSifter-4B-Thinking performs fine-grained reranking to surface the sessions most relevant to the query, which are then passed as context to a downstream chat LLM.

	## System Pipeline

	```
	Session Embedding → Session Ranking (MemSifter) → Chat LLM
	(bge-m3) (generative reranker) (any LLM)
	```

	1. Session Embedding — bge-m3 performs a coarse similarity pre-filter across all sessions.
	2. Session Ranking — MemSifter (this model) performs fine-grained reranking of the pre-filtered candidates.
	3. Chat LLM — the top-ranked sessions are assembled into a context window and passed to any OpenAI-compatible chat model.

	## How to Use

	Install the required packages:

	```bash
	pip install torch sentence-transformers vllm openai pyyaml loguru numpy pandas
	```

	Clone the repository and run the three-stage pipeline:

	```bash
	git clone https://github.com/plageon/MemSifter.git
	cd MemSifter
	```

	```python
	import json
	from memsifter.toolkit import SessionEmbedder, SessionRanker, LLMChat

	# Load one sample
	with open("data/test_memory.json") as f:
	entry = json.load(f)[0]

	question = entry["question"]
	haystack_sessions = entry["haystack_sessions"]
	haystack_dates = entry["haystack_dates"]

	# Initialise models
	embedder = SessionEmbedder(model_path="models/bge-m3", device="cuda:0")
	ranker = SessionRanker(
	model_path="models/zstanjj/MemSifter-4B-Thinking",
	device="cuda:1",
	)
	chat = LLMChat(api_key="YOUR_KEY", base_url="YOUR_BASE_URL", model_name="YOUR_MODEL")

	# Stage 1 — embedding pre-filter
	top_sessions = embedder.get_top_sessions(
	question=question, sessions=haystack_sessions, dates=haystack_dates, top_k=20
	)

	# Stage 2 — generative reranking
	ranked_sessions = ranker.rerank(
	question=question, pre_ranked_sessions=top_sessions, top_k=5
	)

	# Stage 3 — LLM answer
	predicted_answer = chat.answer(question=question, ranked_sessions=ranked_sessions)
	print(predicted_answer)
	```

	The `SessionRanker` uses [vLLM](https://github.com/vllm-project/vllm) for inference. It outputs a chain-of-thought inside `<think>...</think>` tags followed by a comma-separated session ranking inside `<ranking>...</ranking>` tags.

	## Training

	MemSifter-4B-Thinking is fine-tuned from Qwen3-4B using the DAPO reinforcement learning algorithm with a task reward formulation that combines:

	- Marginal Utility Reward — rewards selecting sessions with diminishing redundancy.
	- Rank-Sensitive Reward — rewards placing the most relevant sessions at higher positions.

	Training data is bootstrapped from the MemSifter embedding pipeline on multiple conversational memory benchmarks (LoCoMo, LongMemEval, PersonaMem, etc.), using NDCG-based anchor sampling to construct RL training trajectories.

	## Citation

	```bibtex
	@misc{memsifter,
	title={MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning},
	author={Jiejun Tan and Zhicheng Dou and Liancheng Zhang and Yuyang Hu and Yiruo Cheng and Ji-Rong Wen},
	year={2026},
	eprint={2603.03379},
	archivePrefix={arXiv},
	primaryClass={cs.IR},
	url={https://arxiv.org/abs/2603.03379},
	}
	```