--- license: mit language: - en - zh base_model: - Qwen/Qwen3-4B-Thinking-2507 --- # MemSifter-4B-Thinking **MemSifter** is a lightweight generative session ranker trained with DAPO reinforcement learning. It serves as the core retrieval component of the [MemSifter](https://github.com/plageon/MemSifter) system—an LLM memory retrieval offloading framework based on outcome-driven proxy reasoning. Given a user query and a set of candidate conversation sessions (pre-filtered by a dense embedding model), MemSifter-4B-Thinking performs fine-grained reranking to surface the sessions most relevant to the query, which are then passed as context to a downstream chat LLM. ## System Pipeline ``` Session Embedding → Session Ranking (MemSifter) → Chat LLM (bge-m3) (generative reranker) (any LLM) ``` 1. **Session Embedding** — bge-m3 performs a coarse similarity pre-filter across all sessions. 2. **Session Ranking** — MemSifter (this model) performs fine-grained reranking of the pre-filtered candidates. 3. **Chat LLM** — the top-ranked sessions are assembled into a context window and passed to any OpenAI-compatible chat model. ## How to Use Install the required packages: ```bash pip install torch sentence-transformers vllm openai pyyaml loguru numpy pandas ``` Clone the repository and run the three-stage pipeline: ```bash git clone https://github.com/plageon/MemSifter.git cd MemSifter ``` ```python import json from memsifter.toolkit import SessionEmbedder, SessionRanker, LLMChat # Load one sample with open("data/test_memory.json") as f: entry = json.load(f)[0] question = entry["question"] haystack_sessions = entry["haystack_sessions"] haystack_dates = entry["haystack_dates"] # Initialise models embedder = SessionEmbedder(model_path="models/bge-m3", device="cuda:0") ranker = SessionRanker( model_path="models/zstanjj/MemSifter-4B-Thinking", device="cuda:1", ) chat = LLMChat(api_key="YOUR_KEY", base_url="YOUR_BASE_URL", model_name="YOUR_MODEL") # Stage 1 — embedding pre-filter top_sessions = embedder.get_top_sessions( question=question, sessions=haystack_sessions, dates=haystack_dates, top_k=20 ) # Stage 2 — generative reranking ranked_sessions = ranker.rerank( question=question, pre_ranked_sessions=top_sessions, top_k=5 ) # Stage 3 — LLM answer predicted_answer = chat.answer(question=question, ranked_sessions=ranked_sessions) print(predicted_answer) ``` The `SessionRanker` uses [vLLM](https://github.com/vllm-project/vllm) for inference. It outputs a chain-of-thought inside `...` tags followed by a comma-separated session ranking inside `...` tags. ## Training MemSifter-4B-Thinking is fine-tuned from **Qwen3-4B** using the **DAPO** reinforcement learning algorithm with a **task reward** formulation that combines: - **Marginal Utility Reward** — rewards selecting sessions with diminishing redundancy. - **Rank-Sensitive Reward** — rewards placing the most relevant sessions at higher positions. Training data is bootstrapped from the MemSifter embedding pipeline on multiple conversational memory benchmarks (LoCoMo, LongMemEval, PersonaMem, etc.), using NDCG-based anchor sampling to construct RL training trajectories. ## Citation ```bibtex @misc{memsifter, title={MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning}, author={Jiejun Tan and Zhicheng Dou and Liancheng Zhang and Yuyang Hu and Yiruo Cheng and Ji-Rong Wen}, year={2026}, eprint={2603.03379}, archivePrefix={arXiv}, primaryClass={cs.IR}, url={https://arxiv.org/abs/2603.03379}, } ```