| | --- |
| | license: mit |
| | language: |
| | - en |
| | - zh |
| | base_model: |
| | - Qwen/Qwen3-4B-Thinking-2507 |
| | --- |
| | |
| | # MemSifter-4B-Thinking |
| |
|
| | **MemSifter** is a lightweight generative session ranker trained with DAPO reinforcement learning. It serves as the core retrieval component of the [MemSifter](https://github.com/plageon/MemSifter) system—an LLM memory retrieval offloading framework based on outcome-driven proxy reasoning. |
| |
|
| | Given a user query and a set of candidate conversation sessions (pre-filtered by a dense embedding model), MemSifter-4B-Thinking performs fine-grained reranking to surface the sessions most relevant to the query, which are then passed as context to a downstream chat LLM. |
| |
|
| | ## System Pipeline |
| |
|
| | ``` |
| | Session Embedding → Session Ranking (MemSifter) → Chat LLM |
| | (bge-m3) (generative reranker) (any LLM) |
| | ``` |
| |
|
| | 1. **Session Embedding** — bge-m3 performs a coarse similarity pre-filter across all sessions. |
| | 2. **Session Ranking** — MemSifter (this model) performs fine-grained reranking of the pre-filtered candidates. |
| | 3. **Chat LLM** — the top-ranked sessions are assembled into a context window and passed to any OpenAI-compatible chat model. |
| |
|
| | ## How to Use |
| |
|
| | Install the required packages: |
| |
|
| | ```bash |
| | pip install torch sentence-transformers vllm openai pyyaml loguru numpy pandas |
| | ``` |
| |
|
| | Clone the repository and run the three-stage pipeline: |
| |
|
| | ```bash |
| | git clone https://github.com/plageon/MemSifter.git |
| | cd MemSifter |
| | ``` |
| |
|
| | ```python |
| | import json |
| | from memsifter.toolkit import SessionEmbedder, SessionRanker, LLMChat |
| | |
| | # Load one sample |
| | with open("data/test_memory.json") as f: |
| | entry = json.load(f)[0] |
| | |
| | question = entry["question"] |
| | haystack_sessions = entry["haystack_sessions"] |
| | haystack_dates = entry["haystack_dates"] |
| | |
| | # Initialise models |
| | embedder = SessionEmbedder(model_path="models/bge-m3", device="cuda:0") |
| | ranker = SessionRanker( |
| | model_path="models/zstanjj/MemSifter-4B-Thinking", |
| | device="cuda:1", |
| | ) |
| | chat = LLMChat(api_key="YOUR_KEY", base_url="YOUR_BASE_URL", model_name="YOUR_MODEL") |
| | |
| | # Stage 1 — embedding pre-filter |
| | top_sessions = embedder.get_top_sessions( |
| | question=question, sessions=haystack_sessions, dates=haystack_dates, top_k=20 |
| | ) |
| | |
| | # Stage 2 — generative reranking |
| | ranked_sessions = ranker.rerank( |
| | question=question, pre_ranked_sessions=top_sessions, top_k=5 |
| | ) |
| | |
| | # Stage 3 — LLM answer |
| | predicted_answer = chat.answer(question=question, ranked_sessions=ranked_sessions) |
| | print(predicted_answer) |
| | ``` |
| |
|
| | The `SessionRanker` uses [vLLM](https://github.com/vllm-project/vllm) for inference. It outputs a chain-of-thought inside `<think>...</think>` tags followed by a comma-separated session ranking inside `<ranking>...</ranking>` tags. |
| |
|
| | ## Training |
| |
|
| | MemSifter-4B-Thinking is fine-tuned from **Qwen3-4B** using the **DAPO** reinforcement learning algorithm with a **task reward** formulation that combines: |
| |
|
| | - **Marginal Utility Reward** — rewards selecting sessions with diminishing redundancy. |
| | - **Rank-Sensitive Reward** — rewards placing the most relevant sessions at higher positions. |
| |
|
| | Training data is bootstrapped from the MemSifter embedding pipeline on multiple conversational memory benchmarks (LoCoMo, LongMemEval, PersonaMem, etc.), using NDCG-based anchor sampling to construct RL training trajectories. |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{memsifter, |
| | title={MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning}, |
| | author={Jiejun Tan and Zhicheng Dou and Liancheng Zhang and Yuyang Hu and Yiruo Cheng and Ji-Rong Wen}, |
| | year={2026}, |
| | eprint={2603.03379}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.IR}, |
| | url={https://arxiv.org/abs/2603.03379}, |
| | } |
| | ``` |