MemSifter-4B-Thinking

MemSifter is a lightweight generative session ranker trained with DAPO reinforcement learning. It serves as the core retrieval component of the MemSifter system—an LLM memory retrieval offloading framework based on outcome-driven proxy reasoning.

Given a user query and a set of candidate conversation sessions (pre-filtered by a dense embedding model), MemSifter-4B-Thinking performs fine-grained reranking to surface the sessions most relevant to the query, which are then passed as context to a downstream chat LLM.

System Pipeline

Session Embedding  →  Session Ranking (MemSifter)  →  Chat LLM
   (bge-m3)           (generative reranker)          (any LLM)

Session Embedding — bge-m3 performs a coarse similarity pre-filter across all sessions.
Session Ranking — MemSifter (this model) performs fine-grained reranking of the pre-filtered candidates.
Chat LLM — the top-ranked sessions are assembled into a context window and passed to any OpenAI-compatible chat model.

How to Use

Install the required packages:

pip install torch sentence-transformers vllm openai pyyaml loguru numpy pandas

Clone the repository and run the three-stage pipeline:

git clone https://github.com/plageon/MemSifter.git
cd MemSifter

import json
from memsifter.toolkit import SessionEmbedder, SessionRanker, LLMChat

# Load one sample
with open("data/test_memory.json") as f:
    entry = json.load(f)[0]

question             = entry["question"]
haystack_sessions    = entry["haystack_sessions"]
haystack_dates       = entry["haystack_dates"]

# Initialise models
embedder = SessionEmbedder(model_path="models/bge-m3", device="cuda:0")
ranker   = SessionRanker(
    model_path="models/zstanjj/MemSifter-4B-Thinking",
    device="cuda:1",
)
chat = LLMChat(api_key="YOUR_KEY", base_url="YOUR_BASE_URL", model_name="YOUR_MODEL")

# Stage 1 — embedding pre-filter
top_sessions = embedder.get_top_sessions(
    question=question, sessions=haystack_sessions, dates=haystack_dates, top_k=20
)

# Stage 2 — generative reranking
ranked_sessions = ranker.rerank(
    question=question, pre_ranked_sessions=top_sessions, top_k=5
)

# Stage 3 — LLM answer
predicted_answer = chat.answer(question=question, ranked_sessions=ranked_sessions)
print(predicted_answer)

The SessionRanker uses vLLM for inference. It outputs a chain-of-thought inside <think>...</think> tags followed by a comma-separated session ranking inside <ranking>...</ranking> tags.

Training

MemSifter-4B-Thinking is fine-tuned from Qwen3-4B using the DAPO reinforcement learning algorithm with a task reward formulation that combines:

Marginal Utility Reward — rewards selecting sessions with diminishing redundancy.
Rank-Sensitive Reward — rewards placing the most relevant sessions at higher positions.

Training data is bootstrapped from the MemSifter embedding pipeline on multiple conversational memory benchmarks (LoCoMo, LongMemEval, PersonaMem, etc.), using NDCG-based anchor sampling to construct RL training trajectories.

Citation

@misc{memsifter,
      title={MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning}, 
      author={Jiejun Tan and Zhicheng Dou and Liancheng Zhang and Yuyang Hu and Yiruo Cheng and Ji-Rong Wen},
      year={2026},
      eprint={2603.03379},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2603.03379}, 
}

Downloads last month: 8

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zstanjj/MemSifter-4B-Thinking

Base model

Qwen/Qwen3-4B-Thinking-2507

Finetuned

(249)

this model

Quantizations

1 model

Paper for zstanjj/MemSifter-4B-Thinking

MemSifter: Offloading LLM Memory Retrieval via Outcome-Driven Proxy Reasoning

Paper • 2603.03379 • Published Mar 3 • 32