alireza7/GrepSeek-Qwen3.5-9B-GRPO
Text Generation • 9B • Updated • 120
Direct Corpus Interaction search agent: searches a raw corpus with shell commands (no index). Cold-start SFT + GRPO.
Note Final GrepSeek model (cold-start SFT + GRPO).
Note Cold-start SFT policy (initialization for RL).
Note 10k cold-start SFT trajectories (5k NQ + 5k HotpotQA).
Note Wikipedia corpus the agent searches (21M passages; external).