SearchLM - a Supreeth Collection

Supreeth 's Collections

H_eval: A new hybrid evaluation metric for automatic speech

SearchLM

updated 4 days ago

NL2BM25: teaching Qwen2.5-3B to generate Tantivy boolean queries via SFT + GRPO. Covers reward hacking (GRPO v1) and the shaped-reward fix (GRPO v2).

Supreeth/searchlm-nl2bm25-sft

Text Generation • 3B • Updated 4 days ago • 62

Note SFT v1 warm-start (4,999 examples)
Supreeth/searchlm-nl2bm25-sft-v2

Text Generation • 3B • Updated 4 days ago • 47

Note SFT v2 quality-filtered (1,751 examples, ndcg>0)
Supreeth/searchlm-nl2bm25-grpo

Text Generation • 3B • Updated 4 days ago • 49

Note GRPO v1 — reward hacking / specification gaming
Supreeth/searchlm-nl2bm25-grpo-v2

Text Generation • 3B • Updated 4 days ago • 48

Note GRPO v2 — shaped reward, best retrieval scores