anonym-ous commited on
Commit
da74f71
·
verified ·
1 Parent(s): 5b104f6

add session eval artifacts (base-only, certified judge, ablations)

Browse files
eval/bm25-anchor-tfilter.judged.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/bm25-anchor.judged.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/bm25-rag-qwen3-matched.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/bm25-rag-qwen3.judged-certified.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/multitq-baseonly.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/multitq-llama-baseonly.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/multitq-llama-seed1337.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/multitq-llama-seed7.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/multitq-mistral-baseonly.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/multitq-mistral-seed1337.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/multitq-mistral-seed7.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v2-grpo-10000.judged-certified.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-baseonly-fullret.extracted.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-baseonly-fullret.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-baseonly-fullret.judged.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-csvorder-seed42.extracted.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-csvorder-seed42.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-sft-baseline.extracted-triples.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-sft-baseline.judged-certified.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-sft-seed1337.judged-certified.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-sft-seed7.judged-certified.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-sft-terse-lever-seed1337.judged-certified.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-sft-terse-lever-seed7.judged-certified.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-sft-terse-lever.judged-certified.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-trunc-bm25-seed1337.extracted.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-trunc-bm25-seed1337.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-trunc-bm25-seed42.extracted.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-trunc-bm25-seed42.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-trunc-bm25-seed7.extracted.json ADDED
The diff for this file is too large to render. See raw diff
 
eval/v3-trunc-bm25-seed7.json ADDED
The diff for this file is too large to render. See raw diff