quinn
jwhe
·
AI & ML interests
None yet
Recent Activity
new activity about 1 month ago
harborframework/parity-experiments:[Parity] CL-bench: codex/gpt-5.1 vs original pipeline (50 tasks, 3 trials) authored a paper 3 months ago
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks