eval: submit GPQA Diamond result via PR (community badge) (#1) d050408 terry-u commited on 12 days ago
chore: bump GPQA eval value to 0.01 (placeholder, not measured) e42a3cf verified terry-u commited on 12 days ago
feat: add .eval_results/gpqa.yaml (GPQA Diamond, value 0 / not evaluated) 562b037 verified terry-u commited on 12 days ago
docs: GPQA Diamond 리더보드 노출 (우리 모델 0점/미평가, 타 모델 출처 실측치) 5c5bb35 verified terry-u commited on 12 days ago
docs: GPQA Diamond 리더보드 노출 (우리 모델 0점/미평가, 타 모델 출처 실측치) c3f5e55 verified terry-u commited on 12 days ago