Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Spaces:
Nomearod
/
agentbench
like
0
Running
App
Files
Files
Community
Fetching metadata from the HF Docker repository...
main
agentbench
/
scripts
/
_dev
34.7 kB
Ctrl+K
Ctrl+K
4 contributors
History:
4 commits
Nomearod
calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific
504a35c
1 day ago
generate_kappa_fixtures.py
Safe
2.94 kB
test(calibration): sklearn-parity fixtures + cross-check CI test
3 days ago
probe_3a_paraphrase_recency.py
Safe
5.58 kB
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
1 day ago
probe_4a_gpt4o_full.py
Safe
6.11 kB
calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific
1 day ago
reaggregate_jury_v1_1.py
Safe
11.6 kB
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
1 day ago
rerun_completeness_v1_1_1.py
Safe
4.62 kB
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
1 day ago
sample_calibration_v1.py
Safe
3.89 kB
feat(calibration): 30-item stratified calibration_v1 sample
3 days ago