Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Spaces:
Nomearod
/
agentbench
like
0
Running
App
Files
Files
Community
Fetching metadata from the HF Docker repository...
main
agentbench
/
scripts
/
_dev
34.7 kB
Ctrl+K
Ctrl+K
4 contributors
History:
4 commits
Nomearod
calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific
504a35c
about 2 months ago
generate_kappa_fixtures.py
2.94 kB
test(calibration): sklearn-parity fixtures + cross-check CI test
about 2 months ago
probe_3a_paraphrase_recency.py
5.58 kB
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
about 2 months ago
probe_4a_gpt4o_full.py
6.11 kB
calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific
about 2 months ago
reaggregate_jury_v1_1.py
11.6 kB
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
about 2 months ago
rerun_completeness_v1_1_1.py
4.62 kB
calibrate(jury): v1.1+v1.1.1 β fix weighting bugs; recency-position paraphrase clause
about 2 months ago
sample_calibration_v1.py
3.89 kB
feat(calibration): 30-item stratified calibration_v1 sample
about 2 months ago