Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Spaces:
Nomearod
/
agentbench
Running

App Files Files Community
Fetching metadata from the HF Docker repository...
agentbench / scripts /_dev
34.7 kB
Ctrl+K
Ctrl+K
  • 4 contributors
History: 4 commits
Nomearod's picture
Nomearod
calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific
504a35c 1 day ago
  • generate_kappa_fixtures.py
    2.94 kB
    test(calibration): sklearn-parity fixtures + cross-check CI test 3 days ago
  • probe_3a_paraphrase_recency.py
    5.58 kB
    calibrate(jury): v1.1+v1.1.1 β€” fix weighting bugs; recency-position paraphrase clause 1 day ago
  • probe_4a_gpt4o_full.py
    6.11 kB
    calibrate(jury): 4A characterizes v1.1.1 residual as model-class-specific 1 day ago
  • reaggregate_jury_v1_1.py
    11.6 kB
    calibrate(jury): v1.1+v1.1.1 β€” fix weighting bugs; recency-position paraphrase clause 1 day ago
  • rerun_completeness_v1_1_1.py
    4.62 kB
    calibrate(jury): v1.1+v1.1.1 β€” fix weighting bugs; recency-position paraphrase clause 1 day ago
  • sample_calibration_v1.py
    3.89 kB
    feat(calibration): 30-item stratified calibration_v1 sample 3 days ago