Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

Spaces:
TheUnicat
/
AgentClassroom
Running

App Files Files Community
Fetching metadata from the HF Docker repository...
AgentClassroom / environments /teachingbench /teachingbench
Ctrl+K
Ctrl+K
  • 1 contributor
History: 5 commits
TheUnicat's picture
TheUnicat
fix: re-judge experiment rollouts after credit-exhaustion; retry logic in batched + single-criterion judge
8410720 verified 13 days ago
  • grader
    fix: re-judge experiment rollouts after credit-exhaustion; retry logic in batched + single-criterion judge 13 days ago
  • student
    deploy 15 days ago
  • tasks
    restore spivak PDF + 5 spivak tasks 15 days ago
  • tools
    deploy 15 days ago
  • __init__.py
    79 Bytes
    deploy 15 days ago
  • _reliability_helpers.py
    4.03 kB
    deploy 15 days ago
  • batch_judge.py
    12.9 kB
    deploy 15 days ago
  • batch_judge_recover.py
    5.76 kB
    deploy 15 days ago
  • batch_rollout.py
    8.36 kB
    deploy 15 days ago
  • batched_rollout.py
    23.1 kB
    deploy 15 days ago
  • client_utils.py
    1.99 kB
    deploy 15 days ago
  • dataset.py
    11.8 kB
    deploy 15 days ago
  • env.py
    14.1 kB
    feat: V₀=0.5 baseline + det ceilings, prompt-pill UI, 60 experiment rollouts 13 days ago
  • inter_task_check.py
    6.94 kB
    deploy 15 days ago
  • judge_reliability_check.py
    9.88 kB
    deploy 15 days ago
  • multimodal.py
    5.05 kB
    deploy 15 days ago
  • prompts.py
    18.9 kB
    deploy 15 days ago
  • smoke_test.py
    7.94 kB
    deploy 15 days ago
  • task_reliability_check.py
    8 kB
    deploy 15 days ago
  • teacher_prompts.py
    9.48 kB
    feat: V₀=0.5 baseline + det ceilings, prompt-pill UI, 60 experiment rollouts 13 days ago