Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

Spaces:
shutdowngym
/
RedButton-v2
Sleeping

App Files Files Community
Fetching metadata from the HF Docker repository...
RedButton-v2 / tests
57.2 kB
Ctrl+K
Ctrl+K
  • 2 contributors
History: 3 commits
Hugging557444's picture
Hugging557444
V2-C: three-agent environment + client + demo
449736a 2 months ago
  • __init__.py
    0 Bytes
    V2-A: bootstrap RedButton v2 with v1 verbatim reusables 2 months ago
  • test_audit_v2.py
    8.54 kB
    V2-B: core v2 modules + tests (auditor, deception, sandbagging) 2 months ago
  • test_auditor.py
    3.36 kB
    V2-B: core v2 modules + tests (auditor, deception, sandbagging) 2 months ago
  • test_environment_v2.py
    11.1 kB
    V2-C: three-agent environment + client + demo 2 months ago
  • test_failure_modes.py
    6.34 kB
    V2-C: three-agent environment + client + demo 2 months ago
  • test_models_v2.py
    1.46 kB
    V2-B: core v2 modules + tests (auditor, deception, sandbagging) 2 months ago
  • test_operator.py
    4.98 kB
    V2-B: core v2 modules + tests (auditor, deception, sandbagging) 2 months ago
  • test_problems.py
    2.64 kB
    V2-A: bootstrap RedButton v2 with v1 verbatim reusables 2 months ago
  • test_restricted_python.py
    3.72 kB
    V2-A: bootstrap RedButton v2 with v1 verbatim reusables 2 months ago
  • test_rubrics_v2.py
    6.25 kB
    V2-B: core v2 modules + tests (auditor, deception, sandbagging) 2 months ago
  • test_sandbox.py
    4.56 kB
    V2-A: bootstrap RedButton v2 with v1 verbatim reusables 2 months ago
  • test_tiers_v2.py
    2.08 kB
    V2-B: core v2 modules + tests (auditor, deception, sandbagging) 2 months ago
  • test_timer.py
    2.24 kB
    V2-A: bootstrap RedButton v2 with v1 verbatim reusables 2 months ago