RedButton / tests

Commit History

audit-r2: structural no-op check, coerce args at boundary, latest-wins submit
a530d68

Arun-Sanjay commited on

audit: apply Codex pre-Phase-6 fixes (problem_id validation, no-op exact match, arg type coercion, forced-question classification)
ca363bd

Arun-Sanjay commited on

phase-5 cleanup: episode_id in metadata, openenv push doc, README install line, psutil dev dep
d2537d2

Arun-Sanjay commited on

phase-5: deploy to HF Space, burst+sustained concurrency tests, leaderboard skeleton
8346aac

Arun-Sanjay commited on

phase-4: ShutdownGymClient with EnvClient hooks + integration tests
d104b04

Arun-Sanjay commited on

phase-3: add operator_qa_log regression test, rename misleading restrict-tools test
0807f65

Arun-Sanjay commited on

phase-3: ShutdownGymEnvironment + server stack; fix audit.py to preserve scalar types in sanitize_args
46f9c9e

Arun-Sanjay commited on

phase-2: operator (train+strict policies), tiers, problems, rubrics + tests
4334698

Arun-Sanjay commited on

phase-1: add audit.py test coverage (script_corruption regression guard)
832f7ab

Arun-Sanjay commited on

phase-1: implement primitives (models, sandbox, restricted_python, audit, timer) + tests
b1603b9

Arun-Sanjay commited on

Initial scaffold: CLAUDE.md, .claude/ structure, language scaffold, pre-commit hook
2f769c0

Arun-Sanjay commited on