Commit History

Fix: Add missing logger import in environment.py
6b7794e

Somuai12 commited on

Restructure README to required format: overview, spaces, tasks, setup, baseline
f2195b2

Somuai12 commited on

Fix: clamp scores to strict (0.001, 0.999) β€” validator rejects exact 0 and 1
95a7dc0

Somuai12 commited on

Add smoke & exploit test suite β€” 27/27 pass
e4f6b1d

Somuai12 commited on

Audit fixes: tests/ dir, clean imports, reactive corpus, README polish
70f8688

Somuai12 commited on

Add multi-episode verification script
7660535

Somuai12 commited on

Add ICL terminal verification script β€” all 3 tasks pass
89fc53c

Somuai12 commited on

Remove binary PNG for HF Spaces compatibility
022d875

Somuai12 commited on

Staff-Level Upgrade: Segmented Evaluation, Noise Filtering, and Task Hardening
4553b37

Somuai12 commited on

Implement profound exploit hardening (InstructionGuard, DensityCheck, LogicalAlignment, Step-Locking)
a9f749a

Somuai12 commited on

Update docs and reward progression plot
28e7c64

Somuai12 commited on

Apply bug fixes over Grader logic per evaluation guidelines
147cdc4

Somuai12 commited on

Enhance: Upgrade test suite to professional simulation showing clear reward shaping
5453275

Somuai12 commited on

Fix grading keys mismatch: allow actual dataset metrics to be graded
184bef3

Somuai12 commited on

Harden: skip wildcard models, make LLM errors non-fatal per step
b4f91f0

Somuai12 commited on

Fix model discovery: skip wildcard '*' model IDs from LiteLLM proxy
9c3ced0

Somuai12 commited on

Fix Gradio dashboard hang: restore module-level mounting (required for queue/WebSocket)
9e34c41

Somuai12 commited on

Fix MODEL_NAME=None + Fix Gradio dashboard slowness (remove auto-reset on tab/radio)
8eede32

Somuai12 commited on

Fix MODEL_NAME=None: auto-discover from proxy /models endpoint, fallback to gpt-4o-mini
79fb14b

Somuai12 commited on

Final Submission: Aligned ports (8000), synchronized README, and purged workspace logs/caches
dd5366d

Somuai12 commited on

Critical Fix: Align internal port to 8000 to satisfy OpenEnv library requirements
47a298a

Somuai12 commited on

Compliance Fix: Resolver setup timeout with lazy Gradio and extended 120s wait
6a19dc6

Somuai12 commited on

Fix proxy test: exit with 1 on API failure so validator sees the error; fallback to HF_TOKEN if API_KEY is empty
899c12a

Somuai12 commited on

Compliance Hardening: Remove silent fallbacks to force proxy usage
292424c

Somuai12 commited on

Final Polish: Task-aware fallbacks and surgical refinement
89d39f7

Somuai12 commited on

Compliance alignment: satisfy strict /health checker
c8aa313

Somuai12 commited on

Compliance fix: strictly use API_KEY and API_BASE_URL to avoid proxy bypass
09a9c72

Somuai12 commited on

Allow pip to resolve websockets by relaxing gradio and uvicorn pins
5abef36

Somuai12 commited on

Final HF fix: upgrade openenv-core 0.2.3 and pin huggingface_hub
82a3c1b

Somuai12 commited on

Fix HF Runtime error: pin huggingface_hub<0.26.0
7b7b896

Somuai12 commited on

Final fix for Docker registry failures: use stable python 3.12 and pin dependencies
7c9ac02

Somuai12 commited on

Fix structured output: ensure logging always runs and format matches validator
9cdb062

Somuai12 commited on

Fix validator pipeline: python 3.12, grader POST, websockets
b978fbd

Somuai12 commited on

Fix Docker build: use python:3.11-slim-bookworm for stable registry resolution
75c1656

Somuai12 commited on

Fix inference.py: async OpenEnv pattern, from_docker_image, proper error handling
4c68ece

Somuai12 commited on

Update strategic progression chart for 0.9+ baseline metrics
29f77f6

Somuai12 commited on

Detail grading rewards and penalties in README
74e5e1d

Somuai12 commited on

Remove emojis from README and track reward_progression image
82f6517

Somuai12 commited on

Add detailed explanations for Easy, Medium, and Hard tasks
1ad2a1f

Somuai12 commited on

Final Expert Tier (0.9+) Candidate β€” Groq Baseline Verified
511f04a

Somuai12 commited on

fix: ensure reward evolution chart has (0,0) baseline for judge visibility
8085f66

Somuai12 commited on

fix: restore missing policy_md definition in format_obs
a5522cd

Somuai12 commited on

fix: resolve Gradio 6.x LinePlot TypeError and constructor warnings
199d538

Somuai12 commited on

feat: add reward evolution chart to Gradio dashboard
d78cfdc

Somuai12 commited on

final: comprehensive 0.9+ strategic agent upgrades and infrastructure refactor
933baa6

Somuai12 commited on

deploy: remove binary from git history for HF compatibility, use GitHub raw URL instead
c5ca7a0

Somuai12 commited on

final: polish README (RLVR specs) and cleanup scratch scripts
ef5751d

Somuai12 commited on

hackathon: final submission candidate (removes binary image for HF compatibility)
6aa8acb

Somuai12 commited on

feat: Absolutist deployment of professional Judges Console
706dca3

Somuai12 commited on

build: Trigger forced rebuild for Port-7860 alignment
7470e60

Somuai12 commited on