Spaces:

luciferai-devil
/

devil-policyevolverenv

Sleeping

App Files Files Community

devil-policyevolverenv

Commit History

Fix: Add missing logger import in environment.py

6b7794e

Somuai12 commited on Apr 12

Restructure README to required format: overview, spaces, tasks, setup, baseline

f2195b2

Somuai12 commited on Apr 12

Fix: clamp scores to strict (0.001, 0.999) — validator rejects exact 0 and 1

95a7dc0

Somuai12 commited on Apr 10

Add smoke & exploit test suite — 27/27 pass

e4f6b1d

Somuai12 commited on Apr 10

Audit fixes: tests/ dir, clean imports, reactive corpus, README polish

70f8688

Somuai12 commited on Apr 10

Add multi-episode verification script

7660535

Somuai12 commited on Apr 10

Add ICL terminal verification script — all 3 tasks pass

89fc53c

Somuai12 commited on Apr 10

Remove binary PNG for HF Spaces compatibility

022d875

Somuai12 commited on Apr 10

Staff-Level Upgrade: Segmented Evaluation, Noise Filtering, and Task Hardening

4553b37

Somuai12 commited on Apr 10

Implement profound exploit hardening (InstructionGuard, DensityCheck, LogicalAlignment, Step-Locking)

a9f749a

Somuai12 commited on Apr 9

Update docs and reward progression plot

28e7c64

Somuai12 commited on Apr 8

Apply bug fixes over Grader logic per evaluation guidelines

147cdc4

Somuai12 commited on Apr 8

Enhance: Upgrade test suite to professional simulation showing clear reward shaping

5453275

Somuai12 commited on Apr 8

Fix grading keys mismatch: allow actual dataset metrics to be graded

184bef3

Somuai12 commited on Apr 8

Harden: skip wildcard models, make LLM errors non-fatal per step

b4f91f0

Somuai12 commited on Apr 7

Fix model discovery: skip wildcard '*' model IDs from LiteLLM proxy

9c3ced0

Somuai12 commited on Apr 7

Fix Gradio dashboard hang: restore module-level mounting (required for queue/WebSocket)

9e34c41

Somuai12 commited on Apr 7

Fix MODEL_NAME=None + Fix Gradio dashboard slowness (remove auto-reset on tab/radio)

8eede32

Somuai12 commited on Apr 7

Fix MODEL_NAME=None: auto-discover from proxy /models endpoint, fallback to gpt-4o-mini

79fb14b

Somuai12 commited on Apr 7

Final Submission: Aligned ports (8000), synchronized README, and purged workspace logs/caches

dd5366d

Somuai12 commited on Apr 7

Critical Fix: Align internal port to 8000 to satisfy OpenEnv library requirements

47a298a

Somuai12 commited on Apr 7

Compliance Fix: Resolver setup timeout with lazy Gradio and extended 120s wait

6a19dc6

Somuai12 commited on Apr 7

Fix proxy test: exit with 1 on API failure so validator sees the error; fallback to HF_TOKEN if API_KEY is empty

899c12a

Somuai12 commited on Apr 7

Compliance Hardening: Remove silent fallbacks to force proxy usage

292424c

Somuai12 commited on Apr 7

Final Polish: Task-aware fallbacks and surgical refinement

89d39f7

Somuai12 commited on Apr 7

Compliance alignment: satisfy strict /health checker

c8aa313

Somuai12 commited on Apr 7

Compliance fix: strictly use API_KEY and API_BASE_URL to avoid proxy bypass

09a9c72

Somuai12 commited on Apr 7

Allow pip to resolve websockets by relaxing gradio and uvicorn pins

5abef36

Somuai12 commited on Apr 7

Final HF fix: upgrade openenv-core 0.2.3 and pin huggingface_hub

82a3c1b

Somuai12 commited on Apr 7

Fix HF Runtime error: pin huggingface_hub<0.26.0

7b7b896

Somuai12 commited on Apr 7

Final fix for Docker registry failures: use stable python 3.12 and pin dependencies

7c9ac02

Somuai12 commited on Apr 7

Fix structured output: ensure logging always runs and format matches validator

9cdb062

Somuai12 commited on Apr 7

Fix validator pipeline: python 3.12, grader POST, websockets

b978fbd

Somuai12 commited on Apr 7

Fix Docker build: use python:3.11-slim-bookworm for stable registry resolution

75c1656

Somuai12 commited on Apr 7

Fix inference.py: async OpenEnv pattern, from_docker_image, proper error handling

4c68ece

Somuai12 commited on Apr 7

Update strategic progression chart for 0.9+ baseline metrics

29f77f6

Somuai12 commited on Apr 6

Detail grading rewards and penalties in README

74e5e1d

Somuai12 commited on Apr 6

Remove emojis from README and track reward_progression image

82f6517

Somuai12 commited on Apr 6

Add detailed explanations for Easy, Medium, and Hard tasks

1ad2a1f

Somuai12 commited on Apr 6

Final Expert Tier (0.9+) Candidate — Groq Baseline Verified

511f04a

Somuai12 commited on Apr 6

fix: ensure reward evolution chart has (0,0) baseline for judge visibility

8085f66

Somuai12 commited on Apr 5

fix: restore missing policy_md definition in format_obs

a5522cd

Somuai12 commited on Apr 5

fix: resolve Gradio 6.x LinePlot TypeError and constructor warnings

199d538

Somuai12 commited on Apr 5

feat: add reward evolution chart to Gradio dashboard

d78cfdc

Somuai12 commited on Apr 5

final: comprehensive 0.9+ strategic agent upgrades and infrastructure refactor

933baa6

Somuai12 commited on Apr 5

deploy: remove binary from git history for HF compatibility, use GitHub raw URL instead

c5ca7a0

Somuai12 commited on Apr 3

final: polish README (RLVR specs) and cleanup scratch scripts

ef5751d

Somuai12 commited on Apr 3

hackathon: final submission candidate (removes binary image for HF compatibility)

6aa8acb

Somuai12 commited on Apr 3

feat: Absolutist deployment of professional Judges Console

706dca3

Somuai12 commited on Mar 31

build: Trigger forced rebuild for Port-7860 alignment

7470e60

Somuai12 commited on Mar 31

Commit History

Fix: Add missing logger import in environment.py 6b7794e

Restructure README to required format: overview, spaces, tasks, setup, baseline f2195b2

Fix: clamp scores to strict (0.001, 0.999) — validator rejects exact 0 and 1 95a7dc0

Add smoke & exploit test suite — 27/27 pass e4f6b1d

Audit fixes: tests/ dir, clean imports, reactive corpus, README polish 70f8688

Add multi-episode verification script 7660535

Add ICL terminal verification script — all 3 tasks pass 89fc53c

Remove binary PNG for HF Spaces compatibility 022d875

Staff-Level Upgrade: Segmented Evaluation, Noise Filtering, and Task Hardening 4553b37

Implement profound exploit hardening (InstructionGuard, DensityCheck, LogicalAlignment, Step-Locking) a9f749a

Update docs and reward progression plot 28e7c64

Apply bug fixes over Grader logic per evaluation guidelines 147cdc4

Enhance: Upgrade test suite to professional simulation showing clear reward shaping 5453275

Fix grading keys mismatch: allow actual dataset metrics to be graded 184bef3

Harden: skip wildcard models, make LLM errors non-fatal per step b4f91f0

Fix model discovery: skip wildcard '*' model IDs from LiteLLM proxy 9c3ced0

Fix Gradio dashboard hang: restore module-level mounting (required for queue/WebSocket) 9e34c41

Fix MODEL_NAME=None + Fix Gradio dashboard slowness (remove auto-reset on tab/radio) 8eede32

Fix MODEL_NAME=None: auto-discover from proxy /models endpoint, fallback to gpt-4o-mini 79fb14b

Final Submission: Aligned ports (8000), synchronized README, and purged workspace logs/caches dd5366d

Critical Fix: Align internal port to 8000 to satisfy OpenEnv library requirements 47a298a

Compliance Fix: Resolver setup timeout with lazy Gradio and extended 120s wait 6a19dc6

Fix proxy test: exit with 1 on API failure so validator sees the error; fallback to HF_TOKEN if API_KEY is empty 899c12a

Compliance Hardening: Remove silent fallbacks to force proxy usage 292424c

Final Polish: Task-aware fallbacks and surgical refinement 89d39f7

Compliance alignment: satisfy strict /health checker c8aa313

Compliance fix: strictly use API_KEY and API_BASE_URL to avoid proxy bypass 09a9c72

Allow pip to resolve websockets by relaxing gradio and uvicorn pins 5abef36

Final HF fix: upgrade openenv-core 0.2.3 and pin huggingface_hub 82a3c1b

Fix HF Runtime error: pin huggingface_hub<0.26.0 7b7b896

Final fix for Docker registry failures: use stable python 3.12 and pin dependencies 7c9ac02

Fix structured output: ensure logging always runs and format matches validator 9cdb062

Fix validator pipeline: python 3.12, grader POST, websockets b978fbd

Fix Docker build: use python:3.11-slim-bookworm for stable registry resolution 75c1656

Fix inference.py: async OpenEnv pattern, from_docker_image, proper error handling 4c68ece

Update strategic progression chart for 0.9+ baseline metrics 29f77f6

Detail grading rewards and penalties in README 74e5e1d

Remove emojis from README and track reward_progression image 82f6517

Add detailed explanations for Easy, Medium, and Hard tasks 1ad2a1f

Final Expert Tier (0.9+) Candidate — Groq Baseline Verified 511f04a

fix: ensure reward evolution chart has (0,0) baseline for judge visibility 8085f66

fix: restore missing policy_md definition in format_obs a5522cd

fix: resolve Gradio 6.x LinePlot TypeError and constructor warnings 199d538

feat: add reward evolution chart to Gradio dashboard d78cfdc

final: comprehensive 0.9+ strategic agent upgrades and infrastructure refactor 933baa6

deploy: remove binary from git history for HF compatibility, use GitHub raw URL instead c5ca7a0

final: polish README (RLVR specs) and cleanup scratch scripts ef5751d

hackathon: final submission candidate (removes binary image for HF compatibility) 6aa8acb

feat: Absolutist deployment of professional Judges Console 706dca3

build: Trigger forced rebuild for Port-7860 alignment 7470e60

Fix: Add missing logger import in environment.py

6b7794e

Restructure README to required format: overview, spaces, tasks, setup, baseline

f2195b2

Fix: clamp scores to strict (0.001, 0.999) — validator rejects exact 0 and 1

95a7dc0

Add smoke & exploit test suite — 27/27 pass

e4f6b1d

Audit fixes: tests/ dir, clean imports, reactive corpus, README polish

70f8688

Add multi-episode verification script

7660535

Add ICL terminal verification script — all 3 tasks pass

89fc53c

Remove binary PNG for HF Spaces compatibility

022d875

Staff-Level Upgrade: Segmented Evaluation, Noise Filtering, and Task Hardening

4553b37

Implement profound exploit hardening (InstructionGuard, DensityCheck, LogicalAlignment, Step-Locking)

a9f749a

Update docs and reward progression plot

28e7c64

Apply bug fixes over Grader logic per evaluation guidelines

147cdc4

Enhance: Upgrade test suite to professional simulation showing clear reward shaping

5453275

Fix grading keys mismatch: allow actual dataset metrics to be graded

184bef3

Harden: skip wildcard models, make LLM errors non-fatal per step

b4f91f0

Fix model discovery: skip wildcard '*' model IDs from LiteLLM proxy

9c3ced0

Fix Gradio dashboard hang: restore module-level mounting (required for queue/WebSocket)

9e34c41

Fix MODEL_NAME=None + Fix Gradio dashboard slowness (remove auto-reset on tab/radio)

8eede32

Fix MODEL_NAME=None: auto-discover from proxy /models endpoint, fallback to gpt-4o-mini

79fb14b

Final Submission: Aligned ports (8000), synchronized README, and purged workspace logs/caches

dd5366d

Critical Fix: Align internal port to 8000 to satisfy OpenEnv library requirements

47a298a

Compliance Fix: Resolver setup timeout with lazy Gradio and extended 120s wait

6a19dc6

Fix proxy test: exit with 1 on API failure so validator sees the error; fallback to HF_TOKEN if API_KEY is empty

899c12a

Compliance Hardening: Remove silent fallbacks to force proxy usage

292424c

Final Polish: Task-aware fallbacks and surgical refinement

89d39f7

Compliance alignment: satisfy strict /health checker

c8aa313

Compliance fix: strictly use API_KEY and API_BASE_URL to avoid proxy bypass

09a9c72

Allow pip to resolve websockets by relaxing gradio and uvicorn pins

5abef36

Final HF fix: upgrade openenv-core 0.2.3 and pin huggingface_hub

82a3c1b

Fix HF Runtime error: pin huggingface_hub<0.26.0

7b7b896

Final fix for Docker registry failures: use stable python 3.12 and pin dependencies

7c9ac02

Fix structured output: ensure logging always runs and format matches validator

9cdb062

Fix validator pipeline: python 3.12, grader POST, websockets

b978fbd

Fix Docker build: use python:3.11-slim-bookworm for stable registry resolution

75c1656

Fix inference.py: async OpenEnv pattern, from_docker_image, proper error handling

4c68ece

Update strategic progression chart for 0.9+ baseline metrics

29f77f6

Detail grading rewards and penalties in README

74e5e1d

Remove emojis from README and track reward_progression image

82f6517

Add detailed explanations for Easy, Medium, and Hard tasks

1ad2a1f

Final Expert Tier (0.9+) Candidate — Groq Baseline Verified

511f04a

fix: ensure reward evolution chart has (0,0) baseline for judge visibility

8085f66

fix: restore missing policy_md definition in format_obs

a5522cd

fix: resolve Gradio 6.x LinePlot TypeError and constructor warnings

199d538

feat: add reward evolution chart to Gradio dashboard

d78cfdc

final: comprehensive 0.9+ strategic agent upgrades and infrastructure refactor

933baa6

deploy: remove binary from git history for HF compatibility, use GitHub raw URL instead

c5ca7a0

final: polish README (RLVR specs) and cleanup scratch scripts

ef5751d

hackathon: final submission candidate (removes binary image for HF compatibility)

6aa8acb

feat: Absolutist deployment of professional Judges Console

706dca3

build: Trigger forced rebuild for Port-7860 alignment

7470e60