Spaces:

Supreeth
/

verirl-env

Sleeping

┌───────────────────────────────────────┬─────────────┬──────────────────────────────────────────┐ │ Task │ Difficulty │ Why it matters │ ├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤ │ Depthwise convolution PE │ medium │ Matches MobileNet/EfficientNet workloads │ ├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤ │ Attention score unit (QK dot product) │ medium-hard │ Transformer relevance │ ├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤ │ ReLU + quantization clip unit │ easy │ Complete the "LLM inference stack" story │ ├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤ │ Ring buffer / circular FIFO │ easy │ Different from AXI, good for KV cache │ ├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤ │ Barrel shifter │ easy │ Classic, good baseline task │ ├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤ │ Registered memory (SRAM model) │ medium │ Register file for inference engines │ ├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤ │ FP16 adder │ hard │ Real hardware challenge │ └───────────────────────────────────────┴─────────────┴──────────────────────────────────────────┘

Why this wins: You can pitch "VeriRL covers the full ML accelerator primitive stack from data movement to compute."

Add Formal Verification via SymbiYosys (Highest Novelty, Hard)

Currently you test with simulation (coverage-limited). Add a run_formal action using SymbiYosys:

Write SVA assertions in the spec
Agent gets told: "3 of 5 properties proven, 2 with counterexamples found at cycle N"
This is something no other OpenEnv environment does

This would be a clear differentiator and technically impressive to judges.

Richer Reward Signal (Medium Effort, High RL Training Impact)

Current reward is coarse. Ideas:

Structured error feedback for reward shaping:

Parse iverilog errors to return {line: N, error: "undeclared wire foo"} — agents can localize bugs faster
Return which specific test vectors fail, not just pass count

Add a power dimension:

Yosys can estimate switching activity → add a power component to score_breakdown
Weight it at 5-10% — doesn't need to be dominant, just signals good RTL style

Delta-aware area scoring:

Currently area score is ref_cells / agent_cells — agents that cheat with blackbox modules can game this
Gate area score on sim_ratio >= 0.8 (you already do this, but tighten it)

Curriculum / Difficulty Progression (Medium Effort)

Add curriculum support to the environment: env.reset(difficulty="easy") # only easy tasks env.reset(curriculum="full") # all tasks, sampled by difficulty weight

Also add max_turns scaling: easy tasks get fewer turns as the agent improves
This is a feature RL trainers specifically want and would differentiate from toy environments

Actual Training Demo (Huge Impact for Presentation)

This is probably the single most impressive thing you could show: a training curve where an LLM agent improves at Verilog over episodes.

Use GRPO or rejection sampling with a small model (even GPT-4o or Claude via API)
Run 50-100 episodes on mac_unit
Plot: score vs episode number
Show a trajectory: bad Verilog → compiles → passes tests → passes all tests

Even a simple few-shot improvement curve proves "this environment actually trains agents." That's the entire pitch of OpenEnv.

Leaderboard Endpoint (Low Effort, High Showmanship)

Add a /leaderboard HTTP endpoint that returns aggregated stats: { "mac_unit": {"avg_score": 0.72, "perfect_runs": 14, "total_episodes": 87}, "systolic_array": {"avg_score": 0.41, "perfect_runs": 2, "total_episodes": 43} }

Persist episode results to SQLite
Add a simple web UI or just show it in the demo
This makes the demo feel live and real

Multi-File / Hierarchical Design Support (High Effort, High Impact)

Real Verilog uses multiple modules. Add:

write_file(filename="pe.v", verilog_src=...) — let agents write to named files
run_compile compiles all files together
Tasks could require a top-level module instantiating sub-modules

This mirrors real EDA workflows and would make the systolic array task much more natural.

Prioritized Recommendation

If you have limited time before the finale, I'd do this in order:

More tasks (3-5 new ones) — most visible, shows breadth, 1-2 days of work
Training demo — show a curve, run it tonight, hardest to fake and most impressive
Leaderboard endpoint — few hours, looks great in a live demo
Richer sim feedback (failing test case details) — improves RL signal, shows you understand the problem deeply
Formal verification — if you have the time, this is the WOW moment

What aspect do you want to dig into first? I can help implement any of these.

✻ Brewed for 1m 8s

❯ lets add more tasks and bring it up to 10, use the ones you described before are good and powerful. lets go with that. lets also add formal verification as that's the next frontier for model capability. lets do an actual training demo, the environment will be hosted on huggingface but lets use modal labs for their serverless GPU's to run some RL post-training/RLVR to round it off and finally lets add multifile support. it's essentially a requirement for anything that's not a toy verilog problem