verirl-env / docs /plan.md
Supreeth's picture
Upload folder using huggingface_hub
12ac0fd verified

Now I have a thorough picture. Let me give you a structured breakdown of high-impact ideas.


Understanding Your Current Strengths

You've built something genuinely differentiated:

  • Real EDA toolchain (iverilog + yosys) — not heuristic, not LLM-judged
  • Multi-dimensional scoring (compile, sim, timing, area)
  • Concurrent session support — scales for actual RL training
  • Clean OpenEnv compliance — judges can drop it in immediately

The weak points are: only 3 tasks, a thin reward signal, and no demonstration of the environment being used for actual RL training.


Brainstorm: What to Build for the Grand Finale

  1. Expand the Task Library (Highest Impact, Moderate Effort)

3 tasks is a proof-of-concept. 8-10 tasks is a platform. Ideas that stay in the AI-accelerator domain (cohesive narrative):

┌───────────────────────────────────────┬─────────────┬──────────────────────────────────────────┐ │ Task │ Difficulty │ Why it matters │ ├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤ │ Depthwise convolution PE │ medium │ Matches MobileNet/EfficientNet workloads │ ├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤ │ Attention score unit (QK dot product) │ medium-hard │ Transformer relevance │ ├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤ │ ReLU + quantization clip unit │ easy │ Complete the "LLM inference stack" story │ ├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤ │ Ring buffer / circular FIFO │ easy │ Different from AXI, good for KV cache │ ├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤ │ Barrel shifter │ easy │ Classic, good baseline task │ ├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤ │ Registered memory (SRAM model) │ medium │ Register file for inference engines │ ├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤ │ FP16 adder │ hard │ Real hardware challenge │ └───────────────────────────────────────┴─────────────┴──────────────────────────────────────────┘

Why this wins: You can pitch "VeriRL covers the full ML accelerator primitive stack from data movement to compute."


  1. Add Formal Verification via SymbiYosys (Highest Novelty, Hard)

Currently you test with simulation (coverage-limited). Add a run_formal action using SymbiYosys:

  • Write SVA assertions in the spec
  • Agent gets told: "3 of 5 properties proven, 2 with counterexamples found at cycle N"
  • This is something no other OpenEnv environment does

This would be a clear differentiator and technically impressive to judges.


  1. Richer Reward Signal (Medium Effort, High RL Training Impact)

Current reward is coarse. Ideas:

Structured error feedback for reward shaping:

  • Parse iverilog errors to return {line: N, error: "undeclared wire foo"} — agents can localize bugs faster
  • Return which specific test vectors fail, not just pass count

Add a power dimension:

  • Yosys can estimate switching activity → add a power component to score_breakdown
  • Weight it at 5-10% — doesn't need to be dominant, just signals good RTL style

Delta-aware area scoring:

  • Currently area score is ref_cells / agent_cells — agents that cheat with blackbox modules can game this
  • Gate area score on sim_ratio >= 0.8 (you already do this, but tighten it)

  1. Curriculum / Difficulty Progression (Medium Effort)

Add curriculum support to the environment: env.reset(difficulty="easy") # only easy tasks env.reset(curriculum="full") # all tasks, sampled by difficulty weight

  • Also add max_turns scaling: easy tasks get fewer turns as the agent improves
  • This is a feature RL trainers specifically want and would differentiate from toy environments

  1. Actual Training Demo (Huge Impact for Presentation)

This is probably the single most impressive thing you could show: a training curve where an LLM agent improves at Verilog over episodes.

  • Use GRPO or rejection sampling with a small model (even GPT-4o or Claude via API)
  • Run 50-100 episodes on mac_unit
  • Plot: score vs episode number
  • Show a trajectory: bad Verilog → compiles → passes tests → passes all tests

Even a simple few-shot improvement curve proves "this environment actually trains agents." That's the entire pitch of OpenEnv.


  1. Leaderboard Endpoint (Low Effort, High Showmanship)

Add a /leaderboard HTTP endpoint that returns aggregated stats: { "mac_unit": {"avg_score": 0.72, "perfect_runs": 14, "total_episodes": 87}, "systolic_array": {"avg_score": 0.41, "perfect_runs": 2, "total_episodes": 43} }

  • Persist episode results to SQLite
  • Add a simple web UI or just show it in the demo
  • This makes the demo feel live and real

  1. Multi-File / Hierarchical Design Support (High Effort, High Impact)

Real Verilog uses multiple modules. Add:

  • write_file(filename="pe.v", verilog_src=...) — let agents write to named files
  • run_compile compiles all files together
  • Tasks could require a top-level module instantiating sub-modules

This mirrors real EDA workflows and would make the systolic array task much more natural.


Prioritized Recommendation

If you have limited time before the finale, I'd do this in order:

  1. More tasks (3-5 new ones) — most visible, shows breadth, 1-2 days of work
  2. Training demo — show a curve, run it tonight, hardest to fake and most impressive
  3. Leaderboard endpoint — few hours, looks great in a live demo
  4. Richer sim feedback (failing test case details) — improves RL signal, shows you understand the problem deeply
  5. Formal verification — if you have the time, this is the WOW moment

What aspect do you want to dig into first? I can help implement any of these.

✻ Brewed for 1m 8s

❯ lets add more tasks and bring it up to 10, use the ones you described before are good and powerful. lets go with that. lets also add formal verification as that's the next frontier for model capability. lets do an actual training demo, the environment will be hosted on huggingface but lets use modal labs for their serverless GPU's to run some RL post-training/RLVR to round it off and finally lets add multifile support. it's essentially a requirement for anything that's not a toy verilog problem