Spaces:

Supreeth
/

verirl-env

Sleeping

App Files Files Community

verirl-env / docs /plan.md

Supreeth

Upload folder using huggingface_hub

12ac0fd verified about 1 month ago

preview code

raw

history blame contribute delete

8.69 kB

	Now I have a thorough picture. Let me give you a structured breakdown of high-impact ideas.

	---
	Understanding Your Current Strengths

	You've built something genuinely differentiated:
	- Real EDA toolchain (iverilog + yosys) — not heuristic, not LLM-judged
	- Multi-dimensional scoring (compile, sim, timing, area)
	- Concurrent session support — scales for actual RL training
	- Clean OpenEnv compliance — judges can drop it in immediately

	The weak points are: only 3 tasks, a thin reward signal, and no demonstration of the environment being used for actual RL
	training.

	---
	Brainstorm: What to Build for the Grand Finale

	1. Expand the Task Library (Highest Impact, Moderate Effort)

	3 tasks is a proof-of-concept. 8-10 tasks is a platform. Ideas that stay in the AI-accelerator domain (cohesive
	narrative):

	┌───────────────────────────────────────┬─────────────┬──────────────────────────────────────────┐
	│ Task │ Difficulty │ Why it matters │
	├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤
	│ Depthwise convolution PE │ medium │ Matches MobileNet/EfficientNet workloads │
	├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤
	│ Attention score unit (QK dot product) │ medium-hard │ Transformer relevance │
	├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤
	│ ReLU + quantization clip unit │ easy │ Complete the "LLM inference stack" story │
	├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤
	│ Ring buffer / circular FIFO │ easy │ Different from AXI, good for KV cache │
	├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤
	│ Barrel shifter │ easy │ Classic, good baseline task │
	├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤
	│ Registered memory (SRAM model) │ medium │ Register file for inference engines │
	├───────────────────────────────────────┼─────────────┼──────────────────────────────────────────┤
	│ FP16 adder │ hard │ Real hardware challenge │
	└───────────────────────────────────────┴─────────────┴──────────────────────────────────────────┘

	Why this wins: You can pitch "VeriRL covers the full ML accelerator primitive stack from data movement to compute."

	---
	2. Add Formal Verification via SymbiYosys (Highest Novelty, Hard)

	Currently you test with simulation (coverage-limited). Add a run_formal action using SymbiYosys:
	- Write SVA assertions in the spec
	- Agent gets told: "3 of 5 properties proven, 2 with counterexamples found at cycle N"
	- This is something no other OpenEnv environment does

	This would be a clear differentiator and technically impressive to judges.

	---
	3. Richer Reward Signal (Medium Effort, High RL Training Impact)

	Current reward is coarse. Ideas:

	Structured error feedback for reward shaping:
	- Parse iverilog errors to return {line: N, error: "undeclared wire foo"} — agents can localize bugs faster
	- Return which specific test vectors fail, not just pass count

	Add a power dimension:
	- Yosys can estimate switching activity → add a power component to score_breakdown
	- Weight it at 5-10% — doesn't need to be dominant, just signals good RTL style

	Delta-aware area scoring:
	- Currently area score is ref_cells / agent_cells — agents that cheat with blackbox modules can game this
	- Gate area score on sim_ratio >= 0.8 (you already do this, but tighten it)

	---
	4. Curriculum / Difficulty Progression (Medium Effort)

	Add curriculum support to the environment:
	env.reset(difficulty="easy") # only easy tasks
	env.reset(curriculum="full") # all tasks, sampled by difficulty weight
	- Also add max_turns scaling: easy tasks get fewer turns as the agent improves
	- This is a feature RL trainers specifically want and would differentiate from toy environments

	---
	5. Actual Training Demo (Huge Impact for Presentation)

	This is probably the single most impressive thing you could show: a training curve where an LLM agent improves at Verilog
	over episodes.

	- Use GRPO or rejection sampling with a small model (even GPT-4o or Claude via API)
	- Run 50-100 episodes on mac_unit
	- Plot: score vs episode number
	- Show a trajectory: bad Verilog → compiles → passes tests → passes all tests

	Even a simple few-shot improvement curve proves "this environment actually trains agents." That's the entire pitch of
	OpenEnv.

	---
	6. Leaderboard Endpoint (Low Effort, High Showmanship)

	Add a /leaderboard HTTP endpoint that returns aggregated stats:
	{
	"mac_unit": {"avg_score": 0.72, "perfect_runs": 14, "total_episodes": 87},
	"systolic_array": {"avg_score": 0.41, "perfect_runs": 2, "total_episodes": 43}
	}
	- Persist episode results to SQLite
	- Add a simple web UI or just show it in the demo
	- This makes the demo feel live and real

	---
	7. Multi-File / Hierarchical Design Support (High Effort, High Impact)

	Real Verilog uses multiple modules. Add:
	- write_file(filename="pe.v", verilog_src=...) — let agents write to named files
	- run_compile compiles all files together
	- Tasks could require a top-level module instantiating sub-modules

	This mirrors real EDA workflows and would make the systolic array task much more natural.

	---
	Prioritized Recommendation

	If you have limited time before the finale, I'd do this in order:

	1. More tasks (3-5 new ones) — most visible, shows breadth, 1-2 days of work
	2. Training demo — show a curve, run it tonight, hardest to fake and most impressive
	3. Leaderboard endpoint — few hours, looks great in a live demo
	4. Richer sim feedback (failing test case details) — improves RL signal, shows you understand the problem deeply
	5. Formal verification — if you have the time, this is the WOW moment

	What aspect do you want to dig into first? I can help implement any of these.

	✻ Brewed for 1m 8s

	❯ lets add more tasks and bring it up to 10, use the ones you described before are good and powerful. lets go with that.
	lets also add formal verification as that's the next frontier for model capability. lets do an actual training demo, the
	environment will be hosted on huggingface but lets use modal labs for their serverless GPU's to run some RL
	post-training/RLVR to round it off and finally lets add multifile support. it's essentially a requirement for anything
	that's not a toy verilog problem