Spaces:

dmaheshwar22
/

verifiable-rl-coder

Sleeping

Apply for a GPU community grant: Academic project

by dmaheshwar22 - opened 25 days ago

I am building an open-source tool for verifiable code generation and execution using a fine-tuned Qwen-2.5-Coder-1.5B model. This project explores "verifiable execution rewards" by allowing users to pick a task, generate code, and immediately validate that code through automated tests in a secure sandbox.
I also wanted to show the capabilities of GRPO for verifiable rewards on bigger model Qwen 7B model for which i need GPU for inference.
Why this project needs a GPU grant:
Real-time Inference: To provide a responsive "watch generation + execution" experience, the model requires low-latency GPU inference (ZeroGPU or A10G).
Open Science/Source: The entire project, including training data and execution logic, is being released to the community to advance research in self-correcting code agents.
Community Education: This Space serves as a learning tool for developers to understand how reinforcement learning with execution feedback can improve 1.5B parameter models.
Current Progress:
The Gradio UI is set up, and the sandbox environment is ready. A GPU grant would allow me to make this demo public and stable for the community to explore.

dmaheshwar22

Owner 18 days ago

•

edited 18 days ago

Hi @akhaliq / @hysts , just following up on this academic project. The Gradio UI and sandbox are fully functional on CPU, and I can integrate the spaces library to make it ZeroGPU-ready for the 7B model inference. Would love to get this live for the community to test!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment