Spaces:
Runtime error
Runtime error
File size: 15,961 Bytes
d269059 1d75161 d269059 b2f144b d269059 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 | ---
title: ReproAgent
emoji: π¬
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 4.12.0
python_version: 3.12
app_file: server/app.py
pinned: false
---
<p align="center">
<img src="assets/banner.png" alt="ReproAgent Banner" width="100%"/>
</p>
<h1 align="center">π¬ ReproAgent</h1>
<p align="center">
<strong>An AI-powered agent that automatically reproduces machine learning research papers.</strong>
</p>
<p align="center">
<a href="#-features"><img src="https://img.shields.io/badge/Features-8-blue?style=for-the-badge" alt="Features"/></a>
<a href="#-quick-start"><img src="https://img.shields.io/badge/Python-3.10+-green?style=for-the-badge&logo=python&logoColor=white" alt="Python"/></a>
<a href="#-license"><img src="https://img.shields.io/badge/License-MIT-orange?style=for-the-badge" alt="License"/></a>
<a href="https://huggingface.co/spaces"><img src="https://img.shields.io/badge/π€-HuggingFace_Spaces-yellow?style=for-the-badge" alt="HF Spaces"/></a>
</p>
<p align="center">
Upload a research paper PDF β ReproAgent reads it β finds the repo β clones the code β sets up the environment β runs it β debugs errors β tunes hyperparameters β compares results.
</p>
---
## π OpenEnv Hackathon Submission
This project is submitted to the **OpenEnv Hackathon**. It is a fully compliant environment built on top of the framework.
### Required Materials
- **Hugging Face Space**: [ReproAgent Live Demo](https://huggingface.co/spaces/username/reproagent)
- **Training Script (TRL/PPO)**: [Colab Notebook](training/train_reproagent.ipynb)
- **Evidence of Training**: We trained the agent using Proximal Policy Optimization (PPO) over 50 episodes.
<br><img src="assets/reward_plot.png" alt="Reward Plot" width="400"/> <img src="assets/loss_plot.png" alt="Loss Plot" width="400"/>
- **Presentation**: [Mini-Blog on HuggingFace](https://huggingface.co/blog/reproagent-openenv) / [YouTube Demo (< 2 minutes)](https://youtube.com/watch?v=demo_link)
---
## π Table of Contents
- [Overview](#-overview)
- [Features](#-features)
- [Architecture](#-architecture)
- [Quick Start](#-quick-start)
- [Usage](#-usage)
- [Project Structure](#-project-structure)
- [Configuration](#-configuration)
- [How It Works](#-how-it-works)
- [Validation](#-validation)
- [Docker Deployment](#-docker-deployment)
- [Contributing](#-contributing)
- [License](#-license)
---
## π Overview
**ReproAgent** is an AI-driven framework built on [OpenAI Gymnasium](https://gymnasium.farama.org/) that automates the end-to-end reproduction of machine learning research papers. Given a PDF, it autonomously:
1. **Parses** the paper to extract title, metrics, datasets, and GitHub links
2. **Clones** the linked repository
3. **Sets up** the environment (conda/venv) and installs dependencies
4. **Runs** inference or training scripts
5. **Debugs** errors using real traceback analysis
6. **Tunes** hyperparameters to close the gap between reproduced and claimed results
7. **Compares** final metrics against the paper's claims
It supports both a **Simulation** mode (safe, no system changes) and a **Real Execution** mode (actually clones repos, creates envs, runs code on your machine).
---
## β¨ Features
| Feature | Description |
|---------|-------------|
| π **PDF Parsing** | Extracts metadata using Groq LLM (llama-3.3-70b) with regex fallback |
| π **Repo Discovery** | Finds GitHub links from paper text, cleans trailing punctuation |
| π¦ **Smart Environment Setup** | Auto-detects `requirements.txt`, `environment.yml`, or `pyproject.toml` and creates the correct env (pip venv or conda) |
| π§ **Intelligent Entry Point** | Scans for `inference.py`, `eval.py`, `main.py`, `train.py`, or extracts scripts from README bash blocks |
| π **Real Error Debugging** | Captures actual `stderr` tracebacks and feeds them into the debugging pipeline |
| π§ͺ **Hyperparameter Tuning** | Modifies learning rate, batch size, optimizer, and epochs to reproduce paper metrics |
| π **Dynamic Metric Extraction** | Extracts the actual evaluation metric (FID, BLEU, accuracy, PSNR, etc.) from the paper β not hardcoded |
| π₯οΈ **Gradio Web UI** | Beautiful web interface with live logs, state tracking, and result visualization |
---
## ποΈ Architecture
```
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Gradio Web UI β
β (server/app.py) β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββΌβββββββββββββ
β Reasoning Agent β
β (agents/reasoning_ β
β agent.py) β
ββββββββββββββ¬βββββββββββββ
β select_action()
ββββββββββββββΌβββββββββββββ
β Gymnasium Environment β
β (reproagent/ β
β environment.py) β
β β
β βββββββββββββββββββ β
β β State Machine β β
β β βββββββββββββ β β
β β β Parsing β β β
β β β RepoAnalysβ β β
β β β Setup β β β
β β β Execution β β β
β β β Debugging β β β
β β β Experimentβ β β
β β β Comparisonβ β β
β β βββββββββββββ β β
β βββββββββββββββββββ β
βββββββββββββββββββββββββββ
β β
ββββββββββββ ββββββββββββ
βΌ βΌ
βββββββββββββββββ ββββββββββββββββββ
β Simulation β β Real Execution β
β (mock state β β (subprocess, β
β transitions)β β git clone, β
β β β conda/venv) β
βββββββββββββββββ ββββββββββββββββββ
```
---
## π Quick Start
### Prerequisites
- **Python** 3.10+
- **Git** (for real execution mode)
- **Conda** (optional, for repos that use `environment.yml`)
- A **Groq API key** (free at [console.groq.com](https://console.groq.com))
### Installation
```bash
# 1. Clone the repository
git clone https://github.com/your-username/ReproAgent.git
cd ReproAgent
# 2. Create a virtual environment
python -m venv venv
# Windows
.\venv\Scripts\activate
# macOS/Linux
source venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Set up environment variables
cp .env.example .env
# Edit .env and add your GROQ_API_KEY
```
### Run
```bash
# Launch the Gradio web interface
python server/app.py
```
The UI will be available at `http://localhost:7860` with a public share link.
---
## π» Usage
### Web Interface (Recommended)
1. Open the Gradio UI at `http://localhost:7860`
2. **Upload** a research paper PDF (or paste a URL)
3. Choose **Execution Mode**:
- `Simulation` β Safe demo, no system changes
- `Real Execution` β Actually clones repos and runs code
4. Set **Clone Directory** (where repos will be cloned, e.g. `D:\reproductions`)
5. Click **Start Reproduction** and watch the agent work in real-time
### Command Line
```bash
# Run validation to ensure everything works
python validate.py
# Run a quick inference test
python inference.py
```
### Programmatic API
```python
from reproagent.environment import ReproAgentEnv
from agents.reasoning_agent import create_agent
# Create environment
env = ReproAgentEnv(
difficulty="easy",
max_steps=100,
use_llm=True,
exec_mode="Real Execution",
workspace_dir="./workspace"
)
# Create agent
agent = create_agent(env, agent_type="reasoning", use_llm=True)
# Run episode
obs, info = env.reset()
agent.reset()
for step in range(100):
action = agent.select_action(obs, info)
obs, reward, terminated, truncated, info = env.step(action)
print(f"Step {step}: {info['action_type']} | reward={reward:.2f}")
if terminated or truncated:
break
```
---
## π Project Structure
```
ReproAgent/
βββ reproagent/ # Core Gymnasium environment
β βββ __init__.py
β βββ environment.py # Main env with action implementations
β βββ state.py # Dataclasses for full reproduction state
β βββ actions.py # Action space definition (30+ actions)
β βββ reward.py # Multi-component reward function
β βββ models.py # LLM client (Groq, OpenAI, HuggingFace)
β βββ papers.py # Paper dataset loader
β
βββ agents/ # Agent implementations
β βββ reasoning_agent.py # Phase-based reasoning agent
β βββ paper_parser.py # PDF text extraction + LLM analysis
β βββ repo_analyzer.py # Repository structure analysis
β βββ debugger.py # Error traceback analysis
β
βββ server/
β βββ app.py # Gradio web interface (900+ lines)
β
βββ utils/
β βββ pdf_reader.py # PDF extraction (PyPDF2 + pdfplumber)
β βββ github_utils.py # GitHub API utilities
β
βββ graders/ # Reproduction quality grading
βββ data/papers/ # Sample paper configs (easy/medium/hard)
βββ baseline/ # Baseline agent implementations
βββ static/ # Static assets for UI
β
βββ validate.py # Full validation suite
βββ inference.py # CLI inference entry point
βββ openenv.yaml # OpenEnv compatibility spec
βββ pyproject.toml # Python project metadata
βββ requirements.txt # pip dependencies
βββ Dockerfile # Container deployment
βββ run.bat / run.sh / run.ps1 # Platform-specific launchers
βββ .env.example # Environment variable template
```
---
## βοΈ Configuration
### Environment Variables
Create a `.env` file from the template:
```bash
cp .env.example .env
```
| Variable | Required | Description |
|----------|----------|-------------|
| `GROQ_API_KEY` | **Yes** | Groq API key for LLM-powered extraction ([get one free](https://console.groq.com)) |
| `OPENAI_API_KEY` | No | OpenAI API key (alternative LLM backend) |
| `HF_TOKEN` | No | HuggingFace token for model downloads |
| `GITHUB_TOKEN` | No | GitHub API token for higher rate limits |
### Execution Modes
| Mode | What it does | Use case |
|------|-------------|----------|
| **Simulation** | Simulates all actions with mock state transitions | Safe demos, hackathons, testing |
| **Real Execution** | Runs `git clone`, `conda env create`, `pip install`, `python script.py` on your system | Actually reproducing papers |
---
## π How It Works
The agent follows a **phase-based state machine** with 7 phases:
```
PARSING β REPO_ANALYSIS β SETUP β EXECUTION β DEBUGGING β EXPERIMENTATION β COMPARISON
```
### Phase Details
| Phase | Actions | What Happens |
|-------|---------|--------------|
| **Parsing** | `PARSE_PDF`, `EXTRACT_GITHUB`, `EXTRACT_METRICS` | LLM reads paper, extracts title, GitHub URL, target metric (e.g., FID=7.5) |
| **Repo Analysis** | `CLONE_REPO`, `READ_README`, `FIND_ENTRY_POINT`, `EXTRACT_DEPS` | Clones repo, reads README, finds scripts from bash blocks, detects `environment.yml` |
| **Setup** | `CREATE_VENV`, `INSTALL_REQUIREMENTS`, `VERIFY_SETUP` | Creates conda/venv env, installs deps, verifies setup |
| **Execution** | `RUN_TRAINING`, `RUN_EVAL`, `CHECK_LOGS` | Runs the entry point script via subprocess, captures stdout/stderr |
| **Debugging** | `ANALYZE_ERROR`, `SEARCH_SOLUTION`, `APPLY_FIX` | Parses real Python tracebacks, proposes and applies fixes |
| **Experimentation** | `MODIFY_LR`, `MODIFY_BATCH`, `RUN_EXPERIMENT` | Tunes hyperparameters to close the metric gap |
| **Comparison** | `COMPARE_RESULTS`, `GENERATE_REPORT` | Compares reproduced metric vs. paper claim, generates summary |
### Reward Function
The environment provides a multi-component reward signal:
- **Phase progress** (+10 for advancing through phases)
- **Code execution** (+20 for successful script runs)
- **Error fixing** (+15 per resolved error)
- **Metric improvement** (scaled by how close the reproduced result is to the paper's claim)
- **Time penalty** (-0.01 per step to encourage efficiency)
---
## β
Validation
Run the full validation suite to confirm everything works:
```bash
python validate.py
```
This tests:
| Test | What it validates |
|------|-------------------|
| Environment | `ReproAgentEnv` creates, resets, steps correctly |
| Spaces | Observation and action spaces match the Gymnasium spec |
| Episodes | Full multi-step episodes run without crashes |
| Agents | `ReasoningAgent` and `RandomAgent` interact with the env |
| Demo | Gradio app imports successfully |
| Graders | Reproduction quality grader loads |
| OpenEnv | `openenv.yaml` is present and well-formed |
Expected output:
```
ENVIRONMENT β
PASSED
AGENTS β
PASSED
DEMO β
PASSED
GRADERS β
PASSED
OPENENV_YAML β
PASSED
π ALL VALIDATIONS PASSED!
β
System is ready for deployment
```
---
## π³ Docker Deployment
```bash
# Build the image
docker build -t reproagent .
# Run with your API key
docker run -p 7860:7860 -e GROQ_API_KEY=your_key_here reproagent
```
Or deploy to **HuggingFace Spaces**:
```bash
pip install gradio
gradio deploy
```
---
## π£οΈ Roadmap
- [x] Gymnasium-compatible environment with 30+ actions
- [x] Groq LLM integration with regex fallback
- [x] Gradio web interface with live logs
- [x] Real Execution mode (git clone, conda/venv, subprocess)
- [x] Dynamic metric extraction (FID, BLEU, accuracy, PSNR, etc.)
- [x] Bash block parsing from README for entry point discovery
- [ ] Multi-script sequential execution (run 5 scripts in order per README)
- [ ] Automatic checkpoint downloading from HuggingFace
- [ ] GPU-aware execution scheduling
- [ ] Result visualization and plot generation
- [ ] Support for Jupyter notebook-based repos
---
## π€ Contributing
Contributions are welcome! Please:
1. Fork the repository
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
3. Commit your changes (`git commit -m 'Add amazing feature'`)
4. Push to the branch (`git push origin feature/amazing-feature`)
5. Open a Pull Request
---
## π License
This project is licensed under the **MIT License** β see the [LICENSE](LICENSE) file for details.
---
<p align="center">
Built with β€οΈ for the ML research community
</p>
|