Spaces:
Sleeping
Sleeping
| title: LLM Code Deployment API | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| app_port: 7860 | |
| # LLM Code Deployment System | |
| An automated system for building, deploying, and evaluating LLM-generated web applications with GitHub Pages integration. | |
| ## Overview | |
| This project implements a complete workflow for: | |
| - **Students**: Receive task requests, use LLMs to generate code, deploy to GitHub Pages, and submit for evaluation | |
| - **Instructors**: Generate task requests, receive submissions, run automated evaluations (static, dynamic, and LLM-based) | |
| ## π Quick Deployment for Students | |
| **Deploy to Hugging Face Spaces in 10 minutes:** | |
| 1. Read **[DEPLOYMENT.md](DEPLOYMENT.md)** for complete step-by-step instructions | |
| 2. Read **[README_SPACES.md](README_SPACES.md)** for Hugging Face Spaces configuration | |
| 3. Get your AIPipe token from https://aipipe.org/login ($2/month free for IIT Madras students) | |
| 4. Create a Space at https://huggingface.co/new-space | |
| 5. Configure environment variables and deploy! | |
| **Already deployed?** Just submit your endpoint URL to the instructor's Google Form! | |
| ## Architecture | |
| ``` | |
| βββββββββββββββ ββββββββββββββββ βββββββββββββββ | |
| β Instructor β β Student β β GitHub β | |
| β System ββββββββββΆβ API ββββββββββΆβ Pages β | |
| β β POST β β Deploy β β | |
| βββββββββββββββ Task ββββββββββββββββ βββββββββββββββ | |
| β β β | |
| β β β | |
| β βββββββββPOSTββββββββββββββ | |
| β Submission β | |
| β β | |
| βΌ βΌ | |
| βββββββββββββββ βββββββββββββββ | |
| β Evaluation βββββββββββββββββββββββββββββββββββββ Validation β | |
| β Database β β & Checks β | |
| βββββββββββββββ βββββββββββββββ | |
| ``` | |
| ## Features | |
| ### Student-Side | |
| - **API Endpoint**: Receives task requests via HTTP POST | |
| - **LLM Code Generation**: Uses Claude/GPT to generate complete web apps | |
| - **GitHub Integration**: Automatically creates repos, pushes code, enables Pages | |
| - **Automatic Notification**: Sends repo details to evaluation endpoint | |
| - **Round 2 Support**: Handles update requests for existing repos | |
| ### Instructor-Side | |
| - **Task Templates**: YAML-based parametrizable task definitions | |
| - **Round 1 & 2 Scripts**: Automated task generation and distribution | |
| - **Evaluation API**: Receives and validates student submissions | |
| - **Multi-Level Checks**: | |
| - Static: License, README, repo creation time, secrets detection | |
| - LLM: Code quality, documentation quality | |
| - Dynamic: Playwright-based functional testing | |
| - **Database**: PostgreSQL storage for tasks, repos, and results | |
| ## Project Structure | |
| ``` | |
| tds-p1/ | |
| βββ shared/ # Shared utilities and models | |
| β βββ config.py # Configuration management | |
| β βββ models.py # Pydantic data models | |
| β βββ logger.py # Logging setup | |
| β βββ utils.py # Utility functions | |
| βββ student/ # Student-side components | |
| β βββ api.py # FastAPI endpoint | |
| β βββ code_generator.py # LLM-based code generation | |
| β βββ github_manager.py # GitHub operations | |
| β βββ notification_client.py # Evaluation notification | |
| βββ instructor/ # Instructor-side components | |
| β βββ api.py # Evaluation endpoint | |
| β βββ database.py # Database models and operations | |
| β βββ task_templates.py # Template management | |
| β βββ round1.py # Round 1 task generation | |
| β βββ round2.py # Round 2 task generation | |
| β βββ evaluate.py # Main evaluation script | |
| β βββ checks/ # Evaluation modules | |
| β βββ static_checks.py # Static analysis | |
| β βββ dynamic_checks.py # Playwright tests | |
| β βββ llm_checks.py # LLM evaluations | |
| βββ templates/ # Task template YAML files | |
| β βββ sum-of-sales.yaml | |
| β βββ markdown-to-html.yaml | |
| β βββ github-user-created.yaml | |
| βββ pyproject.toml # Project dependencies | |
| βββ .env.example # Environment variables template | |
| βββ README.md # This file | |
| ``` | |
| ## Setup | |
| ### Prerequisites | |
| - Python 3.10+ | |
| - PostgreSQL database | |
| - GitHub account with personal access token | |
| - Anthropic or OpenAI API key | |
| ### Installation | |
| 1. **Clone the repository** | |
| ```bash | |
| git clone <your-repo-url> | |
| cd tds-p1 | |
| ``` | |
| 2. **Install dependencies** | |
| ```bash | |
| pip install -e . | |
| ``` | |
| 3. **Install Playwright browsers** | |
| ```bash | |
| playwright install chromium | |
| ``` | |
| 4. **Configure environment** | |
| ```bash | |
| cp .env.example .env | |
| # Edit .env with your credentials | |
| ``` | |
| 5. **Set up database** | |
| ```bash | |
| # Create PostgreSQL database | |
| createdb llm_deployment | |
| # Initialize tables | |
| python -c "from instructor.database import Database; Database().create_tables()" | |
| ``` | |
| ## Configuration | |
| Edit `.env` with your settings: | |
| ### Student Configuration | |
| ```bash | |
| STUDENT_SECRET=your-secret-key | |
| STUDENT_EMAIL=your-email@example.com | |
| STUDENT_API_PORT=8000 | |
| ``` | |
| ### GitHub | |
| ```bash | |
| GITHUB_TOKEN=ghp_your_personal_access_token | |
| GITHUB_USERNAME=your-username | |
| ``` | |
| ### LLM Provider | |
| ```bash | |
| # Choose one | |
| LLM_PROVIDER=anthropic # or openai | |
| ANTHROPIC_API_KEY=sk-ant-... | |
| # OR | |
| OPENAI_API_KEY=sk-... | |
| LLM_MODEL=claude-3-5-sonnet-20241022 | |
| ``` | |
| ### Instructor | |
| ```bash | |
| DATABASE_URL=postgresql://user:password@localhost:5432/llm_deployment | |
| EVALUATION_API_URL=http://your-server:8001/api/evaluate | |
| ``` | |
| ## Usage | |
| ### For Students | |
| 1. **Start the Student API** | |
| ```bash | |
| python -m student.api | |
| ``` | |
| The API will listen on `http://localhost:8000/api/build` | |
| 2. **Test with a sample request** | |
| ```bash | |
| curl -X POST http://localhost:8000/api/build \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "email": "your-email@example.com", | |
| "secret": "your-secret", | |
| "task": "test-task-abc", | |
| "round": 1, | |
| "nonce": "unique-nonce-123", | |
| "brief": "Create a simple Hello World page", | |
| "checks": ["Page displays Hello World"], | |
| "evaluation_url": "http://localhost:8001/api/evaluate", | |
| "attachments": [] | |
| }' | |
| ``` | |
| ### For Instructors | |
| 1. **Start the Evaluation API** | |
| ```bash | |
| python -m instructor.api | |
| ``` | |
| 2. **Prepare submissions.csv** | |
| ```csv | |
| timestamp,email,endpoint,secret | |
| 2025-01-15T10:00:00,student1@example.com,http://student1.com/api/build,secret1 | |
| 2025-01-15T10:05:00,student2@example.com,http://student2.com/api/build,secret2 | |
| ``` | |
| 3. **Run Round 1 task generation** | |
| ```bash | |
| python -m instructor.round1 | |
| ``` | |
| This will: | |
| - Load submissions from CSV | |
| - Generate unique tasks from templates | |
| - POST tasks to student endpoints | |
| - Log results to database | |
| 4. **Run evaluations** | |
| ```bash | |
| python -m instructor.evaluate | |
| ``` | |
| This will: | |
| - Fetch pending submissions | |
| - Clone repositories | |
| - Run static, LLM, and Playwright checks | |
| - Save results to database | |
| 5. **Run Round 2 task generation** | |
| ```bash | |
| python -m instructor.round2 | |
| ``` | |
| This will: | |
| - Find all Round 1 submissions | |
| - Generate Round 2 update tasks | |
| - POST to student endpoints | |
| ## Task Templates | |
| Task templates are YAML files in the `templates/` directory. Example: | |
| ```yaml | |
| id: sum-of-sales | |
| brief: Publish a single-page site that fetches data.csv from attachments... | |
| attachments: | |
| - name: data.csv | |
| url: data:text/csv;base64,placeholder | |
| checks: | |
| - "Page title equals 'Sales Summary {{ seed }}'" | |
| - "Bootstrap 5 CSS loaded from jsdelivr" | |
| round2: | |
| - brief: Add a Bootstrap table #product-sales... | |
| checks: | |
| - "Table #product-sales exists" | |
| ``` | |
| ### Template Variables | |
| - `{{ seed }}`: Unique seed based on email and timestamp | |
| - `{{ hash }}`: Deterministic hash value | |
| - `{{ result }}`: Generated numeric value | |
| ## API Endpoints | |
| ### Student API | |
| **POST /api/build** | |
| - Receives task request | |
| - Returns 200 on acceptance | |
| - Processes in background | |
| **GET /api/status/{task_id}** | |
| - Returns task status | |
| **GET /health** | |
| - Health check | |
| ### Instructor API | |
| **POST /api/evaluate** | |
| - Receives repo submission | |
| - Validates against task record | |
| - Returns 200 on success | |
| **GET /api/submissions/{email}** | |
| - Returns all submissions for email | |
| **GET /api/results/{email}** | |
| - Returns all evaluation results | |
| ## Evaluation Criteria | |
| ### Static Checks | |
| - β Repository created after task sent | |
| - β MIT LICENSE exists in root | |
| - β README.md present with good structure | |
| - β No secrets in git history | |
| ### LLM Checks | |
| - β README.md professional quality (0-1 score) | |
| - β Code quality and best practices (0-1 score) | |
| ### Dynamic Checks | |
| - β Task-specific requirements (from template) | |
| - β Page loads successfully | |
| - β JavaScript evaluations | |
| - β Element presence and content | |
| ## Database Schema | |
| ### Tasks Table | |
| - Task requests sent to students | |
| - Fields: email, task, round, nonce, brief, checks, etc. | |
| ### Repos Table | |
| - Repository submissions from students | |
| - Fields: email, task, round, repo_url, commit_sha, pages_url | |
| ### Results Table | |
| - Evaluation results | |
| - Fields: email, task, round, check, score, reason, logs | |
| ## Troubleshooting | |
| ### Common Issues | |
| **Student API not receiving requests** | |
| - Check firewall settings | |
| - Ensure port 8000 is accessible | |
| - Verify endpoint URL in submissions.csv | |
| **GitHub Pages not deploying** | |
| - Verify GITHUB_TOKEN has repo permissions | |
| - Check repository is public | |
| - Wait up to 60 seconds for Pages to activate | |
| **LLM generation fails** | |
| - Check API key is valid | |
| - Verify API quota/credits | |
| - Review logs for error details | |
| **Playwright tests fail** | |
| - Ensure chromium is installed: `playwright install chromium` | |
| - Check Pages URL is accessible | |
| - Increase timeout if needed | |
| **Database connection errors** | |
| - Verify PostgreSQL is running | |
| - Check DATABASE_URL credentials | |
| - Ensure database exists | |
| ## Development | |
| ### Running Tests | |
| ```bash | |
| pytest tests/ | |
| ``` | |
| ### Code Formatting | |
| ```bash | |
| black . | |
| ruff check . | |
| ``` | |
| ### Type Checking | |
| ```bash | |
| mypy . | |
| ``` | |
| ## License | |
| MIT License | |
| Copyright (c) 2025 | |
| Permission is hereby granted, free of charge, to any person obtaining a copy | |
| of this software and associated documentation files (the "Software"), to deal | |
| in the Software without restriction, including without limitation the rights | |
| to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
| copies of the Software, and to permit persons to whom the Software is | |
| furnished to do so, subject to the following conditions: | |
| The above copyright notice and this permission notice shall be included in all | |
| copies or substantial portions of the Software. | |
| THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |
| IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |
| FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |
| AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |
| LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |
| OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |
| SOFTWARE. | |
| ## Contributing | |
| 1. Fork the repository | |
| 2. Create a feature branch | |
| 3. Make your changes | |
| 4. Submit a pull request | |
| ## Support | |
| For issues and questions: | |
| - Check the troubleshooting section | |
| - Review logs in `logs/app.log` | |
| - Open an issue on GitHub | |