Spaces:
Running
Running
| title: Coding LLM Space | |
| emoji: 🤖 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 6.12.0 | |
| python_version: '3.10' | |
| app_file: app.py | |
| pinned: false | |
| # Advanced Coding LLM (Production-Ready Starter) | |
| This project provides a deployable coding assistant API built on a free Hugging Face coding model. | |
| ## Model Strategy | |
| - Primary model: `Qwen/Qwen2.5-Coder-1.5B-Instruct` (free/open on Hugging Face). | |
| - Fallback model: `Qwen/Qwen2.5-Coder-0.5B-Instruct` if primary load fails. | |
| - Final emergency fallback: `sshleifer/tiny-gpt2` (for guaranteed startup). | |
| - No heavy training required. | |
| - LoRA-ready architecture included in `src/lora_prepare.py`. | |
| ## Features | |
| - Code generation | |
| - Debugging / buggy code fixing | |
| - Code explanation | |
| - Instruction following | |
| - Confidence estimation (from token probabilities) | |
| - Important token extraction (low-confidence tokens) | |
| - Relevancy score (embedding cosine similarity) | |
| - Hallucination checks: | |
| - Syntax validation | |
| - Runtime smoke test | |
| - Optional RAG with FAISS from `data/sample_snippets.json` | |
| ## Project Structure | |
| ```text | |
| coding-llm/ | |
| │── data/ | |
| │── src/ | |
| │── api/ | |
| │── requirements.txt | |
| │── README.md | |
| ``` | |
| ## API Output Format | |
| `POST /generate` returns: | |
| ```json | |
| { | |
| "code": "...", | |
| "explanation": "...", | |
| "confidence": 0.0, | |
| "important_tokens": ["..."], | |
| "relevancy_score": 0.0, | |
| "hallucination": false, | |
| "latency_ms": 0 | |
| } | |
| ``` | |
| If hallucination is detected, the reason is appended inside `explanation`. | |
| ## Local Run | |
| 1. Create environment and install: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| Optional: create `.env` from `.env.example` and set values: | |
| ```bash | |
| copy .env.example .env | |
| ``` | |
| 2. Run API: | |
| ```bash | |
| uvicorn api.main:app --host 0.0.0.0 --port 8000 | |
| ``` | |
| 3. Test request: | |
| ```bash | |
| curl -X POST "http://127.0.0.1:8000/generate" ^ | |
| -H "Content-Type: application/json" ^ | |
| -d "{\"instruction\":\"Fix this Python function\",\"input\":\"def add(a,b) return a+b\"}" | |
| ``` | |
| 4. Optional client: | |
| ```bash | |
| python client_example.py | |
| ``` | |
| ## Hugging Face Deployment (Space) | |
| This repo includes: | |
| - `app.py` (Gradio app for HF Space) | |
| - `upload_to_hf.py` (upload helper script) | |
| - `README_hf_space.md` (Space metadata template) | |
| Steps: | |
| 1. Create a HF access token with write permission. | |
| 2. Run: | |
| ```bash | |
| python upload_to_hf.py --repo-id <your-username/coding-llm-space> --token <HF_TOKEN> | |
| ``` | |
| 3. Your Space launches with public UI and can be called by Hugging Face API key. | |
| ## Security and Ops | |
| - API key auth enabled when `API_KEY` is set. | |
| - In-memory per-IP rate limiting via `RATE_LIMIT_PER_MINUTE`. | |
| - Dockerized API included (`Dockerfile`). | |
| - Model is loaded lazily on first `/generate` request (faster boot, fewer startup failures). | |
| - Set `FORCE_MOCK_MODE=true` to run instantly without downloading models. | |
| - Windows quick-start scripts: | |
| - `run_api.bat` | |
| - `run_space.bat` | |
| ## Docker Compose | |
| ```bash | |
| copy .env.example .env | |
| docker compose up --build | |
| ``` | |
| API is available at `http://127.0.0.1:8000`. | |
| ## Automated Smoke Test | |
| Run this after API starts: | |
| ```bash | |
| python smoke_test.py | |
| ``` | |
| This validates: | |
| - `GET /health` | |
| - `POST /generate` | |
| - required JSON output keys | |
| ## One-command Task Runner | |
| Cross-platform: | |
| ```bash | |
| python tasks.py install | |
| python tasks.py run | |
| python tasks.py smoke | |
| python tasks.py serve-smoke | |
| python tasks.py docker-up | |
| python tasks.py docker-down | |
| python tasks.py hf-upload --repo-id <user/space-name> --token <HF_TOKEN> | |
| ``` | |
| Windows shortcut: | |
| ```bat | |
| run_tasks.bat install | |
| run_tasks.bat run | |
| run_tasks.bat smoke | |
| run_tasks.bat serve-smoke | |
| ``` | |
| Makefile (Linux/macOS/WSL): | |
| ```bash | |
| make install | |
| make run | |
| make smoke | |
| make docker-up | |
| ``` | |
| ## FastAPI Endpoint | |
| ### `POST /generate` | |
| Input JSON: | |
| ```json | |
| { | |
| "instruction": "Explain this code and improve it", | |
| "input": "def f(x): return x*x" | |
| } | |
| ``` | |
| ## Notes for Production | |
| - Keep `max_new_tokens` modest for low latency. | |
| - Add request auth/rate limiting before exposing public endpoint. | |
| - For stronger quality, add curated retrieval corpus in `data/`. | |
| - For robust hallucination checks, extend tests per language/framework. |