--- title: Coding LLM Space emoji: 🤖 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 6.12.0 python_version: '3.10' app_file: app.py pinned: false --- # Advanced Coding LLM (Production-Ready Starter) This project provides a deployable coding assistant API built on a free Hugging Face coding model. ## Model Strategy - Primary model: `Qwen/Qwen2.5-Coder-1.5B-Instruct` (free/open on Hugging Face). - Fallback model: `Qwen/Qwen2.5-Coder-0.5B-Instruct` if primary load fails. - Final emergency fallback: `sshleifer/tiny-gpt2` (for guaranteed startup). - No heavy training required. - LoRA-ready architecture included in `src/lora_prepare.py`. ## Features - Code generation - Debugging / buggy code fixing - Code explanation - Instruction following - Confidence estimation (from token probabilities) - Important token extraction (low-confidence tokens) - Relevancy score (embedding cosine similarity) - Hallucination checks: - Syntax validation - Runtime smoke test - Optional RAG with FAISS from `data/sample_snippets.json` ## Project Structure ```text coding-llm/ │── data/ │── src/ │── api/ │── requirements.txt │── README.md ``` ## API Output Format `POST /generate` returns: ```json { "code": "...", "explanation": "...", "confidence": 0.0, "important_tokens": ["..."], "relevancy_score": 0.0, "hallucination": false, "latency_ms": 0 } ``` If hallucination is detected, the reason is appended inside `explanation`. ## Local Run 1. Create environment and install: ```bash pip install -r requirements.txt ``` Optional: create `.env` from `.env.example` and set values: ```bash copy .env.example .env ``` 2. Run API: ```bash uvicorn api.main:app --host 0.0.0.0 --port 8000 ``` 3. Test request: ```bash curl -X POST "http://127.0.0.1:8000/generate" ^ -H "Content-Type: application/json" ^ -d "{\"instruction\":\"Fix this Python function\",\"input\":\"def add(a,b) return a+b\"}" ``` 4. Optional client: ```bash python client_example.py ``` ## Hugging Face Deployment (Space) This repo includes: - `app.py` (Gradio app for HF Space) - `upload_to_hf.py` (upload helper script) - `README_hf_space.md` (Space metadata template) Steps: 1. Create a HF access token with write permission. 2. Run: ```bash python upload_to_hf.py --repo-id --token ``` 3. Your Space launches with public UI and can be called by Hugging Face API key. ## Security and Ops - API key auth enabled when `API_KEY` is set. - In-memory per-IP rate limiting via `RATE_LIMIT_PER_MINUTE`. - Dockerized API included (`Dockerfile`). - Model is loaded lazily on first `/generate` request (faster boot, fewer startup failures). - Set `FORCE_MOCK_MODE=true` to run instantly without downloading models. - Windows quick-start scripts: - `run_api.bat` - `run_space.bat` ## Docker Compose ```bash copy .env.example .env docker compose up --build ``` API is available at `http://127.0.0.1:8000`. ## Automated Smoke Test Run this after API starts: ```bash python smoke_test.py ``` This validates: - `GET /health` - `POST /generate` - required JSON output keys ## One-command Task Runner Cross-platform: ```bash python tasks.py install python tasks.py run python tasks.py smoke python tasks.py serve-smoke python tasks.py docker-up python tasks.py docker-down python tasks.py hf-upload --repo-id --token ``` Windows shortcut: ```bat run_tasks.bat install run_tasks.bat run run_tasks.bat smoke run_tasks.bat serve-smoke ``` Makefile (Linux/macOS/WSL): ```bash make install make run make smoke make docker-up ``` ## FastAPI Endpoint ### `POST /generate` Input JSON: ```json { "instruction": "Explain this code and improve it", "input": "def f(x): return x*x" } ``` ## Notes for Production - Keep `max_new_tokens` modest for low latency. - Add request auth/rate limiting before exposing public endpoint. - For stronger quality, add curated retrieval corpus in `data/`. - For robust hallucination checks, extend tests per language/framework.