Spaces:
Sleeping
Sleeping
File size: 4,084 Bytes
5bfaf88 9929cd4 9ab7030 9929cd4 5bfaf88 9441f22 4dbc2c6 9929cd4 b4c2a58 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 | ---
title: Coding LLM Space
emoji: π€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.12.0
python_version: '3.10'
app_file: app.py
pinned: false
---
# Advanced Coding LLM (Production-Ready Starter)
This project provides a deployable coding assistant API built on a free Hugging Face coding model.
## Model Strategy
- Primary model: `Qwen/Qwen2.5-Coder-1.5B-Instruct` (free/open on Hugging Face).
- Fallback model: `Qwen/Qwen2.5-Coder-0.5B-Instruct` if primary load fails.
- Final emergency fallback: `sshleifer/tiny-gpt2` (for guaranteed startup).
- No heavy training required.
- LoRA-ready architecture included in `src/lora_prepare.py`.
## Features
- Code generation
- Debugging / buggy code fixing
- Code explanation
- Instruction following
- Confidence estimation (from token probabilities)
- Important token extraction (low-confidence tokens)
- Relevancy score (embedding cosine similarity)
- Hallucination checks:
- Syntax validation
- Runtime smoke test
- Optional RAG with FAISS from `data/sample_snippets.json`
## Project Structure
```text
coding-llm/
βββ data/
βββ src/
βββ api/
βββ requirements.txt
βββ README.md
```
## API Output Format
`POST /generate` returns:
```json
{
"code": "...",
"explanation": "...",
"confidence": 0.0,
"important_tokens": ["..."],
"relevancy_score": 0.0,
"hallucination": false,
"latency_ms": 0
}
```
If hallucination is detected, the reason is appended inside `explanation`.
## Local Run
1. Create environment and install:
```bash
pip install -r requirements.txt
```
Optional: create `.env` from `.env.example` and set values:
```bash
copy .env.example .env
```
2. Run API:
```bash
uvicorn api.main:app --host 0.0.0.0 --port 8000
```
3. Test request:
```bash
curl -X POST "http://127.0.0.1:8000/generate" ^
-H "Content-Type: application/json" ^
-d "{\"instruction\":\"Fix this Python function\",\"input\":\"def add(a,b) return a+b\"}"
```
4. Optional client:
```bash
python client_example.py
```
## Hugging Face Deployment (Space)
This repo includes:
- `app.py` (Gradio app for HF Space)
- `upload_to_hf.py` (upload helper script)
- `README_hf_space.md` (Space metadata template)
Steps:
1. Create a HF access token with write permission.
2. Run:
```bash
python upload_to_hf.py --repo-id <your-username/coding-llm-space> --token <HF_TOKEN>
```
3. Your Space launches with public UI and can be called by Hugging Face API key.
## Security and Ops
- API key auth enabled when `API_KEY` is set.
- In-memory per-IP rate limiting via `RATE_LIMIT_PER_MINUTE`.
- Dockerized API included (`Dockerfile`).
- Model is loaded lazily on first `/generate` request (faster boot, fewer startup failures).
- Set `FORCE_MOCK_MODE=true` to run instantly without downloading models.
- Windows quick-start scripts:
- `run_api.bat`
- `run_space.bat`
## Docker Compose
```bash
copy .env.example .env
docker compose up --build
```
API is available at `http://127.0.0.1:8000`.
## Automated Smoke Test
Run this after API starts:
```bash
python smoke_test.py
```
This validates:
- `GET /health`
- `POST /generate`
- required JSON output keys
## One-command Task Runner
Cross-platform:
```bash
python tasks.py install
python tasks.py run
python tasks.py smoke
python tasks.py serve-smoke
python tasks.py docker-up
python tasks.py docker-down
python tasks.py hf-upload --repo-id <user/space-name> --token <HF_TOKEN>
```
Windows shortcut:
```bat
run_tasks.bat install
run_tasks.bat run
run_tasks.bat smoke
run_tasks.bat serve-smoke
```
Makefile (Linux/macOS/WSL):
```bash
make install
make run
make smoke
make docker-up
```
## FastAPI Endpoint
### `POST /generate`
Input JSON:
```json
{
"instruction": "Explain this code and improve it",
"input": "def f(x): return x*x"
}
```
## Notes for Production
- Keep `max_new_tokens` modest for low latency.
- Add request auth/rate limiting before exposing public endpoint.
- For stronger quality, add curated retrieval corpus in `data/`.
- For robust hallucination checks, extend tests per language/framework. |