coding-llm-space / README.md
girish00's picture
Update README.md
9ab7030 verified
---
title: Coding LLM Space
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.12.0
python_version: '3.10'
app_file: app.py
pinned: false
---
# Advanced Coding LLM (Production-Ready Starter)
This project provides a deployable coding assistant API built on a free Hugging Face coding model.
## Model Strategy
- Primary model: `Qwen/Qwen2.5-Coder-1.5B-Instruct` (free/open on Hugging Face).
- Fallback model: `Qwen/Qwen2.5-Coder-0.5B-Instruct` if primary load fails.
- Final emergency fallback: `sshleifer/tiny-gpt2` (for guaranteed startup).
- No heavy training required.
- LoRA-ready architecture included in `src/lora_prepare.py`.
## Features
- Code generation
- Debugging / buggy code fixing
- Code explanation
- Instruction following
- Confidence estimation (from token probabilities)
- Important token extraction (low-confidence tokens)
- Relevancy score (embedding cosine similarity)
- Hallucination checks:
- Syntax validation
- Runtime smoke test
- Optional RAG with FAISS from `data/sample_snippets.json`
## Project Structure
```text
coding-llm/
│── data/
│── src/
│── api/
│── requirements.txt
│── README.md
```
## API Output Format
`POST /generate` returns:
```json
{
"code": "...",
"explanation": "...",
"confidence": 0.0,
"important_tokens": ["..."],
"relevancy_score": 0.0,
"hallucination": false,
"latency_ms": 0
}
```
If hallucination is detected, the reason is appended inside `explanation`.
## Local Run
1. Create environment and install:
```bash
pip install -r requirements.txt
```
Optional: create `.env` from `.env.example` and set values:
```bash
copy .env.example .env
```
2. Run API:
```bash
uvicorn api.main:app --host 0.0.0.0 --port 8000
```
3. Test request:
```bash
curl -X POST "http://127.0.0.1:8000/generate" ^
-H "Content-Type: application/json" ^
-d "{\"instruction\":\"Fix this Python function\",\"input\":\"def add(a,b) return a+b\"}"
```
4. Optional client:
```bash
python client_example.py
```
## Hugging Face Deployment (Space)
This repo includes:
- `app.py` (Gradio app for HF Space)
- `upload_to_hf.py` (upload helper script)
- `README_hf_space.md` (Space metadata template)
Steps:
1. Create a HF access token with write permission.
2. Run:
```bash
python upload_to_hf.py --repo-id <your-username/coding-llm-space> --token <HF_TOKEN>
```
3. Your Space launches with public UI and can be called by Hugging Face API key.
## Security and Ops
- API key auth enabled when `API_KEY` is set.
- In-memory per-IP rate limiting via `RATE_LIMIT_PER_MINUTE`.
- Dockerized API included (`Dockerfile`).
- Model is loaded lazily on first `/generate` request (faster boot, fewer startup failures).
- Set `FORCE_MOCK_MODE=true` to run instantly without downloading models.
- Windows quick-start scripts:
- `run_api.bat`
- `run_space.bat`
## Docker Compose
```bash
copy .env.example .env
docker compose up --build
```
API is available at `http://127.0.0.1:8000`.
## Automated Smoke Test
Run this after API starts:
```bash
python smoke_test.py
```
This validates:
- `GET /health`
- `POST /generate`
- required JSON output keys
## One-command Task Runner
Cross-platform:
```bash
python tasks.py install
python tasks.py run
python tasks.py smoke
python tasks.py serve-smoke
python tasks.py docker-up
python tasks.py docker-down
python tasks.py hf-upload --repo-id <user/space-name> --token <HF_TOKEN>
```
Windows shortcut:
```bat
run_tasks.bat install
run_tasks.bat run
run_tasks.bat smoke
run_tasks.bat serve-smoke
```
Makefile (Linux/macOS/WSL):
```bash
make install
make run
make smoke
make docker-up
```
## FastAPI Endpoint
### `POST /generate`
Input JSON:
```json
{
"instruction": "Explain this code and improve it",
"input": "def f(x): return x*x"
}
```
## Notes for Production
- Keep `max_new_tokens` modest for low latency.
- Add request auth/rate limiting before exposing public endpoint.
- For stronger quality, add curated retrieval corpus in `data/`.
- For robust hallucination checks, extend tests per language/framework.