Spaces:
Running
Running
A newer version of the Gradio SDK is available: 6.14.0
metadata
title: Coding LLM Space
emoji: π€
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.12.0
python_version: '3.10'
app_file: app.py
pinned: false
Advanced Coding LLM (Production-Ready Starter)
This project provides a deployable coding assistant API built on a free Hugging Face coding model.
Model Strategy
- Primary model:
Qwen/Qwen2.5-Coder-1.5B-Instruct(free/open on Hugging Face). - Fallback model:
Qwen/Qwen2.5-Coder-0.5B-Instructif primary load fails. - Final emergency fallback:
sshleifer/tiny-gpt2(for guaranteed startup). - No heavy training required.
- LoRA-ready architecture included in
src/lora_prepare.py.
Features
- Code generation
- Debugging / buggy code fixing
- Code explanation
- Instruction following
- Confidence estimation (from token probabilities)
- Important token extraction (low-confidence tokens)
- Relevancy score (embedding cosine similarity)
- Hallucination checks:
- Syntax validation
- Runtime smoke test
- Optional RAG with FAISS from
data/sample_snippets.json
Project Structure
coding-llm/
βββ data/
βββ src/
βββ api/
βββ requirements.txt
βββ README.md
API Output Format
POST /generate returns:
{
"code": "...",
"explanation": "...",
"confidence": 0.0,
"important_tokens": ["..."],
"relevancy_score": 0.0,
"hallucination": false,
"latency_ms": 0
}
If hallucination is detected, the reason is appended inside explanation.
Local Run
- Create environment and install:
pip install -r requirements.txt
Optional: create .env from .env.example and set values:
copy .env.example .env
- Run API:
uvicorn api.main:app --host 0.0.0.0 --port 8000
- Test request:
curl -X POST "http://127.0.0.1:8000/generate" ^
-H "Content-Type: application/json" ^
-d "{\"instruction\":\"Fix this Python function\",\"input\":\"def add(a,b) return a+b\"}"
- Optional client:
python client_example.py
Hugging Face Deployment (Space)
This repo includes:
app.py(Gradio app for HF Space)upload_to_hf.py(upload helper script)README_hf_space.md(Space metadata template)
Steps:
- Create a HF access token with write permission.
- Run:
python upload_to_hf.py --repo-id <your-username/coding-llm-space> --token <HF_TOKEN>
- Your Space launches with public UI and can be called by Hugging Face API key.
Security and Ops
- API key auth enabled when
API_KEYis set. - In-memory per-IP rate limiting via
RATE_LIMIT_PER_MINUTE. - Dockerized API included (
Dockerfile). - Model is loaded lazily on first
/generaterequest (faster boot, fewer startup failures). - Set
FORCE_MOCK_MODE=trueto run instantly without downloading models. - Windows quick-start scripts:
run_api.batrun_space.bat
Docker Compose
copy .env.example .env
docker compose up --build
API is available at http://127.0.0.1:8000.
Automated Smoke Test
Run this after API starts:
python smoke_test.py
This validates:
GET /healthPOST /generate- required JSON output keys
One-command Task Runner
Cross-platform:
python tasks.py install
python tasks.py run
python tasks.py smoke
python tasks.py serve-smoke
python tasks.py docker-up
python tasks.py docker-down
python tasks.py hf-upload --repo-id <user/space-name> --token <HF_TOKEN>
Windows shortcut:
run_tasks.bat install
run_tasks.bat run
run_tasks.bat smoke
run_tasks.bat serve-smoke
Makefile (Linux/macOS/WSL):
make install
make run
make smoke
make docker-up
FastAPI Endpoint
POST /generate
Input JSON:
{
"instruction": "Explain this code and improve it",
"input": "def f(x): return x*x"
}
Notes for Production
- Keep
max_new_tokensmodest for low latency. - Add request auth/rate limiting before exposing public endpoint.
- For stronger quality, add curated retrieval corpus in
data/. - For robust hallucination checks, extend tests per language/framework.