Spaces:
Running
Running
A newer version of the Gradio SDK is available: 6.15.0
Advanced Coding LLM - Complete Instructions
This document provides full setup, run, validation, optimization, and deployment steps for the coding-llm project.
1) Prerequisites
- Python 3.10+ (recommended 3.11/3.12)
- Git
- Internet access for first model download
- Optional: Docker Desktop
- Optional: Hugging Face account and access token
2) Project Setup
From project root:
cd "C:\Users\GIRISH\OneDrive\Desktop\AI model_14_04_26\coding-llm"
Create environment file:
copy .env.example .env
Install dependencies:
python tasks.py install
3) Configure .env
Open .env and set values:
MODEL_NAME=Qwen/Qwen2.5-Coder-1.5B-InstructFALLBACK_MODEL_NAME=Qwen/Qwen2.5-Coder-0.5B-InstructFINAL_FALLBACK_MODEL_NAME=sshleifer/tiny-gpt2(optional emergency fallback)FORCE_MOCK_MODE=false(true for instant test mode)API_KEY=<your_secret_key>RATE_LIMIT_PER_MINUTE=30USE_RAG=true
4) Run API Locally
python tasks.py run
Server runs at:
http://127.0.0.1:8000
Health endpoint:
GET http://127.0.0.1:8000/health
5) Run Smoke Tests
Full smoke test
python smoke_test.py
Health-only smoke test
set SMOKE_SKIP_GENERATE=true
python smoke_test.py
Combined run-and-test command
python tasks.py serve-smoke
This starts server, executes smoke test, and shuts server down automatically.
6) If Generation Is Slow on First Run
First /generate may take long due to model download/warmup.
Options:
- Increase timeout:
set SMOKE_TIMEOUT=900
- Use mock mode for quick validation:
- set
FORCE_MOCK_MODE=true
- set
- Run full mode after model cache is ready.
7) API Usage
Endpoint
POST /generate
Input JSON
{
"instruction": "Fix this code",
"input": "def add(a,b) return a+b"
}
Required Header (if API key enabled)
x-api-key: <API_KEY>
Output JSON
{
"code": "...",
"explanation": "...",
"confidence": 0.0,
"important_tokens": ["..."],
"relevancy_score": 0.0,
"hallucination": false,
"latency_ms": 0
}
8) Docker Deployment
copy .env.example .env
docker compose up --build -d
Validate:
python smoke_test.py
Stop:
docker compose down
9) Hugging Face Space Deployment
Create HF token (write permission), then:
python tasks.py hf-upload --repo-id <username/coding-llm-space> --token <HF_TOKEN>
After upload, configure Space variables/secrets:
MODEL_NAMEFALLBACK_MODEL_NAMEFORCE_MOCK_MODEAPI_KEY(if needed in your architecture)
10) Production Hardening Checklist
- Keep
API_KEYenabled - Keep rate limiting enabled (
RATE_LIMIT_PER_MINUTE) - Put API behind HTTPS reverse proxy
- Add logging and monitoring
- Pin model versions if strict reproducibility required
- Use
FORCE_MOCK_MODE=falsein production
11) Common Troubleshooting
WinError 10061:- API server is not running. Start with
python tasks.py run.
- API server is not running. Start with
401 Unauthorized:x-api-keydoes not match serverAPI_KEY.
- Health works but generate times out:
- model is still downloading/warming up.
- Low-quality gibberish output:
- likely fallback model path used; verify
.envmodel names.
- likely fallback model path used; verify
12) Recommended Daily Commands
- Install/update:
python tasks.py install - Run API:
python tasks.py run - Smoke:
python tasks.py smoke - Run+smoke:
python tasks.py serve-smoke - Docker up/down:
python tasks.py docker-up/python tasks.py docker-down