coding-llm-space / instruction.md
girish00's picture
Upload folder using huggingface_hub
07a91a1 verified
# Advanced Coding LLM - Complete Instructions
This document provides full setup, run, validation, optimization, and deployment steps for the `coding-llm` project.
## 1) Prerequisites
- Python 3.10+ (recommended 3.11/3.12)
- Git
- Internet access for first model download
- Optional: Docker Desktop
- Optional: Hugging Face account and access token
## 2) Project Setup
From project root:
```bash
cd "C:\Users\GIRISH\OneDrive\Desktop\AI model_14_04_26\coding-llm"
```
Create environment file:
```bash
copy .env.example .env
```
Install dependencies:
```bash
python tasks.py install
```
## 3) Configure `.env`
Open `.env` and set values:
- `MODEL_NAME=Qwen/Qwen2.5-Coder-1.5B-Instruct`
- `FALLBACK_MODEL_NAME=Qwen/Qwen2.5-Coder-0.5B-Instruct`
- `FINAL_FALLBACK_MODEL_NAME=sshleifer/tiny-gpt2` (optional emergency fallback)
- `FORCE_MOCK_MODE=false` (true for instant test mode)
- `API_KEY=<your_secret_key>`
- `RATE_LIMIT_PER_MINUTE=30`
- `USE_RAG=true`
## 4) Run API Locally
```bash
python tasks.py run
```
Server runs at:
- `http://127.0.0.1:8000`
Health endpoint:
- `GET http://127.0.0.1:8000/health`
## 5) Run Smoke Tests
### Full smoke test
```bash
python smoke_test.py
```
### Health-only smoke test
```bash
set SMOKE_SKIP_GENERATE=true
python smoke_test.py
```
### Combined run-and-test command
```bash
python tasks.py serve-smoke
```
This starts server, executes smoke test, and shuts server down automatically.
## 6) If Generation Is Slow on First Run
First `/generate` may take long due to model download/warmup.
Options:
- Increase timeout:
- `set SMOKE_TIMEOUT=900`
- Use mock mode for quick validation:
- set `FORCE_MOCK_MODE=true`
- Run full mode after model cache is ready.
## 7) API Usage
### Endpoint
- `POST /generate`
### Input JSON
```json
{
"instruction": "Fix this code",
"input": "def add(a,b) return a+b"
}
```
### Required Header (if API key enabled)
- `x-api-key: <API_KEY>`
### Output JSON
```json
{
"code": "...",
"explanation": "...",
"confidence": 0.0,
"important_tokens": ["..."],
"relevancy_score": 0.0,
"hallucination": false,
"latency_ms": 0
}
```
## 8) Docker Deployment
```bash
copy .env.example .env
docker compose up --build -d
```
Validate:
```bash
python smoke_test.py
```
Stop:
```bash
docker compose down
```
## 9) Hugging Face Space Deployment
Create HF token (write permission), then:
```bash
python tasks.py hf-upload --repo-id <username/coding-llm-space> --token <HF_TOKEN>
```
After upload, configure Space variables/secrets:
- `MODEL_NAME`
- `FALLBACK_MODEL_NAME`
- `FORCE_MOCK_MODE`
- `API_KEY` (if needed in your architecture)
## 10) Production Hardening Checklist
- Keep `API_KEY` enabled
- Keep rate limiting enabled (`RATE_LIMIT_PER_MINUTE`)
- Put API behind HTTPS reverse proxy
- Add logging and monitoring
- Pin model versions if strict reproducibility required
- Use `FORCE_MOCK_MODE=false` in production
## 11) Common Troubleshooting
- `WinError 10061`:
- API server is not running. Start with `python tasks.py run`.
- `401 Unauthorized`:
- `x-api-key` does not match server `API_KEY`.
- Health works but generate times out:
- model is still downloading/warming up.
- Low-quality gibberish output:
- likely fallback model path used; verify `.env` model names.
## 12) Recommended Daily Commands
- Install/update: `python tasks.py install`
- Run API: `python tasks.py run`
- Smoke: `python tasks.py smoke`
- Run+smoke: `python tasks.py serve-smoke`
- Docker up/down: `python tasks.py docker-up` / `python tasks.py docker-down`