Spaces:

girish00
/

coding-llm-space

Running

App Files Files Community

coding-llm-space / instruction.md

girish00

Upload folder using huggingface_hub

07a91a1 verified about 1 month ago

preview code

raw

history blame contribute delete

3.75 kB

A newer version of the Gradio SDK is available: 6.15.0

Upgrade

Advanced Coding LLM - Complete Instructions

This document provides full setup, run, validation, optimization, and deployment steps for the coding-llm project.

1) Prerequisites

Python 3.10+ (recommended 3.11/3.12)
Git
Internet access for first model download
Optional: Docker Desktop
Optional: Hugging Face account and access token

2) Project Setup

From project root:

cd "C:\Users\GIRISH\OneDrive\Desktop\AI model_14_04_26\coding-llm"

Create environment file:

copy .env.example .env

Install dependencies:

python tasks.py install

3) Configure `.env`

Open .env and set values:

MODEL_NAME=Qwen/Qwen2.5-Coder-1.5B-Instruct
FALLBACK_MODEL_NAME=Qwen/Qwen2.5-Coder-0.5B-Instruct
FINAL_FALLBACK_MODEL_NAME=sshleifer/tiny-gpt2 (optional emergency fallback)
FORCE_MOCK_MODE=false (true for instant test mode)
API_KEY=<your_secret_key>
RATE_LIMIT_PER_MINUTE=30
USE_RAG=true

4) Run API Locally

python tasks.py run

Server runs at:

http://127.0.0.1:8000

Health endpoint:

GET http://127.0.0.1:8000/health

5) Run Smoke Tests

Full smoke test

python smoke_test.py

Health-only smoke test

set SMOKE_SKIP_GENERATE=true
python smoke_test.py

Combined run-and-test command

python tasks.py serve-smoke

This starts server, executes smoke test, and shuts server down automatically.

6) If Generation Is Slow on First Run

First /generate may take long due to model download/warmup.

Options:

Increase timeout:
- set SMOKE_TIMEOUT=900
Use mock mode for quick validation:
- set FORCE_MOCK_MODE=true
Run full mode after model cache is ready.

7) API Usage

Endpoint

POST /generate

Input JSON

{
  "instruction": "Fix this code",
  "input": "def add(a,b) return a+b"
}

Required Header (if API key enabled)

x-api-key: <API_KEY>

Output JSON

{
  "code": "...",
  "explanation": "...",
  "confidence": 0.0,
  "important_tokens": ["..."],
  "relevancy_score": 0.0,
  "hallucination": false,
  "latency_ms": 0
}

8) Docker Deployment

copy .env.example .env
docker compose up --build -d

Validate:

python smoke_test.py

Stop:

docker compose down

9) Hugging Face Space Deployment

Create HF token (write permission), then:

python tasks.py hf-upload --repo-id <username/coding-llm-space> --token <HF_TOKEN>

After upload, configure Space variables/secrets:

MODEL_NAME
FALLBACK_MODEL_NAME
FORCE_MOCK_MODE
API_KEY (if needed in your architecture)

10) Production Hardening Checklist

Keep API_KEY enabled
Keep rate limiting enabled (RATE_LIMIT_PER_MINUTE)
Put API behind HTTPS reverse proxy
Add logging and monitoring
Pin model versions if strict reproducibility required
Use FORCE_MOCK_MODE=false in production

11) Common Troubleshooting

WinError 10061:
- API server is not running. Start with python tasks.py run.
401 Unauthorized:
- x-api-key does not match server API_KEY.
Health works but generate times out:
- model is still downloading/warming up.
Low-quality gibberish output:
- likely fallback model path used; verify .env model names.

12) Recommended Daily Commands

Install/update: python tasks.py install
Run API: python tasks.py run
Smoke: python tasks.py smoke
Run+smoke: python tasks.py serve-smoke
Docker up/down: python tasks.py docker-up / python tasks.py docker-down