# Advanced Coding LLM - Complete Instructions This document provides full setup, run, validation, optimization, and deployment steps for the `coding-llm` project. ## 1) Prerequisites - Python 3.10+ (recommended 3.11/3.12) - Git - Internet access for first model download - Optional: Docker Desktop - Optional: Hugging Face account and access token ## 2) Project Setup From project root: ```bash cd "C:\Users\GIRISH\OneDrive\Desktop\AI model_14_04_26\coding-llm" ``` Create environment file: ```bash copy .env.example .env ``` Install dependencies: ```bash python tasks.py install ``` ## 3) Configure `.env` Open `.env` and set values: - `MODEL_NAME=Qwen/Qwen2.5-Coder-1.5B-Instruct` - `FALLBACK_MODEL_NAME=Qwen/Qwen2.5-Coder-0.5B-Instruct` - `FINAL_FALLBACK_MODEL_NAME=sshleifer/tiny-gpt2` (optional emergency fallback) - `FORCE_MOCK_MODE=false` (true for instant test mode) - `API_KEY=` - `RATE_LIMIT_PER_MINUTE=30` - `USE_RAG=true` ## 4) Run API Locally ```bash python tasks.py run ``` Server runs at: - `http://127.0.0.1:8000` Health endpoint: - `GET http://127.0.0.1:8000/health` ## 5) Run Smoke Tests ### Full smoke test ```bash python smoke_test.py ``` ### Health-only smoke test ```bash set SMOKE_SKIP_GENERATE=true python smoke_test.py ``` ### Combined run-and-test command ```bash python tasks.py serve-smoke ``` This starts server, executes smoke test, and shuts server down automatically. ## 6) If Generation Is Slow on First Run First `/generate` may take long due to model download/warmup. Options: - Increase timeout: - `set SMOKE_TIMEOUT=900` - Use mock mode for quick validation: - set `FORCE_MOCK_MODE=true` - Run full mode after model cache is ready. ## 7) API Usage ### Endpoint - `POST /generate` ### Input JSON ```json { "instruction": "Fix this code", "input": "def add(a,b) return a+b" } ``` ### Required Header (if API key enabled) - `x-api-key: ` ### Output JSON ```json { "code": "...", "explanation": "...", "confidence": 0.0, "important_tokens": ["..."], "relevancy_score": 0.0, "hallucination": false, "latency_ms": 0 } ``` ## 8) Docker Deployment ```bash copy .env.example .env docker compose up --build -d ``` Validate: ```bash python smoke_test.py ``` Stop: ```bash docker compose down ``` ## 9) Hugging Face Space Deployment Create HF token (write permission), then: ```bash python tasks.py hf-upload --repo-id --token ``` After upload, configure Space variables/secrets: - `MODEL_NAME` - `FALLBACK_MODEL_NAME` - `FORCE_MOCK_MODE` - `API_KEY` (if needed in your architecture) ## 10) Production Hardening Checklist - Keep `API_KEY` enabled - Keep rate limiting enabled (`RATE_LIMIT_PER_MINUTE`) - Put API behind HTTPS reverse proxy - Add logging and monitoring - Pin model versions if strict reproducibility required - Use `FORCE_MOCK_MODE=false` in production ## 11) Common Troubleshooting - `WinError 10061`: - API server is not running. Start with `python tasks.py run`. - `401 Unauthorized`: - `x-api-key` does not match server `API_KEY`. - Health works but generate times out: - model is still downloading/warming up. - Low-quality gibberish output: - likely fallback model path used; verify `.env` model names. ## 12) Recommended Daily Commands - Install/update: `python tasks.py install` - Run API: `python tasks.py run` - Smoke: `python tasks.py smoke` - Run+smoke: `python tasks.py serve-smoke` - Docker up/down: `python tasks.py docker-up` / `python tasks.py docker-down`