coding-llm-space / instruction.md
girish00's picture
Upload folder using huggingface_hub
07a91a1 verified

A newer version of the Gradio SDK is available: 6.15.0

Upgrade

Advanced Coding LLM - Complete Instructions

This document provides full setup, run, validation, optimization, and deployment steps for the coding-llm project.

1) Prerequisites

  • Python 3.10+ (recommended 3.11/3.12)
  • Git
  • Internet access for first model download
  • Optional: Docker Desktop
  • Optional: Hugging Face account and access token

2) Project Setup

From project root:

cd "C:\Users\GIRISH\OneDrive\Desktop\AI model_14_04_26\coding-llm"

Create environment file:

copy .env.example .env

Install dependencies:

python tasks.py install

3) Configure .env

Open .env and set values:

  • MODEL_NAME=Qwen/Qwen2.5-Coder-1.5B-Instruct
  • FALLBACK_MODEL_NAME=Qwen/Qwen2.5-Coder-0.5B-Instruct
  • FINAL_FALLBACK_MODEL_NAME=sshleifer/tiny-gpt2 (optional emergency fallback)
  • FORCE_MOCK_MODE=false (true for instant test mode)
  • API_KEY=<your_secret_key>
  • RATE_LIMIT_PER_MINUTE=30
  • USE_RAG=true

4) Run API Locally

python tasks.py run

Server runs at:

  • http://127.0.0.1:8000

Health endpoint:

  • GET http://127.0.0.1:8000/health

5) Run Smoke Tests

Full smoke test

python smoke_test.py

Health-only smoke test

set SMOKE_SKIP_GENERATE=true
python smoke_test.py

Combined run-and-test command

python tasks.py serve-smoke

This starts server, executes smoke test, and shuts server down automatically.

6) If Generation Is Slow on First Run

First /generate may take long due to model download/warmup.

Options:

  • Increase timeout:
    • set SMOKE_TIMEOUT=900
  • Use mock mode for quick validation:
    • set FORCE_MOCK_MODE=true
  • Run full mode after model cache is ready.

7) API Usage

Endpoint

  • POST /generate

Input JSON

{
  "instruction": "Fix this code",
  "input": "def add(a,b) return a+b"
}

Required Header (if API key enabled)

  • x-api-key: <API_KEY>

Output JSON

{
  "code": "...",
  "explanation": "...",
  "confidence": 0.0,
  "important_tokens": ["..."],
  "relevancy_score": 0.0,
  "hallucination": false,
  "latency_ms": 0
}

8) Docker Deployment

copy .env.example .env
docker compose up --build -d

Validate:

python smoke_test.py

Stop:

docker compose down

9) Hugging Face Space Deployment

Create HF token (write permission), then:

python tasks.py hf-upload --repo-id <username/coding-llm-space> --token <HF_TOKEN>

After upload, configure Space variables/secrets:

  • MODEL_NAME
  • FALLBACK_MODEL_NAME
  • FORCE_MOCK_MODE
  • API_KEY (if needed in your architecture)

10) Production Hardening Checklist

  • Keep API_KEY enabled
  • Keep rate limiting enabled (RATE_LIMIT_PER_MINUTE)
  • Put API behind HTTPS reverse proxy
  • Add logging and monitoring
  • Pin model versions if strict reproducibility required
  • Use FORCE_MOCK_MODE=false in production

11) Common Troubleshooting

  • WinError 10061:
    • API server is not running. Start with python tasks.py run.
  • 401 Unauthorized:
    • x-api-key does not match server API_KEY.
  • Health works but generate times out:
    • model is still downloading/warming up.
  • Low-quality gibberish output:
    • likely fallback model path used; verify .env model names.

12) Recommended Daily Commands

  • Install/update: python tasks.py install
  • Run API: python tasks.py run
  • Smoke: python tasks.py smoke
  • Run+smoke: python tasks.py serve-smoke
  • Docker up/down: python tasks.py docker-up / python tasks.py docker-down