# Advanced Coding LLM - Complete Instructions

This document provides full setup, run, validation, optimization, and deployment steps for the `coding-llm` project.

## 1) Prerequisites

- Python 3.10+ (recommended 3.11/3.12)
- Git
- Internet access for first model download
- Optional: Docker Desktop
- Optional: Hugging Face account and access token

## 2) Project Setup

From project root:

```bash
cd "C:\Users\GIRISH\OneDrive\Desktop\AI model_14_04_26\coding-llm"
```

Create environment file:

```bash
copy .env.example .env
```

Install dependencies:

```bash
python tasks.py install
```

## 3) Configure `.env`

Open `.env` and set values:

- `MODEL_NAME=Qwen/Qwen2.5-Coder-1.5B-Instruct`
- `FALLBACK_MODEL_NAME=Qwen/Qwen2.5-Coder-0.5B-Instruct`
- `FINAL_FALLBACK_MODEL_NAME=sshleifer/tiny-gpt2` (optional emergency fallback)
- `FORCE_MOCK_MODE=false` (true for instant test mode)
- `API_KEY=<your_secret_key>`
- `RATE_LIMIT_PER_MINUTE=30`
- `USE_RAG=true`

## 4) Run API Locally

```bash
python tasks.py run
```

Server runs at:

- `http://127.0.0.1:8000`

Health endpoint:

- `GET http://127.0.0.1:8000/health`

## 5) Run Smoke Tests

### Full smoke test

```bash
python smoke_test.py
```

### Health-only smoke test

```bash
set SMOKE_SKIP_GENERATE=true
python smoke_test.py
```

### Combined run-and-test command

```bash
python tasks.py serve-smoke
```

This starts server, executes smoke test, and shuts server down automatically.

## 6) If Generation Is Slow on First Run

First `/generate` may take long due to model download/warmup.

Options:

- Increase timeout:
  - `set SMOKE_TIMEOUT=900`
- Use mock mode for quick validation:
  - set `FORCE_MOCK_MODE=true`
- Run full mode after model cache is ready.

## 7) API Usage

### Endpoint

- `POST /generate`

### Input JSON

```json
{
  "instruction": "Fix this code",
  "input": "def add(a,b) return a+b"
}
```

### Required Header (if API key enabled)

- `x-api-key: <API_KEY>`

### Output JSON

```json
{
  "code": "...",
  "explanation": "...",
  "confidence": 0.0,
  "important_tokens": ["..."],
  "relevancy_score": 0.0,
  "hallucination": false,
  "latency_ms": 0
}
```

## 8) Docker Deployment

```bash
copy .env.example .env
docker compose up --build -d
```

Validate:

```bash
python smoke_test.py
```

Stop:

```bash
docker compose down
```

## 9) Hugging Face Space Deployment

Create HF token (write permission), then:

```bash
python tasks.py hf-upload --repo-id <username/coding-llm-space> --token <HF_TOKEN>
```

After upload, configure Space variables/secrets:

- `MODEL_NAME`
- `FALLBACK_MODEL_NAME`
- `FORCE_MOCK_MODE`
- `API_KEY` (if needed in your architecture)

## 10) Production Hardening Checklist

- Keep `API_KEY` enabled
- Keep rate limiting enabled (`RATE_LIMIT_PER_MINUTE`)
- Put API behind HTTPS reverse proxy
- Add logging and monitoring
- Pin model versions if strict reproducibility required
- Use `FORCE_MOCK_MODE=false` in production

## 11) Common Troubleshooting

- `WinError 10061`:
  - API server is not running. Start with `python tasks.py run`.
- `401 Unauthorized`:
  - `x-api-key` does not match server `API_KEY`.
- Health works but generate times out:
  - model is still downloading/warming up.
- Low-quality gibberish output:
  - likely fallback model path used; verify `.env` model names.

## 12) Recommended Daily Commands

- Install/update: `python tasks.py install`
- Run API: `python tasks.py run`
- Smoke: `python tasks.py smoke`
- Run+smoke: `python tasks.py serve-smoke`
- Docker up/down: `python tasks.py docker-up` / `python tasks.py docker-down`