File size: 4,084 Bytes
5bfaf88
9929cd4
 
 
 
 
9ab7030
 
9929cd4
 
5bfaf88
9441f22
4dbc2c6
9929cd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b4c2a58
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
---
title: Coding LLM Space
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.12.0
python_version: '3.10'
app_file: app.py
pinned: false
---


# Advanced Coding LLM (Production-Ready Starter)

This project provides a deployable coding assistant API built on a free Hugging Face coding model.

## Model Strategy

- Primary model: `Qwen/Qwen2.5-Coder-1.5B-Instruct` (free/open on Hugging Face).
- Fallback model: `Qwen/Qwen2.5-Coder-0.5B-Instruct` if primary load fails.
- Final emergency fallback: `sshleifer/tiny-gpt2` (for guaranteed startup).
- No heavy training required.
- LoRA-ready architecture included in `src/lora_prepare.py`.

## Features

- Code generation
- Debugging / buggy code fixing
- Code explanation
- Instruction following
- Confidence estimation (from token probabilities)
- Important token extraction (low-confidence tokens)
- Relevancy score (embedding cosine similarity)
- Hallucination checks:
  - Syntax validation
  - Runtime smoke test
- Optional RAG with FAISS from `data/sample_snippets.json`

## Project Structure

```text
coding-llm/
│── data/
│── src/
│── api/
│── requirements.txt
│── README.md
```

## API Output Format

`POST /generate` returns:

```json
{
  "code": "...",
  "explanation": "...",
  "confidence": 0.0,
  "important_tokens": ["..."],
  "relevancy_score": 0.0,
  "hallucination": false,
  "latency_ms": 0
}
```

If hallucination is detected, the reason is appended inside `explanation`.

## Local Run

1. Create environment and install:

```bash
pip install -r requirements.txt
```

Optional: create `.env` from `.env.example` and set values:

```bash
copy .env.example .env
```

2. Run API:

```bash
uvicorn api.main:app --host 0.0.0.0 --port 8000
```

3. Test request:

```bash
curl -X POST "http://127.0.0.1:8000/generate" ^
  -H "Content-Type: application/json" ^
  -d "{\"instruction\":\"Fix this Python function\",\"input\":\"def add(a,b) return a+b\"}"
```

4. Optional client:

```bash
python client_example.py
```

## Hugging Face Deployment (Space)

This repo includes:

- `app.py` (Gradio app for HF Space)
- `upload_to_hf.py` (upload helper script)
- `README_hf_space.md` (Space metadata template)

Steps:

1. Create a HF access token with write permission.
2. Run:

```bash
python upload_to_hf.py --repo-id <your-username/coding-llm-space> --token <HF_TOKEN>
```

3. Your Space launches with public UI and can be called by Hugging Face API key.

## Security and Ops

- API key auth enabled when `API_KEY` is set.
- In-memory per-IP rate limiting via `RATE_LIMIT_PER_MINUTE`.
- Dockerized API included (`Dockerfile`).
- Model is loaded lazily on first `/generate` request (faster boot, fewer startup failures).
- Set `FORCE_MOCK_MODE=true` to run instantly without downloading models.
- Windows quick-start scripts:
  - `run_api.bat`
  - `run_space.bat`

## Docker Compose

```bash
copy .env.example .env
docker compose up --build
```

API is available at `http://127.0.0.1:8000`.

## Automated Smoke Test

Run this after API starts:

```bash
python smoke_test.py
```

This validates:
- `GET /health`
- `POST /generate`
- required JSON output keys

## One-command Task Runner

Cross-platform:

```bash
python tasks.py install
python tasks.py run
python tasks.py smoke
python tasks.py serve-smoke
python tasks.py docker-up
python tasks.py docker-down
python tasks.py hf-upload --repo-id <user/space-name> --token <HF_TOKEN>
```

Windows shortcut:

```bat
run_tasks.bat install
run_tasks.bat run
run_tasks.bat smoke
run_tasks.bat serve-smoke
```

Makefile (Linux/macOS/WSL):

```bash
make install
make run
make smoke
make docker-up
```

## FastAPI Endpoint

### `POST /generate`

Input JSON:

```json
{
  "instruction": "Explain this code and improve it",
  "input": "def f(x): return x*x"
}
```

## Notes for Production

- Keep `max_new_tokens` modest for low latency.
- Add request auth/rate limiting before exposing public endpoint.
- For stronger quality, add curated retrieval corpus in `data/`.
- For robust hallucination checks, extend tests per language/framework.