Spaces:

girish00
/

coding-llm-space

Sleeping

File size: 4,084 Bytes

---
title: Coding LLM Space
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.12.0
python_version: '3.10'
app_file: app.py
pinned: false
---


# Advanced Coding LLM (Production-Ready Starter)

This project provides a deployable coding assistant API built on a free Hugging Face coding model.

## Model Strategy

- Primary model: `Qwen/Qwen2.5-Coder-1.5B-Instruct` (free/open on Hugging Face).
- Fallback model: `Qwen/Qwen2.5-Coder-0.5B-Instruct` if primary load fails.
- Final emergency fallback: `sshleifer/tiny-gpt2` (for guaranteed startup).
- No heavy training required.
- LoRA-ready architecture included in `src/lora_prepare.py`.

## Features

- Code generation
- Debugging / buggy code fixing
- Code explanation
- Instruction following
- Confidence estimation (from token probabilities)
- Important token extraction (low-confidence tokens)
- Relevancy score (embedding cosine similarity)
- Hallucination checks:
  - Syntax validation
  - Runtime smoke test
- Optional RAG with FAISS from `data/sample_snippets.json`

## Project Structure

```text
coding-llm/
│── data/
│── src/
│── api/
│── requirements.txt
│── README.md
```

## API Output Format

`POST /generate` returns:

```json
{
  "code": "...",
  "explanation": "...",
  "confidence": 0.0,
  "important_tokens": ["..."],
  "relevancy_score": 0.0,
  "hallucination": false,
  "latency_ms": 0
}
```

If hallucination is detected, the reason is appended inside `explanation`.

## Local Run

1. Create environment and install:

```bash
pip install -r requirements.txt
```

Optional: create `.env` from `.env.example` and set values:

```bash
copy .env.example .env
```

2. Run API:

```bash
uvicorn api.main:app --host 0.0.0.0 --port 8000
```

3. Test request:

```bash
curl -X POST "http://127.0.0.1:8000/generate" ^
  -H "Content-Type: application/json" ^
  -d "{\"instruction\":\"Fix this Python function\",\"input\":\"def add(a,b) return a+b\"}"
```

4. Optional client:

```bash
python client_example.py
```

## Hugging Face Deployment (Space)

This repo includes:

- `app.py` (Gradio app for HF Space)
- `upload_to_hf.py` (upload helper script)
- `README_hf_space.md` (Space metadata template)

Steps:

1. Create a HF access token with write permission.
2. Run:

```bash
python upload_to_hf.py --repo-id <your-username/coding-llm-space> --token <HF_TOKEN>
```

3. Your Space launches with public UI and can be called by Hugging Face API key.

## Security and Ops

- API key auth enabled when `API_KEY` is set.
- In-memory per-IP rate limiting via `RATE_LIMIT_PER_MINUTE`.
- Dockerized API included (`Dockerfile`).
- Model is loaded lazily on first `/generate` request (faster boot, fewer startup failures).
- Set `FORCE_MOCK_MODE=true` to run instantly without downloading models.
- Windows quick-start scripts:
  - `run_api.bat`
  - `run_space.bat`

## Docker Compose

```bash
copy .env.example .env
docker compose up --build
```

API is available at `http://127.0.0.1:8000`.

## Automated Smoke Test

Run this after API starts:

```bash
python smoke_test.py
```

This validates:
- `GET /health`
- `POST /generate`
- required JSON output keys

## One-command Task Runner

Cross-platform:

```bash
python tasks.py install
python tasks.py run
python tasks.py smoke
python tasks.py serve-smoke
python tasks.py docker-up
python tasks.py docker-down
python tasks.py hf-upload --repo-id <user/space-name> --token <HF_TOKEN>
```

Windows shortcut:

```bat
run_tasks.bat install
run_tasks.bat run
run_tasks.bat smoke
run_tasks.bat serve-smoke
```

Makefile (Linux/macOS/WSL):

```bash
make install
make run
make smoke
make docker-up
```

## FastAPI Endpoint

### `POST /generate`

Input JSON:

```json
{
  "instruction": "Explain this code and improve it",
  "input": "def f(x): return x*x"
}
```

## Notes for Production

- Keep `max_new_tokens` modest for low latency.
- Add request auth/rate limiting before exposing public endpoint.
- For stronger quality, add curated retrieval corpus in `data/`.
- For robust hallucination checks, extend tests per language/framework.