Spaces:

girish00
/

coding-llm-space

Running

App Files Files Community

coding-llm-space / README.md

girish00

Update README.md

9ab7030 verified about 1 month ago

preview code

raw

history blame contribute delete

4.08 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: Coding LLM Space
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.12.0
python_version: '3.10'
app_file: app.py
pinned: false

Advanced Coding LLM (Production-Ready Starter)

This project provides a deployable coding assistant API built on a free Hugging Face coding model.

Model Strategy

Primary model: Qwen/Qwen2.5-Coder-1.5B-Instruct (free/open on Hugging Face).
Fallback model: Qwen/Qwen2.5-Coder-0.5B-Instruct if primary load fails.
Final emergency fallback: sshleifer/tiny-gpt2 (for guaranteed startup).
No heavy training required.
LoRA-ready architecture included in src/lora_prepare.py.

Features

Code generation
Debugging / buggy code fixing
Code explanation
Instruction following
Confidence estimation (from token probabilities)
Important token extraction (low-confidence tokens)
Relevancy score (embedding cosine similarity)
Hallucination checks:
- Syntax validation
- Runtime smoke test
Optional RAG with FAISS from data/sample_snippets.json

Project Structure

coding-llm/
│── data/
│── src/
│── api/
│── requirements.txt
│── README.md

API Output Format

POST /generate returns:

{
  "code": "...",
  "explanation": "...",
  "confidence": 0.0,
  "important_tokens": ["..."],
  "relevancy_score": 0.0,
  "hallucination": false,
  "latency_ms": 0
}

If hallucination is detected, the reason is appended inside explanation.

Local Run

Create environment and install:

pip install -r requirements.txt

Optional: create .env from .env.example and set values:

copy .env.example .env

Run API:

uvicorn api.main:app --host 0.0.0.0 --port 8000

Test request:

curl -X POST "http://127.0.0.1:8000/generate" ^
  -H "Content-Type: application/json" ^
  -d "{\"instruction\":\"Fix this Python function\",\"input\":\"def add(a,b) return a+b\"}"

Optional client:

python client_example.py

Hugging Face Deployment (Space)

This repo includes:

app.py (Gradio app for HF Space)
upload_to_hf.py (upload helper script)
README_hf_space.md (Space metadata template)

Steps:

Create a HF access token with write permission.
Run:

python upload_to_hf.py --repo-id <your-username/coding-llm-space> --token <HF_TOKEN>

Your Space launches with public UI and can be called by Hugging Face API key.

Security and Ops

API key auth enabled when API_KEY is set.
In-memory per-IP rate limiting via RATE_LIMIT_PER_MINUTE.
Dockerized API included (Dockerfile).
Model is loaded lazily on first /generate request (faster boot, fewer startup failures).
Set FORCE_MOCK_MODE=true to run instantly without downloading models.
Windows quick-start scripts:
- run_api.bat
- run_space.bat

Docker Compose

copy .env.example .env
docker compose up --build

API is available at http://127.0.0.1:8000.

Automated Smoke Test

Run this after API starts:

python smoke_test.py

This validates:

GET /health
POST /generate
required JSON output keys

One-command Task Runner

Cross-platform:

python tasks.py install
python tasks.py run
python tasks.py smoke
python tasks.py serve-smoke
python tasks.py docker-up
python tasks.py docker-down
python tasks.py hf-upload --repo-id <user/space-name> --token <HF_TOKEN>

Windows shortcut:

run_tasks.bat install
run_tasks.bat run
run_tasks.bat smoke
run_tasks.bat serve-smoke

Makefile (Linux/macOS/WSL):

make install
make run
make smoke
make docker-up

FastAPI Endpoint

`POST /generate`

Input JSON:

{
  "instruction": "Explain this code and improve it",
  "input": "def f(x): return x*x"
}

Notes for Production

Keep max_new_tokens modest for low latency.
Add request auth/rate limiting before exposing public endpoint.
For stronger quality, add curated retrieval corpus in data/.
For robust hallucination checks, extend tests per language/framework.