coding-llm-space / README.md
girish00's picture
Update README.md
9ab7030 verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: Coding LLM Space
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.12.0
python_version: '3.10'
app_file: app.py
pinned: false

Advanced Coding LLM (Production-Ready Starter)

This project provides a deployable coding assistant API built on a free Hugging Face coding model.

Model Strategy

  • Primary model: Qwen/Qwen2.5-Coder-1.5B-Instruct (free/open on Hugging Face).
  • Fallback model: Qwen/Qwen2.5-Coder-0.5B-Instruct if primary load fails.
  • Final emergency fallback: sshleifer/tiny-gpt2 (for guaranteed startup).
  • No heavy training required.
  • LoRA-ready architecture included in src/lora_prepare.py.

Features

  • Code generation
  • Debugging / buggy code fixing
  • Code explanation
  • Instruction following
  • Confidence estimation (from token probabilities)
  • Important token extraction (low-confidence tokens)
  • Relevancy score (embedding cosine similarity)
  • Hallucination checks:
    • Syntax validation
    • Runtime smoke test
  • Optional RAG with FAISS from data/sample_snippets.json

Project Structure

coding-llm/
│── data/
│── src/
│── api/
│── requirements.txt
│── README.md

API Output Format

POST /generate returns:

{
  "code": "...",
  "explanation": "...",
  "confidence": 0.0,
  "important_tokens": ["..."],
  "relevancy_score": 0.0,
  "hallucination": false,
  "latency_ms": 0
}

If hallucination is detected, the reason is appended inside explanation.

Local Run

  1. Create environment and install:
pip install -r requirements.txt

Optional: create .env from .env.example and set values:

copy .env.example .env
  1. Run API:
uvicorn api.main:app --host 0.0.0.0 --port 8000
  1. Test request:
curl -X POST "http://127.0.0.1:8000/generate" ^
  -H "Content-Type: application/json" ^
  -d "{\"instruction\":\"Fix this Python function\",\"input\":\"def add(a,b) return a+b\"}"
  1. Optional client:
python client_example.py

Hugging Face Deployment (Space)

This repo includes:

  • app.py (Gradio app for HF Space)
  • upload_to_hf.py (upload helper script)
  • README_hf_space.md (Space metadata template)

Steps:

  1. Create a HF access token with write permission.
  2. Run:
python upload_to_hf.py --repo-id <your-username/coding-llm-space> --token <HF_TOKEN>
  1. Your Space launches with public UI and can be called by Hugging Face API key.

Security and Ops

  • API key auth enabled when API_KEY is set.
  • In-memory per-IP rate limiting via RATE_LIMIT_PER_MINUTE.
  • Dockerized API included (Dockerfile).
  • Model is loaded lazily on first /generate request (faster boot, fewer startup failures).
  • Set FORCE_MOCK_MODE=true to run instantly without downloading models.
  • Windows quick-start scripts:
    • run_api.bat
    • run_space.bat

Docker Compose

copy .env.example .env
docker compose up --build

API is available at http://127.0.0.1:8000.

Automated Smoke Test

Run this after API starts:

python smoke_test.py

This validates:

  • GET /health
  • POST /generate
  • required JSON output keys

One-command Task Runner

Cross-platform:

python tasks.py install
python tasks.py run
python tasks.py smoke
python tasks.py serve-smoke
python tasks.py docker-up
python tasks.py docker-down
python tasks.py hf-upload --repo-id <user/space-name> --token <HF_TOKEN>

Windows shortcut:

run_tasks.bat install
run_tasks.bat run
run_tasks.bat smoke
run_tasks.bat serve-smoke

Makefile (Linux/macOS/WSL):

make install
make run
make smoke
make docker-up

FastAPI Endpoint

POST /generate

Input JSON:

{
  "instruction": "Explain this code and improve it",
  "input": "def f(x): return x*x"
}

Notes for Production

  • Keep max_new_tokens modest for low latency.
  • Add request auth/rate limiting before exposing public endpoint.
  • For stronger quality, add curated retrieval corpus in data/.
  • For robust hallucination checks, extend tests per language/framework.