Spaces:

girish00
/

coding-llm-space

Running

App Files Files Community

coding-llm-space / README.md

girish00

Update README.md

9ab7030 verified about 1 month ago

preview code

raw

history blame contribute delete

4.08 kB

	---
	title: Coding LLM Space
	emoji: 🤖
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 6.12.0
	python_version: '3.10'
	app_file: app.py
	pinned: false
	---


	# Advanced Coding LLM (Production-Ready Starter)

	This project provides a deployable coding assistant API built on a free Hugging Face coding model.

	## Model Strategy

	- Primary model: `Qwen/Qwen2.5-Coder-1.5B-Instruct` (free/open on Hugging Face).
	- Fallback model: `Qwen/Qwen2.5-Coder-0.5B-Instruct` if primary load fails.
	- Final emergency fallback: `sshleifer/tiny-gpt2` (for guaranteed startup).
	- No heavy training required.
	- LoRA-ready architecture included in `src/lora_prepare.py`.

	## Features

	- Code generation
	- Debugging / buggy code fixing
	- Code explanation
	- Instruction following
	- Confidence estimation (from token probabilities)
	- Important token extraction (low-confidence tokens)
	- Relevancy score (embedding cosine similarity)
	- Hallucination checks:
	- Syntax validation
	- Runtime smoke test
	- Optional RAG with FAISS from `data/sample_snippets.json`

	## Project Structure

	```text
	coding-llm/
	│── data/
	│── src/
	│── api/
	│── requirements.txt
	│── README.md
	```

	## API Output Format

	`POST /generate` returns:

	```json
	{
	"code": "...",
	"explanation": "...",
	"confidence": 0.0,
	"important_tokens": ["..."],
	"relevancy_score": 0.0,
	"hallucination": false,
	"latency_ms": 0
	}
	```

	If hallucination is detected, the reason is appended inside `explanation`.

	## Local Run

	1. Create environment and install:

	```bash
	pip install -r requirements.txt
	```

	Optional: create `.env` from `.env.example` and set values:

	```bash
	copy .env.example .env
	```

	2. Run API:

	```bash
	uvicorn api.main:app --host 0.0.0.0 --port 8000
	```

	3. Test request:

	```bash
	curl -X POST "http://127.0.0.1:8000/generate" ^
	-H "Content-Type: application/json" ^
	-d "{\"instruction\":\"Fix this Python function\",\"input\":\"def add(a,b) return a+b\"}"
	```

	4. Optional client:

	```bash
	python client_example.py
	```

	## Hugging Face Deployment (Space)

	This repo includes:

	- `app.py` (Gradio app for HF Space)
	- `upload_to_hf.py` (upload helper script)
	- `README_hf_space.md` (Space metadata template)

	Steps:

	1. Create a HF access token with write permission.
	2. Run:

	```bash
	python upload_to_hf.py --repo-id <your-username/coding-llm-space> --token <HF_TOKEN>
	```

	3. Your Space launches with public UI and can be called by Hugging Face API key.

	## Security and Ops

	- API key auth enabled when `API_KEY` is set.
	- In-memory per-IP rate limiting via `RATE_LIMIT_PER_MINUTE`.
	- Dockerized API included (`Dockerfile`).
	- Model is loaded lazily on first `/generate` request (faster boot, fewer startup failures).
	- Set `FORCE_MOCK_MODE=true` to run instantly without downloading models.
	- Windows quick-start scripts:
	- `run_api.bat`
	- `run_space.bat`

	## Docker Compose

	```bash
	copy .env.example .env
	docker compose up --build
	```

	API is available at `http://127.0.0.1:8000`.

	## Automated Smoke Test

	Run this after API starts:

	```bash
	python smoke_test.py
	```

	This validates:
	- `GET /health`
	- `POST /generate`
	- required JSON output keys

	## One-command Task Runner

	Cross-platform:

	```bash
	python tasks.py install
	python tasks.py run
	python tasks.py smoke
	python tasks.py serve-smoke
	python tasks.py docker-up
	python tasks.py docker-down
	python tasks.py hf-upload --repo-id <user/space-name> --token <HF_TOKEN>
	```

	Windows shortcut:

	```bat
	run_tasks.bat install
	run_tasks.bat run
	run_tasks.bat smoke
	run_tasks.bat serve-smoke
	```

	Makefile (Linux/macOS/WSL):

	```bash
	make install
	make run
	make smoke
	make docker-up
	```

	## FastAPI Endpoint

	### `POST /generate`

	Input JSON:

	```json
	{
	"instruction": "Explain this code and improve it",
	"input": "def f(x): return x*x"
	}
	```

	## Notes for Production

	- Keep `max_new_tokens` modest for low latency.
	- Add request auth/rate limiting before exposing public endpoint.
	- For stronger quality, add curated retrieval corpus in `data/`.
	- For robust hallucination checks, extend tests per language/framework.