coding-llm-space / specification_file.md
girish00's picture
Upload folder using huggingface_hub
07a91a1 verified
# Advanced Coding LLM - Technical Specification
## 1) Objective
Build a production-ready coding assistant API deployable locally and on Hugging Face, supporting:
- Code generation
- Debugging/fixing buggy code
- Code explanation
- Instruction following
- Explainability signals
- Relevancy scoring
- Hallucination checks
- Optional RAG
## 2) Core Functional Requirements
### 2.1 Model
- Primary model: `Qwen/Qwen2.5-Coder-1.5B-Instruct`
- Fallback model: `Qwen/Qwen2.5-Coder-0.5B-Instruct`
- Emergency fallback mode supported (mock path available)
- Architecture compatible with future LoRA integration (`src/lora_prepare.py`)
### 2.2 API
- Framework: FastAPI
- Endpoint: `POST /generate`
- Health: `GET /health`
- Input schema:
- `instruction: str`
- `input: str`
- Output schema:
- `code: str`
- `explanation: str`
- `confidence: float`
- `important_tokens: list[str]`
- `relevancy_score: float`
- `hallucination: bool`
- `latency_ms: int`
### 2.3 Explainability
- Confidence from token probabilities over generated tokens
- Important tokens extracted from low-probability tokens
### 2.4 Relevancy
- Query-to-output semantic score using TF-IDF + cosine similarity
### 2.5 Hallucination Checks
- Python syntax validation (`ast.parse`)
- Runtime smoke execution for Python-like outputs
- Skip runtime execution for non-Python-like outputs
### 2.6 RAG
- Basic retrieval from local snippets dataset
- FAISS index over normalized TF-IDF vectors
- Inject top-k snippets into prompt context
## 3) Non-Functional Requirements
- Runnable on local workstation
- Supports no-training initial deployment
- Lazy-load model to reduce startup failures
- Graceful fallback response when model unavailable
- Windows-compatible developer workflow
## 4) Security Requirements
- API key auth via `x-api-key` (if configured)
- Per-IP in-memory rate limiting
- No secrets committed to repository (`.env` ignored)
## 5) Performance Requirements
- Lazy model initialization
- Runtime checks bounded by timeout
- Optional mock mode (`FORCE_MOCK_MODE=true`) for fast operational checks
## 6) Deployment Requirements
### Local
- `python tasks.py install`
- `python tasks.py run`
### Docker
- `docker compose up --build -d`
### Hugging Face Space
- `python tasks.py hf-upload --repo-id <id> --token <token>`
- Gradio entrypoint in `app.py`
## 7) Project Structure
```text
coding-llm/
│── data/
│── src/
│── api/
│── requirements.txt
│── README.md
│── instruction.md
│── specification_file.md
```
## 8) Module Responsibilities
- `api/main.py`: API routes and response wiring
- `api/security.py`: API key + rate limiting
- `src/config.py`: environment-driven settings
- `src/model_loader.py`: model/fallback loading
- `src/generator.py`: generation + confidence extraction
- `src/pipeline.py`: orchestration layer
- `src/rag.py`: snippet retrieval
- `src/relevancy.py`: relevancy score computation
- `src/hallucination.py`: syntax/runtime checks
- `src/lora_prepare.py`: LoRA adapter hook
- `app.py`: Gradio UI for HF Spaces
- `upload_to_hf.py`: HF deployment uploader
- `tasks.py`: command runner
- `smoke_test.py`: runtime integration validation
## 9) Operational Modes
- **Real Model Mode**
- `FORCE_MOCK_MODE=false`
- Uses HF model loading and generation
- **Mock Mode**
- `FORCE_MOCK_MODE=true`
- Returns deterministic fallback output for reliability testing
## 10) Validation and QA
- Static compile check with `python -m compileall`
- Lint diagnostics via editor/tooling
- Smoke checks:
- health endpoint reachable
- generate endpoint returns full schema
## 11) Known Constraints
- First generation may be slow due to model download/warmup
- Quality depends on available model and decoding configuration
- In-memory rate limiter is single-process only
## 12) Future Enhancements
- Redis-backed distributed rate limiting
- Better language-aware hallucination tests
- Prompt templates per task type
- Streaming token responses
- Persistent vector store (Chroma/FAISS on-disk)
- CI/CD workflow for automated deploy/test