code-gen-assistant / README.md
Rushabh147's picture
Initial deploy to HF Spaces (clean history, LFS for all binaries)
b89e6d6
|
Raw
History Blame Contribute Delete
3.86 kB
---
title: Code Generation Assistant
sdk: gradio
app_file: app/gradio_app.py
pinned: false
---
# Code Generation Assistant
Generate Python code from natural-language descriptions, grounded in
**CodeSearchNet** via retrieval (RAG), with functional evaluation and a
deployable chat interface.
## Approaches compared
1. **Baseline** - frozen code LLM, zero/few-shot
2. **RAG** - retrieve similar CodeSearchNet examples, condition the LLM
3. **Fine-tuned** - CodeT5+ trained on `docstring -> code`
4. **Agentic** - generate -> run -> read error -> repair loop
## Evaluation
- **CodeBLEU** (similarity to reference) - in the notebook
- **pass@k** on HumanEval / MBPP (functional correctness) - `src/eval/functional_eval.py`
- **recall@k / MRR** (retrieval quality) - `src/eval/retrieval_eval.py`
> CodeSearchNet ships no unit tests, so pass@k is measured on HumanEval/MBPP
> (which do), while CodeSearchNet powers retrieval + the similarity metric.
## Pipeline (run in order)
```bash
pip install -r requirements.txt
python scripts/01_prepare_data.py # load -> clean -> split
python scripts/02_run_eda.py # stats + plots
python scripts/03_build_index.py # embed corpus -> FAISS index (persisted)
python scripts/04_run_eval.py # retrieval + functional pass@1 (baseline vs RAG)
python scripts/05_finetune.py # (optional) fine-tune CodeT5+
```
## Project layout
```
config.yaml # single source of truth (models, paths, thresholds)
src/
data/ load.py clean.py make_sample.py # Phase 1
eda/ analyze.py # Phase 1
rag/ embedder.py generator.py # Phase 3 + 5 (CodeAssistant)
eval/ functional_eval.py retrieval_eval.py sandbox.py # Phase 2
agent/ repair_loop.py # Phase 6
finetune/ train_codet5.py # Phase 4
app/
api.py # FastAPI REST service
gradio_app.py # Gradio chat UI (Hugging Face Spaces)
streamlit_app.py # Streamlit chat UI
scripts/ # numbered phase entrypoints
notebooks/ # experimentation notebook
Dockerfile # container for the API
```
## Is the notebook the right vehicle?
The notebook is for **experimentation, EDA, and reporting eval numbers** - keep
it for your capstone appendix. For anything you *deploy*, the logic belongs in
the `src/` package (importable, testable, version-controlled). The apps in
`app/` all import the same `CodeAssistant`, so there is one implementation, not
three copies. Workflow: prototype in the notebook -> harden into `src/` ->
push to GitHub -> deploy an app.
## Deployment interfaces
**1. Gradio on Hugging Face Spaces (easiest).** Copy `app/gradio_app.py` to a new
Gradio Space as `app.py`, add `requirements.txt`, pick a GPU tier. Public chat UI
in minutes, no servers to manage.
**2. FastAPI (production / integration).**
```bash
uvicorn app.api:app --host 0.0.0.0 --port 8000 # docs at /docs
curl -X POST localhost:8000/generate -H 'Content-Type: application/json' \
-d '{"intent": "function to check if a number is prime", "mode": "rag"}'
```
Use this when another system (an IDE plugin, a CI bot, a web frontend) needs to
call the assistant programmatically. Endpoints: `GET /health`, `POST /generate`
(supports `mode`, `repair`, `return_sources`).
**3. Streamlit.** `streamlit run app/streamlit_app.py` - deploys free to
Streamlit Community Cloud from a GitHub repo.
**4. Docker (any cloud).** `docker build -t cga . && docker run -p 8000:8000 cga`.
## Security note
Generated code is executed (for pass@k and the repair loop) via a subprocess +
timeout in `src/eval/sandbox.py`. That guards against hangs but is **not** a
security sandbox. For public deployment, run execution inside a container with
`--network none` and dropped privileges, or disable the repair feature.