---
title: Code Generation Assistant
sdk: gradio
app_file: app/gradio_app.py
pinned: false
---

# Code Generation Assistant

Generate Python code from natural-language descriptions, grounded in
**CodeSearchNet** via retrieval (RAG), with functional evaluation and a
deployable chat interface.

## Approaches compared
1. **Baseline** - frozen code LLM, zero/few-shot
2. **RAG** - retrieve similar CodeSearchNet examples, condition the LLM
3. **Fine-tuned** - CodeT5+ trained on `docstring -> code`
4. **Agentic** - generate -> run -> read error -> repair loop

## Evaluation
- **CodeBLEU** (similarity to reference) - in the notebook
- **pass@k** on HumanEval / MBPP (functional correctness) - `src/eval/functional_eval.py`
- **recall@k / MRR** (retrieval quality) - `src/eval/retrieval_eval.py`

> CodeSearchNet ships no unit tests, so pass@k is measured on HumanEval/MBPP
> (which do), while CodeSearchNet powers retrieval + the similarity metric.

## Pipeline (run in order)
```bash
pip install -r requirements.txt
python scripts/01_prepare_data.py   # load -> clean -> split
python scripts/02_run_eda.py        # stats + plots
python scripts/03_build_index.py    # embed corpus -> FAISS index (persisted)
python scripts/04_run_eval.py       # retrieval + functional pass@1 (baseline vs RAG)
python scripts/05_finetune.py       # (optional) fine-tune CodeT5+
```

## Project layout
```
config.yaml              # single source of truth (models, paths, thresholds)
src/
  data/        load.py clean.py make_sample.py     # Phase 1
  eda/         analyze.py                          # Phase 1
  rag/         embedder.py  generator.py           # Phase 3 + 5 (CodeAssistant)
  eval/        functional_eval.py retrieval_eval.py sandbox.py   # Phase 2
  agent/       repair_loop.py                       # Phase 6
  finetune/    train_codet5.py                       # Phase 4
app/
  api.py            # FastAPI REST service
  gradio_app.py     # Gradio chat UI (Hugging Face Spaces)
  streamlit_app.py  # Streamlit chat UI
scripts/            # numbered phase entrypoints
notebooks/          # experimentation notebook
Dockerfile          # container for the API
```

## Is the notebook the right vehicle?
The notebook is for **experimentation, EDA, and reporting eval numbers** - keep
it for your capstone appendix. For anything you *deploy*, the logic belongs in
the `src/` package (importable, testable, version-controlled). The apps in
`app/` all import the same `CodeAssistant`, so there is one implementation, not
three copies. Workflow: prototype in the notebook -> harden into `src/` ->
push to GitHub -> deploy an app.

## Deployment interfaces

**1. Gradio on Hugging Face Spaces (easiest).** Copy `app/gradio_app.py` to a new
Gradio Space as `app.py`, add `requirements.txt`, pick a GPU tier. Public chat UI
in minutes, no servers to manage.

**2. FastAPI (production / integration).**
```bash
uvicorn app.api:app --host 0.0.0.0 --port 8000   # docs at /docs
curl -X POST localhost:8000/generate -H 'Content-Type: application/json' \
     -d '{"intent": "function to check if a number is prime", "mode": "rag"}'
```
Use this when another system (an IDE plugin, a CI bot, a web frontend) needs to
call the assistant programmatically. Endpoints: `GET /health`, `POST /generate`
(supports `mode`, `repair`, `return_sources`).

**3. Streamlit.** `streamlit run app/streamlit_app.py` - deploys free to
Streamlit Community Cloud from a GitHub repo.

**4. Docker (any cloud).** `docker build -t cga . && docker run -p 8000:8000 cga`.

## Security note
Generated code is executed (for pass@k and the repair loop) via a subprocess +
timeout in `src/eval/sandbox.py`. That guards against hangs but is **not** a
security sandbox. For public deployment, run execution inside a container with
`--network none` and dropped privileges, or disable the repair feature.