--- title: Code Generation Assistant sdk: gradio app_file: app/gradio_app.py pinned: false --- # Code Generation Assistant Generate Python code from natural-language descriptions, grounded in **CodeSearchNet** via retrieval (RAG), with functional evaluation and a deployable chat interface. ## Approaches compared 1. **Baseline** - frozen code LLM, zero/few-shot 2. **RAG** - retrieve similar CodeSearchNet examples, condition the LLM 3. **Fine-tuned** - CodeT5+ trained on `docstring -> code` 4. **Agentic** - generate -> run -> read error -> repair loop ## Evaluation - **CodeBLEU** (similarity to reference) - in the notebook - **pass@k** on HumanEval / MBPP (functional correctness) - `src/eval/functional_eval.py` - **recall@k / MRR** (retrieval quality) - `src/eval/retrieval_eval.py` > CodeSearchNet ships no unit tests, so pass@k is measured on HumanEval/MBPP > (which do), while CodeSearchNet powers retrieval + the similarity metric. ## Pipeline (run in order) ```bash pip install -r requirements.txt python scripts/01_prepare_data.py # load -> clean -> split python scripts/02_run_eda.py # stats + plots python scripts/03_build_index.py # embed corpus -> FAISS index (persisted) python scripts/04_run_eval.py # retrieval + functional pass@1 (baseline vs RAG) python scripts/05_finetune.py # (optional) fine-tune CodeT5+ ``` ## Project layout ``` config.yaml # single source of truth (models, paths, thresholds) src/ data/ load.py clean.py make_sample.py # Phase 1 eda/ analyze.py # Phase 1 rag/ embedder.py generator.py # Phase 3 + 5 (CodeAssistant) eval/ functional_eval.py retrieval_eval.py sandbox.py # Phase 2 agent/ repair_loop.py # Phase 6 finetune/ train_codet5.py # Phase 4 app/ api.py # FastAPI REST service gradio_app.py # Gradio chat UI (Hugging Face Spaces) streamlit_app.py # Streamlit chat UI scripts/ # numbered phase entrypoints notebooks/ # experimentation notebook Dockerfile # container for the API ``` ## Is the notebook the right vehicle? The notebook is for **experimentation, EDA, and reporting eval numbers** - keep it for your capstone appendix. For anything you *deploy*, the logic belongs in the `src/` package (importable, testable, version-controlled). The apps in `app/` all import the same `CodeAssistant`, so there is one implementation, not three copies. Workflow: prototype in the notebook -> harden into `src/` -> push to GitHub -> deploy an app. ## Deployment interfaces **1. Gradio on Hugging Face Spaces (easiest).** Copy `app/gradio_app.py` to a new Gradio Space as `app.py`, add `requirements.txt`, pick a GPU tier. Public chat UI in minutes, no servers to manage. **2. FastAPI (production / integration).** ```bash uvicorn app.api:app --host 0.0.0.0 --port 8000 # docs at /docs curl -X POST localhost:8000/generate -H 'Content-Type: application/json' \ -d '{"intent": "function to check if a number is prime", "mode": "rag"}' ``` Use this when another system (an IDE plugin, a CI bot, a web frontend) needs to call the assistant programmatically. Endpoints: `GET /health`, `POST /generate` (supports `mode`, `repair`, `return_sources`). **3. Streamlit.** `streamlit run app/streamlit_app.py` - deploys free to Streamlit Community Cloud from a GitHub repo. **4. Docker (any cloud).** `docker build -t cga . && docker run -p 8000:8000 cga`. ## Security note Generated code is executed (for pass@k and the repair loop) via a subprocess + timeout in `src/eval/sandbox.py`. That guards against hangs but is **not** a security sandbox. For public deployment, run execution inside a container with `--network none` and dropped privileges, or disable the repair feature.