Spaces:
Sleeping
Sleeping
| title: Code Generation Assistant | |
| sdk: gradio | |
| app_file: app/gradio_app.py | |
| pinned: false | |
| # Code Generation Assistant | |
| Generate Python code from natural-language descriptions, grounded in | |
| **CodeSearchNet** via retrieval (RAG), with functional evaluation and a | |
| deployable chat interface. | |
| ## Approaches compared | |
| 1. **Baseline** - frozen code LLM, zero/few-shot | |
| 2. **RAG** - retrieve similar CodeSearchNet examples, condition the LLM | |
| 3. **Fine-tuned** - CodeT5+ trained on `docstring -> code` | |
| 4. **Agentic** - generate -> run -> read error -> repair loop | |
| ## Evaluation | |
| - **CodeBLEU** (similarity to reference) - in the notebook | |
| - **pass@k** on HumanEval / MBPP (functional correctness) - `src/eval/functional_eval.py` | |
| - **recall@k / MRR** (retrieval quality) - `src/eval/retrieval_eval.py` | |
| > CodeSearchNet ships no unit tests, so pass@k is measured on HumanEval/MBPP | |
| > (which do), while CodeSearchNet powers retrieval + the similarity metric. | |
| ## Pipeline (run in order) | |
| ```bash | |
| pip install -r requirements.txt | |
| python scripts/01_prepare_data.py # load -> clean -> split | |
| python scripts/02_run_eda.py # stats + plots | |
| python scripts/03_build_index.py # embed corpus -> FAISS index (persisted) | |
| python scripts/04_run_eval.py # retrieval + functional pass@1 (baseline vs RAG) | |
| python scripts/05_finetune.py # (optional) fine-tune CodeT5+ | |
| ``` | |
| ## Project layout | |
| ``` | |
| config.yaml # single source of truth (models, paths, thresholds) | |
| src/ | |
| data/ load.py clean.py make_sample.py # Phase 1 | |
| eda/ analyze.py # Phase 1 | |
| rag/ embedder.py generator.py # Phase 3 + 5 (CodeAssistant) | |
| eval/ functional_eval.py retrieval_eval.py sandbox.py # Phase 2 | |
| agent/ repair_loop.py # Phase 6 | |
| finetune/ train_codet5.py # Phase 4 | |
| app/ | |
| api.py # FastAPI REST service | |
| gradio_app.py # Gradio chat UI (Hugging Face Spaces) | |
| streamlit_app.py # Streamlit chat UI | |
| scripts/ # numbered phase entrypoints | |
| notebooks/ # experimentation notebook | |
| Dockerfile # container for the API | |
| ``` | |
| ## Is the notebook the right vehicle? | |
| The notebook is for **experimentation, EDA, and reporting eval numbers** - keep | |
| it for your capstone appendix. For anything you *deploy*, the logic belongs in | |
| the `src/` package (importable, testable, version-controlled). The apps in | |
| `app/` all import the same `CodeAssistant`, so there is one implementation, not | |
| three copies. Workflow: prototype in the notebook -> harden into `src/` -> | |
| push to GitHub -> deploy an app. | |
| ## Deployment interfaces | |
| **1. Gradio on Hugging Face Spaces (easiest).** Copy `app/gradio_app.py` to a new | |
| Gradio Space as `app.py`, add `requirements.txt`, pick a GPU tier. Public chat UI | |
| in minutes, no servers to manage. | |
| **2. FastAPI (production / integration).** | |
| ```bash | |
| uvicorn app.api:app --host 0.0.0.0 --port 8000 # docs at /docs | |
| curl -X POST localhost:8000/generate -H 'Content-Type: application/json' \ | |
| -d '{"intent": "function to check if a number is prime", "mode": "rag"}' | |
| ``` | |
| Use this when another system (an IDE plugin, a CI bot, a web frontend) needs to | |
| call the assistant programmatically. Endpoints: `GET /health`, `POST /generate` | |
| (supports `mode`, `repair`, `return_sources`). | |
| **3. Streamlit.** `streamlit run app/streamlit_app.py` - deploys free to | |
| Streamlit Community Cloud from a GitHub repo. | |
| **4. Docker (any cloud).** `docker build -t cga . && docker run -p 8000:8000 cga`. | |
| ## Security note | |
| Generated code is executed (for pass@k and the repair loop) via a subprocess + | |
| timeout in `src/eval/sandbox.py`. That guards against hangs but is **not** a | |
| security sandbox. For public deployment, run execution inside a container with | |
| `--network none` and dropped privileges, or disable the repair feature. | |