Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.19.0
title: Code Generation Assistant
sdk: gradio
app_file: app/gradio_app.py
pinned: false
Code Generation Assistant
Generate Python code from natural-language descriptions, grounded in CodeSearchNet via retrieval (RAG), with functional evaluation and a deployable chat interface.
Approaches compared
- Baseline - frozen code LLM, zero/few-shot
- RAG - retrieve similar CodeSearchNet examples, condition the LLM
- Fine-tuned - CodeT5+ trained on
docstring -> code - Agentic - generate -> run -> read error -> repair loop
Evaluation
- CodeBLEU (similarity to reference) - in the notebook
- pass@k on HumanEval / MBPP (functional correctness) -
src/eval/functional_eval.py - recall@k / MRR (retrieval quality) -
src/eval/retrieval_eval.py
CodeSearchNet ships no unit tests, so pass@k is measured on HumanEval/MBPP (which do), while CodeSearchNet powers retrieval + the similarity metric.
Pipeline (run in order)
pip install -r requirements.txt
python scripts/01_prepare_data.py # load -> clean -> split
python scripts/02_run_eda.py # stats + plots
python scripts/03_build_index.py # embed corpus -> FAISS index (persisted)
python scripts/04_run_eval.py # retrieval + functional pass@1 (baseline vs RAG)
python scripts/05_finetune.py # (optional) fine-tune CodeT5+
Project layout
config.yaml # single source of truth (models, paths, thresholds)
src/
data/ load.py clean.py make_sample.py # Phase 1
eda/ analyze.py # Phase 1
rag/ embedder.py generator.py # Phase 3 + 5 (CodeAssistant)
eval/ functional_eval.py retrieval_eval.py sandbox.py # Phase 2
agent/ repair_loop.py # Phase 6
finetune/ train_codet5.py # Phase 4
app/
api.py # FastAPI REST service
gradio_app.py # Gradio chat UI (Hugging Face Spaces)
streamlit_app.py # Streamlit chat UI
scripts/ # numbered phase entrypoints
notebooks/ # experimentation notebook
Dockerfile # container for the API
Is the notebook the right vehicle?
The notebook is for experimentation, EDA, and reporting eval numbers - keep
it for your capstone appendix. For anything you deploy, the logic belongs in
the src/ package (importable, testable, version-controlled). The apps in
app/ all import the same CodeAssistant, so there is one implementation, not
three copies. Workflow: prototype in the notebook -> harden into src/ ->
push to GitHub -> deploy an app.
Deployment interfaces
1. Gradio on Hugging Face Spaces (easiest). Copy app/gradio_app.py to a new
Gradio Space as app.py, add requirements.txt, pick a GPU tier. Public chat UI
in minutes, no servers to manage.
2. FastAPI (production / integration).
uvicorn app.api:app --host 0.0.0.0 --port 8000 # docs at /docs
curl -X POST localhost:8000/generate -H 'Content-Type: application/json' \
-d '{"intent": "function to check if a number is prime", "mode": "rag"}'
Use this when another system (an IDE plugin, a CI bot, a web frontend) needs to
call the assistant programmatically. Endpoints: GET /health, POST /generate
(supports mode, repair, return_sources).
3. Streamlit. streamlit run app/streamlit_app.py - deploys free to
Streamlit Community Cloud from a GitHub repo.
4. Docker (any cloud). docker build -t cga . && docker run -p 8000:8000 cga.
Security note
Generated code is executed (for pass@k and the repair loop) via a subprocess +
timeout in src/eval/sandbox.py. That guards against hangs but is not a
security sandbox. For public deployment, run execution inside a container with
--network none and dropped privileges, or disable the repair feature.