code-gen-assistant / README.md
Rushabh147's picture
Initial deploy to HF Spaces (clean history, LFS for all binaries)
b89e6d6
|
Raw
History Blame Contribute Delete
3.86 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade
metadata
title: Code Generation Assistant
sdk: gradio
app_file: app/gradio_app.py
pinned: false

Code Generation Assistant

Generate Python code from natural-language descriptions, grounded in CodeSearchNet via retrieval (RAG), with functional evaluation and a deployable chat interface.

Approaches compared

  1. Baseline - frozen code LLM, zero/few-shot
  2. RAG - retrieve similar CodeSearchNet examples, condition the LLM
  3. Fine-tuned - CodeT5+ trained on docstring -> code
  4. Agentic - generate -> run -> read error -> repair loop

Evaluation

  • CodeBLEU (similarity to reference) - in the notebook
  • pass@k on HumanEval / MBPP (functional correctness) - src/eval/functional_eval.py
  • recall@k / MRR (retrieval quality) - src/eval/retrieval_eval.py

CodeSearchNet ships no unit tests, so pass@k is measured on HumanEval/MBPP (which do), while CodeSearchNet powers retrieval + the similarity metric.

Pipeline (run in order)

pip install -r requirements.txt
python scripts/01_prepare_data.py   # load -> clean -> split
python scripts/02_run_eda.py        # stats + plots
python scripts/03_build_index.py    # embed corpus -> FAISS index (persisted)
python scripts/04_run_eval.py       # retrieval + functional pass@1 (baseline vs RAG)
python scripts/05_finetune.py       # (optional) fine-tune CodeT5+

Project layout

config.yaml              # single source of truth (models, paths, thresholds)
src/
  data/        load.py clean.py make_sample.py     # Phase 1
  eda/         analyze.py                          # Phase 1
  rag/         embedder.py  generator.py           # Phase 3 + 5 (CodeAssistant)
  eval/        functional_eval.py retrieval_eval.py sandbox.py   # Phase 2
  agent/       repair_loop.py                       # Phase 6
  finetune/    train_codet5.py                       # Phase 4
app/
  api.py            # FastAPI REST service
  gradio_app.py     # Gradio chat UI (Hugging Face Spaces)
  streamlit_app.py  # Streamlit chat UI
scripts/            # numbered phase entrypoints
notebooks/          # experimentation notebook
Dockerfile          # container for the API

Is the notebook the right vehicle?

The notebook is for experimentation, EDA, and reporting eval numbers - keep it for your capstone appendix. For anything you deploy, the logic belongs in the src/ package (importable, testable, version-controlled). The apps in app/ all import the same CodeAssistant, so there is one implementation, not three copies. Workflow: prototype in the notebook -> harden into src/ -> push to GitHub -> deploy an app.

Deployment interfaces

1. Gradio on Hugging Face Spaces (easiest). Copy app/gradio_app.py to a new Gradio Space as app.py, add requirements.txt, pick a GPU tier. Public chat UI in minutes, no servers to manage.

2. FastAPI (production / integration).

uvicorn app.api:app --host 0.0.0.0 --port 8000   # docs at /docs
curl -X POST localhost:8000/generate -H 'Content-Type: application/json' \
     -d '{"intent": "function to check if a number is prime", "mode": "rag"}'

Use this when another system (an IDE plugin, a CI bot, a web frontend) needs to call the assistant programmatically. Endpoints: GET /health, POST /generate (supports mode, repair, return_sources).

3. Streamlit. streamlit run app/streamlit_app.py - deploys free to Streamlit Community Cloud from a GitHub repo.

4. Docker (any cloud). docker build -t cga . && docker run -p 8000:8000 cga.

Security note

Generated code is executed (for pass@k and the repair loop) via a subprocess + timeout in src/eval/sandbox.py. That guards against hangs but is not a security sandbox. For public deployment, run execution inside a container with --network none and dropped privileges, or disable the repair feature.