Spaces:

Rushabh147
/

code-gen-assistant

Sleeping

App Files Files Community

code-gen-assistant / README.md

Rushabh147

Initial deploy to HF Spaces (clean history, LFS for all binaries)

b89e6d6 13 days ago

preview code

Raw

History Blame Contribute Delete

3.86 kB

	---
	title: Code Generation Assistant
	sdk: gradio
	app_file: app/gradio_app.py
	pinned: false
	---

	# Code Generation Assistant

	Generate Python code from natural-language descriptions, grounded in
	CodeSearchNet via retrieval (RAG), with functional evaluation and a
	deployable chat interface.

	## Approaches compared
	1. Baseline - frozen code LLM, zero/few-shot
	2. RAG - retrieve similar CodeSearchNet examples, condition the LLM
	3. Fine-tuned - CodeT5+ trained on `docstring -> code`
	4. Agentic - generate -> run -> read error -> repair loop

	## Evaluation
	- CodeBLEU (similarity to reference) - in the notebook
	- pass@k on HumanEval / MBPP (functional correctness) - `src/eval/functional_eval.py`
	- recall@k / MRR (retrieval quality) - `src/eval/retrieval_eval.py`

	> CodeSearchNet ships no unit tests, so pass@k is measured on HumanEval/MBPP
	> (which do), while CodeSearchNet powers retrieval + the similarity metric.

	## Pipeline (run in order)
	```bash
	pip install -r requirements.txt
	python scripts/01_prepare_data.py # load -> clean -> split
	python scripts/02_run_eda.py # stats + plots
	python scripts/03_build_index.py # embed corpus -> FAISS index (persisted)
	python scripts/04_run_eval.py # retrieval + functional pass@1 (baseline vs RAG)
	python scripts/05_finetune.py # (optional) fine-tune CodeT5+
	```

	## Project layout
	```
	config.yaml # single source of truth (models, paths, thresholds)
	src/
	data/ load.py clean.py make_sample.py # Phase 1
	eda/ analyze.py # Phase 1
	rag/ embedder.py generator.py # Phase 3 + 5 (CodeAssistant)
	eval/ functional_eval.py retrieval_eval.py sandbox.py # Phase 2
	agent/ repair_loop.py # Phase 6
	finetune/ train_codet5.py # Phase 4
	app/
	api.py # FastAPI REST service
	gradio_app.py # Gradio chat UI (Hugging Face Spaces)
	streamlit_app.py # Streamlit chat UI
	scripts/ # numbered phase entrypoints
	notebooks/ # experimentation notebook
	Dockerfile # container for the API
	```

	## Is the notebook the right vehicle?
	The notebook is for experimentation, EDA, and reporting eval numbers - keep
	it for your capstone appendix. For anything you deploy, the logic belongs in
	the `src/` package (importable, testable, version-controlled). The apps in
	`app/` all import the same `CodeAssistant`, so there is one implementation, not
	three copies. Workflow: prototype in the notebook -> harden into `src/` ->
	push to GitHub -> deploy an app.

	## Deployment interfaces

	1. Gradio on Hugging Face Spaces (easiest). Copy `app/gradio_app.py` to a new
	Gradio Space as `app.py`, add `requirements.txt`, pick a GPU tier. Public chat UI
	in minutes, no servers to manage.

	2. FastAPI (production / integration).
	```bash
	uvicorn app.api:app --host 0.0.0.0 --port 8000 # docs at /docs
	curl -X POST localhost:8000/generate -H 'Content-Type: application/json' \
	-d '{"intent": "function to check if a number is prime", "mode": "rag"}'
	```
	Use this when another system (an IDE plugin, a CI bot, a web frontend) needs to
	call the assistant programmatically. Endpoints: `GET /health`, `POST /generate`
	(supports `mode`, `repair`, `return_sources`).

	3. Streamlit. `streamlit run app/streamlit_app.py` - deploys free to
	Streamlit Community Cloud from a GitHub repo.

	4. Docker (any cloud). `docker build -t cga . && docker run -p 8000:8000 cga`.

	## Security note
	Generated code is executed (for pass@k and the repair loop) via a subprocess +
	timeout in `src/eval/sandbox.py`. That guards against hangs but is not a
	security sandbox. For public deployment, run execution inside a container with
	`--network none` and dropped privileges, or disable the repair feature.