Spaces:

aneeb15
/

ai-code-review-agent

Sleeping

App Files Files Community

ai-code-review-agent / README.md

aneeb15

Switch to Docker SDK with Python 3.10 to fix build errors

3e00265 27 days ago

preview code

raw

history blame contribute delete

6.82 kB

	---
	title: AI Code Review Agent
	emoji: 🚀
	sdk: docker
	app_port: 7860
	pinned: false
	---

	# 🔬 AI Code Review Agent

	> An autonomous LLM-powered agent that reviews GitHub Pull Requests — fetches diffs, analyzes code changes, and returns structured feedback with severity ratings and interactive visualizations.

	[![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://python.org)
	[![LangChain](https://img.shields.io/badge/LangChain-0.2.16-green.svg)](https://langchain.com)
	[![Streamlit](https://img.shields.io/badge/Streamlit-1.36.0-red.svg)](https://streamlit.io)
	[![HuggingFace](https://img.shields.io/badge/HuggingFace-Space-yellow.svg)](https://huggingface.co)

	---

	## 📌 What This Project Is

	This is Project 3 of my AI/ML portfolio — an autonomous agent built with LangChain's tool-calling framework. Unlike a simple LLM wrapper that just sends code to a model, this is a true agent: it decides which tools to call, in what order, and reasons over intermediate results before producing a final review.

	The agent follows the ReAct pattern (Reason + Act):

	```
	User pastes PR URL
	↓
	Agent calls get_pr_metadata() → understands intent of PR
	↓
	Agent calls get_pr_diff() → fetches what changed
	↓
	Agent calls get_file_content() → gets surrounding context if needed
	↓
	LLM reasons over all collected data
	↓
	Structured review: issues, severity, fixes
	```

	No hardcoded tool sequence — the LLM decides autonomously.

	---

	## ✨ Features

	- Autonomous agent loop — LLM decides which GitHub API tools to call and when
	- 3 custom tools — PR metadata, code diff, file content fetcher
	- Interactive diff visualization — bar chart showing additions/deletions per file
	- Severity breakdown chart — donut chart of High/Medium/Low issues
	- Inline diff viewer — color-coded added/removed lines per file
	- Structured review cards — each issue has location, severity, problem, and fix
	- Multi-provider support — works with Groq, Google Gemini, or OpenAI
	- Session-persisted keys — save API keys once per session, reuse across reviews
	- Free-text model input — type any model name, no dropdown restrictions

	---

	## 🧠 How The Agent Works (Technical)

	### Tool-Calling Architecture

	The LLM does not execute code. It outputs structured JSON requesting a tool call:

	```json
	{
	"tool": "get_pr_diff",
	"arguments": {"pr_url": "https://github.com/owner/repo/pull/123"}
	}
	```

	LangChain reads this output, executes the Python function, returns the result to the LLM as context, and the loop continues until the LLM decides it has enough information.

	### The Three Tools

	\| Tool \| Purpose \|
	\|------\|---------\|
	\| `get_pr_metadata` \| Fetches PR title, author, description, branch info \|
	\| `get_pr_diff` \| Fetches file-level diffs — what lines were added/removed \|
	\| `get_file_content` \| Fetches full file content for deeper context \|

	### Why This Is Not Just a Wrapper

	A wrapper sends one prompt and gets one response. This agent:
	- Makes multiple sequential API calls based on intermediate findings
	- Decides dynamically whether to fetch file content based on what it sees in the diff
	- Produces structured output parsed into visual components — not just a text blob

	---

	## 🗂️ Project Structure

	```
	ai-code-review-agent/
	├── app.py # Streamlit app — production ready
	├── app_colab.ipynb # Development notebook with all blocks
	├── requirements.txt # Pinned dependencies
	└── README.md
	```

	---

	## ⚙️ Setup & Installation

	### Local (VS Code)

	```bash
	# Clone the repo
	git clone https://github.com/aneebnaqvi15/ai-code-review-agent
	cd ai-code-review-agent

	# Create virtual environment
	python -m venv venv
	venv\Scripts\activate # Windows
	source venv/bin/activate # Mac/Linux

	# Install dependencies
	pip install -r requirements.txt

	# Run the app
	streamlit run app.py
	```

	App opens at `http://localhost:8501`

	### Google Colab

	Open `app_colab.ipynb` and run all cells. Uses ngrok for public URL tunneling.

	---

	## 🔑 API Keys Required

	\| Key \| Where to Get \| Purpose \|
	\|-----\|-------------\|---------\|
	\| Groq API Key \| [console.groq.com](https://console.groq.com) \| LLM inference (free) \|
	\| Google Gemini Key \| [aistudio.google.com](https://aistudio.google.com) \| Alternative LLM (free tier) \|
	\| GitHub Token \| GitHub → Settings → Developer Settings → PAT \| Fetch PR data \|

	GitHub Token permissions needed: `repo` (read only)

	---

	## 📦 Dependencies

	```
	langchain==0.2.16
	langchain-google-genai==1.0.10
	langchain-community==0.2.16
	langchain-groq==0.1.10
	google-generativeai==0.7.2
	pygithub==2.3.0
	pydantic==2.7.4
	streamlit==1.36.0
	plotly
	python-dotenv==1.0.1
	```

	---

	## 📊 Example Output

	Given a real PR URL, the agent produces:

	Metrics:
	- Total issues found, broken down by severity

	Charts:
	- Bar chart: lines added vs deleted per file
	- Donut chart: High / Medium / Low issue distribution

	Per-issue cards:
	```
	ISSUE: Missing input validation
	FILE: auth.py
	SEVERITY: High
	PROBLEM: Function accepts raw user input without sanitization
	FIX: Add input validation using pydantic or manual type checks before processing
	```

	---

	## 🔭 What I Learned

	- Agents vs wrappers — the difference is dynamic tool selection, not just chaining prompts
	- Tool definitions are prompts — how you describe a tool directly affects whether the LLM calls it correctly
	- Structured output matters — getting the LLM to return parseable, consistent format requires careful prompt design
	- Context window management — large diffs need truncation strategy or the LLM loses coherence
	- LangChain version pinning — newer LangChain versions break tool-calling with older Gemini/Groq integrations

	---

	## 🔮 What I'd Build Next

	- [ ] Add evaluation metric: measure review quality against human-written reviews
	- [ ] Support for multi-file reasoning across a full repo
	- [ ] Webhook integration — auto-review every new PR on push
	- [ ] Fine-tuned reviewer model trained on accepted/rejected PR feedback

	---

	## 🗺️ Portfolio Context

	This is part of a 3-project AI/ML portfolio showing progression:

	\| Project \| Skill Demonstrated \|
	\|---------\|-------------------\|
	\| [Banking77 Intent Classifier](https://github.com/aneebnaqvi15/banking77-intent-classifier) \| Fine-tuning (DistilBERT + LoRA) \|
	\| [Multi-Doc RAG Assistant](#) \| Retrieval systems (LangChain + ChromaDB) \|
	\| AI Code Review Agent (this) \| Autonomous agents + tool-calling \|

	---

	## 👨‍💻 Author

	Aneeb Naqvi — CS Graduate, Full-Stack & AI Engineer

	[![GitHub](https://img.shields.io/badge/GitHub-aneebnaqvi15-black)](https://github.com/aneebnaqvi15)