---
title: AI Code Review Agent
emoji: 🚀
sdk: docker
app_port: 7860
pinned: false
---

# 🔬 AI Code Review Agent

> An autonomous LLM-powered agent that reviews GitHub Pull Requests — fetches diffs, analyzes code changes, and returns structured feedback with severity ratings and interactive visualizations.

[![Python](https://img.shields.io/badge/Python-3.10+-blue.svg)](https://python.org)
[![LangChain](https://img.shields.io/badge/LangChain-0.2.16-green.svg)](https://langchain.com)
[![Streamlit](https://img.shields.io/badge/Streamlit-1.36.0-red.svg)](https://streamlit.io)
[![HuggingFace](https://img.shields.io/badge/HuggingFace-Space-yellow.svg)](https://huggingface.co)

---

## 📌 What This Project Is

This is **Project 3** of my AI/ML portfolio — an autonomous agent built with LangChain's tool-calling framework. Unlike a simple LLM wrapper that just sends code to a model, this is a true agent: it **decides which tools to call, in what order, and reasons over intermediate results** before producing a final review.

The agent follows the **ReAct pattern** (Reason + Act):

```
User pastes PR URL
       ↓
Agent calls get_pr_metadata()   → understands intent of PR
       ↓
Agent calls get_pr_diff()       → fetches what changed
       ↓
Agent calls get_file_content()  → gets surrounding context if needed
       ↓
LLM reasons over all collected data
       ↓
Structured review: issues, severity, fixes
```

No hardcoded tool sequence — the LLM decides autonomously.

---

## ✨ Features

- **Autonomous agent loop** — LLM decides which GitHub API tools to call and when
- **3 custom tools** — PR metadata, code diff, file content fetcher
- **Interactive diff visualization** — bar chart showing additions/deletions per file
- **Severity breakdown chart** — donut chart of High/Medium/Low issues
- **Inline diff viewer** — color-coded added/removed lines per file
- **Structured review cards** — each issue has location, severity, problem, and fix
- **Multi-provider support** — works with Groq, Google Gemini, or OpenAI
- **Session-persisted keys** — save API keys once per session, reuse across reviews
- **Free-text model input** — type any model name, no dropdown restrictions

---

## 🧠 How The Agent Works (Technical)

### Tool-Calling Architecture

The LLM does not execute code. It outputs **structured JSON** requesting a tool call:

```json
{
  "tool": "get_pr_diff",
  "arguments": {"pr_url": "https://github.com/owner/repo/pull/123"}
}
```

LangChain reads this output, executes the Python function, returns the result to the LLM as context, and the loop continues until the LLM decides it has enough information.

### The Three Tools

| Tool | Purpose |
|------|---------|
| `get_pr_metadata` | Fetches PR title, author, description, branch info |
| `get_pr_diff` | Fetches file-level diffs — what lines were added/removed |
| `get_file_content` | Fetches full file content for deeper context |

### Why This Is Not Just a Wrapper

A wrapper sends one prompt and gets one response. This agent:
- Makes **multiple sequential API calls** based on intermediate findings
- **Decides dynamically** whether to fetch file content based on what it sees in the diff
- Produces **structured output** parsed into visual components — not just a text blob

---

## 🗂️ Project Structure

```
ai-code-review-agent/
├── app.py                  # Streamlit app — production ready
├── app_colab.ipynb         # Development notebook with all blocks
├── requirements.txt        # Pinned dependencies
└── README.md
```

---

## ⚙️ Setup & Installation

### Local (VS Code)

```bash
# Clone the repo
git clone https://github.com/aneebnaqvi15/ai-code-review-agent
cd ai-code-review-agent

# Create virtual environment
python -m venv venv
venv\Scripts\activate        # Windows
source venv/bin/activate     # Mac/Linux

# Install dependencies
pip install -r requirements.txt

# Run the app
streamlit run app.py
```

App opens at `http://localhost:8501`

### Google Colab

Open `app_colab.ipynb` and run all cells. Uses ngrok for public URL tunneling.

---

## 🔑 API Keys Required

| Key | Where to Get | Purpose |
|-----|-------------|---------|
| Groq API Key | [console.groq.com](https://console.groq.com) | LLM inference (free) |
| Google Gemini Key | [aistudio.google.com](https://aistudio.google.com) | Alternative LLM (free tier) |
| GitHub Token | GitHub → Settings → Developer Settings → PAT | Fetch PR data |

**GitHub Token permissions needed:** `repo` (read only)

---

## 📦 Dependencies

```
langchain==0.2.16
langchain-google-genai==1.0.10
langchain-community==0.2.16
langchain-groq==0.1.10
google-generativeai==0.7.2
pygithub==2.3.0
pydantic==2.7.4
streamlit==1.36.0
plotly
python-dotenv==1.0.1
```

---

## 📊 Example Output

Given a real PR URL, the agent produces:

**Metrics:**
- Total issues found, broken down by severity

**Charts:**
- Bar chart: lines added vs deleted per file
- Donut chart: High / Medium / Low issue distribution

**Per-issue cards:**
```
ISSUE: Missing input validation
FILE: auth.py
SEVERITY: High
PROBLEM: Function accepts raw user input without sanitization
FIX: Add input validation using pydantic or manual type checks before processing
```

---

## 🔭 What I Learned

- **Agents vs wrappers** — the difference is dynamic tool selection, not just chaining prompts
- **Tool definitions are prompts** — how you describe a tool directly affects whether the LLM calls it correctly
- **Structured output matters** — getting the LLM to return parseable, consistent format requires careful prompt design
- **Context window management** — large diffs need truncation strategy or the LLM loses coherence
- **LangChain version pinning** — newer LangChain versions break tool-calling with older Gemini/Groq integrations

---

## 🔮 What I'd Build Next

- [ ] Add evaluation metric: measure review quality against human-written reviews
- [ ] Support for multi-file reasoning across a full repo
- [ ] Webhook integration — auto-review every new PR on push
- [ ] Fine-tuned reviewer model trained on accepted/rejected PR feedback

---

## 🗺️ Portfolio Context

This is part of a 3-project AI/ML portfolio showing progression:

| Project | Skill Demonstrated |
|---------|-------------------|
| [Banking77 Intent Classifier](https://github.com/aneebnaqvi15/banking77-intent-classifier) | Fine-tuning (DistilBERT + LoRA) |
| [Multi-Doc RAG Assistant](#) | Retrieval systems (LangChain + ChromaDB) |
| **AI Code Review Agent** (this) | Autonomous agents + tool-calling |

---

## 👨‍💻 Author

**Aneeb Naqvi** — CS Graduate, Full-Stack & AI Engineer

[![GitHub](https://img.shields.io/badge/GitHub-aneebnaqvi15-black)](https://github.com/aneebnaqvi15)