Spaces:

aneeb15
/

ai-code-review-agent

Sleeping

App Files Files Community

ai-code-review-agent / README.md

aneeb15

Switch to Docker SDK with Python 3.10 to fix build errors

3e00265 26 days ago

preview code

raw

history blame contribute delete

6.82 kB

metadata

title: AI Code Review Agent
emoji: 🚀
sdk: docker
app_port: 7860
pinned: false

🔬 AI Code Review Agent

An autonomous LLM-powered agent that reviews GitHub Pull Requests — fetches diffs, analyzes code changes, and returns structured feedback with severity ratings and interactive visualizations.

📌 What This Project Is

This is Project 3 of my AI/ML portfolio — an autonomous agent built with LangChain's tool-calling framework. Unlike a simple LLM wrapper that just sends code to a model, this is a true agent: it decides which tools to call, in what order, and reasons over intermediate results before producing a final review.

The agent follows the ReAct pattern (Reason + Act):

User pastes PR URL
       ↓
Agent calls get_pr_metadata()   → understands intent of PR
       ↓
Agent calls get_pr_diff()       → fetches what changed
       ↓
Agent calls get_file_content()  → gets surrounding context if needed
       ↓
LLM reasons over all collected data
       ↓
Structured review: issues, severity, fixes

No hardcoded tool sequence — the LLM decides autonomously.

✨ Features

Autonomous agent loop — LLM decides which GitHub API tools to call and when
3 custom tools — PR metadata, code diff, file content fetcher
Interactive diff visualization — bar chart showing additions/deletions per file
Severity breakdown chart — donut chart of High/Medium/Low issues
Inline diff viewer — color-coded added/removed lines per file
Structured review cards — each issue has location, severity, problem, and fix
Multi-provider support — works with Groq, Google Gemini, or OpenAI
Session-persisted keys — save API keys once per session, reuse across reviews
Free-text model input — type any model name, no dropdown restrictions

🧠 How The Agent Works (Technical)

Tool-Calling Architecture

The LLM does not execute code. It outputs structured JSON requesting a tool call:

{
  "tool": "get_pr_diff",
  "arguments": {"pr_url": "https://github.com/owner/repo/pull/123"}
}

LangChain reads this output, executes the Python function, returns the result to the LLM as context, and the loop continues until the LLM decides it has enough information.

The Three Tools

Tool	Purpose
`get_pr_metadata`	Fetches PR title, author, description, branch info
`get_pr_diff`	Fetches file-level diffs — what lines were added/removed
`get_file_content`	Fetches full file content for deeper context

Why This Is Not Just a Wrapper

A wrapper sends one prompt and gets one response. This agent:

Makes multiple sequential API calls based on intermediate findings
Decides dynamically whether to fetch file content based on what it sees in the diff
Produces structured output parsed into visual components — not just a text blob

🗂️ Project Structure

ai-code-review-agent/
├── app.py                  # Streamlit app — production ready
├── app_colab.ipynb         # Development notebook with all blocks
├── requirements.txt        # Pinned dependencies
└── README.md

⚙️ Setup & Installation

Local (VS Code)

# Clone the repo
git clone https://github.com/aneebnaqvi15/ai-code-review-agent
cd ai-code-review-agent

# Create virtual environment
python -m venv venv
venv\Scripts\activate        # Windows
source venv/bin/activate     # Mac/Linux

# Install dependencies
pip install -r requirements.txt

# Run the app
streamlit run app.py

App opens at http://localhost:8501

Google Colab

Open app_colab.ipynb and run all cells. Uses ngrok for public URL tunneling.

🔑 API Keys Required

Key	Where to Get	Purpose
Groq API Key	console.groq.com	LLM inference (free)
Google Gemini Key	aistudio.google.com	Alternative LLM (free tier)
GitHub Token	GitHub → Settings → Developer Settings → PAT	Fetch PR data

GitHub Token permissions needed: repo (read only)

📦 Dependencies

langchain==0.2.16
langchain-google-genai==1.0.10
langchain-community==0.2.16
langchain-groq==0.1.10
google-generativeai==0.7.2
pygithub==2.3.0
pydantic==2.7.4
streamlit==1.36.0
plotly
python-dotenv==1.0.1

📊 Example Output

Given a real PR URL, the agent produces:

Metrics:

Total issues found, broken down by severity

Charts:

Bar chart: lines added vs deleted per file
Donut chart: High / Medium / Low issue distribution

Per-issue cards:

ISSUE: Missing input validation
FILE: auth.py
SEVERITY: High
PROBLEM: Function accepts raw user input without sanitization
FIX: Add input validation using pydantic or manual type checks before processing

🔭 What I Learned

Agents vs wrappers — the difference is dynamic tool selection, not just chaining prompts
Tool definitions are prompts — how you describe a tool directly affects whether the LLM calls it correctly
Structured output matters — getting the LLM to return parseable, consistent format requires careful prompt design
Context window management — large diffs need truncation strategy or the LLM loses coherence
LangChain version pinning — newer LangChain versions break tool-calling with older Gemini/Groq integrations

🔮 What I'd Build Next

Add evaluation metric: measure review quality against human-written reviews
Support for multi-file reasoning across a full repo
Webhook integration — auto-review every new PR on push
Fine-tuned reviewer model trained on accepted/rejected PR feedback

🗺️ Portfolio Context

This is part of a 3-project AI/ML portfolio showing progression:

Project	Skill Demonstrated
Banking77 Intent Classifier	Fine-tuning (DistilBERT + LoRA)
Multi-Doc RAG Assistant	Retrieval systems (LangChain + ChromaDB)
AI Code Review Agent (this)	Autonomous agents + tool-calling

👨‍💻 Author

Aneeb Naqvi — CS Graduate, Full-Stack & AI Engineer