Spaces:
Running on Zero
Running on Zero
File size: 3,744 Bytes
5a0d646 2845a96 5a0d646 2845a96 5e253ed 5a0d646 49fb46a 5a0d646 833914e 49fb46a 5a0d646 49fb46a 5a0d646 3bfffa2 49fb46a 5a0d646 49fb46a 5a0d646 49fb46a 5a0d646 49fb46a 5a0d646 49fb46a 5a0d646 49fb46a 5a0d646 49fb46a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | ---
title: CommitLens
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.18.0
python_version: '3.12'
app_file: app.py
pinned: true
license: mit
short_description: urn any Git commit into a human-readable engineering report.
tags:
- track:backyard
- sponsor:nvidia
- achievement:offgrid
- achievement:offbrand
- achievement:sharing
- achievement:fieldnotes
---
# CommitLens β AI-Powered Code Review Pipeline
https://huggingface.co/pkheria
**CommitLens** is a high-performance information extraction and analysis pipeline that transforms raw GitHub diffs into structured, human-readable engineering reports. It uses a hybrid LLM approach: **JetBrains Mellum 2** for deep per-file analysis and **Groq-hosted Llama 3.3** for lightning-fast synthesis.



## π Resources & Links
- **Demo Video:** [Watch the Product Demo](https://youtu.be/TR8iNU5XnXw)
- **Social Post :** [Linkedin Post](https://www.linkedin.com/posts/piyushkheria7_buildsmall-opensource-devtools-share-7472362275476041728-FdIx/)
## π Key Features
- **Automated Diff Extraction**: Fetches the two latest commits from any GitHub repository and generates semantic diffs.
- **Top-Impact Filtering**: Automatically identifies and prioritizes the most significant changes (top 2 files by lines changed) to ensure high-signal reviews.
- **Hybrid LLM Pipeline**:
- **Mellum 2 (12B)**: Performs surgical, per-file code analysis. Optimized with 6-bit NF4 quantization for efficient GPU utilization.
- **Groq (Llama 3.3 70B)**: Generates a high-level executive summary and key takeaways in milliseconds.
- **Cinematic UI**: A bespoke, low-latency frontend featuring a custom particle engine, real-time status tracking, and a "git-graph" hero visualization.
## π Tech Stack
- **Core**: Python 3.12, FastAPI, Gradio (Server Mode).
- **ML/Inference**: `transformers`, `bitsandbytes` (4-bit/6-bit quantization), `torch`, `spaces` (ZeroGPU).
- **APIs**: GitHub REST API, Groq Cloud API.
- **Frontend**: Vanilla JavaScript (ES6+), HTML5 Canvas, CSS3 Grid/Flexbox.
## π Project Structure
| File | Purpose |
|------|---------|
| `app.py` | Main application server; manages model lifecycle and GPU/API orchestration. |
| `commitlens.py` | Data pipeline; handles GitHub API interaction, file filtering, and prompt engineering. |
| `index.html` | Custom-built, high-fidelity frontend with interactive Git visualizations. |
| `requirements.txt` | Dependency manifest (requests, gradio, torch, transformers, etc.). |
## βοΈ How It Works
1. **Extraction**: The `GitHubClient` fetches commit metadata and raw patches.
2. **Filtering**: Files are filtered by extension (keeping source code, ignoring binaries/locks) and sorted by impact.
3. **Mellum Analysis**: The pipeline builds structured prompts containing "Before", "After", and "Diff" blocks. Mellum 2 generates concise summaries for each file.
4. **Groq Synthesis**: Per-file summaries are batched and sent to Groq for a final structured Markdown report including a "Commit Overview" and "Key Takeaways".
## π Setup & Usage
### Local Development
1. **Install dependencies**:
```bash
pip install -r requirements.txt
```
2. **Set Environment Variables**:
```bash
export GROQ_API_KEY="your_groq_api_key"
```
3. **Run the application**:
```bash
python app.py
```
### CLI Mode
You can also run the extraction pipeline directly:
```bash
python commitlens.py <github_repo_url> --token <optional_pat> --print-prompts
```
## π License
MIT
|