ai-codelens / README.md
nnsohamnn's picture
fix: update repo metadata to AI-CodeLens
e0bc5ed
---
title: AI-CodeLens
emoji: πŸ”
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false
---
# πŸ” CodeLens β€” Agentic Code Intelligence
**CodeLens** is a **ReAct-style Agentic RAG system** for understanding any GitHub codebase. It goes beyond simple search β€” it *thinks*, *plans*, and *investigates* codebases like a senior developer would.
---
## πŸš€ Key Features
* **🧠 Agentic Brain (Plan & Execute)**: An LLM planner breaks your query into a multi-step investigation plan. An execution agent then iteratively calls tools, accumulates context, and synthesizes a final answer.
* **πŸ”€ Hybrid RAG Search**: Combines **ChromaDB** semantic similarity with **Grep-style** keyword matching for maximum code retrieval accuracy.
* **πŸ”§ Tool-Calling Agent Loop**: The agent autonomously uses `search_code`, `open_file`, and `list_files` tools β€” deciding which to use at each step based on what it has found.
* **πŸ› Bug Detection Mode**: Mention "bug", "error", or "security" in your query to trigger a specialized **Security Engineer mode** that produces a full vulnerability/bug report with recommended fixes.
* **πŸ€– LLM Re-Ranker**: Filters retrieved code chunks using the LLM itself, prioritizing actual logic over boilerplate and UI wrappers.
* **πŸ’Ύ Persistent Caching**: Saves vector store state to disk β€” re-loading a previously indexed repo is instant.
* **🌐 Full Stack Support**: Python, JS/TS (TSX/JSX), Go, Rust, C++, Java, and more.
---
## πŸ—οΈ Architecture
```
User Query
↓
Planner (LLM) β†’ generates step-by-step investigation plan
↓
Execution Agent Loop:
β†’ Reason: decides which tool to use
β†’ Act: search_code / open_file / list_files
β†’ Observe: accumulates tool results as context
↓
Synthesizer (LLM) β†’ final answer grounded in actual code
```
This is a **ReAct (Reason + Act)** pattern combined with **Plan & Execute** β€” the two most foundational agentic AI patterns.
---
## πŸ› οΈ Technical Stack
* **UI**: [Streamlit](https://streamlit.io/)
* **Agent System**: Custom ReAct Agent (`planner.py` + `agent.py`)
* **Tool System**: `tools.py` β€” `search_code`, `open_file`, `list_files`
* **RAG Pipeline**: [LangChain](https://www.langchain.com/) with Hybrid Search
* **Vector Database**: [ChromaDB](https://www.trychroma.com/)
* **Embeddings**: `intfloat/e5-small-v2` or `all-MiniLM-L6-v2` (Local CPU)
* **LLM Engine**: `LongCat-Flash-Lite` via OpenAI-compatible API
* **Repo Loader**: `GitLoader` with multi-branch auto-detection (main/master)
---
## πŸ“ Project Structure
```
codelens/
β”œβ”€β”€ app.py # Streamlit UI with Quick Search + Agentic Search
β”œβ”€β”€ rag_pipeline.py # Hybrid RAG: ChromaDB + Grep search + LLM re-ranker
β”œβ”€β”€ planner.py # LLM-based query planner β†’ JSON step list
β”œβ”€β”€ agent.py # ReAct execution loop + Bug Detection mode
β”œβ”€β”€ tools.py # search_code, open_file, list_files
β”œβ”€β”€ utils_llm.py # Shared LLM instance factory
└── requirements.txt
```
---
## πŸ›°οΈ Deployment on Hugging Face Spaces
### 1. Configure Environment Secrets
Go to your HF Space **Settings** β†’ **Variables and secrets**:
- **Key**: `OPENAI_API_KEY`
- **Value**: *Your LongCat/OpenAI API Key*
### 2. Dockerfile
The project uses Docker SDK. The `Dockerfile` is already configured to run the Streamlit app on port 7860.
---
## πŸ› οΈ Local Installation
1. **Clone this repository**:
```bash
git clone https://github.com/your-username/codelens.git
cd codelens
```
2. **Install dependencies**:
```bash
pip install -r requirements.txt
```
3. **Run the application**:
```bash
streamlit run app.py
```
---
## πŸ“– How to Use
1. **Enter a Public GitHub URL** in the sidebar β€” the bot will clone and index the repo.
2. **Wait for Indexing** β€” ~2-5 min for 1,000 chunks on standard CPU. Cached on subsequent loads.
3. **Quick Search** β€” Fast single-step hybrid RAG answer.
4. **Agentic Search** β€” Multi-step investigation: the agent plans, searches, reads files, and synthesizes.
5. **Bug Detection** β€” Include "bug", "error", "security", or "vulnerability" in your query for a full bug report.
### Example Queries
- *"Trace the full user authentication flow"*
- *"How is the database connected and what ORM is used?"*
- *"Find potential security bugs in the input validation"*
- *"What design patterns are used in this project?"*
---
## πŸ’¬ Interview Summary
> *"I built a ReAct-style agentic system where an LLM planner decomposes queries into steps, then an execution agent iteratively calls tools β€” semantic search, file reading, directory listing β€” accumulating context before synthesizing a final answer. It's grounded by a hybrid RAG pipeline using ChromaDB and grep-based keyword search, with an LLM re-ranker to filter results."*
---
### 🐱 Built with ❀️ using LangChain, ChromaDB, and Streamlit
*Intelligent Code Understanding, powered by Agentic AI.*