Spaces:

nnsohamnn
/

ai-codelens

Sleeping

App Files Files Community

ai-codelens / README.md

nnsohamnn

fix: update repo metadata to AI-CodeLens

e0bc5ed 2 months ago

preview code

raw

history blame contribute delete

5.06 kB

	---
	title: AI-CodeLens
	emoji: 🔍
	colorFrom: indigo
	colorTo: blue
	sdk: docker
	pinned: false
	---

	# 🔍 CodeLens — Agentic Code Intelligence

	CodeLens is a ReAct-style Agentic RAG system for understanding any GitHub codebase. It goes beyond simple search — it thinks, plans, and investigates codebases like a senior developer would.

	---

	## 🚀 Key Features

	* 🧠 Agentic Brain (Plan & Execute): An LLM planner breaks your query into a multi-step investigation plan. An execution agent then iteratively calls tools, accumulates context, and synthesizes a final answer.
	* 🔀 Hybrid RAG Search: Combines ChromaDB semantic similarity with Grep-style keyword matching for maximum code retrieval accuracy.
	* 🔧 Tool-Calling Agent Loop: The agent autonomously uses `search_code`, `open_file`, and `list_files` tools — deciding which to use at each step based on what it has found.
	* 🐛 Bug Detection Mode: Mention "bug", "error", or "security" in your query to trigger a specialized Security Engineer mode that produces a full vulnerability/bug report with recommended fixes.
	* 🤖 LLM Re-Ranker: Filters retrieved code chunks using the LLM itself, prioritizing actual logic over boilerplate and UI wrappers.
	* 💾 Persistent Caching: Saves vector store state to disk — re-loading a previously indexed repo is instant.
	* 🌐 Full Stack Support: Python, JS/TS (TSX/JSX), Go, Rust, C++, Java, and more.

	---

	## 🏗️ Architecture

	```
	User Query
	↓
	Planner (LLM) → generates step-by-step investigation plan
	↓
	Execution Agent Loop:
	→ Reason: decides which tool to use
	→ Act: search_code / open_file / list_files
	→ Observe: accumulates tool results as context
	↓
	Synthesizer (LLM) → final answer grounded in actual code
	```

	This is a ReAct (Reason + Act) pattern combined with Plan & Execute — the two most foundational agentic AI patterns.

	---

	## 🛠️ Technical Stack

	* UI: [Streamlit](https://streamlit.io/)
	* Agent System: Custom ReAct Agent (`planner.py` + `agent.py`)
	* Tool System: `tools.py` — `search_code`, `open_file`, `list_files`
	* RAG Pipeline: [LangChain](https://www.langchain.com/) with Hybrid Search
	* Vector Database: [ChromaDB](https://www.trychroma.com/)
	* Embeddings: `intfloat/e5-small-v2` or `all-MiniLM-L6-v2` (Local CPU)
	* LLM Engine: `LongCat-Flash-Lite` via OpenAI-compatible API
	* Repo Loader: `GitLoader` with multi-branch auto-detection (main/master)

	---

	## 📁 Project Structure

	```
	codelens/
	├── app.py # Streamlit UI with Quick Search + Agentic Search
	├── rag_pipeline.py # Hybrid RAG: ChromaDB + Grep search + LLM re-ranker
	├── planner.py # LLM-based query planner → JSON step list
	├── agent.py # ReAct execution loop + Bug Detection mode
	├── tools.py # search_code, open_file, list_files
	├── utils_llm.py # Shared LLM instance factory
	└── requirements.txt
	```

	---

	## 🛰️ Deployment on Hugging Face Spaces

	### 1. Configure Environment Secrets
	Go to your HF Space Settings → Variables and secrets:
	- Key: `OPENAI_API_KEY`
	- Value: Your LongCat/OpenAI API Key

	### 2. Dockerfile
	The project uses Docker SDK. The `Dockerfile` is already configured to run the Streamlit app on port 7860.

	---

	## 🛠️ Local Installation

	1. Clone this repository:
	```bash
	git clone https://github.com/your-username/codelens.git
	cd codelens
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Run the application:
	```bash
	streamlit run app.py
	```

	---

	## 📖 How to Use

	1. Enter a Public GitHub URL in the sidebar — the bot will clone and index the repo.
	2. Wait for Indexing — ~2-5 min for 1,000 chunks on standard CPU. Cached on subsequent loads.
	3. Quick Search — Fast single-step hybrid RAG answer.
	4. Agentic Search — Multi-step investigation: the agent plans, searches, reads files, and synthesizes.
	5. Bug Detection — Include "bug", "error", "security", or "vulnerability" in your query for a full bug report.

	### Example Queries
	- "Trace the full user authentication flow"
	- "How is the database connected and what ORM is used?"
	- "Find potential security bugs in the input validation"
	- "What design patterns are used in this project?"

	---

	## 💬 Interview Summary

	> "I built a ReAct-style agentic system where an LLM planner decomposes queries into steps, then an execution agent iteratively calls tools — semantic search, file reading, directory listing — accumulating context before synthesizing a final answer. It's grounded by a hybrid RAG pipeline using ChromaDB and grep-based keyword search, with an LLM re-ranker to filter results."

	---

	### 🐱 Built with ❤️ using LangChain, ChromaDB, and Streamlit
	Intelligent Code Understanding, powered by Agentic AI.