Spaces:
Sleeping
title: AI-CodeLens
emoji: π
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false
π CodeLens β Agentic Code Intelligence
CodeLens is a ReAct-style Agentic RAG system for understanding any GitHub codebase. It goes beyond simple search β it thinks, plans, and investigates codebases like a senior developer would.
π Key Features
- π§ Agentic Brain (Plan & Execute): An LLM planner breaks your query into a multi-step investigation plan. An execution agent then iteratively calls tools, accumulates context, and synthesizes a final answer.
- π Hybrid RAG Search: Combines ChromaDB semantic similarity with Grep-style keyword matching for maximum code retrieval accuracy.
- π§ Tool-Calling Agent Loop: The agent autonomously uses
search_code,open_file, andlist_filestools β deciding which to use at each step based on what it has found. - π Bug Detection Mode: Mention "bug", "error", or "security" in your query to trigger a specialized Security Engineer mode that produces a full vulnerability/bug report with recommended fixes.
- π€ LLM Re-Ranker: Filters retrieved code chunks using the LLM itself, prioritizing actual logic over boilerplate and UI wrappers.
- πΎ Persistent Caching: Saves vector store state to disk β re-loading a previously indexed repo is instant.
- π Full Stack Support: Python, JS/TS (TSX/JSX), Go, Rust, C++, Java, and more.
ποΈ Architecture
User Query
β
Planner (LLM) β generates step-by-step investigation plan
β
Execution Agent Loop:
β Reason: decides which tool to use
β Act: search_code / open_file / list_files
β Observe: accumulates tool results as context
β
Synthesizer (LLM) β final answer grounded in actual code
This is a ReAct (Reason + Act) pattern combined with Plan & Execute β the two most foundational agentic AI patterns.
π οΈ Technical Stack
- UI: Streamlit
- Agent System: Custom ReAct Agent (
planner.py+agent.py) - Tool System:
tools.pyβsearch_code,open_file,list_files - RAG Pipeline: LangChain with Hybrid Search
- Vector Database: ChromaDB
- Embeddings:
intfloat/e5-small-v2orall-MiniLM-L6-v2(Local CPU) - LLM Engine:
LongCat-Flash-Litevia OpenAI-compatible API - Repo Loader:
GitLoaderwith multi-branch auto-detection (main/master)
π Project Structure
codelens/
βββ app.py # Streamlit UI with Quick Search + Agentic Search
βββ rag_pipeline.py # Hybrid RAG: ChromaDB + Grep search + LLM re-ranker
βββ planner.py # LLM-based query planner β JSON step list
βββ agent.py # ReAct execution loop + Bug Detection mode
βββ tools.py # search_code, open_file, list_files
βββ utils_llm.py # Shared LLM instance factory
βββ requirements.txt
π°οΈ Deployment on Hugging Face Spaces
1. Configure Environment Secrets
Go to your HF Space Settings β Variables and secrets:
- Key:
OPENAI_API_KEY - Value: Your LongCat/OpenAI API Key
2. Dockerfile
The project uses Docker SDK. The Dockerfile is already configured to run the Streamlit app on port 7860.
π οΈ Local Installation
Clone this repository:
git clone https://github.com/your-username/codelens.git cd codelensInstall dependencies:
pip install -r requirements.txtRun the application:
streamlit run app.py
π How to Use
- Enter a Public GitHub URL in the sidebar β the bot will clone and index the repo.
- Wait for Indexing β ~2-5 min for 1,000 chunks on standard CPU. Cached on subsequent loads.
- Quick Search β Fast single-step hybrid RAG answer.
- Agentic Search β Multi-step investigation: the agent plans, searches, reads files, and synthesizes.
- Bug Detection β Include "bug", "error", "security", or "vulnerability" in your query for a full bug report.
Example Queries
- "Trace the full user authentication flow"
- "How is the database connected and what ORM is used?"
- "Find potential security bugs in the input validation"
- "What design patterns are used in this project?"
π¬ Interview Summary
"I built a ReAct-style agentic system where an LLM planner decomposes queries into steps, then an execution agent iteratively calls tools β semantic search, file reading, directory listing β accumulating context before synthesizing a final answer. It's grounded by a hybrid RAG pipeline using ChromaDB and grep-based keyword search, with an LLM re-ranker to filter results."
π± Built with β€οΈ using LangChain, ChromaDB, and Streamlit
Intelligent Code Understanding, powered by Agentic AI.