ai-codelens / README.md
nnsohamnn's picture
fix: update repo metadata to AI-CodeLens
e0bc5ed
metadata
title: AI-CodeLens
emoji: πŸ”
colorFrom: indigo
colorTo: blue
sdk: docker
pinned: false

πŸ” CodeLens β€” Agentic Code Intelligence

CodeLens is a ReAct-style Agentic RAG system for understanding any GitHub codebase. It goes beyond simple search β€” it thinks, plans, and investigates codebases like a senior developer would.


πŸš€ Key Features

  • 🧠 Agentic Brain (Plan & Execute): An LLM planner breaks your query into a multi-step investigation plan. An execution agent then iteratively calls tools, accumulates context, and synthesizes a final answer.
  • πŸ”€ Hybrid RAG Search: Combines ChromaDB semantic similarity with Grep-style keyword matching for maximum code retrieval accuracy.
  • πŸ”§ Tool-Calling Agent Loop: The agent autonomously uses search_code, open_file, and list_files tools β€” deciding which to use at each step based on what it has found.
  • πŸ› Bug Detection Mode: Mention "bug", "error", or "security" in your query to trigger a specialized Security Engineer mode that produces a full vulnerability/bug report with recommended fixes.
  • πŸ€– LLM Re-Ranker: Filters retrieved code chunks using the LLM itself, prioritizing actual logic over boilerplate and UI wrappers.
  • πŸ’Ύ Persistent Caching: Saves vector store state to disk β€” re-loading a previously indexed repo is instant.
  • 🌐 Full Stack Support: Python, JS/TS (TSX/JSX), Go, Rust, C++, Java, and more.

πŸ—οΈ Architecture

User Query
    ↓
Planner (LLM) β†’ generates step-by-step investigation plan
    ↓
Execution Agent Loop:
    β†’ Reason: decides which tool to use
    β†’ Act: search_code / open_file / list_files
    β†’ Observe: accumulates tool results as context
    ↓
Synthesizer (LLM) β†’ final answer grounded in actual code

This is a ReAct (Reason + Act) pattern combined with Plan & Execute β€” the two most foundational agentic AI patterns.


πŸ› οΈ Technical Stack

  • UI: Streamlit
  • Agent System: Custom ReAct Agent (planner.py + agent.py)
  • Tool System: tools.py β€” search_code, open_file, list_files
  • RAG Pipeline: LangChain with Hybrid Search
  • Vector Database: ChromaDB
  • Embeddings: intfloat/e5-small-v2 or all-MiniLM-L6-v2 (Local CPU)
  • LLM Engine: LongCat-Flash-Lite via OpenAI-compatible API
  • Repo Loader: GitLoader with multi-branch auto-detection (main/master)

πŸ“ Project Structure

codelens/
β”œβ”€β”€ app.py              # Streamlit UI with Quick Search + Agentic Search
β”œβ”€β”€ rag_pipeline.py     # Hybrid RAG: ChromaDB + Grep search + LLM re-ranker
β”œβ”€β”€ planner.py          # LLM-based query planner β†’ JSON step list
β”œβ”€β”€ agent.py            # ReAct execution loop + Bug Detection mode
β”œβ”€β”€ tools.py            # search_code, open_file, list_files
β”œβ”€β”€ utils_llm.py        # Shared LLM instance factory
└── requirements.txt

πŸ›°οΈ Deployment on Hugging Face Spaces

1. Configure Environment Secrets

Go to your HF Space Settings β†’ Variables and secrets:

  • Key: OPENAI_API_KEY
  • Value: Your LongCat/OpenAI API Key

2. Dockerfile

The project uses Docker SDK. The Dockerfile is already configured to run the Streamlit app on port 7860.


πŸ› οΈ Local Installation

  1. Clone this repository:

    git clone https://github.com/your-username/codelens.git
    cd codelens
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Run the application:

    streamlit run app.py
    

πŸ“– How to Use

  1. Enter a Public GitHub URL in the sidebar β€” the bot will clone and index the repo.
  2. Wait for Indexing β€” ~2-5 min for 1,000 chunks on standard CPU. Cached on subsequent loads.
  3. Quick Search β€” Fast single-step hybrid RAG answer.
  4. Agentic Search β€” Multi-step investigation: the agent plans, searches, reads files, and synthesizes.
  5. Bug Detection β€” Include "bug", "error", "security", or "vulnerability" in your query for a full bug report.

Example Queries

  • "Trace the full user authentication flow"
  • "How is the database connected and what ORM is used?"
  • "Find potential security bugs in the input validation"
  • "What design patterns are used in this project?"

πŸ’¬ Interview Summary

"I built a ReAct-style agentic system where an LLM planner decomposes queries into steps, then an execution agent iteratively calls tools β€” semantic search, file reading, directory listing β€” accumulating context before synthesizing a final answer. It's grounded by a hybrid RAG pipeline using ChromaDB and grep-based keyword search, with an LLM re-ranker to filter results."


🐱 Built with ❀️ using LangChain, ChromaDB, and Streamlit

Intelligent Code Understanding, powered by Agentic AI.