import React, { useState } from 'react'; import { ChevronRight, ChevronDown, Database, Code, Brain, Search, FileText, GitBranch, Layers, Workflow, Server, Cpu, ArrowRight, Zap } from 'lucide-react'; const ArchitectureViz = () => { const [activeTab, setActiveTab] = useState('overview'); const [expandedSections, setExpandedSections] = useState({}); const toggleSection = (section) => { setExpandedSections(prev => ({ ...prev, [section]: !prev[section] })); }; const tabs = [ { id: 'overview', label: 'System Overview', icon: Layers }, { id: 'rag', label: 'RAG Pipeline', icon: Search }, { id: 'ast', label: 'AST & Graphs', icon: GitBranch }, { id: 'chunking', label: 'Code Chunking', icon: Code }, { id: 'agent', label: 'Agentic Workflow', icon: Brain }, { id: 'retrieval', label: 'Retrieval System', icon: Database }, ]; const ComponentCard = ({ title, description, icon: Icon, color, children }) => (
{description}
{children}An AI-powered codebase assistant combining RAG, AST analysis, Graph databases, and Agentic workflows.
The RAG (Retrieval-Augmented Generation) system combines vector search with LLM-based file selection and cross-encoder reranking for high-precision code retrieval.
{`query = "How does authentication work?"
# Optionally expand with multi-query
expanded_queries = multi_query_expander(query)`}
{`# Vector similarity search (60% weight)
vector_docs = chroma_db.similarity_search(query, k=10)
# LLM-based file selection (40% weight)
llm_docs = llm_retriever.select_files(query, file_tree)
# Combine with EnsembleRetriever
combined = ensemble([vector_docs, llm_docs], weights=[0.6, 0.4])`}
{`# For each retrieved doc, find related files via AST graph
for doc in combined:
neighbors = ast_graph.neighbors(doc.file_path)
for neighbor in neighbors:
if relation == "imports" or relation == "calls":
augmented_docs.append(read_file(neighbor))`}
{`# Score each (query, document) pair with cross-encoder
pairs = [[query, doc.content] for doc in augmented_docs]
scores = cross_encoder.predict(pairs)
# Return top 5 by score
final_docs = sorted(zip(docs, scores), by=score)[:5]`}
{`# Build context from retrieved docs
context = format_docs(final_docs)
# Generate answer with LLM
prompt = system_prompt.format(context=context)
answer = llm.invoke([SystemMessage(prompt), HumanMessage(query)])`}
ChatEngine class
RerankingRetriever
LLM-based file selection
Cross-encoder reranking
Uses tree-sitter to parse code into Abstract Syntax Trees, then builds a NetworkX directed graph capturing code relationships.
{`# Source Code
class UserService:
def get_user(self, user_id):
return self.db.find(user_id) # calls db.find
# Generated Graph
(file: user_service.py)
│
└──defines──▶ (class: UserService)
│
└──has_method──▶ (method: get_user)
│
└──calls──▶ (function: db.find)`}
find_callers("authenticate")
→ Returns all functions that call authenticate()
find_callees("process_request")
→ Returns all functions called by process_request()
find_call_chain("main", "save_to_db")
→ Returns execution paths from main() to save_to_db()
Unlike naive text splitting, this system uses tree-sitter to chunk code at semantic boundaries (functions, classes) while respecting token limits.
{`def process_data():
data = load()
# ──────────────── CHUNK BREAK ────
result = transform(data)
return result # Broken mid-function!`}
{`# CHUNK 1 - Complete function
def process_data():
data = load()
result = transform(data)
return result
# CHUNK 2 - Complete function
def another_func():
...`}
{`FileChunk {
file_path: "src/auth/login.py",
start_byte: 245,
end_byte: 892,
line_range: "L12-L45",
language: "python",
chunk_type: "function_definition",
name: "authenticate",
// Enhanced metadata
symbols_defined: ["authenticate", "verify_token"],
imports_used: ["from jwt import decode"],
complexity_score: 7, // Cyclomatic complexity
parent_context: "AuthService" // Parent class
}`}
The agent can perform multi-step reasoning using tools, enabling complex analysis that simple RAG cannot handle.
search_codebase
Vector search in codebase
read_file
Read complete file content
list_files
Directory listing
find_callers
Who calls this function?
find_callees
What does this call?
find_call_chain
Trace execution path
search_codebase("login authentication")
read_file("src/auth/login.py")
find_callees("authenticate")
Semantic similarity search in Chroma/FAISS using embeddings
LLM analyzes file tree structure and selects relevant files
Weighted merge: 60% vector + 40% LLM selection
Add related files from AST graph (imports, calls)
Score each (query, doc) pair, return top 5
Interactive System Documentation