Spaces:
Runtime error
title: EPITA CodeVoyager on π€ Transformer Library
emoji: π
colorFrom: pink
colorTo: red
sdk: docker
pinned: false
tags:
- mcp-in-action-track-enterprise
π EPITA CodeVoyager
A conversational AI agent that helps you navigate and understand large codebases through natural language
π What is EPITA CodeVoyager?
EPITA CodeVoyager is an interactive chat agent powered by Smolagents that connects to the EPITA Codebase Knowledge Graph MCP Server. It enables users to ask natural language questions about a codebase and receive accurate, grounded answers based on the actual code β not hallucinations.
How It Works
Traditional LLMs generate answers from their training data, which can lead to outdated or fabricated information about specific codebases. EPITA CodeVoyager takes a different approach:
Tool-Augmented Reasoning: Instead of guessing, the agent uses MCP (Model Context Protocol) tools to actively query the knowledge graph β searching for code, navigating relationships, and retrieving actual implementations.
Grounded Responses: Every answer is backed by real code snippets, file paths, and structural information extracted directly from the repository.
Multi-Step Exploration: Complex questions trigger chains of tool calls. For example, understanding how a class works might require: searching for its definition β examining its methods β tracing its inheritance hierarchy β finding usage examples.
Streaming Transparency: Users see the agent's reasoning process in real-time β which tools are called, what information is retrieved, and how the final answer is synthesized.
π€ Showcase: Hugging Face Transformers Library
We demonstrate EPITA CodeVoyager on the Hugging Face Transformers library β one of the most popular open-source ML libraries with:
- 4,000+ files across multiple modules
- 400,000+ lines of code
- Hundreds of model implementations (BERT, GPT, LLaMA, etc.)
- Complex inheritance hierarchies and cross-file dependencies
This showcase demonstrates how the agent can help users understand and navigate even the most complex codebases through simple conversational queries.
π― Why This Matters for Education
Understanding large codebases is a fundamental skill for software engineers. At EPITA (Γcole pour l'informatique et les techniques avancΓ©es), students work on increasingly complex projects and need to understand codebases β whether their own, their teammates', or open-source libraries.
LLM-based coding assistants face significant challenges with large repositories: context window limitations, lack of structural awareness, missing relationships, and inefficient search. EPITA CodeVoyager solves these problems by using MCP tools to search, navigate, and understand code repositories intelligently, making it an ideal assistant for developers, students, and educators exploring complex codebases.
π¬ Use Case: EPITA Coding Courses
EPITA CodeVoyager was developed with educational applications in mind, specifically to support EPITA coding courses.
π― The Educational Challenge
At EPITA, students work on increasingly complex software projects throughout their curriculum. Understanding large codebases β whether their own, their teammates', or open-source libraries like Transformers β is a fundamental skill for any computer science engineer.
However, navigating a library with thousands of files is overwhelming. Students often:
- Struggle to find where specific functionality is implemented
- Don't understand how different components connect
- Spend hours reading code without grasping the big picture
- Miss important design patterns and architectural decisions
π‘ How EPITA CodeVoyager Helps
EPITA CodeVoyager addresses these challenges by enabling students to ask questions in natural language:
π Intelligent Code Q&A
Instead of manually searching through thousands of files, users can simply ask:
- "How does the
AutoModelclass work?" - "What classes inherit from
PreTrainedModel?" - "How is tokenization implemented in the library?"
- "What files are involved in the BERT implementation?"
The agent uses MCP tools from the EPITA Codebase Knowledge Graph MCP Server to explore the codebase, gather relevant information, and provide accurate, well-structured answers grounded in the actual code.
π Learning Through Exploration
For EPITA courses and code learning in general, EPITA CodeVoyager helps users:
- Understand Architecture: Ask about how components are organized and connected
- Trace Code Flow: Follow function calls and understand execution paths
- Learn Design Patterns: Discover implementation patterns used in real-world libraries
- Prepare for Code Reviews: Understand unfamiliar code before reviewing or contributing
π EPITA Course Integration
- Interactive Learning: Students can explore open-source libraries conversationally
- Office Hours Support: Integrate with tutoring systems to answer code-related questions
- Project Onboarding: Help students understand project codebases quickly
- Self-Paced Study: Enable students to learn complex libraries at their own pace
π Broader Applications
Beyond the Transformers library showcase, EPITA CodeVoyager (backed by the EPITA Codebase Knowledge Graph MCP Server) can be applied to any codebase:
- Student Projects: Help students understand their teammates' code during group projects
- Open Source Onboarding: Quickly learn how popular libraries are structured
- Code Reviews: Understand unfamiliar code before reviewing or contributing
- Research: Analyze code patterns across different repositories
- Industry: Onboard new developers to large enterprise codebases
ποΈ Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER (Gradio UI) β
β β
β "How does BertModel's forward method work?" β
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EPITA CODEVOYAGER β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ToolCallingAgent β β
β β β’ Receives natural language question β β
β β β’ Decides which MCP tools to call β β
β β β’ Chains multiple tool calls if needed β β
β β β’ Synthesizes final answer from tool results β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β LLM Backend: Any OpenAI-compatible API β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Supports any HTTP REST service with OpenAI-style interface β β
β β (OpenAI, Azure, HuggingFace Inference, vLLM, Ollama, etc.) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β MCP Protocol (HTTP)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EPITA CODEBASE KNOWLEDGE GRAPH MCP SERVER β
β β
β Tools Used by Agent: β
β βββββββββββββββ ββββββββββββββ ββββββββββββββββ β
β βsearch_nodes β βgo_to_def β βfind_usages β β
β βββββββββββββββ ββββββββββββββ ββββββββββββββββ β
β βββββββββββββββ ββββββββββββββ ββββββββββββββββ β
β βget_node_infoβ βget_file_ β βget_neighbors β β
β β β βstructure β β β β
β βββββββββββββββ ββββββββββββββ ββββββββββββββββ β
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β KNOWLEDGE GRAPH β
β (Hugging Face Transformers Library) β
β β
β β’ 4,000+ files indexed β
β β’ 400k+ lines of code β
β β’ Functions, classes, relationships extracted β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π οΈ Features
Multi-Provider LLM Support
The agent supports multiple LLM backends:
| Provider | Models | Configuration |
|---|---|---|
| OpenAI | gpt-5, gpt-5o, gpt-4o-mini, etc. | API key + base URL |
| Azure OpenAI | gpt-5, gpt-4, gpt-4o (deployed) | API key + endpoint + version |
| HuggingFace Inference | Qwen2.5-Coder-32B, Llama-3.1, etc. | HF token + optional provider |
Streaming Responses
The agent streams responses in real-time, showing:
- π§ Model Thinking: Internal reasoning displayed in collapsible sections
- π§ Tool Calls: Which MCP tools are being invoked
- π¬ Final Answer: Synthesized response based on code exploration
Configurable Reasoning Steps
Control how deeply the agent explores:
- Max Steps: Limit the number of tool calls per query (default: 5)
- Lower values = faster responses, higher values = more thorough exploration
π Getting Started
Configure the LLM in the Web UI
Once launched, open the Gradio interface and configure your LLM provider:
For OpenAI:
- Model Type:
openai - Model Name:
gpt-4o-mini(orgpt-4o,gpt-4-turbo) - API Key: Your OpenAI API key
- Base URL:
https://api.openai.com/v1
For Azure OpenAI:
- Model Type:
azure - Model Name: Your deployment name
- Azure API Key: Your Azure API key
- Azure Endpoint:
https://your-resource.openai.azure.com - API Version:
2024-02-15-preview
For HuggingFace Inference:
- Model Type:
hf_inference - Model Name:
Qwen/Qwen2.5-Coder-32B-Instruct - HuggingFace Token: Your HF API token
- Provider (optional):
together,fireworks-ai,cerebras
π‘ Example Interactions
Understanding a Class
User: "How does the AutoModel class work?"
Agent:
- Calls
search_nodes("AutoModel") - Calls
get_node_info("src/transformers/models/auto/auto_factory.py_3") - Calls
get_file_structure("src/transformers/models/auto/auto_factory.py") - Synthesizes response explaining the auto-class factory pattern
Tracing Dependencies
User: "What classes inherit from PreTrainedModel?"
Agent:
- Calls
go_to_definition("PreTrainedModel") - Calls
find_usages("PreTrainedModel") - Returns list of model classes with inheritance relationships
Exploring Implementation
User: "How does tokenization work in the library?"
Agent:
- Calls
search_nodes("tokenization") - Calls
get_neighbors("src/transformers/tokenization_utils_base.py") - Calls
get_file_structure("src/transformers/tokenization_utils.py") - Explains the tokenizer hierarchy and key methods
π§ Agent Internals
KnowledgeGraphChatAgent Class
The main agent class (powering EPITA CodeVoyager) handles:
class KnowledgeGraphChatAgent:
def __init__(self, mcp_server_url: str):
# Connect to MCP server and load tools
self._initialize_mcp_tools()
def _initialize_model(self, model_type, api_key, ...):
# Configure OpenAI, Azure, or HF Inference backend
def _initialize_agent(self, max_steps):
# Create ToolCallingAgent with MCP tools
def chat(self, message, history):
# Stream responses using stream_to_gradio
Custom Instructions
EPITA CodeVoyager is configured with domain-specific instructions for the Transformers library:
CUSTOM_INSTRUCTIONS = """You are an expert assistant for understanding the Hugging Face Transformers library.
Your role is to help users understand the Transformers codebase by exploring the repository using the available tools. You can:
- Search for functions, classes, and methods in the codebase
- Navigate the file structure and understand code organization
- Find relationships between different components
- Trace how code flows through the library
- Explain implementation details and design patterns
When answering questions:
1. Use the available tools to explore the repository and gather accurate information
2. Provide clear, well-structured explanations based on the actual code
3. Reference specific files, functions, or classes when relevant
4. If you're unsure about something, search the codebase to verify before answering
Always base your answers on the actual code in the repository, not assumptions."""
π₯ Team
Team Name: CEPIA Ionis Team
Team Members:
- Laila ELKOUSSY - @lailaelkoussy - Research Engineer, Data Scientist
- Julien PEREZ - @jnm38 - Research Director
π License
This project is developed as part of research at EPITA / Ionis Group.
π Related Resources
- Smolagents - The agent framework used
- Model Context Protocol (MCP) - The protocol standard
- Gradio - Web interface framework