Spaces:

MCP-1st-Birthday
/

ecomcp

Sleeping

ecomcp / docs /INTEGRATION_GUIDE.md

feat: Implement LlamaIndex integration with new core modules for knowledge base, document loading, vector search, and comprehensive documentation and tests.

108d8af 3 months ago

preview code

raw

history blame contribute delete

9.08 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

LlamaIndex Integration Guide - MCP Server & Gradio UI

Complete integration of LlamaIndex knowledge base into EcoMCP MCP server and Gradio UI.

What's Integrated

1. MCP Server (src/server/mcp_server.py)

Knowledge base initialization on server startup
New tools: knowledge_search, product_query
Semantic search across indexed documents
Natural language Q&A with query engine
Fallback support if LlamaIndex unavailable

2. Gradio UI (src/ui/app.py)

Knowledge Search tab for semantic search
Search type options: All, Products, Documentation
Result display with similarity scores
Dynamic tab (only appears if KB initialized)
Consistent styling with existing UI

3. Core Knowledge Base (src/core/)

Pre-indexed documentation (./docs)
Product data ready for indexing
Metadata extraction (titles, keywords)
Multiple search strategies

New MCP Tools

knowledge_search

Semantic search across knowledge base.

Parameters:

query (string, required): Search query
search_type (string): "all", "products", or "documentation"
top_k (integer): Number of results (1-20, default: 5)

Example:

{
  "name": "knowledge_search",
  "arguments": {
    "query": "wireless headphones features",
    "search_type": "products",
    "top_k": 5
  }
}

Response:

{
  "status": "success",
  "query": "wireless headphones features",
  "search_type": "products",
  "result_count": 3,
  "results": [
    {
      "rank": 1,
      "score": 0.95,
      "content": "Premium wireless headphones with noise cancellation...",
      "source": "products.json"
    },
    ...
  ],
  "timestamp": "2025-11-27T..."
}

product_query

Natural language Q&A with automatic context retrieval.

Parameters:

question (string, required): Natural language question

Example:

{
  "name": "product_query",
  "arguments": {
    "question": "What are the main features of our flagship product?"
  }
}

Response:

{
  "status": "success",
  "question": "What are the main features of our flagship product?",
  "answer": "Based on the documentation, the flagship product offers...",
  "timestamp": "2025-11-27T..."
}

Gradio UI Features

Knowledge Search Tab

Search query input - Natural language or keyword search
Search type selector - Filter by document type
Search button - Trigger semantic search
Results display - Ranked results with scores

Usage:

Enter query: "How to deploy this?"
Select type: "Documentation"
Results show matching docs with relevance scores

Implementation Details

MCP Server Integration

Initialization:

class EcoMCPServer:
    def __init__(self):
        # ... existing code ...
        self.kb = None
        self._init_knowledge_base()
    
    def _init_knowledge_base(self):
        """Initialize LlamaIndex knowledge base"""
        if LLAMAINDEX_AVAILABLE:
            self.kb = EcoMCPKnowledgeBase()
            self.kb.initialize("./docs")

Tool Handlers:

async def call_tool(self, name: str, arguments: Dict) -> Any:
    if name == "knowledge_search":
        return await self._knowledge_search(arguments)
    elif name == "product_query":
        return await self._product_query(arguments)

Search Implementation:

async def _knowledge_search(self, args: Dict) -> Dict:
    if search_type == "products":
        results = self.kb.search_products(query, top_k=top_k)
    elif search_type == "documentation":
        results = self.kb.search_documentation(query, top_k=top_k)
    else:
        results = self.kb.search(query, top_k=top_k)

Gradio UI Integration

Knowledge Base Initialization:

kb = None
if LLAMAINDEX_AVAILABLE:
    try:
        kb = EcoMCPKnowledgeBase()
        if os.path.exists("./docs"):
            kb.initialize("./docs")
    except Exception as e:
        print(f"Warning: {e}")
        kb = None

Search Tab Creation:

if kb and LLAMAINDEX_AVAILABLE:
    with gr.Tab("🔍 Knowledge Search"):
        # Search UI components
        search_btn.click(
            fn=perform_search,
            inputs=[search_query, search_type],
            outputs=output_search
        )

Running the Integration

Prerequisites

pip install -r requirements.txt
export OPENAI_API_KEY=sk-...

Start MCP Server

python src/server/mcp_server.py

Start Gradio UI

python src/ui/app.py
# Opens at http://localhost:7860

Verify Integration

Check MCP server logs for "Knowledge base initialized successfully"
In Gradio UI, verify "Knowledge Search" tab appears
Try a search query to test functionality

Integration Flow

User Input (Gradio UI)
    ↓
Gradio Handler (perform_search)
    ↓
EcoMCPKnowledgeBase.search()
    ↓
VectorSearchEngine.search()
    ↓
VectorStoreIndex.retrieve()
    ↓
Display Results (Gradio Markdown)

OR (via MCP)

Client → MCP JSON-RPC
    ↓
EcoMCPServer.call_tool("knowledge_search")
    ↓
Server._knowledge_search()
    ↓
Knowledge Base Search
    ↓
Return Results (JSON)

Search Behavior

Semantic Search

Uses OpenAI embeddings (text-embedding-3-small)
Finds semantically similar content
Works with natural language queries
Returns similarity scores (0-1)

Search Types

All: Searches products and documentation
Products: Only product-related documents
Documentation: Only documentation files

Result Scoring

Score 0.95+ : Highly relevant
Score 0.80-0.95 : Very relevant
Score 0.70-0.80 : Relevant
Score < 0.70 : Loosely related

Data Sources

Indexed Documents

Documentation (./docs/*.md)
- Guides, tutorials, references
- Implementation details
- Deployment instructions
Products (optional)
- Product catalog data
- Features and specifications
- Pricing information

Adding More Data

Index new documents:

kb = EcoMCPKnowledgeBase()
kb.initialize("./docs")
kb.add_products(product_list)
kb.add_urls(["https://example.com/page"])

Save indexed data:

kb.save("./kb_backup")

Load from backup:

kb2 = EcoMCPKnowledgeBase()
kb2.load("./kb_backup")

Configuration

Server-Side (mcp_server.py)

# Knowledge base path
docs_path = "./docs"

# Automatic initialization on startup
self.kb = EcoMCPKnowledgeBase()
self.kb.initialize(docs_path)

Gradio UI (app.py)

# Knowledge base initialization
kb = EcoMCPKnowledgeBase()
kb.initialize("./docs")

# Search parameters
top_k = 5  # Number of results

Error Handling

KB Not Initialized

{
  "status": "error",
  "error": "Knowledge base not initialized"
}

Query Empty

{
  "status": "error",
  "error": "Query is required"
}

No Results Found

No results found for your query.

Performance

Search Speed

First search: 1-2 seconds (loading model)
Subsequent searches: 0.1-0.5 seconds
With Pinecone: < 100ms

Index Size

Small (100 docs): < 100 MB
Medium (1000 docs): < 500 MB
Large (10000 docs): < 5 GB

Optimization Tips

Use similarity_top_k=3 for speed
Use similarity_top_k=10 for quality
Use Pinecone for production (millions of docs)
Cache results when possible

Troubleshooting

Knowledge base not initializing

Check that ./docs directory exists and contains files

Search tab not appearing

Verify LlamaIndex is installed: pip install -r requirements.txt
Check for errors in server logs

Slow searches

Reduce top_k parameter
Use smaller embedding model (text-embedding-3-small)
Enable Pinecone backend for production

API errors

Verify OPENAI_API_KEY is set
Check OpenAI account has credits
Monitor API usage and rate limits

Testing the Integration

Test MCP Tool

# Test knowledge_search
tool_args = {
    "query": "product features",
    "search_type": "all",
    "top_k": 5
}
result = await server.call_tool("knowledge_search", tool_args)

# Test product_query
tool_args = {
    "question": "What is the main product?"
}
result = await server.call_tool("product_query", tool_args)

Test Gradio UI

Navigate to http://localhost:7860
Click "Knowledge Search" tab
Enter test query: "documentation"
Select search type: "Documentation"
Click "Search"
Verify results appear

Next Steps

Index Product Data: Add your product catalog
Deploy Server: Use Modal or Docker
Customize Search: Adjust chunk size and embedding model
Add Analytics: Track search queries and results
Optimize Performance: Profile and benchmark