ecomcp / docs /INTEGRATION_GUIDE.md
vinhnx90's picture
feat: Implement LlamaIndex integration with new core modules for knowledge base, document loading, vector search, and comprehensive documentation and tests.
108d8af

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

LlamaIndex Integration Guide - MCP Server & Gradio UI

Complete integration of LlamaIndex knowledge base into EcoMCP MCP server and Gradio UI.

What's Integrated

1. MCP Server (src/server/mcp_server.py)

  • Knowledge base initialization on server startup
  • New tools: knowledge_search, product_query
  • Semantic search across indexed documents
  • Natural language Q&A with query engine
  • Fallback support if LlamaIndex unavailable

2. Gradio UI (src/ui/app.py)

  • Knowledge Search tab for semantic search
  • Search type options: All, Products, Documentation
  • Result display with similarity scores
  • Dynamic tab (only appears if KB initialized)
  • Consistent styling with existing UI

3. Core Knowledge Base (src/core/)

  • Pre-indexed documentation (./docs)
  • Product data ready for indexing
  • Metadata extraction (titles, keywords)
  • Multiple search strategies

New MCP Tools

knowledge_search

Semantic search across knowledge base.

Parameters:

  • query (string, required): Search query
  • search_type (string): "all", "products", or "documentation"
  • top_k (integer): Number of results (1-20, default: 5)

Example:

{
  "name": "knowledge_search",
  "arguments": {
    "query": "wireless headphones features",
    "search_type": "products",
    "top_k": 5
  }
}

Response:

{
  "status": "success",
  "query": "wireless headphones features",
  "search_type": "products",
  "result_count": 3,
  "results": [
    {
      "rank": 1,
      "score": 0.95,
      "content": "Premium wireless headphones with noise cancellation...",
      "source": "products.json"
    },
    ...
  ],
  "timestamp": "2025-11-27T..."
}

product_query

Natural language Q&A with automatic context retrieval.

Parameters:

  • question (string, required): Natural language question

Example:

{
  "name": "product_query",
  "arguments": {
    "question": "What are the main features of our flagship product?"
  }
}

Response:

{
  "status": "success",
  "question": "What are the main features of our flagship product?",
  "answer": "Based on the documentation, the flagship product offers...",
  "timestamp": "2025-11-27T..."
}

Gradio UI Features

Knowledge Search Tab

  1. Search query input - Natural language or keyword search
  2. Search type selector - Filter by document type
  3. Search button - Trigger semantic search
  4. Results display - Ranked results with scores

Usage:

  • Enter query: "How to deploy this?"
  • Select type: "Documentation"
  • Results show matching docs with relevance scores

Implementation Details

MCP Server Integration

Initialization:

class EcoMCPServer:
    def __init__(self):
        # ... existing code ...
        self.kb = None
        self._init_knowledge_base()
    
    def _init_knowledge_base(self):
        """Initialize LlamaIndex knowledge base"""
        if LLAMAINDEX_AVAILABLE:
            self.kb = EcoMCPKnowledgeBase()
            self.kb.initialize("./docs")

Tool Handlers:

async def call_tool(self, name: str, arguments: Dict) -> Any:
    if name == "knowledge_search":
        return await self._knowledge_search(arguments)
    elif name == "product_query":
        return await self._product_query(arguments)

Search Implementation:

async def _knowledge_search(self, args: Dict) -> Dict:
    if search_type == "products":
        results = self.kb.search_products(query, top_k=top_k)
    elif search_type == "documentation":
        results = self.kb.search_documentation(query, top_k=top_k)
    else:
        results = self.kb.search(query, top_k=top_k)

Gradio UI Integration

Knowledge Base Initialization:

kb = None
if LLAMAINDEX_AVAILABLE:
    try:
        kb = EcoMCPKnowledgeBase()
        if os.path.exists("./docs"):
            kb.initialize("./docs")
    except Exception as e:
        print(f"Warning: {e}")
        kb = None

Search Tab Creation:

if kb and LLAMAINDEX_AVAILABLE:
    with gr.Tab("πŸ” Knowledge Search"):
        # Search UI components
        search_btn.click(
            fn=perform_search,
            inputs=[search_query, search_type],
            outputs=output_search
        )

Running the Integration

Prerequisites

pip install -r requirements.txt
export OPENAI_API_KEY=sk-...

Start MCP Server

python src/server/mcp_server.py

Start Gradio UI

python src/ui/app.py
# Opens at http://localhost:7860

Verify Integration

  1. Check MCP server logs for "Knowledge base initialized successfully"
  2. In Gradio UI, verify "Knowledge Search" tab appears
  3. Try a search query to test functionality

Integration Flow

User Input (Gradio UI)
    ↓
Gradio Handler (perform_search)
    ↓
EcoMCPKnowledgeBase.search()
    ↓
VectorSearchEngine.search()
    ↓
VectorStoreIndex.retrieve()
    ↓
Display Results (Gradio Markdown)

OR (via MCP)

Client β†’ MCP JSON-RPC
    ↓
EcoMCPServer.call_tool("knowledge_search")
    ↓
Server._knowledge_search()
    ↓
Knowledge Base Search
    ↓
Return Results (JSON)

Search Behavior

Semantic Search

  • Uses OpenAI embeddings (text-embedding-3-small)
  • Finds semantically similar content
  • Works with natural language queries
  • Returns similarity scores (0-1)

Search Types

  • All: Searches products and documentation
  • Products: Only product-related documents
  • Documentation: Only documentation files

Result Scoring

  • Score 0.95+ : Highly relevant
  • Score 0.80-0.95 : Very relevant
  • Score 0.70-0.80 : Relevant
  • Score < 0.70 : Loosely related

Data Sources

Indexed Documents

  1. Documentation (./docs/*.md)

    • Guides, tutorials, references
    • Implementation details
    • Deployment instructions
  2. Products (optional)

    • Product catalog data
    • Features and specifications
    • Pricing information

Adding More Data

Index new documents:

kb = EcoMCPKnowledgeBase()
kb.initialize("./docs")
kb.add_products(product_list)
kb.add_urls(["https://example.com/page"])

Save indexed data:

kb.save("./kb_backup")

Load from backup:

kb2 = EcoMCPKnowledgeBase()
kb2.load("./kb_backup")

Configuration

Server-Side (mcp_server.py)

# Knowledge base path
docs_path = "./docs"

# Automatic initialization on startup
self.kb = EcoMCPKnowledgeBase()
self.kb.initialize(docs_path)

Gradio UI (app.py)

# Knowledge base initialization
kb = EcoMCPKnowledgeBase()
kb.initialize("./docs")

# Search parameters
top_k = 5  # Number of results

Error Handling

KB Not Initialized

{
  "status": "error",
  "error": "Knowledge base not initialized"
}

Query Empty

{
  "status": "error",
  "error": "Query is required"
}

No Results Found

No results found for your query.

Performance

Search Speed

  • First search: 1-2 seconds (loading model)
  • Subsequent searches: 0.1-0.5 seconds
  • With Pinecone: < 100ms

Index Size

  • Small (100 docs): < 100 MB
  • Medium (1000 docs): < 500 MB
  • Large (10000 docs): < 5 GB

Optimization Tips

  1. Use similarity_top_k=3 for speed
  2. Use similarity_top_k=10 for quality
  3. Use Pinecone for production (millions of docs)
  4. Cache results when possible

Troubleshooting

Knowledge base not initializing

Check that ./docs directory exists and contains files

Search tab not appearing

Verify LlamaIndex is installed: pip install -r requirements.txt
Check for errors in server logs

Slow searches

Reduce top_k parameter
Use smaller embedding model (text-embedding-3-small)
Enable Pinecone backend for production

API errors

Verify OPENAI_API_KEY is set
Check OpenAI account has credits
Monitor API usage and rate limits

Testing the Integration

Test MCP Tool

# Test knowledge_search
tool_args = {
    "query": "product features",
    "search_type": "all",
    "top_k": 5
}
result = await server.call_tool("knowledge_search", tool_args)

# Test product_query
tool_args = {
    "question": "What is the main product?"
}
result = await server.call_tool("product_query", tool_args)

Test Gradio UI

  1. Navigate to http://localhost:7860
  2. Click "Knowledge Search" tab
  3. Enter test query: "documentation"
  4. Select search type: "Documentation"
  5. Click "Search"
  6. Verify results appear

Next Steps

  1. Index Product Data: Add your product catalog
  2. Deploy Server: Use Modal or Docker
  3. Customize Search: Adjust chunk size and embedding model
  4. Add Analytics: Track search queries and results
  5. Optimize Performance: Profile and benchmark

Reference