Spaces:

MCP-1st-Birthday
/

ecomcp

Sleeping

ecomcp / docs /INTEGRATION_GUIDE.md

feat: Implement LlamaIndex integration with new core modules for knowledge base, document loading, vector search, and comprehensive documentation and tests.

108d8af 3 months ago

preview code

raw

history blame contribute delete

9.08 kB

	# LlamaIndex Integration Guide - MCP Server & Gradio UI

	Complete integration of LlamaIndex knowledge base into EcoMCP MCP server and Gradio UI.

	## What's Integrated

	### 1. MCP Server (src/server/mcp_server.py)
	- Knowledge base initialization on server startup
	- New tools: `knowledge_search`, `product_query`
	- Semantic search across indexed documents
	- Natural language Q&A with query engine
	- Fallback support if LlamaIndex unavailable

	### 2. Gradio UI (src/ui/app.py)
	- Knowledge Search tab for semantic search
	- Search type options: All, Products, Documentation
	- Result display with similarity scores
	- Dynamic tab (only appears if KB initialized)
	- Consistent styling with existing UI

	### 3. Core Knowledge Base (src/core/)
	- Pre-indexed documentation (./docs)
	- Product data ready for indexing
	- Metadata extraction (titles, keywords)
	- Multiple search strategies

	## New MCP Tools

	### knowledge_search
	Semantic search across knowledge base.

	Parameters:
	- `query` (string, required): Search query
	- `search_type` (string): "all", "products", or "documentation"
	- `top_k` (integer): Number of results (1-20, default: 5)

	Example:
	```json
	{
	"name": "knowledge_search",
	"arguments": {
	"query": "wireless headphones features",
	"search_type": "products",
	"top_k": 5
	}
	}
	```

	Response:
	```json
	{
	"status": "success",
	"query": "wireless headphones features",
	"search_type": "products",
	"result_count": 3,
	"results": [
	{
	"rank": 1,
	"score": 0.95,
	"content": "Premium wireless headphones with noise cancellation...",
	"source": "products.json"
	},
	...
	],
	"timestamp": "2025-11-27T..."
	}
	```

	### product_query
	Natural language Q&A with automatic context retrieval.

	Parameters:
	- `question` (string, required): Natural language question

	Example:
	```json
	{
	"name": "product_query",
	"arguments": {
	"question": "What are the main features of our flagship product?"
	}
	}
	```

	Response:
	```json
	{
	"status": "success",
	"question": "What are the main features of our flagship product?",
	"answer": "Based on the documentation, the flagship product offers...",
	"timestamp": "2025-11-27T..."
	}
	```

	## Gradio UI Features

	### Knowledge Search Tab
	1. Search query input - Natural language or keyword search
	2. Search type selector - Filter by document type
	3. Search button - Trigger semantic search
	4. Results display - Ranked results with scores

	Usage:
	- Enter query: "How to deploy this?"
	- Select type: "Documentation"
	- Results show matching docs with relevance scores

	## Implementation Details

	### MCP Server Integration

	Initialization:
	```python
	class EcoMCPServer:
	def __init__(self):
	# ... existing code ...
	self.kb = None
	self._init_knowledge_base()

	def _init_knowledge_base(self):
	"""Initialize LlamaIndex knowledge base"""
	if LLAMAINDEX_AVAILABLE:
	self.kb = EcoMCPKnowledgeBase()
	self.kb.initialize("./docs")
	```

	Tool Handlers:
	```python
	async def call_tool(self, name: str, arguments: Dict) -> Any:
	if name == "knowledge_search":
	return await self._knowledge_search(arguments)
	elif name == "product_query":
	return await self._product_query(arguments)
	```

	Search Implementation:
	```python
	async def _knowledge_search(self, args: Dict) -> Dict:
	if search_type == "products":
	results = self.kb.search_products(query, top_k=top_k)
	elif search_type == "documentation":
	results = self.kb.search_documentation(query, top_k=top_k)
	else:
	results = self.kb.search(query, top_k=top_k)
	```

	### Gradio UI Integration

	Knowledge Base Initialization:
	```python
	kb = None
	if LLAMAINDEX_AVAILABLE:
	try:
	kb = EcoMCPKnowledgeBase()
	if os.path.exists("./docs"):
	kb.initialize("./docs")
	except Exception as e:
	print(f"Warning: {e}")
	kb = None
	```

	Search Tab Creation:
	```python
	if kb and LLAMAINDEX_AVAILABLE:
	with gr.Tab("🔍 Knowledge Search"):
	# Search UI components
	search_btn.click(
	fn=perform_search,
	inputs=[search_query, search_type],
	outputs=output_search
	)
	```

	## Running the Integration

	### Prerequisites
	```bash
	pip install -r requirements.txt
	export OPENAI_API_KEY=sk-...
	```

	### Start MCP Server
	```bash
	python src/server/mcp_server.py
	```

	### Start Gradio UI
	```bash
	python src/ui/app.py
	# Opens at http://localhost:7860
	```

	### Verify Integration
	1. Check MCP server logs for "Knowledge base initialized successfully"
	2. In Gradio UI, verify "Knowledge Search" tab appears
	3. Try a search query to test functionality

	## Integration Flow

	```
	User Input (Gradio UI)
	↓
	Gradio Handler (perform_search)
	↓
	EcoMCPKnowledgeBase.search()
	↓
	VectorSearchEngine.search()
	↓
	VectorStoreIndex.retrieve()
	↓
	Display Results (Gradio Markdown)

	OR (via MCP)

	Client → MCP JSON-RPC
	↓
	EcoMCPServer.call_tool("knowledge_search")
	↓
	Server._knowledge_search()
	↓
	Knowledge Base Search
	↓
	Return Results (JSON)
	```

	## Search Behavior

	### Semantic Search
	- Uses OpenAI embeddings (text-embedding-3-small)
	- Finds semantically similar content
	- Works with natural language queries
	- Returns similarity scores (0-1)

	### Search Types
	- All: Searches products and documentation
	- Products: Only product-related documents
	- Documentation: Only documentation files

	### Result Scoring
	- Score 0.95+ : Highly relevant
	- Score 0.80-0.95 : Very relevant
	- Score 0.70-0.80 : Relevant
	- Score < 0.70 : Loosely related

	## Data Sources

	### Indexed Documents
	1. Documentation (./docs/*.md)
	- Guides, tutorials, references
	- Implementation details
	- Deployment instructions

	2. Products (optional)
	- Product catalog data
	- Features and specifications
	- Pricing information

	### Adding More Data

	Index new documents:
	```python
	kb = EcoMCPKnowledgeBase()
	kb.initialize("./docs")
	kb.add_products(product_list)
	kb.add_urls(["https://example.com/page"])
	```

	Save indexed data:
	```python
	kb.save("./kb_backup")
	```

	Load from backup:
	```python
	kb2 = EcoMCPKnowledgeBase()
	kb2.load("./kb_backup")
	```

	## Configuration

	### Server-Side (mcp_server.py)
	```python
	# Knowledge base path
	docs_path = "./docs"

	# Automatic initialization on startup
	self.kb = EcoMCPKnowledgeBase()
	self.kb.initialize(docs_path)
	```

	### Gradio UI (app.py)
	```python
	# Knowledge base initialization
	kb = EcoMCPKnowledgeBase()
	kb.initialize("./docs")

	# Search parameters
	top_k = 5 # Number of results
	```

	## Error Handling

	### KB Not Initialized
	```json
	{
	"status": "error",
	"error": "Knowledge base not initialized"
	}
	```

	### Query Empty
	```json
	{
	"status": "error",
	"error": "Query is required"
	}
	```

	### No Results Found
	```
	No results found for your query.
	```

	## Performance

	### Search Speed
	- First search: 1-2 seconds (loading model)
	- Subsequent searches: 0.1-0.5 seconds
	- With Pinecone: < 100ms

	### Index Size
	- Small (100 docs): < 100 MB
	- Medium (1000 docs): < 500 MB
	- Large (10000 docs): < 5 GB

	### Optimization Tips
	1. Use `similarity_top_k=3` for speed
	2. Use `similarity_top_k=10` for quality
	3. Use Pinecone for production (millions of docs)
	4. Cache results when possible

	## Troubleshooting

	### Knowledge base not initializing
	```
	Check that ./docs directory exists and contains files
	```

	### Search tab not appearing
	```
	Verify LlamaIndex is installed: pip install -r requirements.txt
	Check for errors in server logs
	```

	### Slow searches
	```
	Reduce top_k parameter
	Use smaller embedding model (text-embedding-3-small)
	Enable Pinecone backend for production
	```

	### API errors
	```
	Verify OPENAI_API_KEY is set
	Check OpenAI account has credits
	Monitor API usage and rate limits
	```

	## Testing the Integration

	### Test MCP Tool
	```python
	# Test knowledge_search
	tool_args = {
	"query": "product features",
	"search_type": "all",
	"top_k": 5
	}
	result = await server.call_tool("knowledge_search", tool_args)

	# Test product_query
	tool_args = {
	"question": "What is the main product?"
	}
	result = await server.call_tool("product_query", tool_args)
	```

	### Test Gradio UI
	1. Navigate to http://localhost:7860
	2. Click "Knowledge Search" tab
	3. Enter test query: "documentation"
	4. Select search type: "Documentation"
	5. Click "Search"
	6. Verify results appear

	## Next Steps

	1. Index Product Data: Add your product catalog
	2. Deploy Server: Use Modal or Docker
	3. Customize Search: Adjust chunk size and embedding model
	4. Add Analytics: Track search queries and results
	5. Optimize Performance: Profile and benchmark

	## Reference

	- [MCP Server Implementation](./src/server/mcp_server.py)
	- [Gradio UI Implementation](./src/ui/app.py)
	- [Knowledge Base Module](./src/core/knowledge_base.py)
	- [LlamaIndex Framework Guide](./LLAMA_FRAMEWORK_REFINED.md)
	- [Quick Integration Guide](./QUICK_INTEGRATION.md)