Spaces:
Sleeping
Sleeping
| # LlamaIndex Integration Guide | |
| Complete guide to the knowledge base indexing and retrieval system powered by LlamaIndex. | |
| ## Overview | |
| The LlamaIndex integration provides: | |
| - **Knowledge Base Indexing**: Foundation for indexing documents and products | |
| - **Vector Similarity Search**: Semantic search across indexed content | |
| - **Document Retrieval**: Easy retrieval of relevant documents | |
| ## Components | |
| ### 1. Core Modules | |
| #### `KnowledgeBase` (knowledge_base.py) | |
| Low-level interface for index management. | |
| ```python | |
| from src.core import KnowledgeBase, IndexConfig | |
| # Initialize with custom config | |
| config = IndexConfig( | |
| embedding_model="text-embedding-3-small", | |
| chunk_size=1024, | |
| use_pinecone=False, | |
| ) | |
| kb = KnowledgeBase(config) | |
| # Index documents | |
| kb.index_documents("./docs") | |
| # Search | |
| results = kb.search("your query", top_k=5) | |
| # Query with QA | |
| response = kb.query("What is the main feature?") | |
| ``` | |
| #### `DocumentLoader` (document_loader.py) | |
| Load documents from various sources. | |
| ```python | |
| from src.core import DocumentLoader | |
| # Load from directory | |
| docs = DocumentLoader.load_markdown_documents("./docs") | |
| docs += DocumentLoader.load_text_documents("./docs") | |
| # Load products | |
| products = [ | |
| { | |
| "id": "prod_001", | |
| "name": "Product Name", | |
| "description": "Description", | |
| "price": "$99", | |
| "category": "Category", | |
| "features": ["Feature 1", "Feature 2"], | |
| } | |
| ] | |
| product_docs = DocumentLoader.create_product_documents(products) | |
| # Load from URLs | |
| urls = ["https://example.com/page1", "https://example.com/page2"] | |
| url_docs = DocumentLoader.load_documents_from_urls(urls) | |
| # Load all at once | |
| all_docs = DocumentLoader.load_all_documents( | |
| docs_dir="./docs", | |
| products=products, | |
| urls=urls, | |
| ) | |
| ``` | |
| #### `VectorSearchEngine` (vector_search.py) | |
| High-level search interface with advanced features. | |
| ```python | |
| from src.core import VectorSearchEngine | |
| search_engine = VectorSearchEngine(kb) | |
| # Basic search | |
| results = search_engine.search("query", top_k=5) | |
| # Product search only | |
| products = search_engine.search_products("laptop", top_k=10) | |
| # Documentation search only | |
| docs = search_engine.search_documentation("how to setup", top_k=5) | |
| # Semantic search with threshold | |
| results = search_engine.semantic_search( | |
| "installation guide", | |
| top_k=5, | |
| similarity_threshold=0.5, | |
| ) | |
| # Hierarchical search across types | |
| results = search_engine.hierarchical_search("e-commerce") | |
| # Returns: {"products": [...], "documentation": [...]} | |
| # Weighted combined search | |
| results = search_engine.combined_search( | |
| "shopping platform", | |
| weights={"product": 0.6, "documentation": 0.4}, | |
| ) | |
| # Contextual search | |
| results = search_engine.contextual_search( | |
| "laptop", | |
| context={"category": "electronics", "price_range": "$1000-2000"}, | |
| top_k=5, | |
| ) | |
| # Get recommendations | |
| recs = search_engine.get_recommendations("laptop under $1000", limit=5) | |
| ``` | |
| ### 2. High-Level Integration | |
| #### `EcoMCPKnowledgeBase` (llama_integration.py) | |
| Complete integration for EcoMCP application. | |
| ```python | |
| from src.core import EcoMCPKnowledgeBase, initialize_knowledge_base | |
| # Initialize | |
| kb = EcoMCPKnowledgeBase() | |
| # Auto-initialize with documents | |
| kb.initialize("./docs") | |
| # Add products | |
| kb.add_products(products) | |
| # Add URLs | |
| kb.add_urls(["https://example.com"]) | |
| # Search | |
| results = kb.search("query", top_k=5) | |
| # Search specific types | |
| products = kb.search_products("laptop", top_k=10) | |
| docs = kb.search_documentation("deploy", top_k=5) | |
| # Get recommendations | |
| recs = kb.get_recommendations("gaming laptop", limit=5) | |
| # Natural language query | |
| answer = kb.query("What is the platform about?") | |
| # Save and load | |
| kb.save("./kb_index") | |
| kb.load("./kb_index") | |
| # Get stats | |
| stats = kb.get_stats() | |
| ``` | |
| ### 3. Global Singleton Pattern | |
| ```python | |
| from src.core import initialize_knowledge_base, get_knowledge_base | |
| # Initialize globally | |
| kb = initialize_knowledge_base("./docs") | |
| # Access from anywhere | |
| kb = get_knowledge_base() | |
| results = kb.search("query") | |
| ``` | |
| ## Configuration | |
| ### IndexConfig Options | |
| ```python | |
| config = IndexConfig( | |
| # Embedding model (OpenAI) | |
| embedding_model="text-embedding-3-small", # or "text-embedding-3-large" | |
| # Chunking settings | |
| chunk_size=1024, # Size of text chunks | |
| chunk_overlap=20, # Overlap between chunks | |
| # Vector store backend | |
| use_pinecone=False, # True to use Pinecone | |
| pinecone_index_name="ecomcp-knowledge", | |
| pinecone_dimension=1536, | |
| ) | |
| ``` | |
| ## Installation | |
| Add to requirements.txt: | |
| ``` | |
| llama-index>=0.9.0 | |
| llama-index-embeddings-openai>=0.1.0 | |
| llama-index-vector-stores-pinecone>=0.1.0 | |
| ``` | |
| Environment variables: | |
| ```bash | |
| OPENAI_API_KEY=sk-... | |
| PINECONE_API_KEY=... # Optional, only if using Pinecone | |
| ``` | |
| ## Usage Examples | |
| ### Example 1: Basic Document Indexing | |
| ```python | |
| from src.core import EcoMCPKnowledgeBase | |
| kb = EcoMCPKnowledgeBase() | |
| kb.initialize("./docs") | |
| # Search | |
| results = kb.search("deployment guide", top_k=3) | |
| for result in results: | |
| print(f"Score: {result.score:.2f}") | |
| print(f"Content: {result.content[:200]}") | |
| ``` | |
| ### Example 2: Product Recommendation | |
| ```python | |
| from src.core import EcoMCPKnowledgeBase | |
| kb = EcoMCPKnowledgeBase() | |
| products = [ | |
| { | |
| "id": "1", | |
| "name": "Wireless Headphones", | |
| "description": "Noise-canceling", | |
| "price": "$299", | |
| "category": "Electronics", | |
| "features": ["ANC", "30h Battery"], | |
| "tags": ["audio", "wireless"] | |
| }, | |
| # ... more products | |
| ] | |
| kb.add_products(products) | |
| # Get recommendations | |
| recs = kb.get_recommendations("best headphones for music", limit=3) | |
| for rec in recs: | |
| print(f"Rank: {rec['rank']}") | |
| print(f"Confidence: {rec['confidence']:.2f}") | |
| ``` | |
| ### Example 3: Semantic Search with Filtering | |
| ```python | |
| from src.core import VectorSearchEngine | |
| search = VectorSearchEngine(kb) | |
| # Search with context | |
| results = search.contextual_search( | |
| "laptop computer", | |
| context={ | |
| "category": "computers", | |
| "price_range": "$500-1000", | |
| "processor": "Intel" | |
| }, | |
| top_k=5 | |
| ) | |
| ``` | |
| ### Example 4: Knowledge Base Persistence | |
| ```python | |
| from src.core import EcoMCPKnowledgeBase | |
| # Create and save | |
| kb1 = EcoMCPKnowledgeBase() | |
| kb1.initialize("./docs") | |
| kb1.save("./kb_backup") | |
| # Load later | |
| kb2 = EcoMCPKnowledgeBase() | |
| kb2.load("./kb_backup") | |
| # Use immediately | |
| results = kb2.search("something") | |
| ``` | |
| ## Integration with Server | |
| ### In Your Server/MCP Implementation | |
| ```python | |
| from src.core import initialize_knowledge_base, get_knowledge_base | |
| # During startup | |
| def initialize_app(): | |
| kb = initialize_knowledge_base("./docs") | |
| kb.add_products(get_all_products()) # Your product source | |
| # In your handlers | |
| def search_handler(query: str): | |
| kb = get_knowledge_base() | |
| results = kb.search(query) | |
| return results | |
| def recommend_handler(user_query: str): | |
| kb = get_knowledge_base() | |
| recommendations = kb.get_recommendations(user_query) | |
| return recommendations | |
| ``` | |
| ## Advanced Features | |
| ### Custom Metadata | |
| ```python | |
| from llama_index.core.schema import Document | |
| doc = Document( | |
| text="Content here", | |
| metadata={ | |
| "source": "custom_source", | |
| "author": "John Doe", | |
| "date": "2024-01-01", | |
| "category": "tutorial", | |
| } | |
| ) | |
| kb.kb.add_documents([doc]) | |
| ``` | |
| ### Pinecone Integration | |
| ```python | |
| config = IndexConfig(use_pinecone=True) | |
| kb = EcoMCPKnowledgeBase(config=config) | |
| # Automatically creates/uses Pinecone index | |
| kb.initialize("./docs") | |
| ``` | |
| ### Custom Query Engine | |
| ```python | |
| # Low-level query with custom settings | |
| query_engine = kb.kb.index.as_query_engine( | |
| similarity_top_k=10, | |
| response_mode="compact" # or "tree_summarize", "refine" | |
| ) | |
| response = query_engine.query("Your question") | |
| ``` | |
| ## Performance Tips | |
| 1. **Chunk Size**: Larger chunks (2048) for long documents, smaller (512) for varied content | |
| 2. **Vector Store**: Use Pinecone for production deployments | |
| 3. **Batch Processing**: Index documents in batches for large datasets | |
| 4. **Caching**: Load from disk instead of re-indexing frequently | |
| 5. **Top-K**: Start with top_k=5, adjust based on relevance | |
| ## Troubleshooting | |
| ### No OpenAI API Key | |
| ``` | |
| Error: OPENAI_API_KEY not set | |
| Solution: Set export OPENAI_API_KEY=sk-... in environment | |
| ``` | |
| ### Pinecone Connection Failed | |
| ``` | |
| Error: Pinecone connection failed | |
| Solution: Check PINECONE_API_KEY and network connectivity | |
| Falls back to in-memory indexing automatically | |
| ``` | |
| ### Out of Memory with Large Datasets | |
| ``` | |
| Solution: | |
| - Reduce chunk_size in IndexConfig | |
| - Process documents in batches | |
| - Use Pinecone backend (scales to millions of documents) | |
| ``` | |
| ## Testing | |
| Run tests: | |
| ```bash | |
| pytest tests/test_llama_integration.py -v | |
| ``` | |
| ## API Reference | |
| See `src/core/` for detailed API documentation in docstrings. | |
| ## Files Structure | |
| ``` | |
| src/core/ | |
| βββ __init__.py # Package exports | |
| βββ knowledge_base.py # Core KnowledgeBase class | |
| βββ document_loader.py # Document loading utilities | |
| βββ vector_search.py # VectorSearchEngine with advanced features | |
| βββ llama_integration.py # EcoMCP integration wrapper | |
| βββ examples.py # Usage examples | |
| ``` | |
| ## Related Documentation | |
| - OpenAI API: https://platform.openai.com/docs | |
| - LlamaIndex: https://docs.llamaindex.ai | |
| - Pinecone: https://docs.pinecone.io | |