Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
LlamaIndex Framework Integration - Refined
Implementation refined based on official LlamaIndex framework documentation and best practices.
Key Framework Concepts Implemented
1. Ingestion Pipeline
Modern LlamaIndex Pattern: Processing documents through transformations before indexing
from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SimpleNodeParser
from llama_index.core.extractors import TitleExtractor, KeywordExtractor
# Pipeline automatically:
# - Parses documents into nodes
# - Extracts metadata (titles, keywords)
# - Handles deduplication
# - Manages state across runs
pipeline = IngestionPipeline(
transformations=[
SimpleNodeParser(chunk_size=1024, chunk_overlap=20),
TitleExtractor(nodes=5),
KeywordExtractor(keywords=10),
]
)
nodes = pipeline.run(documents=documents)
2. Storage Context
Modern LlamaIndex Pattern: Unified storage management
from llama_index.core import StorageContext, VectorStoreIndex
# Default (in-memory with local persistence)
storage_context = StorageContext.from_defaults()
# Pinecone backend
storage_context = StorageContext.from_defaults(
vector_store=pinecone_vector_store
)
# Create index with storage context
index = VectorStoreIndex(
nodes=nodes,
storage_context=storage_context,
show_progress=True
)
# Persist to disk
index.storage_context.persist(persist_dir="./kb_storage")
3. Query Engines
Modern LlamaIndex Pattern: End-to-end QA with response synthesis
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
# Create query engine with response synthesis
query_engine = index.as_query_engine(
similarity_top_k=5,
response_mode="compact" # Options: compact, tree_summarize, refine
)
response = query_engine.query("What is the main feature?")
# Returns: Response object with answer and source nodes
Response modes:
compact: Concise, single-pass synthesistree_summarize: Hierarchical summarizationrefine: Iterative refinement across results
4. Chat Engines
Modern LlamaIndex Pattern: Multi-turn conversational interface
# Create chat engine for conversation
chat_engine = index.as_chat_engine()
# Multi-turn conversation
response = chat_engine.chat("What's the main topic?")
response = chat_engine.chat("Tell me more about it")
# Maintains conversation history automatically
5. Global Settings
Modern LlamaIndex Pattern: Centralized configuration
from llama_index.core import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
# Configure globally
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.llm = OpenAI(model="gpt-5")
Settings.chunk_size = 1024
Settings.chunk_overlap = 20
# All components use these settings automatically
Architecture Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β EcoMCPKnowledgeBase β
β (High-level integration wrapper) β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββ β
β β DocumentLoader β β
β β - Load markdown, text, JSON, URLs β β
β β - Create product documents β β
β βββββββββββββββββββ¬ββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββ β
β β IngestionPipeline β β
β β - Node parsing β β
β β - Metadata extraction (title, keywords) β β
β β - Transformations β β
β βββββββββββββββββββ¬ββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββ β
β β VectorStoreIndex β β
β β (with StorageContext) β β
β β - In-memory or Pinecone backend β β
β β - Embeddings β β
β ββββββββββββββ¬βββββββββββββββββ¬ββββββββββββ β
β β β β
β βΌ βΌ β
β βββββββββββββββ ββββββββββββββββ β
β β QueryEngine β β ChatEngine β β
β β (QA mode) β β (Conversational) β
β βββββββββββββββ ββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββ
β VectorSearchEngine β
β (Advanced search) β
β - Product search β
β - Documentation search β
β - Semantic search β
β - Recommendations β
βββββββββββββββββββββββββββ
Usage Patterns
Pattern 1: Question-Answering
from src.core import EcoMCPKnowledgeBase
kb = EcoMCPKnowledgeBase()
kb.initialize("./docs")
# Query with automatic response synthesis
answer = kb.query("How do I deploy this?")
print(answer) # Returns full answer with context
Pattern 2: Conversational
kb = EcoMCPKnowledgeBase()
kb.initialize("./docs")
# Multi-turn conversation
messages = [
{"role": "user", "content": "What are the main features?"}
]
response = kb.chat(messages)
print(response)
# Continue conversation
messages.append({"role": "assistant", "content": response})
messages.append({"role": "user", "content": "Tell me more about feature X"})
response = kb.chat(messages)
Pattern 3: Semantic Search
kb = EcoMCPKnowledgeBase()
kb.initialize("./docs")
# Get search results with scores
results = kb.search("setup guide", top_k=5)
for result in results:
print(f"Score: {result.score:.2f}")
print(f"Content: {result.content[:200]}")
Pattern 4: Product Recommendations
kb = EcoMCPKnowledgeBase()
products = [...]
kb.add_products(products)
# Get recommendations with confidence scores
recs = kb.get_recommendations("laptop under $1000", limit=5)
for rec in recs:
print(f"Confidence: {rec['confidence']:.2f}")
print(f"Product: {rec['content']}")
Configuration Best Practices
from src.core import IndexConfig, EcoMCPKnowledgeBase
# Development
dev_config = IndexConfig(
embedding_model="text-embedding-3-small",
llm_model="gpt-3.5-turbo",
chunk_size=512,
use_pinecone=False,
)
# Production
prod_config = IndexConfig(
embedding_model="text-embedding-3-large",
llm_model="gpt-5",
chunk_size=1024,
use_pinecone=True,
pinecone_index_name="ecomcp-prod",
)
kb = EcoMCPKnowledgeBase(config=prod_config)
Response Synthesis Modes
Compact (Recommended for speed)
- Single LLM call
- Combines all retrieved context
- Returns concise answer
- Best for: Direct factual questions
query_engine = index.as_query_engine(response_mode="compact")
Tree Summarize
- Hierarchical summarization
- Better for complex topics
- Multiple LLM calls
- Best for: Complex multi-step answers
query_engine = index.as_query_engine(response_mode="tree_summarize")
Refine
- Iteratively refines answer
- Processes results one by one
- Best for: Detailed, nuanced answers
- Most token usage
query_engine = index.as_query_engine(response_mode="refine")
Integration with Server
MCP Server Handler
from src.core import initialize_knowledge_base, get_knowledge_base
# Startup
@app.on_event("startup")
def startup():
initialize_knowledge_base("./docs")
# Query handler
@mcp.tool()
def search(query: str) -> str:
kb = get_knowledge_base()
results = kb.search(query, top_k=5)
return "\n".join([r.content for r in results])
# Chat handler
@mcp.tool()
def chat(messages: List[Dict[str, str]]) -> str:
kb = get_knowledge_base()
return kb.chat(messages)
API Endpoint
from fastapi import FastAPI
from src.core import initialize_knowledge_base, get_knowledge_base
app = FastAPI()
@app.on_event("startup")
async def startup():
initialize_knowledge_base("./docs")
@app.post("/search")
async def search(query: str, top_k: int = 5):
kb = get_knowledge_base()
results = kb.search(query, top_k=top_k)
return [r.to_dict() for r in results]
@app.post("/query")
async def query(question: str):
kb = get_knowledge_base()
answer = kb.query(question)
return {"answer": answer}
@app.post("/chat")
async def chat(messages: List[Dict[str, str]]):
kb = get_knowledge_base()
response = kb.chat(messages)
return {"response": response}
Metadata Extraction
The ingestion pipeline automatically extracts:
- Titles: Section titles and document headers
- Keywords: Key terms and concepts
# Metadata available in search results
results = kb.search("topic")
for result in results:
print(result.metadata)
# {
# "source": "docs/guide.md",
# "title": "Getting Started Guide",
# "keywords": ["setup", "installation", "requirements"],
# "type": "markdown"
# }
Performance Tuning
For Speed
config = IndexConfig(
embedding_model="text-embedding-3-small",
llm_model="gpt-3.5-turbo",
chunk_size=1024,
similarity_top_k=3, # Fewer results
)
kb = EcoMCPKnowledgeBase(config=config)
query_engine = kb.kb.index.as_query_engine(response_mode="compact")
For Quality
config = IndexConfig(
embedding_model="text-embedding-3-large",
llm_model="gpt-5",
chunk_size=512, # Smaller chunks
similarity_top_k=10, # More results
)
kb = EcoMCPKnowledgeBase(config=config)
query_engine = kb.kb.index.as_query_engine(response_mode="refine")
For Production Scalability
config = IndexConfig(
embedding_model="text-embedding-3-large",
llm_model="gpt-5",
chunk_size=1024,
use_pinecone=True,
pinecone_index_name="ecomcp-prod",
)
kb = EcoMCPKnowledgeBase(config=config)
# Pinecone automatically scales to millions of documents
Error Handling
try:
kb = EcoMCPKnowledgeBase()
kb.initialize("./docs")
except FileNotFoundError:
logger.error("Documentation directory not found")
except Exception as e:
logger.error(f"Failed to initialize knowledge base: {e}")
try:
response = kb.query("question")
except Exception as e:
logger.error(f"Query failed: {e}")
return "Unable to process query"
References
Updates from Refining
β Added IngestionPipeline for metadata extraction β Enhanced StorageContext management β Added ChatEngine for multi-turn conversation β Improved Settings configuration β Better response synthesis options β Enhanced error handling β More detailed documentation