Spaces:

MCP-1st-Birthday
/

ecomcp

Sleeping

ecomcp / docs /LLAMA_FRAMEWORK_REFINED.md

feat: Implement LlamaIndex integration with new core modules for knowledge base, document loading, vector search, and comprehensive documentation and tests.

108d8af 5 months ago

preview code

raw

history blame contribute delete

12.8 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

LlamaIndex Framework Integration - Refined

Implementation refined based on official LlamaIndex framework documentation and best practices.

Key Framework Concepts Implemented

1. Ingestion Pipeline

Modern LlamaIndex Pattern: Processing documents through transformations before indexing

from llama_index.core.ingestion import IngestionPipeline
from llama_index.core.node_parser import SimpleNodeParser
from llama_index.core.extractors import TitleExtractor, KeywordExtractor

# Pipeline automatically:
# - Parses documents into nodes
# - Extracts metadata (titles, keywords)
# - Handles deduplication
# - Manages state across runs

pipeline = IngestionPipeline(
    transformations=[
        SimpleNodeParser(chunk_size=1024, chunk_overlap=20),
        TitleExtractor(nodes=5),
        KeywordExtractor(keywords=10),
    ]
)

nodes = pipeline.run(documents=documents)

2. Storage Context

Modern LlamaIndex Pattern: Unified storage management

from llama_index.core import StorageContext, VectorStoreIndex

# Default (in-memory with local persistence)
storage_context = StorageContext.from_defaults()

# Pinecone backend
storage_context = StorageContext.from_defaults(
    vector_store=pinecone_vector_store
)

# Create index with storage context
index = VectorStoreIndex(
    nodes=nodes,
    storage_context=storage_context,
    show_progress=True
)

# Persist to disk
index.storage_context.persist(persist_dir="./kb_storage")

3. Query Engines

Modern LlamaIndex Pattern: End-to-end QA with response synthesis

from llama_index.core import VectorStoreIndex

index = VectorStoreIndex.from_documents(documents)

# Create query engine with response synthesis
query_engine = index.as_query_engine(
    similarity_top_k=5,
    response_mode="compact"  # Options: compact, tree_summarize, refine
)

response = query_engine.query("What is the main feature?")
# Returns: Response object with answer and source nodes

Response modes:

compact: Concise, single-pass synthesis
tree_summarize: Hierarchical summarization
refine: Iterative refinement across results

4. Chat Engines

Modern LlamaIndex Pattern: Multi-turn conversational interface

# Create chat engine for conversation
chat_engine = index.as_chat_engine()

# Multi-turn conversation
response = chat_engine.chat("What's the main topic?")
response = chat_engine.chat("Tell me more about it")
# Maintains conversation history automatically

5. Global Settings

Modern LlamaIndex Pattern: Centralized configuration

from llama_index.core import Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

# Configure globally
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Settings.llm = OpenAI(model="gpt-5")
Settings.chunk_size = 1024
Settings.chunk_overlap = 20

# All components use these settings automatically

Architecture Overview

┌─────────────────────────────────────────────────┐
│          EcoMCPKnowledgeBase                    │
│  (High-level integration wrapper)               │
├─────────────────────────────────────────────────┤
│                                                 │
│  ┌─────────────────────────────────────────┐   │
│  │ DocumentLoader                          │   │
│  │ - Load markdown, text, JSON, URLs       │   │
│  │ - Create product documents              │   │
│  └─────────────────┬───────────────────────┘   │
│                    │                           │
│                    ▼                           │
│  ┌─────────────────────────────────────────┐   │
│  │ IngestionPipeline                       │   │
│  │ - Node parsing                          │   │
│  │ - Metadata extraction (title, keywords) │   │
│  │ - Transformations                       │   │
│  └─────────────────┬───────────────────────┘   │
│                    │                           │
│                    ▼                           │
│  ┌─────────────────────────────────────────┐   │
│  │ VectorStoreIndex                        │   │
│  │ (with StorageContext)                   │   │
│  │ - In-memory or Pinecone backend         │   │
│  │ - Embeddings                            │   │
│  └────────────┬────────────────┬───────────┘   │
│               │                │               │
│               ▼                ▼               │
│        ┌─────────────┐  ┌──────────────┐     │
│        │ QueryEngine │  │ ChatEngine   │     │
│        │ (QA mode)   │  │ (Conversational)   │
│        └─────────────┘  └──────────────┘     │
│                                                 │
└─────────────────────────────────────────────────┘
                    │
                    ▼
         ┌─────────────────────────┐
         │ VectorSearchEngine      │
         │ (Advanced search)       │
         │ - Product search        │
         │ - Documentation search  │
         │ - Semantic search       │
         │ - Recommendations       │
         └─────────────────────────┘

Usage Patterns

Pattern 1: Question-Answering

from src.core import EcoMCPKnowledgeBase

kb = EcoMCPKnowledgeBase()
kb.initialize("./docs")

# Query with automatic response synthesis
answer = kb.query("How do I deploy this?")
print(answer)  # Returns full answer with context

Pattern 2: Conversational

kb = EcoMCPKnowledgeBase()
kb.initialize("./docs")

# Multi-turn conversation
messages = [
    {"role": "user", "content": "What are the main features?"}
]
response = kb.chat(messages)
print(response)

# Continue conversation
messages.append({"role": "assistant", "content": response})
messages.append({"role": "user", "content": "Tell me more about feature X"})
response = kb.chat(messages)

Pattern 3: Semantic Search

kb = EcoMCPKnowledgeBase()
kb.initialize("./docs")

# Get search results with scores
results = kb.search("setup guide", top_k=5)
for result in results:
    print(f"Score: {result.score:.2f}")
    print(f"Content: {result.content[:200]}")

Pattern 4: Product Recommendations

kb = EcoMCPKnowledgeBase()
products = [...]
kb.add_products(products)

# Get recommendations with confidence scores
recs = kb.get_recommendations("laptop under $1000", limit=5)
for rec in recs:
    print(f"Confidence: {rec['confidence']:.2f}")
    print(f"Product: {rec['content']}")

Configuration Best Practices

from src.core import IndexConfig, EcoMCPKnowledgeBase

# Development
dev_config = IndexConfig(
    embedding_model="text-embedding-3-small",
    llm_model="gpt-3.5-turbo",
    chunk_size=512,
    use_pinecone=False,
)

# Production
prod_config = IndexConfig(
    embedding_model="text-embedding-3-large",
    llm_model="gpt-5",
    chunk_size=1024,
    use_pinecone=True,
    pinecone_index_name="ecomcp-prod",
)

kb = EcoMCPKnowledgeBase(config=prod_config)

Response Synthesis Modes

Compact (Recommended for speed)

Single LLM call
Combines all retrieved context
Returns concise answer
Best for: Direct factual questions

query_engine = index.as_query_engine(response_mode="compact")

Tree Summarize

Hierarchical summarization
Better for complex topics
Multiple LLM calls
Best for: Complex multi-step answers

query_engine = index.as_query_engine(response_mode="tree_summarize")

Refine

Iteratively refines answer
Processes results one by one
Best for: Detailed, nuanced answers
Most token usage

query_engine = index.as_query_engine(response_mode="refine")

Integration with Server

MCP Server Handler

from src.core import initialize_knowledge_base, get_knowledge_base

# Startup
@app.on_event("startup")
def startup():
    initialize_knowledge_base("./docs")

# Query handler
@mcp.tool()
def search(query: str) -> str:
    kb = get_knowledge_base()
    results = kb.search(query, top_k=5)
    return "\n".join([r.content for r in results])

# Chat handler
@mcp.tool()
def chat(messages: List[Dict[str, str]]) -> str:
    kb = get_knowledge_base()
    return kb.chat(messages)

API Endpoint

from fastapi import FastAPI
from src.core import initialize_knowledge_base, get_knowledge_base

app = FastAPI()

@app.on_event("startup")
async def startup():
    initialize_knowledge_base("./docs")

@app.post("/search")
async def search(query: str, top_k: int = 5):
    kb = get_knowledge_base()
    results = kb.search(query, top_k=top_k)
    return [r.to_dict() for r in results]

@app.post("/query")
async def query(question: str):
    kb = get_knowledge_base()
    answer = kb.query(question)
    return {"answer": answer}

@app.post("/chat")
async def chat(messages: List[Dict[str, str]]):
    kb = get_knowledge_base()
    response = kb.chat(messages)
    return {"response": response}

Metadata Extraction

The ingestion pipeline automatically extracts:

Titles: Section titles and document headers
Keywords: Key terms and concepts

# Metadata available in search results
results = kb.search("topic")
for result in results:
    print(result.metadata)
    # {
    #   "source": "docs/guide.md",
    #   "title": "Getting Started Guide",
    #   "keywords": ["setup", "installation", "requirements"],
    #   "type": "markdown"
    # }

Performance Tuning

For Speed

config = IndexConfig(
    embedding_model="text-embedding-3-small",
    llm_model="gpt-3.5-turbo",
    chunk_size=1024,
    similarity_top_k=3,  # Fewer results
)
kb = EcoMCPKnowledgeBase(config=config)
query_engine = kb.kb.index.as_query_engine(response_mode="compact")

For Quality

config = IndexConfig(
    embedding_model="text-embedding-3-large",
    llm_model="gpt-5",
    chunk_size=512,  # Smaller chunks
    similarity_top_k=10,  # More results
)
kb = EcoMCPKnowledgeBase(config=config)
query_engine = kb.kb.index.as_query_engine(response_mode="refine")

For Production Scalability

config = IndexConfig(
    embedding_model="text-embedding-3-large",
    llm_model="gpt-5",
    chunk_size=1024,
    use_pinecone=True,
    pinecone_index_name="ecomcp-prod",
)
kb = EcoMCPKnowledgeBase(config=config)
# Pinecone automatically scales to millions of documents

Error Handling

try:
    kb = EcoMCPKnowledgeBase()
    kb.initialize("./docs")
except FileNotFoundError:
    logger.error("Documentation directory not found")
except Exception as e:
    logger.error(f"Failed to initialize knowledge base: {e}")

try:
    response = kb.query("question")
except Exception as e:
    logger.error(f"Query failed: {e}")
    return "Unable to process query"

References

Updates from Refining

✅ Added IngestionPipeline for metadata extraction ✅ Enhanced StorageContext management ✅ Added ChatEngine for multi-turn conversation ✅ Improved Settings configuration ✅ Better response synthesis options ✅ Enhanced error handling ✅ More detailed documentation