--- title: AI with Pinecone Integrated Inference RAG emoji: 🤖 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.44.0 app_file: app.py pinned: false license: mit --- # AI Assistant with Pinecone Integrated Inference RAG This Gradio application creates an AI assistant that uses Pinecone's integrated inference capabilities for embeddings and reranking, combined with vector storage for persistent memory. ## 🚀 Features - **Pinecone Integrated Inference**: Uses Pinecone's hosted embedding and reranking models - **Advanced Memory System**: Stores conversation experiences as vectors with automatic embedding - **Smart Reranking**: Improves search relevance using Pinecone's reranking models - **OpenRouter Integration**: Supports multiple AI models through OpenRouter API - **Learning Over Time**: The AI gains experience and provides more contextual responses - **Real-time Context Display**: Shows retrieved and reranked experiences ## 🧠 AI Models Used ### Embedding Model - **multilingual-e5-large**: 1024-dimensional multilingual embeddings - Excellent for semantic search across languages - Optimized for both passage and query embeddings - Automatically handles text-to-vector conversion ### Reranking Model - **pinecone-rerank-v0**: Pinecone's state-of-the-art reranking model - Up to 60% improvement in search accuracy - Reorders results by relevance before sending to LLM - Reduces token waste and improves response quality ### Alternative Models Available - **cohere-rerank-v3.5**: Cohere's leading reranking model - **pinecone-sparse-english-v0**: Sparse embeddings for keyword-based search - **bge-reranker-v2-m3**: Open-source multilingual reranking ## 🛠 Setup ### Required Environment Variables 1. **PINECONE_API_KEY**: Your Pinecone API key (hardcoded in this demo) - Get it from: https://www.pinecone.io/ 2. **OPENROUTER_API_KEY**: Your OpenRouter API key - Get it from: https://openrouter.ai/ ### Optional Environment Variables 3. **PINECONE_EMBEDDING_MODEL**: Embedding model name (default: "multilingual-e5-large") 4. **PINECONE_RERANK_MODEL**: Reranking model name (default: "pinecone-rerank-v0") 5. **MODEL_NAME**: OpenRouter model name (default: "anthropic/claude-3-haiku") ## 🔄 How It Works ### 1. Integrated Inference Pipeline 1. **User Input** → Pinecone automatically converts text to embeddings 2. **Vector Search** → Retrieves relevant past conversations from vector database 3. **Reranking** → Pinecone reranks results by relevance to query 4. **Context Building** → Formats reranked experiences for AI 5. **AI Response** → OpenRouter generates response with retrieved context 6. **Memory Storage** → New conversation automatically embedded and stored ### 2. Advanced Features - **Automatic Embedding**: No manual embedding generation required - **Smart Reranking**: Improves relevance of retrieved memories - **Multilingual Support**: Works across multiple languages - **Serverless Architecture**: Automatically scales based on usage - **Real-time Learning**: Each conversation improves future responses ## 📊 Benefits of Pinecone Integrated Inference ### Traditional Approach vs Pinecone Integrated - ❌ **Traditional**: Manage separate embedding service + vector DB + reranking - ✅ **Pinecone Integrated**: Single API for embedding, storage, search, and reranking ### Performance Improvements - 🚀 **60% better search accuracy** with integrated reranking - ⚡ **Lower latency** with co-located inference and storage - 💰 **Cost efficient** with serverless scaling - 🔒 **More secure** with private networking (no cross-service calls) ## 🎯 Use Cases Perfect for This System 1. **Customer Support**: AI that remembers previous interactions 2. **Personal Assistant**: Learning user preferences over time 3. **Knowledge Management**: Building institutional memory 4. **Content Recommendation**: Improving suggestions based on history 5. **Research Assistant**: Connecting related information across conversations ## 🔧 Technical Architecture ```mermaid graph TD A[User Input] --> B[Pinecone Inference API] B --> C[multilingual-e5-large Embedding] C --> D[Vector Search in Pinecone] D --> E[pinecone-rerank-v0 Reranking] E --> F[OpenRouter LLM] F --> G[AI Response] G --> H[Auto-embed & Store] H --> D ``` ## 🚀 Getting Started 1. **Clone/Deploy** this HuggingFace Space 2. **Set Environment Variables** in Space settings 3. **Start Chatting** - the system will auto-create everything needed! The AI will automatically: - Create a new Pinecone index with integrated inference - Generate embeddings for all conversations - Build a memory of interactions over time - Provide increasingly contextual responses ## 📈 Monitoring & Analytics The interface provides real-time monitoring of: - Connection status to Pinecone and OpenRouter - Number of stored experiences - Embedding and reranking model information - Retrieved context for each response ## 🔐 Privacy & Security - Conversations stored in your personal Pinecone database - Integrated inference runs on Pinecone's secure infrastructure - No cross-network communication between services - Full control over your data and models ## 📚 Learn More - [Pinecone Integrated Inference](https://docs.pinecone.io/guides/inference/understanding-inference) - [OpenRouter API](https://openrouter.ai/docs) - [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/) ## 🏷 License MIT License - feel free to modify and use for your projects! --- *Powered by Pinecone's state-of-the-art integrated inference and vector database technology* 🚀