| --- |
| title: AI with Pinecone Integrated Inference RAG |
| emoji: π€ |
| colorFrom: blue |
| colorTo: purple |
| sdk: gradio |
| sdk_version: 4.44.0 |
| app_file: app.py |
| pinned: false |
| license: mit |
| --- |
| |
| # AI Assistant with Pinecone Integrated Inference RAG |
|
|
| This Gradio application creates an AI assistant that uses Pinecone's integrated inference capabilities for embeddings and reranking, combined with vector storage for persistent memory. |
|
|
| ## π Features |
|
|
| - **Pinecone Integrated Inference**: Uses Pinecone's hosted embedding and reranking models |
| - **Advanced Memory System**: Stores conversation experiences as vectors with automatic embedding |
| - **Smart Reranking**: Improves search relevance using Pinecone's reranking models |
| - **OpenRouter Integration**: Supports multiple AI models through OpenRouter API |
| - **Learning Over Time**: The AI gains experience and provides more contextual responses |
| - **Real-time Context Display**: Shows retrieved and reranked experiences |
|
|
| ## π§ AI Models Used |
|
|
| ### Embedding Model |
| - **multilingual-e5-large**: 1024-dimensional multilingual embeddings |
| - Excellent for semantic search across languages |
| - Optimized for both passage and query embeddings |
| - Automatically handles text-to-vector conversion |
|
|
| ### Reranking Model |
| - **pinecone-rerank-v0**: Pinecone's state-of-the-art reranking model |
| - Up to 60% improvement in search accuracy |
| - Reorders results by relevance before sending to LLM |
| - Reduces token waste and improves response quality |
|
|
| ### Alternative Models Available |
| - **cohere-rerank-v3.5**: Cohere's leading reranking model |
| - **pinecone-sparse-english-v0**: Sparse embeddings for keyword-based search |
| - **bge-reranker-v2-m3**: Open-source multilingual reranking |
|
|
| ## π Setup |
|
|
| ### Required Environment Variables |
|
|
| 1. **PINECONE_API_KEY**: Your Pinecone API key (hardcoded in this demo) |
| - Get it from: https://www.pinecone.io/ |
| |
| 2. **OPENROUTER_API_KEY**: Your OpenRouter API key |
| - Get it from: https://openrouter.ai/ |
|
|
| ### Optional Environment Variables |
|
|
| 3. **PINECONE_EMBEDDING_MODEL**: Embedding model name (default: "multilingual-e5-large") |
| 4. **PINECONE_RERANK_MODEL**: Reranking model name (default: "pinecone-rerank-v0") |
| 5. **MODEL_NAME**: OpenRouter model name (default: "anthropic/claude-3-haiku") |
| |
| ## π How It Works |
| |
| ### 1. Integrated Inference Pipeline |
| 1. **User Input** β Pinecone automatically converts text to embeddings |
| 2. **Vector Search** β Retrieves relevant past conversations from vector database |
| 3. **Reranking** β Pinecone reranks results by relevance to query |
| 4. **Context Building** β Formats reranked experiences for AI |
| 5. **AI Response** β OpenRouter generates response with retrieved context |
| 6. **Memory Storage** β New conversation automatically embedded and stored |
| |
| ### 2. Advanced Features |
| - **Automatic Embedding**: No manual embedding generation required |
| - **Smart Reranking**: Improves relevance of retrieved memories |
| - **Multilingual Support**: Works across multiple languages |
| - **Serverless Architecture**: Automatically scales based on usage |
| - **Real-time Learning**: Each conversation improves future responses |
| |
| ## π Benefits of Pinecone Integrated Inference |
| |
| ### Traditional Approach vs Pinecone Integrated |
| - β **Traditional**: Manage separate embedding service + vector DB + reranking |
| - β
**Pinecone Integrated**: Single API for embedding, storage, search, and reranking |
| |
| ### Performance Improvements |
| - π **60% better search accuracy** with integrated reranking |
| - β‘ **Lower latency** with co-located inference and storage |
| - π° **Cost efficient** with serverless scaling |
| - π **More secure** with private networking (no cross-service calls) |
| |
| ## π― Use Cases Perfect for This System |
| |
| 1. **Customer Support**: AI that remembers previous interactions |
| 2. **Personal Assistant**: Learning user preferences over time |
| 3. **Knowledge Management**: Building institutional memory |
| 4. **Content Recommendation**: Improving suggestions based on history |
| 5. **Research Assistant**: Connecting related information across conversations |
| |
| ## π§ Technical Architecture |
| |
| ```mermaid |
| graph TD |
| A[User Input] --> B[Pinecone Inference API] |
| B --> C[multilingual-e5-large Embedding] |
| C --> D[Vector Search in Pinecone] |
| D --> E[pinecone-rerank-v0 Reranking] |
| E --> F[OpenRouter LLM] |
| F --> G[AI Response] |
| G --> H[Auto-embed & Store] |
| H --> D |
| ``` |
| |
| ## π Getting Started |
| |
| 1. **Clone/Deploy** this HuggingFace Space |
| 2. **Set Environment Variables** in Space settings |
| 3. **Start Chatting** - the system will auto-create everything needed! |
| |
| The AI will automatically: |
| - Create a new Pinecone index with integrated inference |
| - Generate embeddings for all conversations |
| - Build a memory of interactions over time |
| - Provide increasingly contextual responses |
| |
| ## π Monitoring & Analytics |
| |
| The interface provides real-time monitoring of: |
| - Connection status to Pinecone and OpenRouter |
| - Number of stored experiences |
| - Embedding and reranking model information |
| - Retrieved context for each response |
| |
| ## π Privacy & Security |
| |
| - Conversations stored in your personal Pinecone database |
| - Integrated inference runs on Pinecone's secure infrastructure |
| - No cross-network communication between services |
| - Full control over your data and models |
| |
| ## π Learn More |
| |
| - [Pinecone Integrated Inference](https://docs.pinecone.io/guides/inference/understanding-inference) |
| - [OpenRouter API](https://openrouter.ai/docs) |
| - [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/) |
| |
| ## π· License |
| |
| MIT License - feel free to modify and use for your projects! |
| |
| --- |
| |
| *Powered by Pinecone's state-of-the-art integrated inference and vector database technology* π |
| |