---
title: AI with Pinecone Integrated Inference RAG
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
license: mit
---

# AI Assistant with Pinecone Integrated Inference RAG

This Gradio application creates an AI assistant that uses Pinecone's integrated inference capabilities for embeddings and reranking, combined with vector storage for persistent memory.

## 🚀 Features

- **Pinecone Integrated Inference**: Uses Pinecone's hosted embedding and reranking models
- **Advanced Memory System**: Stores conversation experiences as vectors with automatic embedding
- **Smart Reranking**: Improves search relevance using Pinecone's reranking models
- **OpenRouter Integration**: Supports multiple AI models through OpenRouter API
- **Learning Over Time**: The AI gains experience and provides more contextual responses
- **Real-time Context Display**: Shows retrieved and reranked experiences

## 🧠 AI Models Used

### Embedding Model
- **multilingual-e5-large**: 1024-dimensional multilingual embeddings
  - Excellent for semantic search across languages
  - Optimized for both passage and query embeddings
  - Automatically handles text-to-vector conversion

### Reranking Model
- **pinecone-rerank-v0**: Pinecone's state-of-the-art reranking model
  - Up to 60% improvement in search accuracy
  - Reorders results by relevance before sending to LLM
  - Reduces token waste and improves response quality

### Alternative Models Available
- **cohere-rerank-v3.5**: Cohere's leading reranking model
- **pinecone-sparse-english-v0**: Sparse embeddings for keyword-based search
- **bge-reranker-v2-m3**: Open-source multilingual reranking

## 🛠 Setup

### Required Environment Variables

1. **PINECONE_API_KEY**: Your Pinecone API key (hardcoded in this demo)
   - Get it from: https://www.pinecone.io/
   
2. **OPENROUTER_API_KEY**: Your OpenRouter API key
   - Get it from: https://openrouter.ai/

### Optional Environment Variables

3. **PINECONE_EMBEDDING_MODEL**: Embedding model name (default: "multilingual-e5-large")
4. **PINECONE_RERANK_MODEL**: Reranking model name (default: "pinecone-rerank-v0")
5. **MODEL_NAME**: OpenRouter model name (default: "anthropic/claude-3-haiku")

## 🔄 How It Works

### 1. Integrated Inference Pipeline
1. **User Input** → Pinecone automatically converts text to embeddings
2. **Vector Search** → Retrieves relevant past conversations from vector database
3. **Reranking** → Pinecone reranks results by relevance to query
4. **Context Building** → Formats reranked experiences for AI
5. **AI Response** → OpenRouter generates response with retrieved context
6. **Memory Storage** → New conversation automatically embedded and stored

### 2. Advanced Features
- **Automatic Embedding**: No manual embedding generation required
- **Smart Reranking**: Improves relevance of retrieved memories
- **Multilingual Support**: Works across multiple languages
- **Serverless Architecture**: Automatically scales based on usage
- **Real-time Learning**: Each conversation improves future responses

## 📊 Benefits of Pinecone Integrated Inference

### Traditional Approach vs Pinecone Integrated
- ❌ **Traditional**: Manage separate embedding service + vector DB + reranking
- ✅ **Pinecone Integrated**: Single API for embedding, storage, search, and reranking

### Performance Improvements
- 🚀 **60% better search accuracy** with integrated reranking
- ⚡ **Lower latency** with co-located inference and storage
- 💰 **Cost efficient** with serverless scaling
- 🔒 **More secure** with private networking (no cross-service calls)

## 🎯 Use Cases Perfect for This System

1. **Customer Support**: AI that remembers previous interactions
2. **Personal Assistant**: Learning user preferences over time
3. **Knowledge Management**: Building institutional memory
4. **Content Recommendation**: Improving suggestions based on history
5. **Research Assistant**: Connecting related information across conversations

## 🔧 Technical Architecture

```mermaid
graph TD
    A[User Input] --> B[Pinecone Inference API]
    B --> C[multilingual-e5-large Embedding]
    C --> D[Vector Search in Pinecone]
    D --> E[pinecone-rerank-v0 Reranking]
    E --> F[OpenRouter LLM]
    F --> G[AI Response]
    G --> H[Auto-embed & Store]
    H --> D
```

## 🚀 Getting Started

1. **Clone/Deploy** this HuggingFace Space
2. **Set Environment Variables** in Space settings
3. **Start Chatting** - the system will auto-create everything needed!

The AI will automatically:
- Create a new Pinecone index with integrated inference
- Generate embeddings for all conversations
- Build a memory of interactions over time
- Provide increasingly contextual responses

## 📈 Monitoring & Analytics

The interface provides real-time monitoring of:
- Connection status to Pinecone and OpenRouter
- Number of stored experiences
- Embedding and reranking model information
- Retrieved context for each response

## 🔐 Privacy & Security

- Conversations stored in your personal Pinecone database
- Integrated inference runs on Pinecone's secure infrastructure
- No cross-network communication between services
- Full control over your data and models

## 📚 Learn More

- [Pinecone Integrated Inference](https://docs.pinecone.io/guides/inference/understanding-inference)
- [OpenRouter API](https://openrouter.ai/docs)
- [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/)

## 🏷 License

MIT License - feel free to modify and use for your projects!

---

*Powered by Pinecone's state-of-the-art integrated inference and vector database technology* 🚀