Spaces:

ReallyFloppyPenguin
/

AI.With.Experiences

Sleeping

App Files Files Community

AI.With.Experiences / README.md

ReallyFloppyPenguin

Update README.md

995b83d verified 10 months ago

preview code

raw

history blame contribute delete

5.69 kB

	---
	title: AI with Pinecone Integrated Inference RAG
	emoji: 🤖
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	pinned: false
	license: mit
	---

	# AI Assistant with Pinecone Integrated Inference RAG

	This Gradio application creates an AI assistant that uses Pinecone's integrated inference capabilities for embeddings and reranking, combined with vector storage for persistent memory.

	## 🚀 Features

	- Pinecone Integrated Inference: Uses Pinecone's hosted embedding and reranking models
	- Advanced Memory System: Stores conversation experiences as vectors with automatic embedding
	- Smart Reranking: Improves search relevance using Pinecone's reranking models
	- OpenRouter Integration: Supports multiple AI models through OpenRouter API
	- Learning Over Time: The AI gains experience and provides more contextual responses
	- Real-time Context Display: Shows retrieved and reranked experiences

	## 🧠 AI Models Used

	### Embedding Model
	- multilingual-e5-large: 1024-dimensional multilingual embeddings
	- Excellent for semantic search across languages
	- Optimized for both passage and query embeddings
	- Automatically handles text-to-vector conversion

	### Reranking Model
	- pinecone-rerank-v0: Pinecone's state-of-the-art reranking model
	- Up to 60% improvement in search accuracy
	- Reorders results by relevance before sending to LLM
	- Reduces token waste and improves response quality

	### Alternative Models Available
	- cohere-rerank-v3.5: Cohere's leading reranking model
	- pinecone-sparse-english-v0: Sparse embeddings for keyword-based search
	- bge-reranker-v2-m3: Open-source multilingual reranking

	## 🛠 Setup

	### Required Environment Variables

	1. PINECONE_API_KEY: Your Pinecone API key (hardcoded in this demo)
	- Get it from: https://www.pinecone.io/

	2. OPENROUTER_API_KEY: Your OpenRouter API key
	- Get it from: https://openrouter.ai/

	### Optional Environment Variables

	3. PINECONE_EMBEDDING_MODEL: Embedding model name (default: "multilingual-e5-large")
	4. PINECONE_RERANK_MODEL: Reranking model name (default: "pinecone-rerank-v0")
	5. MODEL_NAME: OpenRouter model name (default: "anthropic/claude-3-haiku")

	## 🔄 How It Works

	### 1. Integrated Inference Pipeline
	1. User Input → Pinecone automatically converts text to embeddings
	2. Vector Search → Retrieves relevant past conversations from vector database
	3. Reranking → Pinecone reranks results by relevance to query
	4. Context Building → Formats reranked experiences for AI
	5. AI Response → OpenRouter generates response with retrieved context
	6. Memory Storage → New conversation automatically embedded and stored

	### 2. Advanced Features
	- Automatic Embedding: No manual embedding generation required
	- Smart Reranking: Improves relevance of retrieved memories
	- Multilingual Support: Works across multiple languages
	- Serverless Architecture: Automatically scales based on usage
	- Real-time Learning: Each conversation improves future responses

	## 📊 Benefits of Pinecone Integrated Inference

	### Traditional Approach vs Pinecone Integrated
	- ❌ Traditional: Manage separate embedding service + vector DB + reranking
	- ✅ Pinecone Integrated: Single API for embedding, storage, search, and reranking

	### Performance Improvements
	- 🚀 60% better search accuracy with integrated reranking
	- ⚡ Lower latency with co-located inference and storage
	- 💰 Cost efficient with serverless scaling
	- 🔒 More secure with private networking (no cross-service calls)

	## 🎯 Use Cases Perfect for This System

	1. Customer Support: AI that remembers previous interactions
	2. Personal Assistant: Learning user preferences over time
	3. Knowledge Management: Building institutional memory
	4. Content Recommendation: Improving suggestions based on history
	5. Research Assistant: Connecting related information across conversations

	## 🔧 Technical Architecture

	```mermaid
	graph TD
	A[User Input] --> B[Pinecone Inference API]
	B --> C[multilingual-e5-large Embedding]
	C --> D[Vector Search in Pinecone]
	D --> E[pinecone-rerank-v0 Reranking]
	E --> F[OpenRouter LLM]
	F --> G[AI Response]
	G --> H[Auto-embed & Store]
	H --> D
	```

	## 🚀 Getting Started

	1. Clone/Deploy this HuggingFace Space
	2. Set Environment Variables in Space settings
	3. Start Chatting - the system will auto-create everything needed!

	The AI will automatically:
	- Create a new Pinecone index with integrated inference
	- Generate embeddings for all conversations
	- Build a memory of interactions over time
	- Provide increasingly contextual responses

	## 📈 Monitoring & Analytics

	The interface provides real-time monitoring of:
	- Connection status to Pinecone and OpenRouter
	- Number of stored experiences
	- Embedding and reranking model information
	- Retrieved context for each response

	## 🔐 Privacy & Security

	- Conversations stored in your personal Pinecone database
	- Integrated inference runs on Pinecone's secure infrastructure
	- No cross-network communication between services
	- Full control over your data and models

	## 📚 Learn More

	- [Pinecone Integrated Inference](https://docs.pinecone.io/guides/inference/understanding-inference)
	- [OpenRouter API](https://openrouter.ai/docs)
	- [RAG Best Practices](https://www.pinecone.io/learn/retrieval-augmented-generation/)

	## 🏷 License

	MIT License - feel free to modify and use for your projects!

	---

	Powered by Pinecone's state-of-the-art integrated inference and vector database technology 🚀