Pinecone Vector Database Setup Guide
This guide will help you set up Pinecone cloud storage for your vector database.
Prerequisites
- A Pinecone account (sign up at https://app.pinecone.io/)
- A Pinecone API key
Setup Steps
1. Get Your Pinecone API Key
- Go to https://app.pinecone.io/
- Sign up or log in
- Navigate to your API keys section
- Copy your API key
2. Set the API Key
You have two options:
Option A: Environment Variable (Recommended)
Set the environment variable before running your code:
Windows (PowerShell):
$env:PINECONE_API_KEY="your-api-key-here"
Windows (Command Prompt):
set PINECONE_API_KEY=your-api-key-here
Linux/Mac:
export PINECONE_API_KEY="your-api-key-here"
Option B: Direct Configuration
Edit module_a/config.py and set:
PINECONE_API_KEY = "your-api-key-here"
Note: Option A is recommended for security reasons.
3. Install Dependencies
Make sure you have the Pinecone client installed:
pip install pinecone-client[grpc]>=3.0.0
Or install all requirements:
pip install -r module_a/requirements.txt
4. Build Your Vector Database
Run the build script:
python -m module_a.build_vector_db
The script will automatically detect if Pinecone is configured and use it instead of ChromaDB.
5. Verify Setup
The build script will:
- Create a Pinecone index if it doesn't exist
- Upload your document chunks to Pinecone
- Store full text in a local JSON file (to avoid Pinecone metadata limits)
How It Works
Text Storage
Pinecone has a 40KB limit on metadata per vector. To work around this:
- Full text is stored in a local JSON file (
data/module-A/pinecone_text_storage.json) - Only a text preview is stored in Pinecone metadata
- The system automatically loads and saves this storage file
Index Configuration
- Index Name:
nepal-legal-docs(configurable inconfig.py) - Dimension: 384 (matches the embedding model)
- Metric: Cosine similarity
- Cloud: AWS
- Region: us-east-1 (configurable in
pinecone_vector_db.py)
Troubleshooting
"PINECONE_API_KEY must be set"
- Make sure you've set the API key (see Step 2)
- Check that the environment variable is set in the same terminal session
"Index creation failed"
- Check your Pinecone dashboard for quota limits
- Verify your API key is valid
- Try a different region if us-east-1 is unavailable
"Failed to connect to index"
- Wait a few minutes after index creation (it takes time to initialize)
- Check your network connection
- Verify the index exists in your Pinecone dashboard
Text not found in queries
- Make sure
pinecone_text_storage.jsonexists and contains your data - The file is automatically created when you build the database
- If you delete it, you'll need to rebuild the database
Switching Between ChromaDB and Pinecone
The system automatically uses Pinecone if PINECONE_API_KEY is set, otherwise it falls back to ChromaDB.
Important: The application will automatically fall back to ChromaDB if:
- Pinecone API key is not set
- Pinecone initialization fails
- Pinecone client is not installed
This means your application will work even without Pinecone configured - it will just use the local ChromaDB instead.
To switch:
- Use Pinecone: Set
PINECONE_API_KEYenvironment variable - Use ChromaDB: Unset or remove
PINECONE_API_KEY
The RAG chain (LegalRAGChain) automatically detects which vector database to use at initialization time.
Cost Considerations
Pinecone offers a free tier with:
- 1 index
- 100K vectors
- 1M queries/month
Check https://www.pinecone.io/pricing/ for current pricing.
Support
For Pinecone-specific issues, check:
- Pinecone Documentation: https://docs.pinecone.io/
- Pinecone Console: https://app.pinecone.io/