# Pinecone Vector Database Setup Guide

This guide will help you set up Pinecone cloud storage for your vector database.

## Prerequisites

1. A Pinecone account (sign up at https://app.pinecone.io/)
2. A Pinecone API key

## Setup Steps

### 1. Get Your Pinecone API Key

1. Go to https://app.pinecone.io/
2. Sign up or log in
3. Navigate to your API keys section
4. Copy your API key

### 2. Set the API Key

You have two options:

#### Option A: Environment Variable (Recommended)
Set the environment variable before running your code:

**Windows (PowerShell):**
```powershell
$env:PINECONE_API_KEY="your-api-key-here"
```

**Windows (Command Prompt):**
```cmd
set PINECONE_API_KEY=your-api-key-here
```

**Linux/Mac:**
```bash
export PINECONE_API_KEY="your-api-key-here"
```

#### Option B: Direct Configuration
Edit `module_a/config.py` and set:
```python
PINECONE_API_KEY = "your-api-key-here"
```

**Note:** Option A is recommended for security reasons.

### 3. Install Dependencies

Make sure you have the Pinecone client installed:
```bash
pip install pinecone-client[grpc]>=3.0.0
```

Or install all requirements:
```bash
pip install -r module_a/requirements.txt
```

### 4. Build Your Vector Database

Run the build script:
```bash
python -m module_a.build_vector_db
```

The script will automatically detect if Pinecone is configured and use it instead of ChromaDB.

### 5. Verify Setup

The build script will:
- Create a Pinecone index if it doesn't exist
- Upload your document chunks to Pinecone
- Store full text in a local JSON file (to avoid Pinecone metadata limits)

## How It Works

### Text Storage
Pinecone has a 40KB limit on metadata per vector. To work around this:
- Full text is stored in a local JSON file (`data/module-A/pinecone_text_storage.json`)
- Only a text preview is stored in Pinecone metadata
- The system automatically loads and saves this storage file

### Index Configuration
- **Index Name:** `nepal-legal-docs` (configurable in `config.py`)
- **Dimension:** 384 (matches the embedding model)
- **Metric:** Cosine similarity
- **Cloud:** AWS
- **Region:** us-east-1 (configurable in `pinecone_vector_db.py`)

## Troubleshooting

### "PINECONE_API_KEY must be set"
- Make sure you've set the API key (see Step 2)
- Check that the environment variable is set in the same terminal session

### "Index creation failed"
- Check your Pinecone dashboard for quota limits
- Verify your API key is valid
- Try a different region if us-east-1 is unavailable

### "Failed to connect to index"
- Wait a few minutes after index creation (it takes time to initialize)
- Check your network connection
- Verify the index exists in your Pinecone dashboard

### Text not found in queries
- Make sure `pinecone_text_storage.json` exists and contains your data
- The file is automatically created when you build the database
- If you delete it, you'll need to rebuild the database

## Switching Between ChromaDB and Pinecone

The system automatically uses Pinecone if `PINECONE_API_KEY` is set, otherwise it falls back to ChromaDB.

**Important:** The application will automatically fall back to ChromaDB if:
- Pinecone API key is not set
- Pinecone initialization fails
- Pinecone client is not installed

This means your application will work even without Pinecone configured - it will just use the local ChromaDB instead.

To switch:
- **Use Pinecone:** Set `PINECONE_API_KEY` environment variable
- **Use ChromaDB:** Unset or remove `PINECONE_API_KEY`

The RAG chain (`LegalRAGChain`) automatically detects which vector database to use at initialization time.

## Cost Considerations

Pinecone offers a free tier with:
- 1 index
- 100K vectors
- 1M queries/month

Check https://www.pinecone.io/pricing/ for current pricing.

## Support

For Pinecone-specific issues, check:
- Pinecone Documentation: https://docs.pinecone.io/
- Pinecone Console: https://app.pinecone.io/