File size: 3,899 Bytes
3998131
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
# Pinecone Vector Database Setup Guide

This guide will help you set up Pinecone cloud storage for your vector database.

## Prerequisites

1. A Pinecone account (sign up at https://app.pinecone.io/)
2. A Pinecone API key

## Setup Steps

### 1. Get Your Pinecone API Key

1. Go to https://app.pinecone.io/
2. Sign up or log in
3. Navigate to your API keys section
4. Copy your API key

### 2. Set the API Key

You have two options:

#### Option A: Environment Variable (Recommended)
Set the environment variable before running your code:

**Windows (PowerShell):**
```powershell
$env:PINECONE_API_KEY="your-api-key-here"
```

**Windows (Command Prompt):**
```cmd
set PINECONE_API_KEY=your-api-key-here
```

**Linux/Mac:**
```bash
export PINECONE_API_KEY="your-api-key-here"
```

#### Option B: Direct Configuration
Edit `module_a/config.py` and set:
```python
PINECONE_API_KEY = "your-api-key-here"
```

**Note:** Option A is recommended for security reasons.

### 3. Install Dependencies

Make sure you have the Pinecone client installed:
```bash
pip install pinecone-client[grpc]>=3.0.0
```

Or install all requirements:
```bash
pip install -r module_a/requirements.txt
```

### 4. Build Your Vector Database

Run the build script:
```bash
python -m module_a.build_vector_db
```

The script will automatically detect if Pinecone is configured and use it instead of ChromaDB.

### 5. Verify Setup

The build script will:
- Create a Pinecone index if it doesn't exist
- Upload your document chunks to Pinecone
- Store full text in a local JSON file (to avoid Pinecone metadata limits)

## How It Works

### Text Storage
Pinecone has a 40KB limit on metadata per vector. To work around this:
- Full text is stored in a local JSON file (`data/module-A/pinecone_text_storage.json`)
- Only a text preview is stored in Pinecone metadata
- The system automatically loads and saves this storage file

### Index Configuration
- **Index Name:** `nepal-legal-docs` (configurable in `config.py`)
- **Dimension:** 384 (matches the embedding model)
- **Metric:** Cosine similarity
- **Cloud:** AWS
- **Region:** us-east-1 (configurable in `pinecone_vector_db.py`)

## Troubleshooting

### "PINECONE_API_KEY must be set"
- Make sure you've set the API key (see Step 2)
- Check that the environment variable is set in the same terminal session

### "Index creation failed"
- Check your Pinecone dashboard for quota limits
- Verify your API key is valid
- Try a different region if us-east-1 is unavailable

### "Failed to connect to index"
- Wait a few minutes after index creation (it takes time to initialize)
- Check your network connection
- Verify the index exists in your Pinecone dashboard

### Text not found in queries
- Make sure `pinecone_text_storage.json` exists and contains your data
- The file is automatically created when you build the database
- If you delete it, you'll need to rebuild the database

## Switching Between ChromaDB and Pinecone

The system automatically uses Pinecone if `PINECONE_API_KEY` is set, otherwise it falls back to ChromaDB.

**Important:** The application will automatically fall back to ChromaDB if:
- Pinecone API key is not set
- Pinecone initialization fails
- Pinecone client is not installed

This means your application will work even without Pinecone configured - it will just use the local ChromaDB instead.

To switch:
- **Use Pinecone:** Set `PINECONE_API_KEY` environment variable
- **Use ChromaDB:** Unset or remove `PINECONE_API_KEY`

The RAG chain (`LegalRAGChain`) automatically detects which vector database to use at initialization time.

## Cost Considerations

Pinecone offers a free tier with:
- 1 index
- 100K vectors
- 1M queries/month

Check https://www.pinecone.io/pricing/ for current pricing.

## Support

For Pinecone-specific issues, check:
- Pinecone Documentation: https://docs.pinecone.io/
- Pinecone Console: https://app.pinecone.io/