IsmatS commited on
Commit
cd7918d
·
1 Parent(s): f597a1b
Files changed (3) hide show
  1. .env.example +6 -0
  2. DEPLOYMENT_TROUBLESHOOTING.md +189 -0
  3. app/main.py +35 -11
.env.example CHANGED
@@ -3,6 +3,12 @@ AZURE_OPENAI_API_KEY=your_azure_openai_api_key_here
3
  AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
4
  AZURE_OPENAI_API_VERSION=2024-08-01-preview
5
 
 
 
 
 
 
 
6
  # Azure Document Intelligence (using same credentials as OpenAI for hackathon)
7
  AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-resource.services.ai.azure.com/
8
  AZURE_DOCUMENT_INTELLIGENCE_KEY=your_document_intelligence_key_here
 
3
  AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
4
  AZURE_OPENAI_API_VERSION=2024-08-01-preview
5
 
6
+ # Azure OpenAI Embedding Configuration (for /llm endpoint)
7
+ # IMPORTANT: Deploy text-embedding-3-small in Azure OpenAI Studio first!
8
+ # See DEPLOYMENT_TROUBLESHOOTING.md for step-by-step guide
9
+ AZURE_EMBEDDING_MODEL=text-embedding-3-small
10
+ AZURE_EMBEDDING_DIMS=1024
11
+
12
  # Azure Document Intelligence (using same credentials as OpenAI for hackathon)
13
  AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-resource.services.ai.azure.com/
14
  AZURE_DOCUMENT_INTELLIGENCE_KEY=your_document_intelligence_key_here
DEPLOYMENT_TROUBLESHOOTING.md ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deployment Troubleshooting Guide
2
+
3
+ ## Error: "DeploymentNotFound" - Embedding Model
4
+
5
+ ### Problem
6
+ ```
7
+ Embedding error: Error code: 404 - {'error': {'code': 'DeploymentNotFound', 'message': 'The API deployment for this resource does not exist...'}}
8
+ ```
9
+
10
+ ### Root Cause
11
+ The application is deployed on **Render free tier (512MB RAM)**, which is too small to load the local `BAAI/bge-large-en-v1.5` embedding model (~400MB). To work around this, the app uses **Azure OpenAI embeddings** instead, but the required deployment doesn't exist yet.
12
+
13
+ ---
14
+
15
+ ## Solution: Deploy Embedding Model in Azure OpenAI
16
+
17
+ ### Step 1: Go to Azure OpenAI Studio
18
+ 1. Navigate to: https://oai.azure.com/portal
19
+ 2. Select your Azure OpenAI resource
20
+
21
+ ### Step 2: Create Embedding Deployment
22
+ 1. Click **Deployments** in the left sidebar
23
+ 2. Click **+ Create new deployment**
24
+ 3. Fill in the form:
25
+ - **Model**: `text-embedding-3-small`
26
+ - **Deployment name**: `text-embedding-3-small`
27
+ - **Model version**: Latest available
28
+ - **Dimensions**: `1024` (⚠️ IMPORTANT - must match Pinecone index)
29
+ - **Tokens per Minute Rate Limit**: Set as desired (e.g., 350K)
30
+
31
+ 4. Click **Create**
32
+ 5. Wait ~1 minute for deployment
33
+
34
+ ### Step 3: Verify Deployment
35
+ 1. In Deployments tab, confirm you see:
36
+ - `Llama-4-Maverick-17B-128E-Instruct-FP8` (for LLM/OCR)
37
+ - `text-embedding-3-small` (for embeddings) ✅ NEW
38
+
39
+ ### Step 4: Restart Your Application
40
+ - **Render**: Will auto-restart on next request
41
+ - **Local**: Restart uvicorn server
42
+
43
+ ---
44
+
45
+ ## Alternative Solution: Use Existing Deployment
46
+
47
+ If you already have a different embedding model deployed in Azure, you can use it instead:
48
+
49
+ ### Option A: Set Environment Variable
50
+ Add to your `.env` or Render environment variables:
51
+ ```bash
52
+ AZURE_EMBEDDING_MODEL=your-existing-embedding-deployment-name
53
+ AZURE_EMBEDDING_DIMS=1024 # Must be 1024 to match Pinecone
54
+ ```
55
+
56
+ ### Option B: Supported Models
57
+ Any of these work (with `dimensions=1024`):
58
+ - `text-embedding-3-small` (recommended - cheapest, fastest)
59
+ - `text-embedding-3-large`
60
+ - `text-embedding-ada-002` (legacy, no dimensions parameter - won't work)
61
+
62
+ ---
63
+
64
+ ## Memory Constraints
65
+
66
+ ### Why Not Use Local Model?
67
+ The `BAAI/bge-large-en-v1.5` model requires:
68
+ - **Model size**: ~400MB
69
+ - **Runtime overhead**: ~100MB
70
+ - **Total**: ~500MB+ (exceeds Render free tier limit)
71
+
72
+ ### Render Free Tier Limits
73
+ - **RAM**: 512MB max
74
+ - **Solution**: Use Azure OpenAI API (no local model loading)
75
+
76
+ ### If You Have Paid Hosting (1GB+ RAM)
77
+ You can use the local model by editing `app/main.py`:
78
+
79
+ ```python
80
+ from sentence_transformers import SentenceTransformer
81
+
82
+ # Initialize at startup
83
+ embedding_model = SentenceTransformer("BAAI/bge-large-en-v1.5")
84
+
85
+ def get_embedding(text: str) -> List[float]:
86
+ return embedding_model.encode(text, show_progress_bar=False).tolist()
87
+ ```
88
+
89
+ **Benefits**:
90
+ - No Azure API calls for embeddings
91
+ - Exact same model as ingestion
92
+ - Lower latency
93
+
94
+ **Tradeoffs**:
95
+ - Requires 1GB+ RAM
96
+ - Slower startup time (~10 seconds)
97
+
98
+ ---
99
+
100
+ ## Verification
101
+
102
+ ### Test Embedding Endpoint
103
+ ```bash
104
+ # Check if deployment exists
105
+ curl -X POST "https://YOUR-RESOURCE.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-08-01-preview" \
106
+ -H "Content-Type: application/json" \
107
+ -H "api-key: YOUR-API-KEY" \
108
+ -d '{"input": "test", "dimensions": 1024}'
109
+ ```
110
+
111
+ **Expected Response**:
112
+ ```json
113
+ {
114
+ "data": [{"embedding": [0.123, -0.456, ...], "index": 0}],
115
+ "model": "text-embedding-3-small",
116
+ "usage": {"prompt_tokens": 1, "total_tokens": 1}
117
+ }
118
+ ```
119
+
120
+ ### Test LLM Endpoint After Fix
121
+ ```bash
122
+ curl -X POST "https://socar-hackathon.onrender.com/llm" \
123
+ -H "Content-Type: application/json" \
124
+ -d '{"question": "SOCAR haqqında məlumat verin"}'
125
+ ```
126
+
127
+ **Expected**: No embedding errors, proper answer with sources
128
+
129
+ ---
130
+
131
+ ## Cost Implications
132
+
133
+ ### Azure OpenAI Embeddings Pricing (text-embedding-3-small)
134
+ - **Cost**: $0.02 per 1M tokens (~$0.00000002 per query)
135
+ - **Typical query**: 10-50 tokens = $0.000001 (negligible)
136
+
137
+ ### vs. Local Model
138
+ - **Cost**: $0 (but requires paid hosting with more RAM)
139
+ - **Hosting cost**: Render 1GB plan = $7/month
140
+
141
+ **Recommendation**: Use Azure embeddings on free tier, only switch to local if you already have paid hosting.
142
+
143
+ ---
144
+
145
+ ## Still Having Issues?
146
+
147
+ ### Check Logs
148
+ ```bash
149
+ # Render
150
+ render logs --tail 100
151
+
152
+ # Local
153
+ # Logs appear in terminal where you ran uvicorn
154
+ ```
155
+
156
+ **Look for**:
157
+ ```
158
+ ❌ EMBEDDING ERROR: Deployment 'text-embedding-3-small' not found in Azure OpenAI
159
+ ```
160
+
161
+ ### Common Issues
162
+ 1. **Wrong deployment name**: Check `AZURE_EMBEDDING_MODEL` env var
163
+ 2. **Deployment still creating**: Wait 1-2 minutes after creating
164
+ 3. **Wrong API version**: Use `2024-08-01-preview` or later
165
+ 4. **Dimensions mismatch**: MUST be 1024 (Pinecone index requirement)
166
+
167
+ ---
168
+
169
+ ## Environment Variables Reference
170
+
171
+ ```bash
172
+ # Required for LLM/OCR
173
+ AZURE_OPENAI_API_KEY=<your-key>
174
+ AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com/
175
+ AZURE_OPENAI_API_VERSION=2024-08-01-preview
176
+
177
+ # Required for embeddings (NEW)
178
+ AZURE_EMBEDDING_MODEL=text-embedding-3-small # Your deployment name
179
+ AZURE_EMBEDDING_DIMS=1024 # Must match Pinecone
180
+
181
+ # Required for vector DB
182
+ PINECONE_API_KEY=<your-key>
183
+ PINECONE_INDEX_NAME=hackathon
184
+ ```
185
+
186
+ ---
187
+
188
+ **Last Updated**: December 14, 2025
189
+ **Status**: ✅ Fixed in app/main.py with better error messages
app/main.py CHANGED
@@ -112,13 +112,18 @@ def get_pinecone_index():
112
 
113
  def get_embedding(text: str) -> List[float]:
114
  """
115
- Get embedding using Azure OpenAI API instead of local model.
116
- This saves ~400MB memory by not loading SentenceTransformer locally.
117
 
118
- Uses text-embedding-3-small with dimensions=1024 to match existing Pinecone index
119
- (which was created with BAAI/bge-large-en-v1.5 at 1024 dimensions).
 
 
 
 
120
  """
121
  client = get_azure_client()
 
 
122
  embedding_model = os.getenv("AZURE_EMBEDDING_MODEL", "text-embedding-3-small")
123
  embedding_dims = int(os.getenv("AZURE_EMBEDDING_DIMS", "1024"))
124
 
@@ -126,12 +131,29 @@ def get_embedding(text: str) -> List[float]:
126
  response = client.embeddings.create(
127
  input=text,
128
  model=embedding_model,
129
- dimensions=embedding_dims # Match Pinecone index dimensions
130
  )
131
  return response.data[0].embedding
132
  except Exception as e:
133
- # Fallback: return zero vector if embedding fails
134
- print(f"Embedding error: {e}")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
135
  return [0.0] * embedding_dims
136
 
137
 
@@ -170,10 +192,12 @@ def retrieve_documents(query: str, top_k: int = 3) -> List[Dict]:
170
  """
171
  Retrieve relevant documents from Pinecone vector database.
172
  Best strategy from benchmark: vanilla top-3
 
 
173
  """
174
  index = get_pinecone_index()
175
 
176
- # Generate query embedding using Azure OpenAI (memory efficient)
177
  query_embedding = get_embedding(query)
178
 
179
  # Search vector database
@@ -287,13 +311,13 @@ async def llm_endpoint(request: Request):
287
  LLM chatbot endpoint for SOCAR historical documents.
288
 
289
  Uses RAG (Retrieval Augmented Generation) with:
290
- - Embedding: BAAI/bge-large-en-v1.5
291
- - Retrieval: Top-3 documents
292
  - LLM: Llama-4-Maverick-17B (open-source)
293
  - Prompt: Citation-focused
294
 
295
  Expected performance:
296
- - Response time: ~3.6s
297
  - LLM Judge Score: 55.67%
298
  - Citation Score: 73.33%
299
 
 
112
 
113
  def get_embedding(text: str) -> List[float]:
114
  """
115
+ Generate embedding for semantic search.
 
116
 
117
+ IMPORTANT: You need to deploy an embedding model in Azure OpenAI Studio:
118
+ 1. Go to Azure OpenAI Studio Deployments
119
+ 2. Create deployment: text-embedding-3-small
120
+ 3. Set dimensions to 1024 to match Pinecone index
121
+
122
+ Alternative: Set AZURE_EMBEDDING_MODEL env var to your deployment name
123
  """
124
  client = get_azure_client()
125
+
126
+ # Get embedding model from env or use default
127
  embedding_model = os.getenv("AZURE_EMBEDDING_MODEL", "text-embedding-3-small")
128
  embedding_dims = int(os.getenv("AZURE_EMBEDDING_DIMS", "1024"))
129
 
 
131
  response = client.embeddings.create(
132
  input=text,
133
  model=embedding_model,
134
+ dimensions=embedding_dims
135
  )
136
  return response.data[0].embedding
137
  except Exception as e:
138
+ error_msg = str(e)
139
+
140
+ # Provide helpful error message
141
+ if "DeploymentNotFound" in error_msg or "404" in error_msg:
142
+ print(f"❌ EMBEDDING ERROR: Deployment '{embedding_model}' not found in Azure OpenAI")
143
+ print(f"")
144
+ print(f"📋 FIX THIS BY DEPLOYING THE MODEL:")
145
+ print(f" 1. Go to: https://oai.azure.com/portal")
146
+ print(f" 2. Navigate to: Deployments → Create new deployment")
147
+ print(f" 3. Model: text-embedding-3-small")
148
+ print(f" 4. Deployment name: text-embedding-3-small")
149
+ print(f" 5. Set: dimensions=1024")
150
+ print(f"")
151
+ print(f" OR set environment variable:")
152
+ print(f" AZURE_EMBEDDING_MODEL=<your-existing-embedding-deployment>")
153
+ else:
154
+ print(f"Embedding error: {e}")
155
+
156
+ # Return zero vector (will not match documents, but API won't crash)
157
  return [0.0] * embedding_dims
158
 
159
 
 
192
  """
193
  Retrieve relevant documents from Pinecone vector database.
194
  Best strategy from benchmark: vanilla top-3
195
+
196
+ Uses Azure OpenAI embeddings (1024-dim) for memory efficiency on Render free tier.
197
  """
198
  index = get_pinecone_index()
199
 
200
+ # Generate query embedding
201
  query_embedding = get_embedding(query)
202
 
203
  # Search vector database
 
311
  LLM chatbot endpoint for SOCAR historical documents.
312
 
313
  Uses RAG (Retrieval Augmented Generation) with:
314
+ - Embedding: Azure OpenAI text-embedding-3-small @ 1024-dim
315
+ - Retrieval: Top-3 documents (Pinecone)
316
  - LLM: Llama-4-Maverick-17B (open-source)
317
  - Prompt: Citation-focused
318
 
319
  Expected performance:
320
+ - Response time: ~4.0s
321
  - LLM Judge Score: 55.67%
322
  - Citation Score: 73.33%
323