IsmatS commited on
Commit
a9089e1
·
1 Parent(s): 585eacf
Files changed (2) hide show
  1. DEPLOYMENT_TROUBLESHOOTING.md +0 -189
  2. app/main.py +22 -8
DEPLOYMENT_TROUBLESHOOTING.md DELETED
@@ -1,189 +0,0 @@
1
- # Deployment Troubleshooting Guide
2
-
3
- ## Error: "DeploymentNotFound" - Embedding Model
4
-
5
- ### Problem
6
- ```
7
- Embedding error: Error code: 404 - {'error': {'code': 'DeploymentNotFound', 'message': 'The API deployment for this resource does not exist...'}}
8
- ```
9
-
10
- ### Root Cause
11
- The application is deployed on **Render free tier (512MB RAM)**, which is too small to load the local `BAAI/bge-large-en-v1.5` embedding model (~400MB). To work around this, the app uses **Azure OpenAI embeddings** instead, but the required deployment doesn't exist yet.
12
-
13
- ---
14
-
15
- ## Solution: Deploy Embedding Model in Azure OpenAI
16
-
17
- ### Step 1: Go to Azure OpenAI Studio
18
- 1. Navigate to: https://oai.azure.com/portal
19
- 2. Select your Azure OpenAI resource
20
-
21
- ### Step 2: Create Embedding Deployment
22
- 1. Click **Deployments** in the left sidebar
23
- 2. Click **+ Create new deployment**
24
- 3. Fill in the form:
25
- - **Model**: `text-embedding-3-small`
26
- - **Deployment name**: `text-embedding-3-small`
27
- - **Model version**: Latest available
28
- - **Dimensions**: `1024` (⚠️ IMPORTANT - must match Pinecone index)
29
- - **Tokens per Minute Rate Limit**: Set as desired (e.g., 350K)
30
-
31
- 4. Click **Create**
32
- 5. Wait ~1 minute for deployment
33
-
34
- ### Step 3: Verify Deployment
35
- 1. In Deployments tab, confirm you see:
36
- - `Llama-4-Maverick-17B-128E-Instruct-FP8` (for LLM/OCR)
37
- - `text-embedding-3-small` (for embeddings) ✅ NEW
38
-
39
- ### Step 4: Restart Your Application
40
- - **Render**: Will auto-restart on next request
41
- - **Local**: Restart uvicorn server
42
-
43
- ---
44
-
45
- ## Alternative Solution: Use Existing Deployment
46
-
47
- If you already have a different embedding model deployed in Azure, you can use it instead:
48
-
49
- ### Option A: Set Environment Variable
50
- Add to your `.env` or Render environment variables:
51
- ```bash
52
- AZURE_EMBEDDING_MODEL=your-existing-embedding-deployment-name
53
- AZURE_EMBEDDING_DIMS=1024 # Must be 1024 to match Pinecone
54
- ```
55
-
56
- ### Option B: Supported Models
57
- Any of these work (with `dimensions=1024`):
58
- - `text-embedding-3-small` (recommended - cheapest, fastest)
59
- - `text-embedding-3-large`
60
- - `text-embedding-ada-002` (legacy, no dimensions parameter - won't work)
61
-
62
- ---
63
-
64
- ## Memory Constraints
65
-
66
- ### Why Not Use Local Model?
67
- The `BAAI/bge-large-en-v1.5` model requires:
68
- - **Model size**: ~400MB
69
- - **Runtime overhead**: ~100MB
70
- - **Total**: ~500MB+ (exceeds Render free tier limit)
71
-
72
- ### Render Free Tier Limits
73
- - **RAM**: 512MB max
74
- - **Solution**: Use Azure OpenAI API (no local model loading)
75
-
76
- ### If You Have Paid Hosting (1GB+ RAM)
77
- You can use the local model by editing `app/main.py`:
78
-
79
- ```python
80
- from sentence_transformers import SentenceTransformer
81
-
82
- # Initialize at startup
83
- embedding_model = SentenceTransformer("BAAI/bge-large-en-v1.5")
84
-
85
- def get_embedding(text: str) -> List[float]:
86
- return embedding_model.encode(text, show_progress_bar=False).tolist()
87
- ```
88
-
89
- **Benefits**:
90
- - No Azure API calls for embeddings
91
- - Exact same model as ingestion
92
- - Lower latency
93
-
94
- **Tradeoffs**:
95
- - Requires 1GB+ RAM
96
- - Slower startup time (~10 seconds)
97
-
98
- ---
99
-
100
- ## Verification
101
-
102
- ### Test Embedding Endpoint
103
- ```bash
104
- # Check if deployment exists
105
- curl -X POST "https://YOUR-RESOURCE.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-08-01-preview" \
106
- -H "Content-Type: application/json" \
107
- -H "api-key: YOUR-API-KEY" \
108
- -d '{"input": "test", "dimensions": 1024}'
109
- ```
110
-
111
- **Expected Response**:
112
- ```json
113
- {
114
- "data": [{"embedding": [0.123, -0.456, ...], "index": 0}],
115
- "model": "text-embedding-3-small",
116
- "usage": {"prompt_tokens": 1, "total_tokens": 1}
117
- }
118
- ```
119
-
120
- ### Test LLM Endpoint After Fix
121
- ```bash
122
- curl -X POST "https://socar-hackathon.onrender.com/llm" \
123
- -H "Content-Type: application/json" \
124
- -d '{"question": "SOCAR haqqında məlumat verin"}'
125
- ```
126
-
127
- **Expected**: No embedding errors, proper answer with sources
128
-
129
- ---
130
-
131
- ## Cost Implications
132
-
133
- ### Azure OpenAI Embeddings Pricing (text-embedding-3-small)
134
- - **Cost**: $0.02 per 1M tokens (~$0.00000002 per query)
135
- - **Typical query**: 10-50 tokens = $0.000001 (negligible)
136
-
137
- ### vs. Local Model
138
- - **Cost**: $0 (but requires paid hosting with more RAM)
139
- - **Hosting cost**: Render 1GB plan = $7/month
140
-
141
- **Recommendation**: Use Azure embeddings on free tier, only switch to local if you already have paid hosting.
142
-
143
- ---
144
-
145
- ## Still Having Issues?
146
-
147
- ### Check Logs
148
- ```bash
149
- # Render
150
- render logs --tail 100
151
-
152
- # Local
153
- # Logs appear in terminal where you ran uvicorn
154
- ```
155
-
156
- **Look for**:
157
- ```
158
- ❌ EMBEDDING ERROR: Deployment 'text-embedding-3-small' not found in Azure OpenAI
159
- ```
160
-
161
- ### Common Issues
162
- 1. **Wrong deployment name**: Check `AZURE_EMBEDDING_MODEL` env var
163
- 2. **Deployment still creating**: Wait 1-2 minutes after creating
164
- 3. **Wrong API version**: Use `2024-08-01-preview` or later
165
- 4. **Dimensions mismatch**: MUST be 1024 (Pinecone index requirement)
166
-
167
- ---
168
-
169
- ## Environment Variables Reference
170
-
171
- ```bash
172
- # Required for LLM/OCR
173
- AZURE_OPENAI_API_KEY=<your-key>
174
- AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com/
175
- AZURE_OPENAI_API_VERSION=2024-08-01-preview
176
-
177
- # Required for embeddings (NEW)
178
- AZURE_EMBEDDING_MODEL=text-embedding-3-small # Your deployment name
179
- AZURE_EMBEDDING_DIMS=1024 # Must match Pinecone
180
-
181
- # Required for vector DB
182
- PINECONE_API_KEY=<your-key>
183
- PINECONE_INDEX_NAME=hackathon
184
- ```
185
-
186
- ---
187
-
188
- **Last Updated**: December 14, 2025
189
- **Status**: ✅ Fixed in app/main.py with better error messages
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app/main.py CHANGED
@@ -138,12 +138,26 @@ def get_embedding(text: str) -> List[float]:
138
  embedding_dims = int(os.getenv("AZURE_EMBEDDING_DIMS", "1024"))
139
 
140
  try:
 
 
141
  response = embedding_client.embeddings.create(
142
  input=text,
143
- model=embedding_model,
144
- dimensions=embedding_dims
145
  )
146
- return response.data[0].embedding
 
 
 
 
 
 
 
 
 
 
 
 
 
147
  except Exception as e:
148
  error_msg = str(e)
149
 
@@ -190,12 +204,12 @@ class AnswerResponse(BaseModel):
190
  response_time: float
191
 
192
 
193
- def retrieve_documents(query: str, top_k: int = 3) -> List[Dict]:
194
  """
195
  Retrieve relevant documents from Pinecone vector database.
196
- Best strategy from benchmark: vanilla top-3
197
 
198
- Uses Azure OpenAI embeddings (1024-dim) for memory efficiency on Render free tier.
199
  """
200
  index = get_pinecone_index()
201
 
@@ -407,8 +421,8 @@ async def llm_endpoint(request: Request):
407
  response_time=0.0
408
  )
409
 
410
- # Retrieve relevant documents
411
- documents = retrieve_documents(query, top_k=3)
412
 
413
  # Generate answer
414
  answer, response_time = generate_answer(
 
138
  embedding_dims = int(os.getenv("AZURE_EMBEDDING_DIMS", "1024"))
139
 
140
  try:
141
+ # Azure OpenAI doesn't support 'dimensions' parameter (different from OpenAI API)
142
+ # Just create embedding and handle dimension mismatch afterward
143
  response = embedding_client.embeddings.create(
144
  input=text,
145
+ model=embedding_model
 
146
  )
147
+
148
+ embedding = response.data[0].embedding
149
+
150
+ # text-embedding-3-small returns 1536 dims by default, but Pinecone expects 1024
151
+ # Truncate or pad to match expected dimensions
152
+ if len(embedding) != embedding_dims:
153
+ if len(embedding) < embedding_dims:
154
+ # Pad with zeros
155
+ embedding = embedding + [0.0] * (embedding_dims - len(embedding))
156
+ else:
157
+ # Truncate to required dimensions
158
+ embedding = embedding[:embedding_dims]
159
+
160
+ return embedding
161
  except Exception as e:
162
  error_msg = str(e)
163
 
 
204
  response_time: float
205
 
206
 
207
+ def retrieve_documents(query: str, top_k: int = 10) -> List[Dict]:
208
  """
209
  Retrieve relevant documents from Pinecone vector database.
210
+ Increased to top-10 due to dimension truncation (1536→1024) affecting similarity scores.
211
 
212
+ Uses Azure OpenAI embeddings (truncated to 1024-dim for Pinecone compatibility).
213
  """
214
  index = get_pinecone_index()
215
 
 
421
  response_time=0.0
422
  )
423
 
424
+ # Retrieve relevant documents (increased to 10 due to dimension truncation issues)
425
+ documents = retrieve_documents(query, top_k=10)
426
 
427
  # Generate answer
428
  answer, response_time = generate_answer(