IsmatS
/

SOCAR_Hackathon

@@ -3,6 +3,12 @@ AZURE_OPENAI_API_KEY=your_azure_openai_api_key_here
 AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
 AZURE_OPENAI_API_VERSION=2024-08-01-preview
 # Azure Document Intelligence (using same credentials as OpenAI for hackathon)
 AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-resource.services.ai.azure.com/
 AZURE_DOCUMENT_INTELLIGENCE_KEY=your_document_intelligence_key_here

 AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
 AZURE_OPENAI_API_VERSION=2024-08-01-preview
+# Azure OpenAI Embedding Configuration (for /llm endpoint)
+# IMPORTANT: Deploy text-embedding-3-small in Azure OpenAI Studio first!
+# See DEPLOYMENT_TROUBLESHOOTING.md for step-by-step guide
+AZURE_EMBEDDING_MODEL=text-embedding-3-small
+AZURE_EMBEDDING_DIMS=1024
 # Azure Document Intelligence (using same credentials as OpenAI for hackathon)
 AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT=https://your-resource.services.ai.azure.com/
 AZURE_DOCUMENT_INTELLIGENCE_KEY=your_document_intelligence_key_here

DEPLOYMENT_TROUBLESHOOTING.md ADDED Viewed

	@@ -0,0 +1,189 @@

+# Deployment Troubleshooting Guide
+## Error: "DeploymentNotFound" - Embedding Model
+### Problem
+```
+Embedding error: Error code: 404 - {'error': {'code': 'DeploymentNotFound', 'message': 'The API deployment for this resource does not exist...'}}
+```
+### Root Cause
+The application is deployed on **Render free tier (512MB RAM)**, which is too small to load the local `BAAI/bge-large-en-v1.5` embedding model (~400MB). To work around this, the app uses **Azure OpenAI embeddings** instead, but the required deployment doesn't exist yet.
+---
+## Solution: Deploy Embedding Model in Azure OpenAI
+### Step 1: Go to Azure OpenAI Studio
+1. Navigate to: https://oai.azure.com/portal
+2. Select your Azure OpenAI resource
+### Step 2: Create Embedding Deployment
+1. Click **Deployments** in the left sidebar
+2. Click **+ Create new deployment**
+3. Fill in the form:
+   - **Model**: `text-embedding-3-small`
+   - **Deployment name**: `text-embedding-3-small`
+   - **Model version**: Latest available
+   - **Dimensions**: `1024` (⚠️ IMPORTANT - must match Pinecone index)
+   - **Tokens per Minute Rate Limit**: Set as desired (e.g., 350K)
+4. Click **Create**
+5. Wait ~1 minute for deployment
+### Step 3: Verify Deployment
+1. In Deployments tab, confirm you see:
+   - `Llama-4-Maverick-17B-128E-Instruct-FP8` (for LLM/OCR)
+   - `text-embedding-3-small` (for embeddings) ✅ NEW
+### Step 4: Restart Your Application
+- **Render**: Will auto-restart on next request
+- **Local**: Restart uvicorn server
+---
+## Alternative Solution: Use Existing Deployment
+If you already have a different embedding model deployed in Azure, you can use it instead:
+### Option A: Set Environment Variable
+Add to your `.env` or Render environment variables:
+```bash
+AZURE_EMBEDDING_MODEL=your-existing-embedding-deployment-name
+AZURE_EMBEDDING_DIMS=1024  # Must be 1024 to match Pinecone
+```
+### Option B: Supported Models
+Any of these work (with `dimensions=1024`):
+- `text-embedding-3-small` (recommended - cheapest, fastest)
+- `text-embedding-3-large`
+- `text-embedding-ada-002` (legacy, no dimensions parameter - won't work)
+---
+## Memory Constraints
+### Why Not Use Local Model?
+The `BAAI/bge-large-en-v1.5` model requires:
+- **Model size**: ~400MB
+- **Runtime overhead**: ~100MB
+- **Total**: ~500MB+ (exceeds Render free tier limit)
+### Render Free Tier Limits
+- **RAM**: 512MB max
+- **Solution**: Use Azure OpenAI API (no local model loading)
+### If You Have Paid Hosting (1GB+ RAM)
+You can use the local model by editing `app/main.py`:
+```python
+from sentence_transformers import SentenceTransformer
+# Initialize at startup
+embedding_model = SentenceTransformer("BAAI/bge-large-en-v1.5")
+def get_embedding(text: str) -> List[float]:
+    return embedding_model.encode(text, show_progress_bar=False).tolist()
+```
+**Benefits**:
+- No Azure API calls for embeddings
+- Exact same model as ingestion
+- Lower latency
+**Tradeoffs**:
+- Requires 1GB+ RAM
+- Slower startup time (~10 seconds)
+---
+## Verification
+### Test Embedding Endpoint
+```bash
+# Check if deployment exists
+curl -X POST "https://YOUR-RESOURCE.openai.azure.com/openai/deployments/text-embedding-3-small/embeddings?api-version=2024-08-01-preview" \
+  -H "Content-Type: application/json" \
+  -H "api-key: YOUR-API-KEY" \
+  -d '{"input": "test", "dimensions": 1024}'
+```
+**Expected Response**:
+```json
+{
+  "data": [{"embedding": [0.123, -0.456, ...], "index": 0}],
+  "model": "text-embedding-3-small",
+  "usage": {"prompt_tokens": 1, "total_tokens": 1}
+}
+```
+### Test LLM Endpoint After Fix
+```bash
+curl -X POST "https://socar-hackathon.onrender.com/llm" \
+  -H "Content-Type: application/json" \
+  -d '{"question": "SOCAR haqqında məlumat verin"}'
+```
+**Expected**: No embedding errors, proper answer with sources
+---
+## Cost Implications
+### Azure OpenAI Embeddings Pricing (text-embedding-3-small)
+- **Cost**: $0.02 per 1M tokens (~$0.00000002 per query)
+- **Typical query**: 10-50 tokens = $0.000001 (negligible)
+### vs. Local Model
+- **Cost**: $0 (but requires paid hosting with more RAM)
+- **Hosting cost**: Render 1GB plan = $7/month
+**Recommendation**: Use Azure embeddings on free tier, only switch to local if you already have paid hosting.
+---
+## Still Having Issues?
+### Check Logs
+```bash
+# Render
+render logs --tail 100
+# Local
+# Logs appear in terminal where you ran uvicorn
+```
+**Look for**:
+```
+❌ EMBEDDING ERROR: Deployment 'text-embedding-3-small' not found in Azure OpenAI
+```
+### Common Issues
+1. **Wrong deployment name**: Check `AZURE_EMBEDDING_MODEL` env var
+2. **Deployment still creating**: Wait 1-2 minutes after creating
+3. **Wrong API version**: Use `2024-08-01-preview` or later
+4. **Dimensions mismatch**: MUST be 1024 (Pinecone index requirement)
+---
+## Environment Variables Reference
+```bash
+# Required for LLM/OCR
+AZURE_OPENAI_API_KEY=<your-key>
+AZURE_OPENAI_ENDPOINT=https://<resource>.openai.azure.com/
+AZURE_OPENAI_API_VERSION=2024-08-01-preview
+# Required for embeddings (NEW)
+AZURE_EMBEDDING_MODEL=text-embedding-3-small  # Your deployment name
+AZURE_EMBEDDING_DIMS=1024  # Must match Pinecone
+# Required for vector DB
+PINECONE_API_KEY=<your-key>
+PINECONE_INDEX_NAME=hackathon
+```
+---
+**Last Updated**: December 14, 2025
+**Status**: ✅ Fixed in app/main.py with better error messages

app/main.py CHANGED Viewed

@@ -112,13 +112,18 @@ def get_pinecone_index():
 def get_embedding(text: str) -> List[float]:
     """
-    Get embedding using Azure OpenAI API instead of local model.
-    This saves ~400MB memory by not loading SentenceTransformer locally.
-    Uses text-embedding-3-small with dimensions=1024 to match existing Pinecone index
-    (which was created with BAAI/bge-large-en-v1.5 at 1024 dimensions).
     """
     client = get_azure_client()
     embedding_model = os.getenv("AZURE_EMBEDDING_MODEL", "text-embedding-3-small")
     embedding_dims = int(os.getenv("AZURE_EMBEDDING_DIMS", "1024"))
@@ -126,12 +131,29 @@ def get_embedding(text: str) -> List[float]:
         response = client.embeddings.create(
             input=text,
             model=embedding_model,
-            dimensions=embedding_dims  # Match Pinecone index dimensions
         )
         return response.data[0].embedding
     except Exception as e:
-        # Fallback: return zero vector if embedding fails
-        print(f"Embedding error: {e}")
         return [0.0] * embedding_dims
@@ -170,10 +192,12 @@ def retrieve_documents(query: str, top_k: int = 3) -> List[Dict]:
     """
     Retrieve relevant documents from Pinecone vector database.
     Best strategy from benchmark: vanilla top-3
     """
     index = get_pinecone_index()
-    # Generate query embedding using Azure OpenAI (memory efficient)
     query_embedding = get_embedding(query)
     # Search vector database
@@ -287,13 +311,13 @@ async def llm_endpoint(request: Request):
     LLM chatbot endpoint for SOCAR historical documents.
     Uses RAG (Retrieval Augmented Generation) with:
-    - Embedding: BAAI/bge-large-en-v1.5
-    - Retrieval: Top-3 documents
     - LLM: Llama-4-Maverick-17B (open-source)
     - Prompt: Citation-focused
     Expected performance:
-    - Response time: ~3.6s
     - LLM Judge Score: 55.67%
     - Citation Score: 73.33%

 def get_embedding(text: str) -> List[float]:
     """
+    Generate embedding for semantic search.
+    IMPORTANT: You need to deploy an embedding model in Azure OpenAI Studio:
+    1. Go to Azure OpenAI Studio → Deployments
+    2. Create deployment: text-embedding-3-small
+    3. Set dimensions to 1024 to match Pinecone index
+    Alternative: Set AZURE_EMBEDDING_MODEL env var to your deployment name
     """
     client = get_azure_client()
+    # Get embedding model from env or use default
     embedding_model = os.getenv("AZURE_EMBEDDING_MODEL", "text-embedding-3-small")
     embedding_dims = int(os.getenv("AZURE_EMBEDDING_DIMS", "1024"))
         response = client.embeddings.create(
             input=text,
             model=embedding_model,
+            dimensions=embedding_dims
         )
         return response.data[0].embedding
     except Exception as e:
+        error_msg = str(e)
+        # Provide helpful error message
+        if "DeploymentNotFound" in error_msg or "404" in error_msg:
+            print(f"❌ EMBEDDING ERROR: Deployment '{embedding_model}' not found in Azure OpenAI")
+            print(f"")
+            print(f"📋 FIX THIS BY DEPLOYING THE MODEL:")
+            print(f"   1. Go to: https://oai.azure.com/portal")
+            print(f"   2. Navigate to: Deployments → Create new deployment")
+            print(f"   3. Model: text-embedding-3-small")
+            print(f"   4. Deployment name: text-embedding-3-small")
+            print(f"   5. Set: dimensions=1024")
+            print(f"")
+            print(f"   OR set environment variable:")
+            print(f"   AZURE_EMBEDDING_MODEL=<your-existing-embedding-deployment>")
+        else:
+            print(f"Embedding error: {e}")
+        # Return zero vector (will not match documents, but API won't crash)
         return [0.0] * embedding_dims
     """
     Retrieve relevant documents from Pinecone vector database.
     Best strategy from benchmark: vanilla top-3
+    Uses Azure OpenAI embeddings (1024-dim) for memory efficiency on Render free tier.
     """
     index = get_pinecone_index()
+    # Generate query embedding
     query_embedding = get_embedding(query)
     # Search vector database
     LLM chatbot endpoint for SOCAR historical documents.
     Uses RAG (Retrieval Augmented Generation) with:
+    - Embedding: Azure OpenAI text-embedding-3-small @ 1024-dim
+    - Retrieval: Top-3 documents (Pinecone)
     - LLM: Llama-4-Maverick-17B (open-source)
     - Prompt: Citation-focused
     Expected performance:
+    - Response time: ~4.0s
     - LLM Judge Score: 55.67%
     - Citation Score: 73.33%