Spaces:

sanilahmed2019
/

backend-deploy

Sleeping

App Files Files Community

sanilahmed2019 commited on Jan 2

Commit

ce4595c

1 Parent(s): 2ade705

Update backend logic

Browse files

Files changed (21) hide show

.env +2 -2
.env.example +2 -2
README.md +57 -32
backend.log +230 -81
book_ingestor.egg-info/PKG-INFO +49 -24
check_qdrant.py +59 -0
rag_agent_api/README.md +9 -9
rag_agent_api/__init__.py +2 -2
rag_agent_api/__pycache__/__init__.cpython-313.pyc +0 -0
rag_agent_api/__pycache__/agent.cpython-313.pyc +0 -0
rag_agent_api/__pycache__/config.cpython-313.pyc +0 -0
rag_agent_api/__pycache__/main.cpython-313.pyc +0 -0
rag_agent_api/__pycache__/openrouter_agent.cpython-313.pyc +0 -0
rag_agent_api/__pycache__/retrieval.cpython-313.pyc +0 -0
rag_agent_api/agent.py +363 -0
rag_agent_api/config.py +0 -1
rag_agent_api/main.py +6 -11
rag_agent_api/retrieval.py +126 -35
requirements.txt +9 -11
test_retrieval.py +60 -0
tests/test_integration.py +18 -21

.env CHANGED Viewed

@@ -4,7 +4,7 @@ QDRANT_API_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3MiOiJtIn0.BDBAtGf7
 REACT_APP_RAG_API_URL=http://localhost:8000
 # RAG Agent and API Layer Environment Variables
-# OpenAI API Configuration
 OPENROUTER_API_KEY=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
 # Qdrant Configuration
@@ -13,7 +13,7 @@ QDRANT_API_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3MiOiJtIn0.BDBAtGf7
 QDRANT_COLLECTION_NAME=rag_embedding
 # Cohere Configuration (for query embeddings)
-COHERE_API_KEY=Dq2dLJlwDOZwAg4K7XalSEC91kXnucGd52KmkJh7
 # Application Configuration
 DEFAULT_CONTEXT_WINDOW=5

 REACT_APP_RAG_API_URL=http://localhost:8000
 # RAG Agent and API Layer Environment Variables
+# OpenRouter API Configuration
 OPENROUTER_API_KEY=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
 # Qdrant Configuration
 QDRANT_COLLECTION_NAME=rag_embedding
 # Cohere Configuration (for query embeddings)
+COHERE_API_KEY=RGfPBR6t5Ev2VXgIA00o5XcHiuXYkyCVL8TjkSZs
 # Application Configuration
 DEFAULT_CONTEXT_WINDOW=5

.env.example CHANGED Viewed

@@ -1,14 +1,14 @@
 # RAG Agent and API Layer Environment Variables
 # OpenRouter API Configuration
-OPENROUTER_API_KEY=your-openrouter-api-key-here
 # Qdrant Configuration
 QDRANT_URL=https://72888a6e-0dfc-4620-bf85-0b9025951e0c.us-east4-0.gcp.cloud.qdrant.io:6333
 QDRANT_API_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3MiOiJtIn0.BDBAtGf7x_XGCu3lO4-kNxgJeVgnSTKUjHeZBT6qJkQ
 QDRANT_COLLECTION_NAME=rag_embedding
 REACT_APP_RAG_API_URL=http://localhost:8000
 # Cohere Configuration (for query embeddings)
-COHERE_API_KEY=Dq2dLJlwDOZwAg4K7XalSEC91kXnucGd52KmkJh7
 # Application Configuration
 DEFAULT_CONTEXT_WINDOW=5

 # RAG Agent and API Layer Environment Variables
 # OpenRouter API Configuration
+OPENROUTER_API_KEY=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
 # Qdrant Configuration
 QDRANT_URL=https://72888a6e-0dfc-4620-bf85-0b9025951e0c.us-east4-0.gcp.cloud.qdrant.io:6333
 QDRANT_API_KEY=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3MiOiJtIn0.BDBAtGf7x_XGCu3lO4-kNxgJeVgnSTKUjHeZBT6qJkQ
 QDRANT_COLLECTION_NAME=rag_embedding
 REACT_APP_RAG_API_URL=http://localhost:8000
 # Cohere Configuration (for query embeddings)
+COHERE_API_KEY=RGfPBR6t5Ev2VXgIA00o5XcHiuXYkyCVL8TjkSZs
 # Application Configuration
 DEFAULT_CONTEXT_WINDOW=5

README.md CHANGED Viewed

@@ -1,32 +1,57 @@
----
-title: Backend Deploy
-emoji: 🚀
-colorFrom: blue
-colorTo: purple
-sdk: docker
-pinned: false
----
-# RAG Agent and API Layer
-This is a FastAPI application that provides a question-answering API using Gemini agents and Qdrant retrieval for RAG (Retrieval Augmented Generation) functionality.
-## API Endpoints
-- `GET /` - Root endpoint with API information
-- `POST /ask` - Main question-answering endpoint
-- `GET /health` - Health check endpoint
-- `GET /ready` - Readiness check endpoint
-- `/docs` - API documentation (Swagger UI)
-- `/redoc` - API documentation (Redoc)
-## Configuration
-The application requires the following environment variables:
-- `GEMINI_API_KEY` - API key for Google Gemini
-- `QDRANT_URL` - URL for Qdrant vector database
-- `QDRANT_API_KEY` - API key for Qdrant database
-## Deployment
-This application is configured for deployment on Hugging Face Spaces using Docker.

+# Book Content Ingestor & RAG Verification
+A system to extract content from Docusaurus-based book websites, chunk and embed it using Cohere, store embeddings in Qdrant Cloud for RAG applications, and verify the retrieval pipeline functionality.
+## Setup
+1. Install dependencies using uv:
+```bash
+cd backend
+uv sync
+```
+2. Create a `.env` file with your API keys:
+```bash
+cp .env.example .env
+# Edit .env with your actual API keys
+```
+## Environment Variables
+- `COHERE_API_KEY`: Your Cohere API key
+- `QDRANT_URL`: Your Qdrant Cloud URL
+- `QDRANT_API_KEY`: Your Qdrant API key
+- `QDRANT_COLLECTION_NAME`: Name of the collection to use (default: "rag_embedding")
+## Usage
+### Run the ingestion pipeline:
+```bash
+cd backend
+uv run python main.py
+```
+This will:
+1. Collect all URLs from the target book (https://sanilahmed.github.io/hackathon-ai-book/)
+2. Extract text content from each URL
+3. Chunk the content into fixed-size segments
+4. Generate embeddings using Cohere
+5. Store embeddings with metadata in Qdrant Cloud collection named "rag_embedding"
+### Run the verification pipeline:
+```bash
+cd backend
+python -m verify_retrieval.main
+```
+Or with specific options:
+```bash
+python -m verify_retrieval.main --query "transformer architecture in NLP" --top-k 10
+```
+The verification system will:
+1. Load vectors and metadata stored in Qdrant from the original ingestion
+2. Implement retrieval functions to query Qdrant using sample keywords or phrases
+3. Validate that retrieved chunks are accurate and relevant
+4. Check that metadata (URL, title, chunk_id) matches source content
+5. Log results and confirm the pipeline executes end-to-end without errors

backend.log CHANGED Viewed

@@ -1,14 +1,14 @@
-2025-12-28 03:28:11,862 - root - INFO - OpenRouter agent initialized with model: arcee-ai/trinity-mini:free
-2025-12-28 03:28:11,862 - root - INFO - OpenRouter agent initialized successfully
-2025-12-28 03:28:12,881 - httpx - INFO - HTTP Request: GET https://72888a6e-0dfc-4620-bf85-0b9025951e0c.us-east4-0.gcp.cloud.qdrant.io:6333 "HTTP/1.1 200 OK"
-2025-12-28 03:28:12,935 - root - INFO - Initialized Qdrant retriever for collection: rag_embedding
-2025-12-28 03:28:12,935 - root - INFO - Qdrant retriever initialized successfully
-2025-12-28 03:28:12,935 - root - INFO - Application startup completed
-2025-12-28 03:33:25,861 - root - INFO - Processing query: what about this book?...
-2025-12-28 03:33:25,866 - root - INFO - Step 1: Retrieving relevant content from Qdrant...
-2025-12-28 03:33:25,872 - root - INFO - Retrieving context for query: 'what about this book?' from collection: rag_embedding
-2025-12-28 03:33:27,515 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 429 Too Many Requests"
-2025-12-28 03:33:27,529 - root - ERROR - Error embedding query with Cohere: headers: {'access-control-expose-headers': 'X-Debug-Trace-ID', 'cache-control': 'no-cache, no-store, no-transform, must-revalidate, private, max-age=0', 'content-encoding': 'gzip', 'content-type': 'application/json', 'expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'pragma': 'no-cache', 'vary': 'Origin,Accept-Encoding', 'x-accel-expires': '0', 'x-debug-trace-id': 'd089f3fe358a80aeb61a8713a62bb51e', 'x-trial-endpoint-call-limit': '100', 'x-trial-endpoint-call-remaining': '99', 'date': 'Sat, 27 Dec 2025 22:33:27 GMT', 'x-envoy-upstream-service-time': '22', 'server': 'envoy', 'via': '1.1 google', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'transfer-encoding': 'chunked'}, status_code: 429, body: {'id': 'd2182ca7-5051-4c05-b9b7-a79e9dbe1312', 'message': "You are using a Trial key, which is limited to 1000 API calls / month. You can continue to use the Trial key for free or upgrade to a Production key with higher rate limits at 'https://dashboard.cohere.com/api-keys'. Contact us on 'https://discord.gg/XW44jPfYJu' or email us at support@cohere.com with any questions"}
 Traceback (most recent call last):
   File "/mnt/d/Hackathon/book/backend/rag_agent_api/retrieval.py", line 132, in _embed_query
     response = await self.cohere_client.embed(
@@ -32,20 +32,20 @@ Traceback (most recent call last):
     raise TooManyRequestsError(
     ...<8 lines>...
     )
-cohere.errors.too_many_requests_error.TooManyRequestsError: headers: {'access-control-expose-headers': 'X-Debug-Trace-ID', 'cache-control': 'no-cache, no-store, no-transform, must-revalidate, private, max-age=0', 'content-encoding': 'gzip', 'content-type': 'application/json', 'expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'pragma': 'no-cache', 'vary': 'Origin,Accept-Encoding', 'x-accel-expires': '0', 'x-debug-trace-id': 'd089f3fe358a80aeb61a8713a62bb51e', 'x-trial-endpoint-call-limit': '100', 'x-trial-endpoint-call-remaining': '99', 'date': 'Sat, 27 Dec 2025 22:33:27 GMT', 'x-envoy-upstream-service-time': '22', 'server': 'envoy', 'via': '1.1 google', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'transfer-encoding': 'chunked'}, status_code: 429, body: {'id': 'd2182ca7-5051-4c05-b9b7-a79e9dbe1312', 'message': "You are using a Trial key, which is limited to 1000 API calls / month. You can continue to use the Trial key for free or upgrade to a Production key with higher rate limits at 'https://dashboard.cohere.com/api-keys'. Contact us on 'https://discord.gg/XW44jPfYJu' or email us at support@cohere.com with any questions"}
-2025-12-28 03:33:29,957 - root - WARNING - Using zero vector as final fallback for query embedding
-2025-12-28 03:33:32,465 - httpx - INFO - HTTP Request: POST https://72888a6e-0dfc-4620-bf85-0b9025951e0c.us-east4-0.gcp.cloud.qdrant.io:6333/collections/rag_embedding/points/query "HTTP/1.1 200 OK"
-2025-12-28 03:33:32,482 - root - INFO - Retrieved 5 valid chunks from Qdrant
-2025-12-28 03:33:32,482 - root - INFO - Retrieved 5 chunks from Qdrant
-2025-12-28 03:33:32,482 - root - INFO - Step 2: Generating response with OpenAI agent...
-2025-12-28 03:33:40,381 - httpx - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
-2025-12-28 03:33:41,893 - root - INFO - Step 3: Formatting response...
-2025-12-28 03:33:41,893 - root - INFO - Query processed successfully, response ID: resp_d77ed446
-2025-12-28 03:48:54,357 - root - INFO - Processing query: What is this book about?...
-2025-12-28 03:48:54,360 - root - INFO - Step 1: Retrieving relevant content from Qdrant...
-2025-12-28 03:48:54,363 - root - INFO - Retrieving context for query: 'What is this book about?' from collection: rag_embedding
-2025-12-28 03:48:55,736 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 429 Too Many Requests"
-2025-12-28 03:48:55,750 - root - ERROR - Error embedding query with Cohere: headers: {'access-control-expose-headers': 'X-Debug-Trace-ID', 'cache-control': 'no-cache, no-store, no-transform, must-revalidate, private, max-age=0', 'content-encoding': 'gzip', 'content-type': 'application/json', 'expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'pragma': 'no-cache', 'vary': 'Origin,Accept-Encoding', 'x-accel-expires': '0', 'x-debug-trace-id': 'b1f3f38920e419721e629c6abc56371b', 'x-trial-endpoint-call-limit': '100', 'x-trial-endpoint-call-remaining': '99', 'date': 'Sat, 27 Dec 2025 22:48:55 GMT', 'x-envoy-upstream-service-time': '15', 'server': 'envoy', 'via': '1.1 google', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'transfer-encoding': 'chunked'}, status_code: 429, body: {'id': '41a0eefd-af53-4f33-b084-34c94d377f38', 'message': "You are using a Trial key, which is limited to 1000 API calls / month. You can continue to use the Trial key for free or upgrade to a Production key with higher rate limits at 'https://dashboard.cohere.com/api-keys'. Contact us on 'https://discord.gg/XW44jPfYJu' or email us at support@cohere.com with any questions"}
 Traceback (most recent call last):
   File "/mnt/d/Hackathon/book/backend/rag_agent_api/retrieval.py", line 132, in _embed_query
     response = await self.cohere_client.embed(
@@ -69,21 +69,61 @@ Traceback (most recent call last):
     raise TooManyRequestsError(
     ...<8 lines>...
     )
-cohere.errors.too_many_requests_error.TooManyRequestsError: headers: {'access-control-expose-headers': 'X-Debug-Trace-ID', 'cache-control': 'no-cache, no-store, no-transform, must-revalidate, private, max-age=0', 'content-encoding': 'gzip', 'content-type': 'application/json', 'expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'pragma': 'no-cache', 'vary': 'Origin,Accept-Encoding', 'x-accel-expires': '0', 'x-debug-trace-id': 'b1f3f38920e419721e629c6abc56371b', 'x-trial-endpoint-call-limit': '100', 'x-trial-endpoint-call-remaining': '99', 'date': 'Sat, 27 Dec 2025 22:48:55 GMT', 'x-envoy-upstream-service-time': '15', 'server': 'envoy', 'via': '1.1 google', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'transfer-encoding': 'chunked'}, status_code: 429, body: {'id': '41a0eefd-af53-4f33-b084-34c94d377f38', 'message': "You are using a Trial key, which is limited to 1000 API calls / month. You can continue to use the Trial key for free or upgrade to a Production key with higher rate limits at 'https://dashboard.cohere.com/api-keys'. Contact us on 'https://discord.gg/XW44jPfYJu' or email us at support@cohere.com with any questions"}
-2025-12-28 03:48:55,790 - root - WARNING - Using zero vector as final fallback for query embedding
-2025-12-28 03:48:56,887 - httpx - INFO - HTTP Request: POST https://72888a6e-0dfc-4620-bf85-0b9025951e0c.us-east4-0.gcp.cloud.qdrant.io:6333/collections/rag_embedding/points/query "HTTP/1.1 200 OK"
-2025-12-28 03:48:56,897 - root - INFO - Retrieved 5 valid chunks from Qdrant
-2025-12-28 03:48:56,897 - root - INFO - Retrieved 5 chunks from Qdrant
-2025-12-28 03:48:56,897 - root - INFO - Step 2: Generating response with OpenAI agent...
-2025-12-28 03:49:00,669 - httpx - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
-2025-12-28 03:49:02,263 - root - INFO - Agent response generated successfully. Confidence: 0.30
-2025-12-28 03:49:02,265 - root - INFO - Step 3: Formatting response...
-2025-12-28 03:49:02,269 - root - INFO - Query processed successfully, response ID: resp_523ca795
-2025-12-28 03:51:03,381 - root - INFO - Processing query: What is this book about?...
-2025-12-28 03:51:03,381 - root - INFO - Step 1: Retrieving relevant content from Qdrant...
-2025-12-28 03:51:03,382 - root - INFO - Retrieving context for query: 'What is this book about?' from collection: rag_embedding
-2025-12-28 03:51:03,863 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 429 Too Many Requests"
-2025-12-28 03:51:03,868 - root - ERROR - Error embedding query with Cohere: headers: {'access-control-expose-headers': 'X-Debug-Trace-ID', 'cache-control': 'no-cache, no-store, no-transform, must-revalidate, private, max-age=0', 'content-encoding': 'gzip', 'content-type': 'application/json', 'expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'pragma': 'no-cache', 'vary': 'Origin,Accept-Encoding', 'x-accel-expires': '0', 'x-debug-trace-id': '19432b16b53a7488ff206de4686f4925', 'x-trial-endpoint-call-limit': '100', 'x-trial-endpoint-call-remaining': '99', 'date': 'Sat, 27 Dec 2025 22:51:03 GMT', 'x-envoy-upstream-service-time': '11', 'server': 'envoy', 'via': '1.1 google', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'transfer-encoding': 'chunked'}, status_code: 429, body: {'id': '6cc43158-05de-4534-8bb1-dfe337b29d9e', 'message': "You are using a Trial key, which is limited to 1000 API calls / month. You can continue to use the Trial key for free or upgrade to a Production key with higher rate limits at 'https://dashboard.cohere.com/api-keys'. Contact us on 'https://discord.gg/XW44jPfYJu' or email us at support@cohere.com with any questions"}
 Traceback (most recent call last):
   File "/mnt/d/Hackathon/book/backend/rag_agent_api/retrieval.py", line 132, in _embed_query
     response = await self.cohere_client.embed(
@@ -103,54 +143,163 @@ Traceback (most recent call last):
     ...<7 lines>...
     )
     ^
-  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/cohere/raw_base_client.py", line 4637, in embed
-    raise TooManyRequestsError(
-    ...<8 lines>...
     )
-cohere.errors.too_many_requests_error.TooManyRequestsError: headers: {'access-control-expose-headers': 'X-Debug-Trace-ID', 'cache-control': 'no-cache, no-store, no-transform, must-revalidate, private, max-age=0', 'content-encoding': 'gzip', 'content-type': 'application/json', 'expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'pragma': 'no-cache', 'vary': 'Origin,Accept-Encoding', 'x-accel-expires': '0', 'x-debug-trace-id': '19432b16b53a7488ff206de4686f4925', 'x-trial-endpoint-call-limit': '100', 'x-trial-endpoint-call-remaining': '99', 'date': 'Sat, 27 Dec 2025 22:51:03 GMT', 'x-envoy-upstream-service-time': '11', 'server': 'envoy', 'via': '1.1 google', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'transfer-encoding': 'chunked'}, status_code: 429, body: {'id': '6cc43158-05de-4534-8bb1-dfe337b29d9e', 'message': "You are using a Trial key, which is limited to 1000 API calls / month. You can continue to use the Trial key for free or upgrade to a Production key with higher rate limits at 'https://dashboard.cohere.com/api-keys'. Contact us on 'https://discord.gg/XW44jPfYJu' or email us at support@cohere.com with any questions"}
-2025-12-28 03:51:03,882 - root - WARNING - Using zero vector as final fallback for query embedding
-2025-12-28 03:51:05,002 - httpx - INFO - HTTP Request: POST https://72888a6e-0dfc-4620-bf85-0b9025951e0c.us-east4-0.gcp.cloud.qdrant.io:6333/collections/rag_embedding/points/query "HTTP/1.1 200 OK"
-2025-12-28 03:51:05,009 - root - INFO - Retrieved 5 valid chunks from Qdrant
-2025-12-28 03:51:05,009 - root - INFO - Retrieved 5 chunks from Qdrant
-2025-12-28 03:51:05,010 - root - INFO - Step 2: Generating response with OpenAI agent...
-2025-12-28 03:51:08,526 - httpx - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
-2025-12-28 03:51:09,866 - root - INFO - Agent response generated successfully. Confidence: 0.30
-2025-12-28 03:51:09,866 - root - INFO - Step 3: Formatting response...
-2025-12-28 03:51:09,869 - root - INFO - Query processed successfully, response ID: resp_b7ce931e
-2026-01-01 15:42:34,257 - root - INFO - Processing query: what about this book??...
-2026-01-01 15:42:34,275 - root - INFO - Step 1: Retrieving relevant content from Qdrant...
-2026-01-01 15:42:34,279 - root - INFO - Retrieving context for query: 'what about this book??' from collection: rag_embedding
-2026-01-01 15:42:36,042 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 429 Too Many Requests"
-2026-01-01 15:42:36,079 - root - ERROR - Error embedding query with Cohere: headers: {'access-control-expose-headers': 'X-Debug-Trace-ID', 'cache-control': 'no-cache, no-store, no-transform, must-revalidate, private, max-age=0', 'content-encoding': 'gzip', 'content-type': 'application/json', 'expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'pragma': 'no-cache', 'vary': 'Origin,Accept-Encoding', 'x-accel-expires': '0', 'x-debug-trace-id': '8021dc50cfd962fdc707cc6726dff6b3', 'date': 'Thu, 01 Jan 2026 10:42:34 GMT', 'x-envoy-upstream-service-time': '27', 'server': 'envoy', 'via': '1.1 google', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'transfer-encoding': 'chunked'}, status_code: 429, body: {'id': '542f6133-103b-4cf2-a9a2-3778972e6290', 'message': 'Please wait and try again later'}
 Traceback (most recent call last):
-  File "/mnt/d/Hackathon/book/backend/rag_agent_api/retrieval.py", line 132, in _embed_query
-    response = await self.cohere_client.embed(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     ...<3 lines>...
     )
     ^
-  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/cohere/client.py", line 402, in embed
-    await asyncio.gather(
-    ^^^^^^^^^^^^^^^^^^^^^
-    ...<12 lines>...
-    ),
     ^
-  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/cohere/base_client.py", line 2598, in embed
-    _response = await self._raw_client.embed(
-                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-    ...<7 lines>...
     )
     ^
-  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/cohere/raw_base_client.py", line 4637, in embed
-    raise TooManyRequestsError(
-    ...<8 lines>...
     )
-cohere.errors.too_many_requests_error.TooManyRequestsError: headers: {'access-control-expose-headers': 'X-Debug-Trace-ID', 'cache-control': 'no-cache, no-store, no-transform, must-revalidate, private, max-age=0', 'content-encoding': 'gzip', 'content-type': 'application/json', 'expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'pragma': 'no-cache', 'vary': 'Origin,Accept-Encoding', 'x-accel-expires': '0', 'x-debug-trace-id': '8021dc50cfd962fdc707cc6726dff6b3', 'date': 'Thu, 01 Jan 2026 10:42:34 GMT', 'x-envoy-upstream-service-time': '27', 'server': 'envoy', 'via': '1.1 google', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'transfer-encoding': 'chunked'}, status_code: 429, body: {'id': '542f6133-103b-4cf2-a9a2-3778972e6290', 'message': 'Please wait and try again later'}
-2026-01-01 15:42:36,199 - root - WARNING - Using zero vector as final fallback for query embedding
-2026-01-01 15:42:37,468 - httpx - INFO - HTTP Request: POST https://72888a6e-0dfc-4620-bf85-0b9025951e0c.us-east4-0.gcp.cloud.qdrant.io:6333/collections/rag_embedding/points/query "HTTP/1.1 200 OK"
-2026-01-01 15:42:37,506 - root - INFO - Retrieved 5 valid chunks from Qdrant
-2026-01-01 15:42:37,506 - root - INFO - Retrieved 5 chunks from Qdrant
-2026-01-01 15:42:37,507 - root - INFO - Step 2: Generating response with OpenAI agent...
-2026-01-01 15:42:40,636 - httpx - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
-2026-01-01 15:42:42,440 - root - INFO - Step 3: Formatting response...
-2026-01-01 15:42:42,443 - root - INFO - Query processed successfully, response ID: resp_159653f4

+2026-01-02 21:51:07,979 - root - INFO - OpenRouter agent initialized with model: arcee-ai/trinity-mini:free
+2026-01-02 21:51:07,980 - root - INFO - OpenRouter agent initialized successfully
+2026-01-02 21:51:09,509 - httpx - INFO - HTTP Request: GET https://72888a6e-0dfc-4620-bf85-0b9025951e0c.us-east4-0.gcp.cloud.qdrant.io:6333 "HTTP/1.1 200 OK"
+2026-01-02 21:51:09,616 - root - INFO - Initialized Qdrant retriever for collection: rag_embedding
+2026-01-02 21:51:09,616 - root - INFO - Qdrant retriever initialized successfully
+2026-01-02 21:51:09,616 - root - INFO - Application startup completed
+2026-01-02 21:56:18,858 - root - INFO - Processing query: what about this book?...
+2026-01-02 21:56:18,858 - root - INFO - Step 1: Retrieving relevant content from Qdrant...
+2026-01-02 21:56:18,858 - root - INFO - Retrieving context for query: 'what about this book?' from collection: rag_embedding
+2026-01-02 21:56:20,085 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 429 Too Many Requests"
+2026-01-02 21:56:20,158 - root - ERROR - Error embedding query with Cohere: headers: {'access-control-expose-headers': 'X-Debug-Trace-ID', 'cache-control': 'no-cache, no-store, no-transform, must-revalidate, private, max-age=0', 'content-encoding': 'gzip', 'content-type': 'application/json', 'expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'pragma': 'no-cache', 'vary': 'Origin,Accept-Encoding', 'x-accel-expires': '0', 'x-debug-trace-id': 'a074d2b0b8f1166420f46cc0e91c3ef8', 'date': 'Fri, 02 Jan 2026 16:56:15 GMT', 'x-envoy-upstream-service-time': '16', 'server': 'envoy', 'via': '1.1 google', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'transfer-encoding': 'chunked'}, status_code: 429, body: {'id': '0d36b9be-f4cc-4559-b824-e673736abec0', 'message': 'Please wait and try again later'}
 Traceback (most recent call last):
   File "/mnt/d/Hackathon/book/backend/rag_agent_api/retrieval.py", line 132, in _embed_query
     response = await self.cohere_client.embed(
     raise TooManyRequestsError(
     ...<8 lines>...
     )
+cohere.errors.too_many_requests_error.TooManyRequestsError: headers: {'access-control-expose-headers': 'X-Debug-Trace-ID', 'cache-control': 'no-cache, no-store, no-transform, must-revalidate, private, max-age=0', 'content-encoding': 'gzip', 'content-type': 'application/json', 'expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'pragma': 'no-cache', 'vary': 'Origin,Accept-Encoding', 'x-accel-expires': '0', 'x-debug-trace-id': 'a074d2b0b8f1166420f46cc0e91c3ef8', 'date': 'Fri, 02 Jan 2026 16:56:15 GMT', 'x-envoy-upstream-service-time': '16', 'server': 'envoy', 'via': '1.1 google', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'transfer-encoding': 'chunked'}, status_code: 429, body: {'id': '0d36b9be-f4cc-4559-b824-e673736abec0', 'message': 'Please wait and try again later'}
+2026-01-02 21:56:21,542 - root - WARNING - Using zero vector as final fallback for query embedding
+2026-01-02 21:56:23,990 - httpx - INFO - HTTP Request: POST https://72888a6e-0dfc-4620-bf85-0b9025951e0c.us-east4-0.gcp.cloud.qdrant.io:6333/collections/rag_embedding/points/query "HTTP/1.1 200 OK"
+2026-01-02 21:56:24,063 - root - INFO - Retrieved 5 valid chunks from Qdrant
+2026-01-02 21:56:24,063 - root - INFO - Retrieved 5 chunks from Qdrant
+2026-01-02 21:56:24,063 - root - INFO - Step 2: Generating response with OpenAI agent...
+2026-01-02 21:56:27,063 - httpx - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
+2026-01-02 21:56:28,191 - root - INFO - Step 3: Formatting response...
+2026-01-02 21:56:28,191 - root - INFO - Query processed successfully, response ID: resp_12b8d406
+2026-01-02 22:18:31,661 - root - INFO - Processing query: what about this book?...
+2026-01-02 22:18:31,672 - root - INFO - Step 1: Retrieving relevant content from Qdrant...
+2026-01-02 22:18:31,679 - root - INFO - Retrieving context for query: 'what about this book?' from collection: rag_embedding
+2026-01-02 22:18:32,663 - httpx - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 429 Too Many Requests"
+2026-01-02 22:18:32,681 - root - ERROR - Error embedding query with Cohere: headers: {'access-control-expose-headers': 'X-Debug-Trace-ID', 'cache-control': 'no-cache, no-store, no-transform, must-revalidate, private, max-age=0', 'content-encoding': 'gzip', 'content-type': 'application/json', 'expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'pragma': 'no-cache', 'vary': 'Origin,Accept-Encoding', 'x-accel-expires': '0', 'x-debug-trace-id': '16258d9e56f535c3a9cda7da3a75bc2d', 'date': 'Fri, 02 Jan 2026 17:18:28 GMT', 'x-envoy-upstream-service-time': '13', 'server': 'envoy', 'via': '1.1 google', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'transfer-encoding': 'chunked'}, status_code: 429, body: {'id': '69b64fea-d70d-43f9-a1d9-9fc56b940914', 'message': 'Please wait and try again later'}
 Traceback (most recent call last):
   File "/mnt/d/Hackathon/book/backend/rag_agent_api/retrieval.py", line 132, in _embed_query
     response = await self.cohere_client.embed(
     raise TooManyRequestsError(
     ...<8 lines>...
     )
+cohere.errors.too_many_requests_error.TooManyRequestsError: headers: {'access-control-expose-headers': 'X-Debug-Trace-ID', 'cache-control': 'no-cache, no-store, no-transform, must-revalidate, private, max-age=0', 'content-encoding': 'gzip', 'content-type': 'application/json', 'expires': 'Thu, 01 Jan 1970 00:00:00 GMT', 'pragma': 'no-cache', 'vary': 'Origin,Accept-Encoding', 'x-accel-expires': '0', 'x-debug-trace-id': '16258d9e56f535c3a9cda7da3a75bc2d', 'date': 'Fri, 02 Jan 2026 17:18:28 GMT', 'x-envoy-upstream-service-time': '13', 'server': 'envoy', 'via': '1.1 google', 'alt-svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000', 'transfer-encoding': 'chunked'}, status_code: 429, body: {'id': '69b64fea-d70d-43f9-a1d9-9fc56b940914', 'message': 'Please wait and try again later'}
+2026-01-02 22:18:32,704 - root - WARNING - Using zero vector as final fallback for query embedding
+2026-01-02 22:18:34,063 - httpx - INFO - HTTP Request: POST https://72888a6e-0dfc-4620-bf85-0b9025951e0c.us-east4-0.gcp.cloud.qdrant.io:6333/collections/rag_embedding/points/query "HTTP/1.1 200 OK"
+2026-01-02 22:18:34,095 - root - INFO - Retrieved 5 valid chunks from Qdrant
+2026-01-02 22:18:34,097 - root - INFO - Retrieved 5 chunks from Qdrant
+2026-01-02 22:18:34,098 - root - INFO - Step 2: Generating response with OpenAI agent...
+2026-01-02 22:18:38,176 - httpx - INFO - HTTP Request: POST https://openrouter.ai/api/v1/chat/completions "HTTP/1.1 200 OK"
+2026-01-02 22:18:40,245 - root - INFO - Agent response generated successfully. Confidence: 0.30
+2026-01-02 22:18:40,245 - root - INFO - Step 3: Formatting response...
+2026-01-02 22:18:40,246 - root - INFO - Query processed successfully, response ID: resp_c32d1dbe
+2026-01-02 22:20:37,532 - root - INFO - Processing query: what about this book?...
+2026-01-02 22:20:37,533 - root - INFO - Step 1: Retrieving relevant content from Qdrant...
+2026-01-02 22:20:37,533 - root - INFO - Retrieving context for query: 'what about this book?' from collection: rag_embedding
+2026-01-02 22:20:47,620 - root - ERROR - Error embedding query with Cohere: [Errno -3] Temporary failure in name resolution
+Traceback (most recent call last):
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
+    yield
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_transports/default.py", line 394, in handle_async_request
+    resp = await self._pool.handle_async_request(req)
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request
+    raise exc from None
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request
+    response = await connection.handle_async_request(
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+        pool_request.request
+        ^^^^^^^^^^^^^^^^^^^^
+    )
+    ^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_async/connection.py", line 101, in handle_async_request
+    raise exc
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_async/connection.py", line 78, in handle_async_request
+    stream = await self._connect(request)
+             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_async/connection.py", line 124, in _connect
+    stream = await self._network_backend.connect_tcp(**kwargs)
+             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_backends/auto.py", line 31, in connect_tcp
+    return await self._backend.connect_tcp(
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    ...<5 lines>...
+    )
+    ^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_backends/anyio.py", line 113, in connect_tcp
+    with map_exceptions(exc_map):
+         ~~~~~~~~~~~~~~^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/contextlib.py", line 162, in __exit__
+    self.gen.throw(value)
+    ~~~~~~~~~~~~~~^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
+    raise to_exc(exc) from exc
+httpcore.ConnectError: [Errno -3] Temporary failure in name resolution
+The above exception was the direct cause of the following exception:
 Traceback (most recent call last):
   File "/mnt/d/Hackathon/book/backend/rag_agent_api/retrieval.py", line 132, in _embed_query
     response = await self.cohere_client.embed(
     ...<7 lines>...
     )
     ^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/cohere/raw_base_client.py", line 4554, in embed
+    _response = await self._client_wrapper.httpx_client.request(
+                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    ...<15 lines>...
+    )
+    ^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/cohere/core/http_client.py", line 412, in request
+    response = await self.httpx_client.request(
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    ...<33 lines>...
+    )
+    ^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_client.py", line 1540, in request
+    return await self.send(request, auth=auth, follow_redirects=follow_redirects)
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_client.py", line 1629, in send
+    response = await self._send_handling_auth(
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    ...<4 lines>...
+    )
+    ^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_client.py", line 1657, in _send_handling_auth
+    response = await self._send_handling_redirects(
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    ...<3 lines>...
     )
+    ^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_client.py", line 1694, in _send_handling_redirects
+    response = await self._send_single_request(request)
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_client.py", line 1730, in _send_single_request
+    response = await transport.handle_async_request(request)
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_transports/default.py", line 393, in handle_async_request
+    with map_httpcore_exceptions():
+         ~~~~~~~~~~~~~~~~~~~~~~~^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/contextlib.py", line 162, in __exit__
+    self.gen.throw(value)
+    ~~~~~~~~~~~~~~^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions
+    raise mapped_exc(message) from exc
+httpx.ConnectError: [Errno -3] Temporary failure in name resolution
+2026-01-02 22:20:48,168 - root - WARNING - Using zero vector as final fallback for query embedding
+2026-01-02 22:20:58,240 - root - ERROR - Error retrieving context from Qdrant: [Errno -3] Temporary failure in name resolution
 Traceback (most recent call last):
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_transports/default.py", line 101, in map_httpcore_exceptions
+    yield
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_transports/default.py", line 394, in handle_async_request
+    resp = await self._pool.handle_async_request(req)
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_async/connection_pool.py", line 256, in handle_async_request
+    raise exc from None
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_async/connection_pool.py", line 236, in handle_async_request
+    response = await connection.handle_async_request(
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+        pool_request.request
+        ^^^^^^^^^^^^^^^^^^^^
+    )
+    ^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_async/connection.py", line 101, in handle_async_request
+    raise exc
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_async/connection.py", line 78, in handle_async_request
+    stream = await self._connect(request)
+             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_async/connection.py", line 124, in _connect
+    stream = await self._network_backend.connect_tcp(**kwargs)
+             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_backends/auto.py", line 31, in connect_tcp
+    return await self._backend.connect_tcp(
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    ...<5 lines>...
+    )
+    ^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_backends/anyio.py", line 113, in connect_tcp
+    with map_exceptions(exc_map):
+         ~~~~~~~~~~~~~~^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/contextlib.py", line 162, in __exit__
+    self.gen.throw(value)
+    ~~~~~~~~~~~~~~^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
+    raise to_exc(exc) from exc
+httpcore.ConnectError: [Errno -3] Temporary failure in name resolution
+The above exception was the direct cause of the following exception:
+Traceback (most recent call last):
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/qdrant_client/http/api_client.py", line 223, in send_inner
+    response = await self._async_client.send(request)
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_client.py", line 1629, in send
+    response = await self._send_handling_auth(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    ...<4 lines>...
+    )
+    ^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_client.py", line 1657, in _send_handling_auth
+    response = await self._send_handling_redirects(
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     ...<3 lines>...
     )
     ^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_client.py", line 1694, in _send_handling_redirects
+    response = await self._send_single_request(request)
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_client.py", line 1730, in _send_single_request
+    response = await transport.handle_async_request(request)
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_transports/default.py", line 393, in handle_async_request
+    with map_httpcore_exceptions():
+         ~~~~~~~~~~~~~~~~~~~~~~~^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/contextlib.py", line 162, in __exit__
+    self.gen.throw(value)
+    ~~~~~~~~~~~~~~^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/httpx/_transports/default.py", line 118, in map_httpcore_exceptions
+    raise mapped_exc(message) from exc
+httpx.ConnectError: [Errno -3] Temporary failure in name resolution
+During handling of the above exception, another exception occurred:
+Traceback (most recent call last):
+  File "/mnt/d/Hackathon/book/backend/rag_agent_api/retrieval.py", line 80, in retrieve_context
+    search_results = await self.client.query_points(
+                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    ...<5 lines>...
+    )
     ^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/qdrant_client/async_qdrant_client.py", line 400, in query_points
+    return await self._client.query_points(
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    ...<16 lines>...
     )
     ^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/qdrant_client/async_qdrant_remote.py", line 461, in query_points
+    query_result = await self.http.search_api.query_points(
+                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    ...<4 lines>...
     )
+    ^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/qdrant_client/http/api/search_api.py", line 560, in query_points
+    return await self._build_for_query_points(
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+    ...<4 lines>...
+    )
+    ^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/qdrant_client/http/api_client.py", line 184, in request
+    return await self.send(request, type_)
+           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/qdrant_client/http/api_client.py", line 201, in send
+    response = await self.middleware(request, self.send_inner)
+               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/qdrant_client/http/api_client.py", line 245, in __call__
+    return await call_next(request)
+           ^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/sobiafatima/miniconda3/lib/python3.13/site-packages/qdrant_client/http/api_client.py", line 225, in send_inner
+    raise ResponseHandlingException(e)
+qdrant_client.http.exceptions.ResponseHandlingException: [Errno -3] Temporary failure in name resolution
+2026-01-02 22:20:58,441 - root - INFO - Retrieved 0 chunks from Qdrant
+2026-01-02 22:20:58,441 - root - INFO - Step 2: Generating response with OpenAI agent...
+2026-01-02 22:20:58,441 - root - INFO - Step 3: Formatting response...
+2026-01-02 22:20:58,441 - root - INFO - Query processed successfully, response ID: resp_ab31a354

book_ingestor.egg-info/PKG-INFO CHANGED Viewed

@@ -14,35 +14,60 @@ Requires-Dist: uvicorn>=0.24.0
 Requires-Dist: openai>=1.0.0
 Requires-Dist: pydantic>=2.0.0
----
-title: Backend Deploy
-emoji: 🚀
-colorFrom: blue
-colorTo: purple
-sdk: docker
-pinned: false
----
-# RAG Agent and API Layer
-This is a FastAPI application that provides a question-answering API using Gemini agents and Qdrant retrieval for RAG (Retrieval Augmented Generation) functionality.
-## API Endpoints
-- `GET /` - Root endpoint with API information
-- `POST /ask` - Main question-answering endpoint
-- `GET /health` - Health check endpoint
-- `GET /ready` - Readiness check endpoint
-- `/docs` - API documentation (Swagger UI)
-- `/redoc` - API documentation (Redoc)
-## Configuration
-The application requires the following environment variables:
-- `GEMINI_API_KEY` - API key for Google Gemini
-- `QDRANT_URL` - URL for Qdrant vector database
-- `QDRANT_API_KEY` - API key for Qdrant database
-## Deployment
-This application is configured for deployment on Hugging Face Spaces using Docker.

 Requires-Dist: openai>=1.0.0
 Requires-Dist: pydantic>=2.0.0
+# Book Content Ingestor & RAG Verification
+A system to extract content from Docusaurus-based book websites, chunk and embed it using Cohere, store embeddings in Qdrant Cloud for RAG applications, and verify the retrieval pipeline functionality.
+## Setup
+1. Install dependencies using uv:
+```bash
+cd backend
+uv sync
+```
+2. Create a `.env` file with your API keys:
+```bash
+cp .env.example .env
+# Edit .env with your actual API keys
+```
+## Environment Variables
+- `COHERE_API_KEY`: Your Cohere API key
+- `QDRANT_URL`: Your Qdrant Cloud URL
+- `QDRANT_API_KEY`: Your Qdrant API key
+- `QDRANT_COLLECTION_NAME`: Name of the collection to use (default: "rag_embedding")
+## Usage
+### Run the ingestion pipeline:
+```bash
+cd backend
+uv run python main.py
+```
+This will:
+1. Collect all URLs from the target book (https://sanilahmed.github.io/hackathon-ai-book/)
+2. Extract text content from each URL
+3. Chunk the content into fixed-size segments
+4. Generate embeddings using Cohere
+5. Store embeddings with metadata in Qdrant Cloud collection named "rag_embedding"
+### Run the verification pipeline:
+```bash
+cd backend
+python -m verify_retrieval.main
+```
+Or with specific options:
+```bash
+python -m verify_retrieval.main --query "transformer architecture in NLP" --top-k 10
+```
+The verification system will:
+1. Load vectors and metadata stored in Qdrant from the original ingestion
+2. Implement retrieval functions to query Qdrant using sample keywords or phrases
+3. Validate that retrieved chunks are accurate and relevant
+4. Check that metadata (URL, title, chunk_id) matches source content
+5. Log results and confirm the pipeline executes end-to-end without errors

check_qdrant.py ADDED Viewed

	@@ -0,0 +1,59 @@

+#!/usr/bin/env python3
+"""
+Script to check if Qdrant collection exists and has data.
+"""
+import os
+from qdrant_client import QdrantClient
+from dotenv import load_dotenv
+# Load environment variables
+load_dotenv()
+# Get environment variables
+qdrant_url = os.getenv('QDRANT_URL')
+qdrant_api_key = os.getenv('QDRANT_API_KEY')
+if not qdrant_url or not qdrant_api_key:
+    print("Error: QDRANT_URL or QDRANT_API_KEY not found in environment variables")
+    exit(1)
+# Initialize Qdrant client
+client = QdrantClient(
+    url=qdrant_url,
+    api_key=qdrant_api_key,
+    timeout=30
+)
+try:
+    # List all collections
+    collections = client.get_collections()
+    print("Available collections:")
+    for collection in collections.collections:
+        # For newer Qdrant versions, get the collection info to get point count
+        collection_info = client.get_collection(collection.name)
+        print(f"  - {collection.name} (points: {collection_info.points_count})")
+    # Check specifically for the rag_embedding collection
+    try:
+        collection_info = client.get_collection("rag_embedding")
+        print(f"\nCollection 'rag_embedding' exists with {collection_info.points_count} points")
+        if collection_info.points_count > 0:
+            # Get a sample point to verify data exists
+            points = client.scroll(
+                collection_name="rag_embedding",
+                limit=1
+            )
+            if len(points[0]) > 0:
+                sample_point = points[0][0]
+                print(f"Sample point ID: {sample_point.id}")
+                print(f"Sample point payload keys: {list(sample_point.payload.keys())}")
+                print(f"Sample text preview: {sample_point.payload.get('text', '')[:100]}...")
+        else:
+            print("Collection 'rag_embedding' exists but is empty")
+    except Exception as e:
+        print(f"\nCollection 'rag_embedding' does not exist: {e}")
+except Exception as e:
+    print(f"Error connecting to Qdrant: {e}")

rag_agent_api/README.md CHANGED Viewed

@@ -1,17 +1,17 @@
 # RAG Agent and API Layer
-A FastAPI-based question-answering system that uses OpenRouter Agents and Qdrant retrieval to generate grounded responses based on book content.
 ## Overview
-The RAG Agent and API Layer provides a question-answering API that retrieves relevant content from Qdrant and uses an OpenRouter agent to generate accurate, source-grounded responses. The system ensures that all answers are based only on the provided context to prevent hallucinations.
 ## Architecture
 The system consists of several key components:
 - **FastAPI Application**: Main entry point for the question-answering API
-- **OpenRouter Agent**: Generates responses based on retrieved context
 - **Qdrant Retriever**: Retrieves relevant content chunks from Qdrant database
 - **Configuration Manager**: Handles environment variables and settings
 - **Data Models**: Pydantic models for API requests/responses
@@ -22,7 +22,7 @@ The system consists of several key components:
 ### Prerequisites
 - Python 3.9+
-- OpenRouter API key
 - Qdrant Cloud instance with book content embeddings
 - Cohere API key (for query embeddings)
@@ -42,7 +42,7 @@ The system consists of several key components:
 3. Edit `.env` with your API keys and configuration:
    ```env
-   OPENROUTER_API_KEY=your-openrouter-api-key-here
    QDRANT_URL=your-qdrant-instance-url
    QDRANT_API_KEY=your-qdrant-api-key
    QDRANT_COLLECTION_NAME=rag_embedding
@@ -103,7 +103,7 @@ Root endpoint with API information.
 ### Environment Variables
-- `OPENROUTER_API_KEY`: Your OpenRouter API key
 - `QDRANT_URL`: URL of your Qdrant instance
 - `QDRANT_API_KEY`: Your Qdrant API key
 - `QDRANT_COLLECTION_NAME`: Name of the collection with book embeddings (default: `rag_embedding`)
@@ -123,8 +123,8 @@ Pydantic models for API request/response schemas.
 ### Schemas (`schemas.py`)
 Additional schemas for internal data structures.
-### Agent (`openrouter_agent.py`)
-OpenRouter agent implementation with context injection and response validation.
 ### Retrieval (`retrieval.py`)
 Qdrant integration for content retrieval with semantic search.
@@ -160,7 +160,7 @@ pytest
 # Run specific test files
 pytest tests/test_api.py
-pytest tests/test_openrouter_agent.py
 pytest tests/test_retrieval.py
 ```

 # RAG Agent and API Layer
+A FastAPI-based question-answering system that uses OpenAI Agents and Qdrant retrieval to generate grounded responses based on book content.
 ## Overview
+The RAG Agent and API Layer provides a question-answering API that retrieves relevant content from Qdrant and uses an OpenAI agent to generate accurate, source-grounded responses. The system ensures that all answers are based only on the provided context to prevent hallucinations.
 ## Architecture
 The system consists of several key components:
 - **FastAPI Application**: Main entry point for the question-answering API
+- **OpenAI Agent**: Generates responses based on retrieved context
 - **Qdrant Retriever**: Retrieves relevant content chunks from Qdrant database
 - **Configuration Manager**: Handles environment variables and settings
 - **Data Models**: Pydantic models for API requests/responses
 ### Prerequisites
 - Python 3.9+
+- OpenAI API key
 - Qdrant Cloud instance with book content embeddings
 - Cohere API key (for query embeddings)
 3. Edit `.env` with your API keys and configuration:
    ```env
+   OPENAI_API_KEY=your-openai-api-key-here
    QDRANT_URL=your-qdrant-instance-url
    QDRANT_API_KEY=your-qdrant-api-key
    QDRANT_COLLECTION_NAME=rag_embedding
 ### Environment Variables
+- `OPENAI_API_KEY`: Your OpenAI API key
 - `QDRANT_URL`: URL of your Qdrant instance
 - `QDRANT_API_KEY`: Your Qdrant API key
 - `QDRANT_COLLECTION_NAME`: Name of the collection with book embeddings (default: `rag_embedding`)
 ### Schemas (`schemas.py`)
 Additional schemas for internal data structures.
+### Agent (`agent.py`)
+OpenAI agent implementation with context injection and response validation.
 ### Retrieval (`retrieval.py`)
 Qdrant integration for content retrieval with semantic search.
 # Run specific test files
 pytest tests/test_api.py
+pytest tests/test_agent.py
 pytest tests/test_retrieval.py
 ```

rag_agent_api/__init__.py CHANGED Viewed

@@ -10,7 +10,7 @@ __license__ = "MIT"
 # Import main components for easy access
 from .main import app
 from .config import Config, get_config, validate_config
-from .openrouter_agent import OpenRouterAgent
 from .retrieval import QdrantRetriever
 # Define what gets imported with "from rag_agent_api import *"
@@ -19,6 +19,6 @@ __all__ = [
     "Config",
     "get_config",
     "validate_config",
-    "OpenRouterAgent",
     "QdrantRetriever"
 ]

 # Import main components for easy access
 from .main import app
 from .config import Config, get_config, validate_config
+from .agent import GeminiAgent
 from .retrieval import QdrantRetriever
 # Define what gets imported with "from rag_agent_api import *"
     "Config",
     "get_config",
     "validate_config",
+    "GeminiAgent",
     "QdrantRetriever"
 ]

rag_agent_api/__pycache__/__init__.cpython-313.pyc CHANGED Viewed

Binary files a/rag_agent_api/__pycache__/__init__.cpython-313.pyc and b/rag_agent_api/__pycache__/__init__.cpython-313.pyc differ

rag_agent_api/__pycache__/agent.cpython-313.pyc CHANGED Viewed

Binary files a/rag_agent_api/__pycache__/agent.cpython-313.pyc and b/rag_agent_api/__pycache__/agent.cpython-313.pyc differ

rag_agent_api/__pycache__/config.cpython-313.pyc CHANGED Viewed

Binary files a/rag_agent_api/__pycache__/config.cpython-313.pyc and b/rag_agent_api/__pycache__/config.cpython-313.pyc differ

rag_agent_api/__pycache__/main.cpython-313.pyc CHANGED Viewed

Binary files a/rag_agent_api/__pycache__/main.cpython-313.pyc and b/rag_agent_api/__pycache__/main.cpython-313.pyc differ

rag_agent_api/__pycache__/openrouter_agent.cpython-313.pyc CHANGED Viewed

Binary files a/rag_agent_api/__pycache__/openrouter_agent.cpython-313.pyc and b/rag_agent_api/__pycache__/openrouter_agent.cpython-313.pyc differ

rag_agent_api/__pycache__/retrieval.cpython-313.pyc CHANGED Viewed

Binary files a/rag_agent_api/__pycache__/retrieval.cpython-313.pyc and b/rag_agent_api/__pycache__/retrieval.cpython-313.pyc differ

rag_agent_api/agent.py ADDED Viewed

	@@ -0,0 +1,363 @@

+"""
+Google Gemini Agent module for the RAG Agent and API Layer system.
+This module provides functionality for creating and managing a Google Gemini agent
+that generates responses based on retrieved context.
+"""
+import asyncio
+import logging
+from typing import List, Dict, Any, Optional
+import google.generativeai as genai
+from .config import get_config
+from .schemas import AgentContext, AgentResponse, SourceChunkSchema
+from .utils import format_confidence_score
+class GeminiAgent:
+    """
+    A class to manage the Google Gemini agent for generating responses based on context.
+    """
+    def __init__(self, model_name: str = "gemini-2.5-flash"):
+        """
+        Initialize the Google Gemini agent with configuration.
+        Args:
+            model_name: Name of the Gemini model to use (default: gemini-2.5-flash)
+        """
+        config = get_config()
+        api_key = config.gemini_api_key
+        if not api_key:
+            raise ValueError("GEMINI_API_KEY environment variable not set")
+        # Configure the Gemini client
+        genai.configure(api_key=api_key)
+        # Create the generative model instance
+        self.model = genai.GenerativeModel(model_name)
+        self.model_name = model_name
+        self.default_temperature = config.default_temperature
+        logging.info(f"Gemini agent initialized with model: {model_name}")
+    async def generate_response(self, context: AgentContext) -> AgentResponse:
+        """
+        Generate a response based on the provided context.
+        Args:
+            context: AgentContext containing the query and retrieved context chunks
+        Returns:
+            AgentResponse with the generated answer and metadata
+        """
+        # Check if retrieved context is empty (no chunks at all)
+        if not context.retrieved_chunks:
+            return AgentResponse(
+                raw_response="I could not find this information in the book.",
+                used_sources=[],
+                confidence_score=0.0,
+                is_valid=True,
+                validation_details="No context chunks retrieved from the database",
+                unsupported_claims=[]
+            )
+        # Check if context is insufficient (very short content)
+        total_context_length = sum(len(chunk.content) for chunk in context.retrieved_chunks)
+        if total_context_length < 10:  # Much lower threshold, but still meaningful
+            return AgentResponse(
+                raw_response="I could not find this information in the book.",
+                used_sources=[],
+                confidence_score=0.0,
+                is_valid=True,
+                validation_details="No sufficient context provided to answer the question",
+                unsupported_claims=[]
+            )
+        try:
+            # Prepare the system message with instructions for grounding responses
+            system_message = self._create_system_message(context)
+            # Prepare the user message with the query
+            user_message = self._create_user_message(context)
+            # For Google Gemini, we need to format the prompt differently
+            # Combine system instructions and user query
+            full_prompt = f"{system_message}\n\n{user_message}"
+            # Generate response from Google Gemini
+            # For async generation, we need to use the appropriate async method
+            chat = self.model.start_chat()
+            response = await chat.send_message_async(
+                full_prompt,
+                generation_config={
+                    "temperature": context.source_policy if hasattr(context, 'temperature') else self.default_temperature,
+                    "max_output_tokens": 1000
+                }
+            )
+            # Extract the response text
+            raw_response = response.text if response and hasattr(response, 'text') else str(response)
+            # If the response indicates no information was found, return the exact message
+            if "I could not find this information in the book" in raw_response:
+                return AgentResponse(
+                    raw_response="I could not find this information in the book.",
+                    used_sources=[],
+                    confidence_score=0.0,
+                    is_valid=True,
+                    validation_details="No relevant information found in the provided context",
+                    unsupported_claims=[]
+                )
+            # Determine which sources were used (this is a simplified approach)
+            used_sources = self._identify_used_sources(raw_response, context.retrieved_chunks)
+            # Calculate confidence score (based on similarity scores of used sources)
+            confidence_score = self._calculate_confidence_score(used_sources, context.retrieved_chunks)
+            # Validate that the response is grounded in the provided context
+            grounding_validation = self._validate_response_grounding(
+                raw_response, context.retrieved_chunks, context.query
+            )
+            # Create and return the agent response
+            agent_response = AgentResponse(
+                raw_response=raw_response,
+                used_sources=used_sources,
+                confidence_score=confidence_score,
+                is_valid=grounding_validation["is_valid"],
+                validation_details=grounding_validation["details"],
+                unsupported_claims=grounding_validation["unsupported_claims"]
+            )
+            logging.info(f"Agent response generated successfully. Confidence: {confidence_score:.2f}")
+            return agent_response
+        except Exception as e:
+            logging.error(f"Error generating response from Google Gemini agent: {e}", exc_info=True)
+            # Return the specific message when there's an error
+            return AgentResponse(
+                raw_response="I could not find this information in the book.",
+                used_sources=[],
+                confidence_score=0.0,
+                is_valid=False,
+                validation_details=f"Error generating response: {str(e)}",
+                unsupported_claims=[]
+            )
+    def _create_system_message(self, context: AgentContext) -> str:
+        """
+        Create the system message that instructs the agent on how to behave.
+        Args:
+            context: AgentContext containing the query and retrieved context chunks
+        Returns:
+            Formatted system message string
+        """
+        system_prompt = """You are a documentation-based assistant.
+Answer ONLY using the provided context from the book
+"Physical AI & Humanoid Robotics".
+If the answer is not found, reply EXACTLY:
+"I could not find this information in the book."""
+        return system_prompt
+    def _create_user_message(self, context: AgentContext) -> str:
+        """
+        Create the user message containing the query.
+        Args:
+            context: AgentContext containing the query and retrieved context chunks
+        Returns:
+            Formatted user message string
+        """
+        return f"""CONTEXT:
+{self._format_context_chunks(context.retrieved_chunks)}
+QUESTION:
+{context.query}"""
+    def _format_context_chunks(self, chunks: List[SourceChunkSchema]) -> str:
+        """
+        Format the context chunks for the prompt.
+        Args:
+            chunks: List of source chunks to format
+        Returns:
+            Formatted context string
+        """
+        if not chunks:
+            return ""
+        formatted_chunks = []
+        for i, chunk in enumerate(chunks):
+            formatted_chunks.append(f"[Chunk {i+1}]\n{chunk.content}\n[/Chunk {i+1}]")
+        return "\n".join(formatted_chunks)
+    def _create_context_messages(self, context: AgentContext) -> List[Dict[str, str]]:
+        """
+        Create context messages from the retrieved chunks.
+        With the new format, context is now provided in the user message,
+        so this method returns an empty list to avoid duplication.
+        Args:
+            context: AgentContext containing the query and retrieved context chunks
+        Returns:
+            Empty list since context is now in user message
+        """
+        return []
+    def _identify_used_sources(self, response: str, chunks: List[SourceChunkSchema]) -> List[str]:
+        """
+        Identify which sources were likely used in the response.
+        This is a simplified approach - in a real implementation, you might use
+        more sophisticated techniques like semantic similarity.
+        Args:
+            response: The agent's response text
+            chunks: List of source chunks that were provided to the agent
+        Returns:
+            List of source IDs that were likely used
+        """
+        used_sources = []
+        response_lower = response.lower()
+        for chunk in chunks:
+            # Check if any significant words from the chunk appear in the response
+            content_words = set(chunk.content.lower().split()[:20])  # Check first 20 words
+            response_words = set(response_lower.split())
+            # If there's significant overlap, consider this chunk as used
+            overlap = content_words.intersection(response_words)
+            if len(overlap) > 2:  # Arbitrary threshold
+                used_sources.append(chunk.id)
+        # If no sources were identified, return all sources (conservative approach)
+        if not used_sources:
+            used_sources = [chunk.id for chunk in chunks]
+        return used_sources
+    def _calculate_confidence_score(self, used_sources: List[str], chunks: List[SourceChunkSchema]) -> float:
+        """
+        Calculate a confidence score based on the quality of the used sources.
+        Args:
+            used_sources: List of source IDs that were used
+            chunks: List of all source chunks that were provided to the agent
+        Returns:
+            Confidence score between 0.0 and 1.0
+        """
+        if not used_sources:
+            return 0.1  # Low confidence if no sources were used
+        # Calculate average similarity score of used sources
+        total_similarity = 0.0
+        used_count = 0
+        for chunk in chunks:
+            if chunk.id in used_sources:
+                total_similarity += chunk.similarity_score
+                used_count += 1
+        if used_count == 0:
+            return 0.1  # Low confidence if no matching chunks found
+        avg_similarity = total_similarity / used_count
+        # If similarity scores are very low (e.g., due to embedding issues),
+        # but we have content, still provide some confidence
+        if avg_similarity < 0.1 and len(used_sources) > 0:
+            # If we have relevant content but low similarity scores,
+            # it might be due to embedding issues, not lack of relevance
+            # So we'll set a minimum confidence if content exists
+            return 0.3  # Low but not zero confidence
+        else:
+            # Normalize the confidence score (adjust based on your requirements)
+            # Higher similarity scores contribute to higher confidence
+            confidence = avg_similarity
+        return format_confidence_score(confidence)
+    def _validate_response_grounding(self, response: str, chunks: List[SourceChunkSchema], query: str) -> Dict[str, Any]:
+        """
+        Validate that the response is grounded in the provided context.
+        Args:
+            response: The agent's response text
+            chunks: List of source chunks that were provided to the agent
+            query: The original query
+        Returns:
+            Dictionary with validation results
+        """
+        # Check if the response contains elements from the provided context
+        response_lower = response.lower()
+        context_text = " ".join([chunk.content.lower() for chunk in chunks])
+        # Simple heuristic: check if response contains significant terms from context
+        response_words = set(response_lower.split())
+        context_words = set(context_text.split())
+        # Calculate overlap between response and context
+        overlap = response_words.intersection(context_words)
+        total_response_words = len(response_words)
+        overlap_count = len(overlap)
+        # If less than 30% of response words come from context, flag as potentially ungrounded
+        is_grounded = True
+        unsupported_claims = []
+        if total_response_words > 0:
+            grounding_ratio = overlap_count / total_response_words
+            is_grounded = grounding_ratio >= 0.3  # At least 30% of words should come from context
+        # For now, we'll just return the basic validation
+        # In a more sophisticated implementation, you'd analyze the response more deeply
+        details = f"Response grounding validation completed. Context overlap ratio: {overlap_count/total_response_words if total_response_words > 0 else 0:.2f}"
+        return {
+            "is_valid": is_grounded,
+            "details": details,
+            "unsupported_claims": unsupported_claims
+        }
+    async def validate_response_quality(self, response: str, context: AgentContext) -> bool:
+        """
+        Validate the quality of the agent's response.
+        Args:
+            response: The agent's response text
+            context: AgentContext containing the query and retrieved context chunks
+        Returns:
+            True if response meets quality standards, False otherwise
+        """
+        # Check for common signs of poor quality responses
+        if not response or response.strip() == "":
+            logging.warning("Agent returned an empty response")
+            return False
+        # Check if response contains generic fallback phrases
+        lower_response = response.lower()
+        if "i don't know" in lower_response or "i don't have" in lower_response:
+            # This might be a valid response if there's no relevant context
+            if len(context.retrieved_chunks) == 0:
+                return True  # Valid response if no context was provided
+            else:
+                # Check if the response is justified given the context
+                # For now, we'll consider it valid if it acknowledges the lack of relevant information
+                return True
+        # In a more sophisticated implementation, you'd validate against the context more rigorously
+        return True
+# Global agent instance (if needed)
+# agent_instance = OpenAIAgent()

rag_agent_api/config.py CHANGED Viewed

@@ -19,7 +19,6 @@ class Config:
     def __init__(self):
         """Initialize configuration by loading environment variables."""
-        self.openai_api_key = os.getenv('OPENAI_API_KEY')
         self.cohere_api_key = os.getenv('COHERE_API_KEY')
         self.openrouter_api_key = os.getenv('OPENROUTER_API_KEY')
         self.qdrant_url = os.getenv('QDRANT_URL')

     def __init__(self):
         """Initialize configuration by loading environment variables."""
         self.cohere_api_key = os.getenv('COHERE_API_KEY')
         self.openrouter_api_key = os.getenv('OPENROUTER_API_KEY')
         self.qdrant_url = os.getenv('QDRANT_URL')

rag_agent_api/main.py CHANGED Viewed

@@ -82,22 +82,22 @@ async def health_check() -> HealthResponse:
         HealthResponse with status of services
     """
     # Check if all required components are initialized
-    openrouter_status = "up" if agent else "down"
     qdrant_status = "up" if retriever else "down"
     agent_status = "up" if agent else "down"
     # Determine overall status
     overall_status = "healthy"
-    if openrouter_status == "down" or qdrant_status == "down":
         overall_status = "unhealthy"
-    elif openrouter_status == "degraded" or qdrant_status == "degraded":
         overall_status = "degraded"
     return HealthResponse(
         status=overall_status,
         timestamp=format_timestamp(),
         services={
-            "openrouter": openrouter_status,
             "qdrant": qdrant_status,
             "agent": agent_status
         }
@@ -194,7 +194,7 @@ async def root() -> Dict[str, Any]:
     return {
         "message": "RAG Agent and API Layer",
         "version": "1.0.0",
-        "description": "Question-answering API using OpenRouter Agents and Qdrant retrieval",
         "endpoints": {
             "POST /ask": "Main question-answering endpoint",
             "GET /health": "Health check endpoint",
@@ -243,9 +243,4 @@ async def readiness_check() -> Dict[str, str]:
     if retriever and agent:
         return {"status": "ready"}
     else:
-        raise HTTPException(status_code=503, detail="Service not ready")
-if __name__ == "__main__":
-    import uvicorn
-    uvicorn.run(app, host="0.0.0.0", port=8000)

         HealthResponse with status of services
     """
     # Check if all required components are initialized
+    gemini_status = "up" if agent else "down"
     qdrant_status = "up" if retriever else "down"
     agent_status = "up" if agent else "down"
     # Determine overall status
     overall_status = "healthy"
+    if gemini_status == "down" or qdrant_status == "down":
         overall_status = "unhealthy"
+    elif gemini_status == "degraded" or qdrant_status == "degraded":
         overall_status = "degraded"
     return HealthResponse(
         status=overall_status,
         timestamp=format_timestamp(),
         services={
+            "gemini": gemini_status,
             "qdrant": qdrant_status,
             "agent": agent_status
         }
     return {
         "message": "RAG Agent and API Layer",
         "version": "1.0.0",
+        "description": "Question-answering API using OpenAI Agents and Qdrant retrieval",
         "endpoints": {
             "POST /ask": "Main question-answering endpoint",
             "GET /health": "Health check endpoint",
     if retriever and agent:
         return {"status": "ready"}
     else:
+        raise HTTPException(status_code=503, detail="Service not ready")

rag_agent_api/retrieval.py CHANGED Viewed

@@ -76,6 +76,16 @@ class QdrantRetriever:
             # Embed the query using Cohere
             query_embedding = await self._embed_query(query)
             # Perform semantic search in Qdrant
             search_results = await self.client.query_points(
                 collection_name=self.collection_name,
@@ -116,53 +126,134 @@ class QdrantRetriever:
             # Return empty list instead of raising exception to allow graceful handling
             return []
-    async def _embed_query(self, query: str) -> List[float]:
         """
-        Embed the query using Cohere to prepare for semantic search.
         Args:
-            query: The query string to embed
         Returns:
-            List of floats representing the query embedding
         """
         try:
-            # Use Cohere to embed the query
-            # The original book content was likely embedded with Cohere embed-english-v3.0
-            response = await self.cohere_client.embed(
-                texts=[query],
-                model="embed-english-v3.0",  # 1024-dimensional embedding model
-                input_type="search_query"  # Specify this is a search query
             )
-            # Extract the embedding from the response
-            embedding = response.embeddings[0]  # Get the first (and only) embedding
-            return embedding
         except Exception as e:
-            logging.error(f"Error embedding query with Cohere: {e}", exc_info=True)
-            # Try using OpenAI embeddings as fallback if available
             try:
-                from openai import OpenAI
-                from .config import get_config
-                config = get_config()
-                if config.openai_api_key:
-                    client = OpenAI(api_key=config.openai_api_key)
-                    response = client.embeddings.create(
-                        input=query,
-                        model="text-embedding-ada-002"
-                    )
-                    embedding = response.data[0].embedding
-                    logging.info("Successfully used OpenAI embedding as fallback")
-                    return embedding
-            except Exception as openai_error:
-                logging.warning(f"OpenAI fallback also failed: {openai_error}")
-            # If both fail, return a zero vector of the correct size (1024) as a last resort
-            # This will result in poor semantic matches but won't crash the system
-            logging.warning("Using zero vector as final fallback for query embedding")
-            return [0.0] * 1024
     def _validate_chunk(self, chunk: SourceChunkSchema) -> bool:
         """

             # Embed the query using Cohere
             query_embedding = await self._embed_query(query)
+            # Check if we got a zero vector fallback (indicating embedding service failure)
+            is_zero_vector = all(x == 0.0 for x in query_embedding)
+            if is_zero_vector:
+                # If we have a zero vector, try a different approach - keyword search
+                logging.warning("Zero vector detected, attempting keyword-based fallback search")
+                retrieved_chunks = await self._keyword_search_fallback(query, top_k)
+                logging.info(f"Keyword fallback search retrieved {len(retrieved_chunks)} chunks from Qdrant")
+                return retrieved_chunks
             # Perform semantic search in Qdrant
             search_results = await self.client.query_points(
                 collection_name=self.collection_name,
             # Return empty list instead of raising exception to allow graceful handling
             return []
+    async def _keyword_search_fallback(self, query: str, top_k: int = 5) -> List[SourceChunkSchema]:
         """
+        Fallback method to search using keyword matching when embedding service is unavailable.
         Args:
+            query: The user's query string
+            top_k: Number of results to return (default: 5)
         Returns:
+            List of SourceChunkSchema objects containing relevant content
         """
         try:
+            # Use Qdrant's full-text search capability or filter-based approach
+            # For now, we'll use a scroll + filter approach to find relevant chunks
+            from qdrant_client.http import models
+            # Simple approach: get all points and filter based on keyword matching
+            # In a production system, you'd want to use Qdrant's text indexing capabilities
+            all_points = await self.client.scroll(
+                collection_name=self.collection_name,
+                limit=10000,  # Get up to 10000 points (or as many as exist)
+                with_payload=True,
+                with_vectors=False
             )
+            # Extract points from the result (structure may vary depending on Qdrant client version)
+            points = all_points[0] if isinstance(all_points, tuple) else all_points
+            # Score points based on keyword matching
+            scored_chunks = []
+            query_lower = query.lower()
+            query_words = set(query_lower.split())
+            for point in points:
+                payload = point.payload if hasattr(point, 'payload') else point
+                content = payload.get('text', '') if isinstance(payload, dict) else getattr(payload, 'text', '')
+                content_lower = content.lower()
+                # Calculate a simple keyword match score
+                content_words = set(content_lower.split())
+                overlap = query_words.intersection(content_words)
+                score = len(overlap) / len(query_words) if query_words else 0  # Jaccard similarity
+                if score > 0 or query_lower in content_lower:  # Only include if there's some match
+                    chunk = SourceChunkSchema(
+                        id=point.id if hasattr(point, 'id') else getattr(point, 'point_id', None),
+                        url=payload.get('url', '') if isinstance(payload, dict) else getattr(payload, 'url', ''),
+                        title=payload.get('title', '') if isinstance(payload, dict) else getattr(payload, 'title', ''),
+                        content=content,
+                        similarity_score=score,
+                        chunk_index=payload.get('chunk_index', 0) if isinstance(payload, dict) else getattr(payload, 'chunk_index', 0)
+                    )
+                    if self._validate_chunk(chunk):
+                        scored_chunks.append((chunk, score))
+            # Sort by score and return top_k
+            scored_chunks.sort(key=lambda x: x[1], reverse=True)
+            top_chunks = [chunk for chunk, score in scored_chunks[:top_k]]
+            return top_chunks
         except Exception as e:
+            logging.error(f"Error in keyword fallback search: {e}", exc_info=True)
+            return []
+    async def _embed_query(self, query: str) -> List[float]:
+        """
+        Embed the query using Cohere to prepare for semantic search with retry logic for rate limits.
+        Args:
+            query: The query string to embed
+        Returns:
+            List of floats representing the query embedding
+        """
+        import time
+        import random
+        from cohere.errors.too_many_requests_error import TooManyRequestsError
+        # Try Cohere with retry logic for rate limits
+        for attempt in range(3):  # Try up to 3 times
             try:
+                # Use Cohere to embed the query
+                # The original book content was likely embedded with Cohere embed-english-v3.0
+                response = await self.cohere_client.embed(
+                    texts=[query],
+                    model="embed-english-v3.0",  # 1024-dimensional embedding model
+                    input_type="search_query"  # Specify this is a search query
+                )
+                # Extract the embedding from the response
+                embedding = response.embeddings[0]  # Get the first (and only) embedding
+                return embedding
+            except TooManyRequestsError as e:
+                if attempt < 2:  # Don't wait after the last attempt
+                    # Exponential backoff with jitter
+                    wait_time = (2 ** attempt) + random.uniform(0, 1)
+                    logging.warning(f"Cohere rate limited (attempt {attempt + 1}), waiting {wait_time:.2f}s: {e}")
+                    await asyncio.sleep(wait_time)
+                else:
+                    logging.error(f"Cohere rate limited after {attempt + 1} attempts: {e}")
+            except Exception as e:
+                logging.error(f"Error embedding query with Cohere: {e}", exc_info=True)
+                break  # Don't retry for other types of errors
+        # If Cohere fails, try using OpenAI embeddings as fallback if available
+        try:
+            from openai import OpenAI
+            from .config import get_config
+            config = get_config()
+            if config.openai_api_key:
+                client = OpenAI(api_key=config.openai_api_key)
+                response = client.embeddings.create(
+                    input=query,
+                    model="text-embedding-ada-002"
+                )
+                embedding = response.data[0].embedding
+                logging.info("Successfully used OpenAI embedding as fallback")
+                return embedding
+        except Exception as openai_error:
+            logging.warning(f"OpenAI fallback also failed: {openai_error}")
+        # If all fail, return a zero vector of the correct size (1024) as a last resort
+        # This will result in poor semantic matches but won't crash the system
+        logging.warning("Using zero vector as final fallback for query embedding")
+        return [0.0] * 1024
     def _validate_chunk(self, chunk: SourceChunkSchema) -> bool:
         """

requirements.txt CHANGED Viewed

@@ -1,12 +1,10 @@
-# Backend Service Dependencies
-requests>=2.31.0
-beautifulsoup4>=4.12.0
-cohere>=4.9.0
-qdrant-client>=1.7.0
 python-dotenv>=1.0.0
-fastapi>=0.104.0
-uvicorn>=0.24.0
-openai>=1.0.0
-pydantic>=2.0.0
-numpy>=1.21.0
-httpx>=0.27.0

+fastapi>=0.104.1
+uvicorn[standard]>=0.24.0
+qdrant-client>=1.8.0
 python-dotenv>=1.0.0
+httpx>=0.25.0
+cohere>=4.9.0
+google-generativeai>=0.4.0
+openai>=1.6.0
+pydantic>=2.5.0
+typing-extensions>=4.8.0

test_retrieval.py ADDED Viewed

	@@ -0,0 +1,60 @@

+#!/usr/bin/env python3
+"""
+Test script to directly test the Qdrant retrieval functionality
+"""
+import asyncio
+import os
+from dotenv import load_dotenv
+from rag_agent_api.retrieval import QdrantRetriever
+from rag_agent_api.config import get_config
+# Load environment variables
+load_dotenv()
+async def test_retrieval():
+    print("Testing Qdrant retrieval functionality...")
+    # Create a QdrantRetriever instance
+    retriever = QdrantRetriever()
+    print("1. Testing collection existence...")
+    exists = await retriever.validate_collection_exists()
+    print(f"   Collection exists: {exists}")
+    if exists:
+        print("2. Getting total points in collection...")
+        total_points = await retriever.get_total_points()
+        print(f"   Total points: {total_points}")
+    print("3. Testing query embedding...")
+    try:
+        query = "what about this book?"
+        embedding = await retriever._embed_query(query)
+        print(f"   Query embedding successful, length: {len(embedding)}")
+    except Exception as e:
+        print(f"   Query embedding failed: {e}")
+        return
+    print("4. Testing direct search...")
+    try:
+        results = await retriever.retrieve_context(query, top_k=5)
+        print(f"   Retrieved {len(results)} results")
+        if results:
+            print("   Sample results:")
+            for i, result in enumerate(results[:2]):  # Show first 2 results
+                print(f"     Result {i+1}:")
+                print(f"       ID: {result.id}")
+                print(f"       Title: {result.title}")
+                print(f"       Content preview: {result.content[:100]}...")
+                print(f"       Similarity: {result.similarity_score}")
+                print(f"       URL: {result.url}")
+        else:
+            print("   No results retrieved - this indicates the main issue")
+    except Exception as e:
+        print(f"   Direct search failed: {e}")
+        import traceback
+        traceback.print_exc()
+if __name__ == "__main__":
+    asyncio.run(test_retrieval())

tests/test_integration.py CHANGED Viewed

@@ -7,7 +7,7 @@ from fastapi.testclient import TestClient
 from unittest.mock import Mock, patch, AsyncMock
 from rag_agent_api.main import app, retriever, agent
 from rag_agent_api.retrieval import QdrantRetriever
-from rag_agent_api.openrouter_agent import OpenRouterAgent
 from rag_agent_api.schemas import SourceChunkSchema, AgentResponse, AgentContext
@@ -17,13 +17,13 @@ def test_full_query_flow_with_mocked_components():
         'QDRANT_URL': 'http://test-qdrant:6333',
         'QDRANT_API_KEY': 'test-api-key',
         'COHERE_API_KEY': 'test-cohere-key',
-        'OPENROUTER_API_KEY': 'test-openrouter-key'
     }):
         with patch('rag_agent_api.main.QdrantRetriever') as mock_retriever_class:
-            with patch('rag_agent_api.main.OpenRouterAgent') as mock_agent_class:
                 # Create mock instances
                 mock_retriever = Mock(spec=QdrantRetriever)
-                mock_agent = Mock(spec=OpenRouterAgent)
                 # Configure the class mocks to return our instance mocks
                 mock_retriever_class.return_value = mock_retriever
@@ -84,11 +84,11 @@ async def test_agent_context_creation():
         'QDRANT_URL': 'http://test-qdrant:6333',
         'QDRANT_API_KEY': 'test-api-key',
         'COHERE_API_KEY': 'test-cohere-key',
-        'OPENROUTER_API_KEY': 'test-openrouter-key'
     }):
         with patch('rag_agent_api.retrieval.AsyncQdrantClient') as mock_qdrant_client:
             with patch('rag_agent_api.retrieval.cohere.Client') as mock_cohere_client:
-                with patch('rag_agent_api.openrouter_agent.httpx.AsyncClient'):
                     # Mock the Qdrant client
                     mock_qdrant_instance = Mock()
                     mock_qdrant_client.return_value = mock_qdrant_instance
@@ -101,7 +101,7 @@ async def test_agent_context_creation():
                     # Initialize components
                     retriever = QdrantRetriever(collection_name="test_collection")
-                    agent = OpenRouterAgent(model_name="gpt-4-test")
                     # Create test chunks
                     test_chunk = SourceChunkSchema(
@@ -145,7 +145,7 @@ def test_health_endpoint_integration():
             assert "services" in data
             # Check that services status is included
-            assert "openrouter" in data["services"]
             assert "qdrant" in data["services"]
             assert "agent" in data["services"]
@@ -157,11 +157,11 @@ async def test_retrieval_and_agent_integration():
         'QDRANT_URL': 'http://test-qdrant:6333',
         'QDRANT_API_KEY': 'test-api-key',
         'COHERE_API_KEY': 'test-cohere-key',
-        'OPENROUTER_API_KEY': 'test-openrouter-key'
     }):
         with patch('rag_agent_api.retrieval.AsyncQdrantClient') as mock_qdrant_client:
             with patch('rag_agent_api.retrieval.cohere.Client') as mock_cohere_client:
-                with patch('rag_agent_api.openrouter_agent.httpx.AsyncClient') as mock_httpx_client:
                     # Mock the Qdrant client
                     mock_qdrant_instance = Mock()
                     mock_qdrant_client.return_value = mock_qdrant_instance
@@ -172,21 +172,18 @@ async def test_retrieval_and_agent_integration():
                     mock_cohere_client.return_value = mock_cohere_instance
                     mock_cohere_instance.embed.return_value = Mock(embeddings=[[0.1, 0.2, 0.3]])
-                    # Mock the httpx client for OpenRouter
-                    mock_httpx_instance = Mock()
-                    mock_httpx_client.return_value.__aenter__.return_value = mock_httpx_instance
                     mock_completion = Mock()
-                    mock_completion.json.return_value = {
-                        "choices": [
-                            {"message": {"content": "This is a test response"}}
-                        ]
-                    }
-                    mock_httpx_instance.post = AsyncMock(return_value=mock_completion)
-                    mock_httpx_instance.post.return_value.status_code = 200
                     # Initialize components
                     test_retriever = QdrantRetriever(collection_name="test_collection")
-                    test_agent = OpenRouterAgent(model_name="gpt-4-test")
                     # Mock the retrieval result
                     mock_chunk = SourceChunkSchema(

 from unittest.mock import Mock, patch, AsyncMock
 from rag_agent_api.main import app, retriever, agent
 from rag_agent_api.retrieval import QdrantRetriever
+from rag_agent_api.agent import OpenAIAgent
 from rag_agent_api.schemas import SourceChunkSchema, AgentResponse, AgentContext
         'QDRANT_URL': 'http://test-qdrant:6333',
         'QDRANT_API_KEY': 'test-api-key',
         'COHERE_API_KEY': 'test-cohere-key',
+        'OPENAI_API_KEY': 'test-openai-key'
     }):
         with patch('rag_agent_api.main.QdrantRetriever') as mock_retriever_class:
+            with patch('rag_agent_api.main.OpenAIAgent') as mock_agent_class:
                 # Create mock instances
                 mock_retriever = Mock(spec=QdrantRetriever)
+                mock_agent = Mock(spec=OpenAIAgent)
                 # Configure the class mocks to return our instance mocks
                 mock_retriever_class.return_value = mock_retriever
         'QDRANT_URL': 'http://test-qdrant:6333',
         'QDRANT_API_KEY': 'test-api-key',
         'COHERE_API_KEY': 'test-cohere-key',
+        'OPENAI_API_KEY': 'test-openai-key'
     }):
         with patch('rag_agent_api.retrieval.AsyncQdrantClient') as mock_qdrant_client:
             with patch('rag_agent_api.retrieval.cohere.Client') as mock_cohere_client:
+                with patch('rag_agent_api.agent.AsyncOpenAI'):
                     # Mock the Qdrant client
                     mock_qdrant_instance = Mock()
                     mock_qdrant_client.return_value = mock_qdrant_instance
                     # Initialize components
                     retriever = QdrantRetriever(collection_name="test_collection")
+                    agent = OpenAIAgent(model_name="gpt-4-test")
                     # Create test chunks
                     test_chunk = SourceChunkSchema(
             assert "services" in data
             # Check that services status is included
+            assert "openai" in data["services"]
             assert "qdrant" in data["services"]
             assert "agent" in data["services"]
         'QDRANT_URL': 'http://test-qdrant:6333',
         'QDRANT_API_KEY': 'test-api-key',
         'COHERE_API_KEY': 'test-cohere-key',
+        'OPENAI_API_KEY': 'test-openai-key'
     }):
         with patch('rag_agent_api.retrieval.AsyncQdrantClient') as mock_qdrant_client:
             with patch('rag_agent_api.retrieval.cohere.Client') as mock_cohere_client:
+                with patch('rag_agent_api.agent.AsyncOpenAI') as mock_openai:
                     # Mock the Qdrant client
                     mock_qdrant_instance = Mock()
                     mock_qdrant_client.return_value = mock_qdrant_instance
                     mock_cohere_client.return_value = mock_cohere_instance
                     mock_cohere_instance.embed.return_value = Mock(embeddings=[[0.1, 0.2, 0.3]])
+                    # Mock the OpenAI client
+                    mock_openai_instance = Mock()
+                    mock_openai.return_value = mock_openai_instance
                     mock_completion = Mock()
+                    mock_completion.choices = [Mock()]
+                    mock_completion.choices[0].message = Mock()
+                    mock_completion.choices[0].message.content = "This is a test response"
+                    mock_openai_instance.chat.completions.create = AsyncMock(return_value=mock_completion)
                     # Initialize components
                     test_retriever = QdrantRetriever(collection_name="test_collection")
+                    test_agent = OpenAIAgent(model_name="gpt-4-test")
                     # Mock the retrieval result
                     mock_chunk = SourceChunkSchema(