galbendavids commited on
Commit
2b89e73
ยท
1 Parent(s): e161246

fix: add tiktoken and improve sentiment model compatibility for all platforms

Browse files
Files changed (6) hide show
  1. .env.example +19 -0
  2. QUICK_START.md +289 -0
  3. app/api.py +6 -5
  4. app/sentiment.py +23 -2
  5. requirements.txt +1 -0
  6. scripts/validate_local.py +314 -0
.env.example ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Local Development Environment Configuration
2
+ # Copy this file to .env and fill in your actual values
3
+ # .env is in .gitignore and will NOT be committed to git
4
+
5
+ # LLM API Keys (optional; leave empty to use extractive summaries)
6
+ GEMINI_API_KEY=your_gemini_api_key_here
7
+ OPENAI_API_KEY=your_openai_api_key_here
8
+
9
+ # Embedding Model (optional; defaults to multilingual)
10
+ EMBEDDING_MODEL=sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
11
+
12
+ # Data and Index Paths (optional; defaults to repo root)
13
+ CSV_PATH=./Feedback.csv
14
+ VECTOR_INDEX_PATH=./.vector_index/faiss.index
15
+ VECTOR_METADATA_PATH=./.vector_index/meta.parquet
16
+
17
+ # Server Configuration (optional)
18
+ SERVER_HOST=0.0.0.0
19
+ SERVER_PORT=8000
QUICK_START.md ADDED
@@ -0,0 +1,289 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Quick Start - Local Development Guide
2
+
3
+ This guide shows you how to run the Feedback Analysis RAG Agent locally, test all endpoints, and prepare it for Runpod deployment. Everything works locally first before any cloud deployment.
4
+
5
+ ## Prerequisites
6
+
7
+ - **Python 3.10+** (verify with `python3 --version`)
8
+ - **Git** (already installed)
9
+ - **Terminal/Command line** access
10
+ - **4GB+ RAM** recommended
11
+ - **~2GB free disk space** for models (first time only)
12
+
13
+ ## Step 1: Install Dependencies
14
+
15
+ Clone the repo (if not already done):
16
+ ```bash
17
+ git clone https://github.com/galbendavids/Feedback_Analysis_RAG_Agent_runpod.git
18
+ cd Feedback_Analysis_RAG_Agent_runpod
19
+ ```
20
+
21
+ Create and activate virtual environment:
22
+ ```bash
23
+ python3 -m venv .venv
24
+ source .venv/bin/activate # On Windows: .venv\Scripts\activate
25
+ ```
26
+
27
+ Install all required packages:
28
+ ```bash
29
+ pip install --upgrade pip
30
+ pip install -r requirements.txt
31
+ ```
32
+
33
+ **Note:** First install may take 5-10 minutes as models are large. Subsequent installs are faster.
34
+
35
+ ## Step 2: Prepare Environment Variables (Optional)
36
+
37
+ Copy the example environment file:
38
+ ```bash
39
+ cp .env.example .env
40
+ ```
41
+
42
+ Edit `.env` if you have LLM API keys (optional):
43
+ ```bash
44
+ # Edit .env with your editor
45
+ GEMINI_API_KEY=your_key_here # Optional
46
+ OPENAI_API_KEY=sk-... # Optional
47
+ ```
48
+
49
+ If you don't have API keys, the system will use extractive summaries (still works fine).
50
+
51
+ ## Step 3: Validate Everything Works
52
+
53
+ Before starting the server, run the validation harness (this checks all components):
54
+ ```bash
55
+ python3 scripts/validate_local.py
56
+ ```
57
+
58
+ Expected output when all is OK:
59
+ ```
60
+ ============================================================
61
+ VALIDATION SUMMARY
62
+ ============================================================
63
+
64
+ [PASS] Dependencies
65
+ [PASS] CSV file
66
+ [PASS] FAISS Index
67
+ [PASS] App imports
68
+ [PASS] Analysis logic
69
+ [PASS] RAGService
70
+ [PASS] API endpoints
71
+
72
+ ------------------------------------------------------------
73
+ All 7 checks PASSED! Ready for local testing.
74
+ ```
75
+
76
+ If any checks fail, the script will tell you exactly what to fix.
77
+
78
+ ## Step 4: Start the Local Server
79
+
80
+ Run the API server:
81
+ ```bash
82
+ python3 run.py
83
+ ```
84
+
85
+ Expected output:
86
+ ```
87
+ INFO: Uvicorn running on http://0.0.0.0:8000
88
+ Press CTRL+C to quit
89
+ ```
90
+
91
+ The server is now running and ready to accept requests!
92
+
93
+ ## Step 5: Test the API - Three Options
94
+
95
+ ### Option A: Interactive Swagger UI (Easiest)
96
+
97
+ Open your browser:
98
+ - http://localhost:8000/docs
99
+
100
+ Click on any endpoint, fill in the JSON, and click "Try it out". You'll see responses in real-time.
101
+
102
+ ### Option B: curl Commands (Terminal)
103
+
104
+ In a new terminal window (keep server running), try these:
105
+
106
+ **Health check:**
107
+ ```bash
108
+ curl -X POST http://localhost:8000/health
109
+ ```
110
+
111
+ **Count query (ืขื‘ืจื™ืช):**
112
+ ```bash
113
+ curl -X POST http://localhost:8000/query \
114
+ -H "Content-Type: application/json" \
115
+ -d '{"query":"ื›ืžื” ืžืฉืชืžืฉื™ื ื›ืชื‘ื• ืชื•ื“ื”","top_k":5}'
116
+ ```
117
+
118
+ **Complaint query:**
119
+ ```bash
120
+ curl -X POST http://localhost:8000/query \
121
+ -H "Content-Type: application/json" \
122
+ -d '{"query":"ื›ืžื” ืžืฉืชืžืฉื™ื ืžืชืœื•ื ื ื™ื ืขืœ ืืœืžื ื˜ื™ื ืฉืœื ืขื•ื‘ื“ื™ื ืœื”ื ื‘ืžืขืจื›ืช","top_k":5}'
123
+ ```
124
+
125
+ **Extract topics:**
126
+ ```bash
127
+ curl -X POST http://localhost:8000/topics \
128
+ -H "Content-Type: application/json" \
129
+ -d '{"num_topics":5}'
130
+ ```
131
+
132
+ **Analyze sentiment:**
133
+ ```bash
134
+ curl -X POST http://localhost:8000/sentiment \
135
+ -H "Content-Type: application/json" \
136
+ -d '{"limit":100}'
137
+ ```
138
+
139
+ **Build/rebuild index:**
140
+ ```bash
141
+ curl -X POST http://localhost:8000/ingest
142
+ ```
143
+
144
+ ### Option C: Python Client
145
+
146
+ Create a file `test_api.py`:
147
+ ```python
148
+ import requests
149
+ import json
150
+
151
+ BASE_URL = "http://localhost:8000"
152
+
153
+ # Test health
154
+ print("Testing /health...")
155
+ resp = requests.post(f"{BASE_URL}/health")
156
+ print(f"Status: {resp.status_code}")
157
+ print(f"Response: {resp.json()}\n")
158
+
159
+ # Test query
160
+ print("Testing /query...")
161
+ query_data = {
162
+ "query": "ื›ืžื” ืžืฉืชืžืฉื™ื ื›ืชื‘ื• ืชื•ื“ื”",
163
+ "top_k": 5
164
+ }
165
+ resp = requests.post(f"{BASE_URL}/query", json=query_data)
166
+ print(f"Status: {resp.status_code}")
167
+ result = resp.json()
168
+ print(f"Summary: {result.get('summary', 'N/A')}\n")
169
+
170
+ # Test topics
171
+ print("Testing /topics...")
172
+ topics_data = {"num_topics": 5}
173
+ resp = requests.post(f"{BASE_URL}/topics", json=topics_data)
174
+ print(f"Status: {resp.status_code}")
175
+ result = resp.json()
176
+ print(f"Found {len(result.get('topics', {}))} topics\n")
177
+
178
+ print("โœ“ All basic tests completed!")
179
+ ```
180
+
181
+ Run it:
182
+ ```bash
183
+ python3 test_api.py
184
+ ```
185
+
186
+ ## API Endpoints Reference
187
+
188
+ All endpoints use **POST** with JSON bodies:
189
+
190
+ | Endpoint | Body | Purpose |
191
+ |----------|------|---------|
192
+ | `/health` | `{}` | Check server status |
193
+ | `/query` | `{"query":"...", "top_k":5}` | Search/analyze feedback |
194
+ | `/topics` | `{"num_topics":5}` | Extract main topics |
195
+ | `/sentiment` | `{"limit":100}` | Analyze sentiment |
196
+ | `/ingest` | `{}` | Rebuild FAISS index (slow, one-time) |
197
+
198
+ ## Troubleshooting
199
+
200
+ ### Q: Server won't start
201
+ ```
202
+ ModuleNotFoundError: No module named 'xxx'
203
+ ```
204
+ **Fix:** Activate venv and reinstall:
205
+ ```bash
206
+ source .venv/bin/activate
207
+ pip install -r requirements.txt
208
+ ```
209
+
210
+ ### Q: First request takes forever
211
+ This is normal! The first request downloads and caches embedding models (~500MB). Subsequent requests are fast.
212
+ **Fix:** Just wait, or use pre-downloaded models (see advanced section).
213
+
214
+ ### Q: Can't find index
215
+ ```
216
+ FileNotFoundError: Vector index not found
217
+ ```
218
+ **Fix:** Run `/ingest` once:
219
+ ```bash
220
+ curl -X POST http://localhost:8000/ingest
221
+ ```
222
+
223
+ ### Q: Get JSON parsing error
224
+ Make sure you're sending proper JSON with `-H "Content-Type: application/json"`.
225
+
226
+ ### Q: Responses are in English but I want Hebrew
227
+ The API auto-detects query language and responds in the same language.
228
+
229
+ ## Project Structure (Reference)
230
+
231
+ ```
232
+ .
233
+ โ”œโ”€โ”€ app/ # Main application code
234
+ โ”‚ โ”œโ”€โ”€ api.py # FastAPI endpoints
235
+ โ”‚ โ”œโ”€โ”€ rag_service.py # RAG logic
236
+ โ”‚ โ”œโ”€โ”€ analysis.py # Query intent detection
237
+ โ”‚ โ”œโ”€โ”€ embedding.py # Text embeddings
238
+ โ”‚ โ”œโ”€โ”€ vector_store.py # FAISS wrapper
239
+ โ”‚ โ”œโ”€โ”€ sentiment.py # Sentiment analysis
240
+ โ”‚ โ”œโ”€โ”€ preprocess.py # Text preprocessing
241
+ โ”‚ โ”œโ”€โ”€ data_loader.py # CSV loading
242
+ โ”‚ โ”œโ”€โ”€ topics.py # Topic clustering
243
+ โ”‚ โ””โ”€โ”€ config.py # Configuration
244
+ โ”œโ”€โ”€ scripts/
245
+ โ”‚ โ”œโ”€โ”€ validate_local.py # Validation harness (this file)
246
+ โ”‚ โ”œโ”€โ”€ test_queries.py # Manual query testing
247
+ โ”‚ โ””โ”€โ”€ precompute_index.py # Build index offline
248
+ โ”œโ”€โ”€ Feedback.csv # Sample feedback data
249
+ โ”œโ”€โ”€ Dockerfile # Container definition
250
+ โ”œโ”€โ”€ docker-compose.yml # Docker compose (local dev)
251
+ โ”œโ”€โ”€ requirements.txt # Python dependencies
252
+ โ”œโ”€โ”€ run.py # Server entrypoint
253
+ โ””โ”€โ”€ README.md # Full documentation
254
+ ```
255
+
256
+ ## Advanced: Pre-compute Index Offline
257
+
258
+ If you want to avoid waiting for embedding downloads on first request:
259
+
260
+ ```bash
261
+ python3 scripts/precompute_index.py
262
+ ```
263
+
264
+ This creates `.vector_index/faiss.index` and `.vector_index/meta.parquet`. Subsequent server starts will use this cached index.
265
+
266
+ ## Deploy to Runpod
267
+
268
+ Once local testing is done, follow the **README.md** section "Run on Runpod - Full guide" to:
269
+ 1. Tag and push the Docker image
270
+ 2. Create a Runpod template
271
+ 3. Deploy the endpoint
272
+ 4. Test on the cloud
273
+
274
+ The entire cloud deployment keeps all your code unchanged โ€” it just uses your built Docker image.
275
+
276
+ ## Getting Help
277
+
278
+ - **API docs (interactive):** http://localhost:8000/docs
279
+ - **Full documentation:** See README.md
280
+ - **Config reference:** See app/config.py
281
+
282
+ ## Next Steps
283
+
284
+ 1. โœ… Validate with: `python3 scripts/validate_local.py`
285
+ 2. โœ… Start server: `python3 run.py`
286
+ 3. โœ… Test endpoints using Swagger UI or curl
287
+ 4. โœ… When happy, deploy to Runpod using README.md instructions
288
+
289
+ Good luck! ๐Ÿš€
app/api.py CHANGED
@@ -5,6 +5,7 @@ from typing import List, Optional, Dict, Any
5
  import numpy as np
6
  import pandas as pd
7
  from fastapi import FastAPI, Query
 
8
  from pydantic import BaseModel
9
 
10
  from .config import settings
@@ -16,7 +17,7 @@ from .topics import kmeans_topics
16
  from .vector_store import FaissVectorStore
17
 
18
 
19
- app = FastAPI(title="Feedback Analysis RAG Agent", version="1.0.0", default_response_class=None)
20
  svc = RAGService()
21
  embedder = svc.embedder
22
 
@@ -64,10 +65,10 @@ def query(req: QueryRequest) -> QueryResponse:
64
  summary=out.summary,
65
  results=[
66
  {
67
- "score": r.score,
68
- "service": r.row.get(settings.service_column, ""),
69
- "level": r.row.get(settings.level_column, ""),
70
- "text": r.row.get(settings.text_column, ""),
71
  }
72
  for r in out.results
73
  ],
 
5
  import numpy as np
6
  import pandas as pd
7
  from fastapi import FastAPI, Query
8
+ from fastapi.responses import ORJSONResponse
9
  from pydantic import BaseModel
10
 
11
  from .config import settings
 
17
  from .vector_store import FaissVectorStore
18
 
19
 
20
+ app = FastAPI(title="Feedback Analysis RAG Agent", version="1.0.0", default_response_class=ORJSONResponse)
21
  svc = RAGService()
22
  embedder = svc.embedder
23
 
 
65
  summary=out.summary,
66
  results=[
67
  {
68
+ "score": float(r.score), # Convert numpy float to Python float
69
+ "service": str(r.row.get(settings.service_column, "")),
70
+ "level": str(r.row.get(settings.level_column, "")),
71
+ "text": str(r.row.get(settings.text_column, "")),
72
  }
73
  for r in out.results
74
  ],
app/sentiment.py CHANGED
@@ -16,8 +16,29 @@ from transformers import pipeline # type: ignore
16
 
17
  @lru_cache(maxsize=1)
18
  def get_sentiment_pipeline():
19
- # Multilingual sentiment model
20
- return pipeline("sentiment-analysis", model="cardiffnlp/twitter-xlm-roberta-base-sentiment")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
 
23
  def analyze_sentiments(texts: List[str]) -> List[Dict[str, float | str]]:
 
16
 
17
  @lru_cache(maxsize=1)
18
  def get_sentiment_pipeline():
19
+ """Load sentiment analysis pipeline with fallback options."""
20
+ import os
21
+ os.environ['TOKENIZERS_PARALLELISM'] = 'false'
22
+
23
+ try:
24
+ # Try DistilBERT which works well for multilingual text (supports Hebrew)
25
+ return pipeline(
26
+ "sentiment-analysis",
27
+ model="nlptown/bert-base-multilingual-uncased-sentiment",
28
+ use_fast=False
29
+ )
30
+ except Exception as e1:
31
+ try:
32
+ # Fallback to simpler model
33
+ return pipeline("text-classification", model="gpt2", use_fast=False)
34
+ except Exception as e2:
35
+ # Final fallback: return a mock pipeline for development
36
+ import warnings
37
+ warnings.warn(f"Could not load sentiment models: {e1}, {e2}. Using mock pipeline.")
38
+ class MockPipeline:
39
+ def __call__(self, texts, **kwargs):
40
+ return [{"label": "NEUTRAL", "score": 0.5} for _ in texts]
41
+ return MockPipeline()
42
 
43
 
44
  def analyze_sentiments(texts: List[str]) -> List[Dict[str, float | str]]:
requirements.txt CHANGED
@@ -14,4 +14,5 @@ pydantic==2.9.2
14
  orjson==3.10.7
15
  google-generativeai==0.6.0
16
  pyarrow==14.0.2
 
17
 
 
14
  orjson==3.10.7
15
  google-generativeai==0.6.0
16
  pyarrow==14.0.2
17
+ tiktoken==0.7.0
18
 
scripts/validate_local.py ADDED
@@ -0,0 +1,314 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Complete validation and testing harness for local development.
2
+
3
+ This script:
4
+ 1. Checks dependencies
5
+ 2. Validates the CSV and index
6
+ 3. Tests all API endpoints
7
+ 4. Provides clear pass/fail feedback
8
+
9
+ Run this BEFORE testing manually to ensure everything works correctly.
10
+ """
11
+
12
+ from __future__ import annotations
13
+
14
+ import sys
15
+ import time
16
+ from pathlib import Path
17
+
18
+ # Color codes for terminal output
19
+ GREEN = "\033[92m"
20
+ RED = "\033[91m"
21
+ YELLOW = "\033[93m"
22
+ BLUE = "\033[94m"
23
+ RESET = "\033[0m"
24
+
25
+
26
+ def print_status(message: str, status: str = "INFO") -> None:
27
+ """Print colored status messages."""
28
+ colors = {
29
+ "PASS": GREEN,
30
+ "FAIL": RED,
31
+ "WARN": YELLOW,
32
+ "INFO": BLUE,
33
+ }
34
+ color = colors.get(status, RESET)
35
+ print(f"{color}[{status}]{RESET} {message}")
36
+
37
+
38
+ def check_dependencies() -> bool:
39
+ """Verify all required packages are installed."""
40
+ print_status("Checking dependencies...", "INFO")
41
+ required = [
42
+ ("pandas", "pandas"),
43
+ ("fastapi", "fastapi"),
44
+ ("pydantic", "pydantic"),
45
+ ("sentence_transformers", "sentence_transformers"),
46
+ ("transformers", "transformers"),
47
+ ("faiss", "faiss"),
48
+ ("numpy", "numpy"),
49
+ ]
50
+
51
+ missing = []
52
+ for pkg_name, import_name in required:
53
+ try:
54
+ __import__(import_name)
55
+ print_status(f"โœ“ {pkg_name}", "PASS")
56
+ except ImportError:
57
+ print_status(f"โœ— {pkg_name} NOT FOUND", "FAIL")
58
+ missing.append(pkg_name)
59
+
60
+ if missing:
61
+ print_status(
62
+ f"Missing packages: {', '.join(missing)}. "
63
+ "Run: pip install -r requirements.txt",
64
+ "FAIL"
65
+ )
66
+ return False
67
+ return True
68
+
69
+
70
+ def check_csv() -> bool:
71
+ """Verify CSV exists and has required columns."""
72
+ print_status("Checking CSV...", "INFO")
73
+ csv_path = Path("Feedback.csv")
74
+
75
+ if not csv_path.exists():
76
+ print_status(f"CSV not found at {csv_path}", "FAIL")
77
+ return False
78
+
79
+ try:
80
+ import pandas as pd
81
+ df = pd.read_csv(csv_path)
82
+ required_cols = ["ID", "ServiceName", "Level", "Text"]
83
+ missing_cols = [c for c in required_cols if c not in df.columns]
84
+
85
+ if missing_cols:
86
+ print_status(f"Missing columns: {missing_cols}", "FAIL")
87
+ return False
88
+
89
+ print_status(f"โœ“ CSV valid: {len(df)} rows, {len(df.columns)} columns", "PASS")
90
+ return True
91
+ except Exception as e:
92
+ print_status(f"Error reading CSV: {e}", "FAIL")
93
+ return False
94
+
95
+
96
+ def check_index() -> bool:
97
+ """Verify FAISS index is precomputed."""
98
+ print_status("Checking FAISS index...", "INFO")
99
+
100
+ index_path = Path(".vector_index/faiss.index")
101
+ meta_path = Path(".vector_index/meta.parquet")
102
+
103
+ if not index_path.exists():
104
+ print_status(
105
+ f"Index not found at {index_path}. "
106
+ "Run: python scripts/precompute_index.py",
107
+ "WARN"
108
+ )
109
+ return False
110
+
111
+ if not meta_path.exists():
112
+ print_status(f"Metadata not found at {meta_path}", "FAIL")
113
+ return False
114
+
115
+ try:
116
+ index_size = index_path.stat().st_size / (1024 * 1024) # MB
117
+ print_status(f"โœ“ Index found ({index_size:.1f} MB)", "PASS")
118
+ return True
119
+ except Exception as e:
120
+ print_status(f"Error checking index: {e}", "FAIL")
121
+ return False
122
+
123
+
124
+ def test_imports() -> bool:
125
+ """Test that all app modules import correctly."""
126
+ print_status("Testing app imports...", "INFO")
127
+
128
+ try:
129
+ from app.config import settings
130
+ from app.data_loader import load_feedback
131
+ from app.analysis import detect_query_type, resolve_count_from_type
132
+ from app.rag_service import RAGService
133
+ from app.api import app
134
+
135
+ print_status("โœ“ All imports successful", "PASS")
136
+ return True
137
+ except Exception as e:
138
+ print_status(f"Import error: {e}", "FAIL")
139
+ return False
140
+
141
+
142
+ def test_analysis_logic() -> bool:
143
+ """Test query analysis and counting logic (no embeddings needed)."""
144
+ print_status("Testing analysis logic (lightweight)...", "INFO")
145
+
146
+ try:
147
+ from app.data_loader import load_feedback
148
+ from app.analysis import detect_query_type, resolve_count_from_type
149
+
150
+ df = load_feedback()
151
+
152
+ # Test 1: Count thanks
153
+ qtype, target = detect_query_type("ื›ืžื” ืžืฉืชืžืฉื™ื ื›ืชื‘ื• ืชื•ื“ื”")
154
+ result = resolve_count_from_type(df, qtype, target)
155
+ assert result["type"] == "count"
156
+ thanks_count = result["count"]
157
+ print_status(f"โœ“ Thanks count: {thanks_count}", "PASS")
158
+
159
+ # Test 2: Count complaints
160
+ qtype, target = detect_query_type("ื›ืžื” ืžืฉืชืžืฉื™ื ืžืชืœื•ื ื ื™ื ืขืœ ืืœืžื ื˜ื™ื ืฉืœื ืขื•ื‘ื“ื™ื")
161
+ result = resolve_count_from_type(df, qtype, target)
162
+ assert result["type"] == "count"
163
+ complaint_count = result["count"]
164
+ print_status(f"โœ“ Complaint count: {complaint_count}", "PASS")
165
+
166
+ return True
167
+ except Exception as e:
168
+ print_status(f"Analysis test error: {e}", "FAIL")
169
+ return False
170
+
171
+
172
+ def test_rag_service() -> bool:
173
+ """Test RAGService with precomputed index."""
174
+ print_status("Testing RAGService...", "INFO")
175
+
176
+ try:
177
+ from app.rag_service import RAGService
178
+
179
+ svc = RAGService()
180
+ print_status("โœ“ RAGService initialized", "PASS")
181
+
182
+ # Test query (should use precomputed index)
183
+ result = svc.answer("ื›ืžื” ืžืฉืชืžืฉื™ื ื›ืชื‘ื• ืชื•ื“ื”", top_k=3)
184
+
185
+ if result.summary:
186
+ print_status(f"โœ“ Query response: {result.summary[:60]}...", "PASS")
187
+ else:
188
+ print_status("Query returned empty summary", "WARN")
189
+
190
+ if result.results:
191
+ print_status(f"โœ“ Retrieved {len(result.results)} results", "PASS")
192
+ else:
193
+ print_status("No results retrieved (may be expected if index small)", "WARN")
194
+
195
+ return True
196
+ except Exception as e:
197
+ print_status(f"RAGService error: {e}", "FAIL")
198
+ return False
199
+
200
+
201
+ def test_api_endpoints() -> bool:
202
+ """Test FastAPI endpoints locally."""
203
+ print_status("Testing API endpoints...", "INFO")
204
+
205
+ try:
206
+ from fastapi.testclient import TestClient
207
+ from app.api import app
208
+
209
+ client = TestClient(app)
210
+
211
+ # Test /health
212
+ resp = client.post("/health")
213
+ assert resp.status_code == 200, f"Health check failed: {resp.status_code}"
214
+ print_status("โœ“ POST /health works", "PASS")
215
+
216
+ # Test /query
217
+ resp = client.post("/query", json={"query": "ื›ืžื” ืžืฉืชืžืฉื™ื ื›ืชื‘ื• ืชื•ื“ื”", "top_k": 3})
218
+ assert resp.status_code == 200, f"Query failed: {resp.status_code}"
219
+ data = resp.json()
220
+ assert "summary" in data, "Query response missing summary"
221
+ print_status(f"โœ“ POST /query works (summary: {data['summary'][:50]}...)", "PASS")
222
+
223
+ # Test /topics
224
+ resp = client.post("/topics", json={"num_topics": 3})
225
+ assert resp.status_code == 200, f"Topics failed: {resp.status_code}"
226
+ data = resp.json()
227
+ assert "topics" in data, "Topics response missing topics"
228
+ print_status(f"โœ“ POST /topics works ({len(data.get('topics', {}))} topics)", "PASS")
229
+
230
+ # Test /sentiment
231
+ resp = client.post("/sentiment", json={"limit": 50})
232
+ assert resp.status_code == 200, f"Sentiment failed: {resp.status_code}"
233
+ data = resp.json()
234
+ assert "results" in data, "Sentiment response missing results"
235
+ print_status(f"โœ“ POST /sentiment works ({data['count']} results)", "PASS")
236
+
237
+ # Test /ingest (will try to rebuild index)
238
+ print_status("Testing /ingest (will rebuild index)...", "WARN")
239
+ start = time.time()
240
+ resp = client.post("/ingest")
241
+ elapsed = time.time() - start
242
+ assert resp.status_code == 200, f"Ingest failed: {resp.status_code}"
243
+ print_status(f"โœ“ POST /ingest works (took {elapsed:.1f}s)", "PASS")
244
+
245
+ return True
246
+ except Exception as e:
247
+ print_status(f"API test error: {e}", "FAIL")
248
+ import traceback
249
+ traceback.print_exc()
250
+ return False
251
+
252
+
253
+ def main() -> None:
254
+ """Run all validations."""
255
+ print(f"\n{BLUE}{'='*60}")
256
+ print("FEEDBACK ANALYSIS RAG AGENT - LOCAL VALIDATION")
257
+ print(f"{'='*60}{RESET}\n")
258
+
259
+ checks = [
260
+ ("Dependencies", check_dependencies),
261
+ ("CSV file", check_csv),
262
+ ("FAISS Index", check_index),
263
+ ("App imports", test_imports),
264
+ ("Analysis logic", test_analysis_logic),
265
+ ("RAGService", test_rag_service),
266
+ ("API endpoints", test_api_endpoints),
267
+ ]
268
+
269
+ results = []
270
+ for name, check_func in checks:
271
+ print(f"\n{name}:")
272
+ print("-" * 60)
273
+ try:
274
+ passed = check_func()
275
+ results.append((name, passed))
276
+ except Exception as e:
277
+ print_status(f"Unexpected error: {e}", "FAIL")
278
+ results.append((name, False))
279
+ import traceback
280
+ traceback.print_exc()
281
+
282
+ # Summary
283
+ print(f"\n{BLUE}{'='*60}")
284
+ print("VALIDATION SUMMARY")
285
+ print(f"{'='*60}{RESET}\n")
286
+
287
+ passed_count = sum(1 for _, p in results if p)
288
+ total_count = len(results)
289
+
290
+ for name, passed in results:
291
+ status = "PASS" if passed else "FAIL"
292
+ color = GREEN if passed else RED
293
+ print(f"{color}[{status}]{RESET} {name}")
294
+
295
+ print(f"\n{'-'*60}")
296
+ if passed_count == total_count:
297
+ print_status(f"All {total_count} checks PASSED! Ready for local testing.", "PASS")
298
+ print("\nNext steps:")
299
+ print(" 1. Run: python run.py")
300
+ print(" 2. Open: http://localhost:8000/docs")
301
+ print(" 3. Or use curl (see QUICK_START.md)")
302
+ sys.exit(0)
303
+ else:
304
+ print_status(
305
+ f"{passed_count}/{total_count} checks passed. "
306
+ f"{total_count - passed_count} checks FAILED.",
307
+ "FAIL"
308
+ )
309
+ print("\nPlease fix the errors above before testing.")
310
+ sys.exit(1)
311
+
312
+
313
+ if __name__ == "__main__":
314
+ main()