USF00 commited on
Commit
4b5c25d
·
0 Parent(s):

Initial deployment setup for Recommendation_Deploy

Browse files
Files changed (11) hide show
  1. .dockerignore +11 -0
  2. .gitattributes +1 -0
  3. .gitignore +7 -0
  4. Dockerfile +26 -0
  5. Final_Recommendation.ipynb +0 -0
  6. README.md +198 -0
  7. app.py +325 -0
  8. recommender.py +553 -0
  9. requirements.txt +13 -0
  10. sample_data/books.csv +201 -0
  11. utils.py +106 -0
.dockerignore ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ *.pyc
3
+ *.pyo
4
+ .git/
5
+ .gitignore
6
+ *.ipynb
7
+ *.md
8
+ .env
9
+ venv/
10
+ .venv/
11
+ extracted_notebook.py
.gitattributes ADDED
@@ -0,0 +1 @@
 
 
1
+ * text=auto
.gitignore ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ *.pyc
3
+ *.pyo
4
+ venv/
5
+ .venv/
6
+ .env
7
+ extracted_notebook.py
Dockerfile ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.9-slim
2
+
3
+ ENV HF_HOME=/tmp/huggingface
4
+ ENV PYTHONUNBUFFERED=1
5
+
6
+ # Set up a new user named "user" with user ID 1000
7
+ RUN useradd -m -u 1000 user
8
+ USER user
9
+ ENV PATH="/home/user/.local/bin:$PATH"
10
+
11
+ # Set working directory
12
+ WORKDIR /app
13
+
14
+ # Copy requirements and install
15
+ COPY --chown=user requirements.txt .
16
+ RUN pip install --no-cache-dir --upgrade pip && \
17
+ pip install --no-cache-dir -r requirements.txt
18
+
19
+ # Copy project files
20
+ COPY --chown=user . .
21
+
22
+ # Expose port 7860 for Hugging Face Spaces
23
+ EXPOSE 7860
24
+
25
+ # Run uvicorn
26
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
Final_Recommendation.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
README.md ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: LITVISION Recommendation API
3
+ emoji: 📚
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: docker
7
+ pinned: false
8
+ license: mit
9
+ ---
10
+
11
+ # LITVISION Book Recommendation API
12
+
13
+ A production-ready FastAPI service for the LITVISION Book Recommendation Feature. This API provides personalized book recommendations using zero-shot genre classification, SentenceTransformer embeddings, FAISS similarity search, and an event-weighted ranking pipeline.
14
+
15
+ Fully configured for deployment on Hugging Face Spaces with Docker SDK.
16
+
17
+ ## Features
18
+
19
+ - **Zero-Shot Genre Classification** using `joeddav/xlm-roberta-large-xnli`
20
+ - **SentenceTransformer Embeddings** using `paraphrase-multilingual-MiniLM-L12-v2`
21
+ - **FAISS Similarity Search** with `IndexFlatIP` for cosine similarity on normalized vectors
22
+ - **Event-Weighted User Profiling** with configurable view/like weights
23
+ - **Genre-Balanced Feed Allocation** for diverse recommendations
24
+ - **Cosine-Similarity Ranking Pipeline** for personalized ordering
25
+ - **GPU/CPU Fallback** with FP16 optimization on CUDA
26
+ - **Async Processing** via `asyncio.to_thread` for non-blocking inference
27
+ - **Production Error Handling** including CUDA OOM recovery
28
+
29
+ ## API Endpoints
30
+
31
+ ### GET /
32
+
33
+ Returns basic API information.
34
+
35
+ ```json
36
+ {
37
+ "api": "LITVISION Book Recommendation API",
38
+ "status": "online",
39
+ "version": "1.0.0",
40
+ "endpoints": ["/health", "/recommend"]
41
+ }
42
+ ```
43
+
44
+ ### GET /health
45
+
46
+ Returns health status and model readiness.
47
+
48
+ ```json
49
+ {
50
+ "status": "healthy",
51
+ "models_loaded": true,
52
+ "device": "cuda",
53
+ "total_books": 200,
54
+ "faiss_index_size": 200
55
+ }
56
+ ```
57
+
58
+ ### POST /recommend
59
+
60
+ Generates personalized book recommendations for a user.
61
+
62
+ **Request Body:**
63
+
64
+ ```json
65
+ {
66
+ "user_id": 1,
67
+ "interactions": [
68
+ {
69
+ "book_id": 5,
70
+ "event_type": "like",
71
+ "timestamp": "2025-01-01T00:00:00"
72
+ },
73
+ {
74
+ "book_id": 12,
75
+ "event_type": "view",
76
+ "timestamp": "2025-01-02T00:00:00"
77
+ }
78
+ ],
79
+ "favorite_genres": ["Fantasy", "Science Fiction"],
80
+ "viewed_books": [1, 2, 3],
81
+ "feed_size": 20
82
+ }
83
+ ```
84
+
85
+ **Parameters:**
86
+
87
+ | Field | Type | Required | Default | Description |
88
+ |---|---|---|---|---|
89
+ | user_id | int | Yes | — | Unique user identifier (> 0) |
90
+ | interactions | list | No | null | Explicit user-book interaction events |
91
+ | favorite_genres | list | No | null | Preferred genres for boosting |
92
+ | viewed_books | list | No | null | Book IDs already viewed by the user |
93
+ | feed_size | int | No | 20 | Number of recommendations (1-100) |
94
+
95
+ **Valid Genres:**
96
+
97
+ Fantasy, Romance, Mystery, Science Fiction, Self-Help, History, Business, Children, Horror, Poetry
98
+
99
+ **Response:**
100
+
101
+ ```json
102
+ {
103
+ "success": true,
104
+ "user_id": 1,
105
+ "recommendations": [
106
+ {
107
+ "book_id": 42,
108
+ "title": "Fantasy Book 42",
109
+ "author": "Author 7",
110
+ "genre": "Fantasy",
111
+ "score": 0.9234
112
+ }
113
+ ],
114
+ "genre_distribution": {
115
+ "Fantasy": 8,
116
+ "Romance": 4,
117
+ "Mystery": 3,
118
+ "Science Fiction": 2,
119
+ "Self-Help": 1,
120
+ "History": 1,
121
+ "Children": 1
122
+ },
123
+ "total_recommendations": 20,
124
+ "processing_time_seconds": 1.234
125
+ }
126
+ ```
127
+
128
+ ## Folder Structure
129
+
130
+ ```text
131
+ .
132
+ ├── app.py # FastAPI endpoints and lifespan events
133
+ ├── recommender.py # Full recommendation pipeline engine
134
+ ├── utils.py # Logging, device helpers, and cleanup
135
+ ├── requirements.txt # Python dependencies
136
+ ├── Dockerfile # Container configuration for HF Spaces
137
+ ├── .dockerignore # Docker build exclusions
138
+ ├── .gitignore # Git exclusions
139
+ ├── .gitattributes # Line ending configuration
140
+ ├── README.md # This file
141
+ └── sample_data/
142
+ └── books.csv # Sample book dataset (200 books)
143
+ ```
144
+
145
+ ## Local Development
146
+
147
+ ### 1. Install Python Dependencies
148
+
149
+ ```bash
150
+ pip install -r requirements.txt
151
+ ```
152
+
153
+ ### 2. Run the Server
154
+
155
+ ```bash
156
+ uvicorn app:app --host 0.0.0.0 --port 7860 --reload
157
+ ```
158
+
159
+ ### 3. Test with cURL
160
+
161
+ ```bash
162
+ curl -X POST http://localhost:7860/recommend \
163
+ -H "Content-Type: application/json" \
164
+ -d '{"user_id": 1, "feed_size": 10}'
165
+ ```
166
+
167
+ ## Docker Build and Run
168
+
169
+ ### Build the Image
170
+
171
+ ```bash
172
+ docker build -t litvision-recommender .
173
+ ```
174
+
175
+ ### Run the Container
176
+
177
+ ```bash
178
+ docker run -p 7860:7860 litvision-recommender
179
+ ```
180
+
181
+ With GPU support:
182
+
183
+ ```bash
184
+ docker run -p 7860:7860 --gpus all litvision-recommender
185
+ ```
186
+
187
+ ## Deployment to Hugging Face Spaces
188
+
189
+ 1. Go to [Hugging Face](https://huggingface.co) and create a new Space.
190
+ 2. Select **Docker** as the Space SDK.
191
+ 3. Upload all the files in this directory to the repository.
192
+ 4. The Space will automatically build the container and start the Uvicorn server on port 7860.
193
+
194
+ ## Troubleshooting
195
+
196
+ - **Models loading slowly:** The first startup downloads `xlm-roberta-large-xnli` (~2.2 GB) and `paraphrase-multilingual-MiniLM-L12-v2`. Subsequent starts use the cached models.
197
+ - **CUDA OOM:** The API automatically clears CUDA cache and returns HTTP 503. Retry the request or reduce `feed_size`.
198
+ - **503 on first request:** Models may still be loading. Check `/health` endpoint for status.
app.py ADDED
@@ -0,0 +1,325 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LITVISION Recommendation API
3
+ ==============================
4
+ Production FastAPI application for personalized book recommendations.
5
+ Deployed on Hugging Face Spaces via Docker SDK.
6
+ """
7
+
8
+ import time
9
+ import logging
10
+ import asyncio
11
+ from contextlib import asynccontextmanager
12
+ from typing import Dict, List, Optional
13
+
14
+ import torch
15
+ from fastapi import FastAPI, HTTPException, Request, BackgroundTasks
16
+ from fastapi.middleware.cors import CORSMiddleware
17
+ from fastapi.responses import JSONResponse
18
+ from pydantic import BaseModel, Field, field_validator
19
+
20
+ from utils import setup_logging, safe_cuda_empty_cache, cleanup_temp_files, validate_positive_int
21
+ from recommender import engine, GENRES
22
+
23
+ # ---------------------------------------------------------------------------
24
+ # Logging
25
+ # ---------------------------------------------------------------------------
26
+
27
+ setup_logging()
28
+ logger = logging.getLogger("litvision.recommendation")
29
+
30
+ # ---------------------------------------------------------------------------
31
+ # Pydantic models
32
+ # ---------------------------------------------------------------------------
33
+
34
+ class Interaction(BaseModel):
35
+ """A single user–book interaction event."""
36
+ book_id: int = Field(..., gt=0, description="ID of the book")
37
+ event_type: str = Field(
38
+ default="view",
39
+ description="Interaction type: 'view' or 'like'",
40
+ )
41
+ timestamp: Optional[str] = Field(
42
+ default=None,
43
+ description="ISO-8601 timestamp of the interaction",
44
+ )
45
+
46
+ @field_validator("event_type")
47
+ @classmethod
48
+ def validate_event_type(cls, v: str) -> str:
49
+ allowed = {"view", "like"}
50
+ if v not in allowed:
51
+ raise ValueError(f"event_type must be one of {allowed}, got '{v}'")
52
+ return v
53
+
54
+
55
+ class RecommendRequest(BaseModel):
56
+ """Payload for POST /recommend."""
57
+ user_id: int = Field(..., gt=0, description="Unique user identifier")
58
+ interactions: Optional[List[Interaction]] = Field(
59
+ default=None,
60
+ description="List of explicit user–book interactions",
61
+ )
62
+ favorite_genres: Optional[List[str]] = Field(
63
+ default=None,
64
+ description="User's preferred genres",
65
+ )
66
+ viewed_books: Optional[List[int]] = Field(
67
+ default=None,
68
+ description="IDs of books the user has already viewed",
69
+ )
70
+ feed_size: int = Field(
71
+ default=20,
72
+ ge=1,
73
+ le=100,
74
+ description="Number of recommendations to return (1-100)",
75
+ )
76
+
77
+ @field_validator("favorite_genres")
78
+ @classmethod
79
+ def validate_genres(cls, v: Optional[List[str]]) -> Optional[List[str]]:
80
+ if v is not None:
81
+ invalid = [g for g in v if g not in GENRES]
82
+ if invalid:
83
+ raise ValueError(
84
+ f"Invalid genres: {invalid}. Valid genres: {GENRES}"
85
+ )
86
+ return v
87
+
88
+ @field_validator("viewed_books")
89
+ @classmethod
90
+ def validate_viewed_books(cls, v: Optional[List[int]]) -> Optional[List[int]]:
91
+ if v is not None:
92
+ for bid in v:
93
+ if bid < 1:
94
+ raise ValueError(f"viewed_books IDs must be positive, got {bid}")
95
+ return v
96
+
97
+
98
+ class BookResponse(BaseModel):
99
+ """A single recommended book."""
100
+ book_id: int
101
+ title: str
102
+ author: str
103
+ genre: str
104
+ score: Optional[float] = None
105
+
106
+
107
+ class RecommendResponse(BaseModel):
108
+ """Response from POST /recommend."""
109
+ success: bool
110
+ user_id: int
111
+ recommendations: List[BookResponse]
112
+ genre_distribution: Dict[str, int]
113
+ total_recommendations: int
114
+ processing_time_seconds: float
115
+
116
+
117
+ class RootResponse(BaseModel):
118
+ api: str
119
+ status: str
120
+ version: str
121
+ endpoints: List[str]
122
+
123
+
124
+ class HealthResponse(BaseModel):
125
+ status: str
126
+ models_loaded: bool
127
+ device: str
128
+ total_books: int
129
+ faiss_index_size: int
130
+
131
+
132
+ class VersionResponse(BaseModel):
133
+ service: str
134
+ version: str
135
+
136
+
137
+ # ---------------------------------------------------------------------------
138
+ # Application lifespan
139
+ # ---------------------------------------------------------------------------
140
+
141
+ @asynccontextmanager
142
+ async def lifespan(app: FastAPI):
143
+ """Startup: load models. Shutdown: cleanup caches."""
144
+ logger.info("API starting...")
145
+ try:
146
+ logger.info("Loading recommendation engine...")
147
+ await asyncio.to_thread(engine.load_models)
148
+ logger.info("Recommendation engine loaded successfully")
149
+ logger.info("Server ready")
150
+ except Exception as exc:
151
+ logger.error(f"Failed to load models on startup: {exc}", exc_info=True)
152
+ # Allow the app to start anyway so /health can report the issue
153
+ yield
154
+ logger.info("Shutting down — cleaning up …")
155
+ cleanup_temp_files()
156
+ safe_cuda_empty_cache()
157
+ logger.info("Shutdown complete")
158
+
159
+
160
+ # ---------------------------------------------------------------------------
161
+ # FastAPI app
162
+ # ---------------------------------------------------------------------------
163
+
164
+ app = FastAPI(
165
+ title="LITVISION Book Recommendation API",
166
+ description=(
167
+ "AI-powered personalized book recommendation service using "
168
+ "zero-shot classification, SentenceTransformer embeddings, "
169
+ "and FAISS similarity search."
170
+ ),
171
+ version="1.0.0",
172
+ lifespan=lifespan,
173
+ )
174
+
175
+ app.add_middleware(
176
+ CORSMiddleware,
177
+ allow_origins=["*"],
178
+ allow_credentials=True,
179
+ allow_methods=["*"],
180
+ allow_headers=["*"],
181
+ )
182
+
183
+ # ---------------------------------------------------------------------------
184
+ # Endpoints
185
+ # ---------------------------------------------------------------------------
186
+
187
+
188
+ @app.get("/", response_model=RootResponse, tags=["Recommendation"])
189
+ async def root():
190
+ """Basic API information."""
191
+ return {
192
+ "api": "LITVISION Book Recommendation API",
193
+ "status": "online",
194
+ "version": "1.0.0",
195
+ "endpoints": ["/health", "/recommend", "/version"],
196
+ }
197
+
198
+
199
+ @app.get("/health", response_model=HealthResponse, tags=["Recommendation"])
200
+ async def health():
201
+ """Health check — reports model readiness and device info."""
202
+ return {
203
+ "status": "healthy" if engine._loaded else "loading",
204
+ "models_loaded": engine._loaded,
205
+ "device": engine.device,
206
+ "total_books": len(engine.books_df) if engine.books_df is not None else 0,
207
+ "faiss_index_size": engine.faiss_index.ntotal if engine.faiss_index else 0,
208
+ }
209
+
210
+ @app.get("/version", response_model=VersionResponse, tags=["Recommendation"])
211
+ async def version():
212
+ """Return API version information."""
213
+ return {
214
+ "service": "LITVISION Recommendation API",
215
+ "version": "1.0.0"
216
+ }
217
+
218
+
219
+ @app.post("/recommend", response_model=RecommendResponse, tags=["Recommendation"])
220
+ async def recommend(request: RecommendRequest, background_tasks: BackgroundTasks):
221
+ """
222
+ Generate personalized book recommendations.
223
+
224
+ Uses the full notebook pipeline:
225
+ 1. Build user interaction DataFrame from request payload
226
+ 2. Compute genre interest ratios (event-weighted)
227
+ 3. Genre-balanced feed allocation
228
+ 4. Cosine-similarity ranking via user embedding vector
229
+ """
230
+ start_time = time.time()
231
+
232
+ # Guard: models must be loaded
233
+ if not engine._loaded:
234
+ raise HTTPException(
235
+ status_code=503,
236
+ detail="Models are still loading. Please retry in a few moments.",
237
+ )
238
+
239
+ try:
240
+ # 1. Build interactions DataFrame
241
+ interactions_dicts = None
242
+ if request.interactions:
243
+ interactions_dicts = [
244
+ {
245
+ "book_id": i.book_id,
246
+ "event_type": i.event_type,
247
+ "timestamp": i.timestamp,
248
+ }
249
+ for i in request.interactions
250
+ ]
251
+
252
+ interactions_df = await asyncio.to_thread(
253
+ engine.build_interactions_df,
254
+ request.user_id,
255
+ interactions_dicts,
256
+ request.viewed_books,
257
+ request.favorite_genres,
258
+ )
259
+
260
+ # 2. Generate recommendations (heavy — offloaded from event loop)
261
+ try:
262
+ feed = await asyncio.wait_for(
263
+ asyncio.to_thread(
264
+ engine.build_mixed_feed,
265
+ request.user_id,
266
+ interactions_df,
267
+ request.feed_size,
268
+ ),
269
+ timeout=60.0
270
+ )
271
+ except asyncio.TimeoutError:
272
+ raise HTTPException(status_code=504, detail="Request processing timed out.")
273
+
274
+ # 3. Build response
275
+ recommendations: List[BookResponse] = []
276
+ for _, row in feed.iterrows():
277
+ recommendations.append(
278
+ BookResponse(
279
+ book_id=int(row["book_id"]),
280
+ title=str(row["title"]),
281
+ author=str(row["author"]),
282
+ genre=str(row["genre"]),
283
+ score=round(float(row["score"]), 4) if "score" in row.index else None,
284
+ )
285
+ )
286
+
287
+ genre_dist = feed["genre"].value_counts().to_dict()
288
+ elapsed = round(time.time() - start_time, 3)
289
+
290
+ logger.info(
291
+ f"Recommendation for user {request.user_id}: "
292
+ f"{len(recommendations)} books in {elapsed}s"
293
+ )
294
+
295
+ return RecommendResponse(
296
+ success=True,
297
+ user_id=request.user_id,
298
+ recommendations=recommendations,
299
+ genre_distribution=genre_dist,
300
+ total_recommendations=len(recommendations),
301
+ processing_time_seconds=elapsed,
302
+ )
303
+
304
+ except torch.cuda.OutOfMemoryError as exc:
305
+ safe_cuda_empty_cache()
306
+ logger.error(f"CUDA OOM during recommendation: {exc}")
307
+ raise HTTPException(
308
+ status_code=503,
309
+ detail="GPU out of memory. CUDA cache cleared — please retry.",
310
+ )
311
+ except ValueError as exc:
312
+ logger.warning(f"Validation error: {exc}")
313
+ raise HTTPException(status_code=400, detail=str(exc))
314
+ except Exception as exc:
315
+ logger.error(f"Recommendation error: {exc}", exc_info=True)
316
+ error_msg = str(exc).lower()
317
+ if "out of memory" in error_msg:
318
+ safe_cuda_empty_cache()
319
+ raise HTTPException(
320
+ status_code=503,
321
+ detail="Out of memory. Cache cleared — please retry.",
322
+ )
323
+ raise HTTPException(status_code=500, detail=f"Internal error: {exc}")
324
+ finally:
325
+ background_tasks.add_task(cleanup_temp_files)
recommender.py ADDED
@@ -0,0 +1,553 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LITVISION Recommendation Engine
3
+ ================================
4
+ Complete recommendation pipeline preserving every component from
5
+ the original Jupyter Notebook:
6
+
7
+ • Zero-shot genre classification (joeddav/xlm-roberta-large-xnli)
8
+ • SentenceTransformer embeddings (paraphrase-multilingual-MiniLM-L12-v2)
9
+ • FAISS IndexFlatIP similarity search
10
+ • Event-weighted user vector construction
11
+ • Proportional genre-balanced feed allocation
12
+ • Cosine-similarity ranking pipeline
13
+
14
+ NO logic has been simplified, removed, or replaced.
15
+ """
16
+
17
+ import os
18
+ import logging
19
+ from datetime import datetime, timedelta
20
+ from typing import Dict, List, Optional, Set, Tuple
21
+
22
+ import numpy as np
23
+ import pandas as pd
24
+ import torch
25
+ import faiss
26
+ from transformers import pipeline as hf_pipeline
27
+ from sentence_transformers import SentenceTransformer
28
+
29
+ from utils import get_device, safe_cuda_empty_cache
30
+
31
+ logger = logging.getLogger("litvision.recommendation")
32
+
33
+ # ═══════════════════════════════════════════════════════════════════════════
34
+ # Constants — identical to the notebook
35
+ # ═══════════════════════════════════════════════════════════════════════════
36
+
37
+ GENRES: List[str] = [
38
+ "Fantasy", "Romance", "Mystery", "Science Fiction", "Self-Help",
39
+ "History", "Business", "Children", "Horror", "Poetry",
40
+ ]
41
+
42
+ EVENT_W: Dict[str, float] = {"view": 1.0, "like": 3.0}
43
+
44
+ TEMPLATES: Dict[str, List[str]] = {
45
+ "Fantasy": [
46
+ "A young hero discovers a hidden kingdom and must defeat a dark sorcerer.",
47
+ "Dragons rise again as an ancient prophecy awakens in the north.",
48
+ ],
49
+ "Romance": [
50
+ "Two strangers meet in a small cafe and find love against all odds.",
51
+ "A long-distance relationship is tested by secrets and time.",
52
+ ],
53
+ "Mystery": [
54
+ "A detective investigates a series of murders in a quiet town.",
55
+ "A missing diary reveals clues to an old family crime.",
56
+ ],
57
+ "Science Fiction": [
58
+ "A crew travels through a wormhole to save humanity from collapse.",
59
+ "An AI gains consciousness and changes the future of Earth.",
60
+ ],
61
+ "Self-Help": [
62
+ "A practical guide to build habits and improve focus every day.",
63
+ "Learn to manage anxiety with simple routines and mindset shifts.",
64
+ ],
65
+ "History": [
66
+ "An account of ancient empires and the wars that shaped the world.",
67
+ "A deep dive into the political revolutions of the 20th century.",
68
+ ],
69
+ "Business": [
70
+ "How startups scale products and build strong teams.",
71
+ "Negotiation tactics and leadership strategies for managers.",
72
+ ],
73
+ "Children": [
74
+ "A curious cat explores the city and learns about friendship.",
75
+ "A magical school adventure for kids with puzzles and fun.",
76
+ ],
77
+ "Horror": [
78
+ "A haunted house whispers at night, luring visitors inside.",
79
+ "A village faces a terrifying creature in the woods.",
80
+ ],
81
+ "Poetry": [
82
+ "A collection of poems about love, loss, and hope.",
83
+ "Minimalist poems inspired by nature and silence.",
84
+ ],
85
+ }
86
+
87
+ SAMPLE_DATA_DIR = os.path.join(os.path.dirname(__file__), "sample_data")
88
+ BOOKS_CSV_PATH = os.path.join(SAMPLE_DATA_DIR, "books.csv")
89
+
90
+ # ═══════════════════════════════════════════════════════════════════════════
91
+ # Recommendation Engine
92
+ # ═══════════════════════════════════════════════════════════════════════════
93
+
94
+
95
+ class RecommendationEngine:
96
+ """
97
+ Production wrapper around the full notebook recommendation pipeline.
98
+ Models are loaded lazily on first call or explicitly via ``load_models()``.
99
+ """
100
+
101
+ def __init__(self) -> None:
102
+ self.device: str = "cpu"
103
+ self.embed_model: Optional[SentenceTransformer] = None
104
+ self.zero_shot = None
105
+ self.books_df: Optional[pd.DataFrame] = None
106
+ self.book_embeddings: Optional[np.ndarray] = None
107
+ self.faiss_index: Optional[faiss.IndexFlatIP] = None
108
+ self.bookid_to_idx: Dict[int, int] = {}
109
+ self._loaded = False
110
+ # Per-user feed state (identical to notebook)
111
+ self.user_feed_state: Dict[int, Set[int]] = {}
112
+
113
+ # ------------------------------------------------------------------
114
+ # Model loading
115
+ # ------------------------------------------------------------------
116
+
117
+ def load_models(self) -> None:
118
+ """Load all AI models and build the FAISS index."""
119
+ if self._loaded:
120
+ return
121
+
122
+ self.device = get_device()
123
+ logger.info("Loading recommendation models …")
124
+
125
+ # 1. Zero-shot classifier — identical model to notebook
126
+ logger.info("Loading zero-shot classifier: joeddav/xlm-roberta-large-xnli")
127
+ zs_device = 0 if self.device == "cuda" else -1
128
+ self.zero_shot = hf_pipeline(
129
+ "zero-shot-classification",
130
+ model="joeddav/xlm-roberta-large-xnli",
131
+ device=zs_device,
132
+ )
133
+
134
+ # FP16 on CUDA for the zero-shot model
135
+ if self.device == "cuda":
136
+ try:
137
+ self.zero_shot.model.half()
138
+ logger.info("Zero-shot model converted to FP16")
139
+ except Exception as e:
140
+ logger.warning(f"Could not convert zero-shot to FP16: {e}")
141
+
142
+ # 2. SentenceTransformer — identical model to notebook
143
+ logger.info(
144
+ "Loading SentenceTransformer: "
145
+ "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"
146
+ )
147
+ self.embed_model = SentenceTransformer(
148
+ "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2",
149
+ device=self.device,
150
+ )
151
+ if self.device == "cuda":
152
+ try:
153
+ self.embed_model.half()
154
+ logger.info("SentenceTransformer converted to FP16")
155
+ except Exception as e:
156
+ logger.warning(f"Could not convert embed model to FP16: {e}")
157
+
158
+ # 3. Load / generate books dataset
159
+ self.books_df = self._load_books()
160
+
161
+ # 4. Classify genres (zero-shot) — identical to notebook
162
+ self._classify_genres()
163
+
164
+ # 5. Build embeddings + FAISS index — identical to notebook
165
+ self._build_index()
166
+
167
+ self._loaded = True
168
+ logger.info(
169
+ f"Recommendation engine ready — "
170
+ f"{len(self.books_df)} books, FAISS dim={self.faiss_index.d}"
171
+ )
172
+
173
+ # ------------------------------------------------------------------
174
+ # Books dataset
175
+ # ------------------------------------------------------------------
176
+
177
+ def _load_books(self) -> pd.DataFrame:
178
+ """Load books from CSV or generate the default sample set."""
179
+ if os.path.exists(BOOKS_CSV_PATH):
180
+ logger.info(f"Loading books from {BOOKS_CSV_PATH}")
181
+ df = pd.read_csv(BOOKS_CSV_PATH)
182
+ required = {"book_id", "title", "author", "description"}
183
+ if not required.issubset(set(df.columns)):
184
+ raise ValueError(
185
+ f"books.csv must contain columns {required}, "
186
+ f"found {set(df.columns)}"
187
+ )
188
+ return df
189
+
190
+ logger.info("No books.csv found — generating sample data (seed=42)")
191
+ df = self._make_books(200)
192
+ os.makedirs(SAMPLE_DATA_DIR, exist_ok=True)
193
+ df.to_csv(BOOKS_CSV_PATH, index=False)
194
+ logger.info(f"Saved {len(df)} books to {BOOKS_CSV_PATH}")
195
+ return df
196
+
197
+ @staticmethod
198
+ def _make_books(n_books: int = 200) -> pd.DataFrame:
199
+ """Generate sample books — identical to notebook ``make_books``."""
200
+ np.random.seed(42)
201
+ rows = []
202
+ for i in range(1, n_books + 1):
203
+ g = np.random.choice(GENRES)
204
+ desc = np.random.choice(TEMPLATES[g])
205
+ title = f"{g} Book {i}"
206
+ author = f"Author {np.random.randint(1, 60)}"
207
+ rows.append([i, title, author, desc])
208
+ return pd.DataFrame(rows, columns=["book_id", "title", "author", "description"])
209
+
210
+ # ------------------------------------------------------------------
211
+ # Genre classification — identical to notebook
212
+ # ------------------------------------------------------------------
213
+
214
+ def classify_genre(self, text: str) -> Tuple[str, float]:
215
+ """Zero-shot genre classification — identical to notebook."""
216
+ out = self.zero_shot(text, candidate_labels=GENRES, multi_label=False)
217
+ return out["labels"][0], float(out["scores"][0])
218
+
219
+ def _classify_genres(self) -> None:
220
+ """Classify all books in the dataset — identical to notebook loop."""
221
+ if "genre" in self.books_df.columns and "genre_confidence" in self.books_df.columns:
222
+ logger.info("Genre columns already present — skipping classification")
223
+ return
224
+
225
+ logger.info("Classifying genres for all books …")
226
+ genres, scores = [], []
227
+ texts = (self.books_df["title"] + " | " + self.books_df["description"]).tolist()
228
+ for i, txt in enumerate(texts):
229
+ g, s = self.classify_genre(txt)
230
+ genres.append(g)
231
+ scores.append(s)
232
+ if (i + 1) % 50 == 0:
233
+ logger.info(f" classified {i + 1}/{len(texts)} books")
234
+
235
+ self.books_df["genre"] = genres
236
+ self.books_df["genre_confidence"] = scores
237
+ # Persist updated CSV
238
+ self.books_df.to_csv(BOOKS_CSV_PATH, index=False)
239
+ logger.info("Genre classification complete")
240
+
241
+ # ------------------------------------------------------------------
242
+ # Embeddings + FAISS — identical to notebook
243
+ # ------------------------------------------------------------------
244
+
245
+ def _build_index(self) -> None:
246
+ """Build SentenceTransformer embeddings and FAISS index."""
247
+ self.books_df["text"] = (
248
+ "Title: " + self.books_df["title"]
249
+ + " | Author: " + self.books_df["author"]
250
+ + " | Genre: " + self.books_df["genre"]
251
+ + " | Description: " + self.books_df["description"]
252
+ )
253
+
254
+ logger.info("Encoding book embeddings …")
255
+ self.book_embeddings = self.embed_model.encode(
256
+ self.books_df["text"].tolist(),
257
+ batch_size=64,
258
+ show_progress_bar=True,
259
+ convert_to_numpy=True,
260
+ normalize_embeddings=True,
261
+ ).astype("float32")
262
+
263
+ dim = self.book_embeddings.shape[1]
264
+ self.faiss_index = faiss.IndexFlatIP(dim) # cosine (normalized)
265
+ self.faiss_index.add(self.book_embeddings)
266
+
267
+ self.bookid_to_idx = {
268
+ int(bid): i
269
+ for i, bid in enumerate(self.books_df["book_id"].tolist())
270
+ }
271
+
272
+ logger.info(
273
+ f"FAISS index built — embeddings {self.book_embeddings.shape}, "
274
+ f"ntotal={self.faiss_index.ntotal}"
275
+ )
276
+
277
+ # ------------------------------------------------------------------
278
+ # User interest ratios — identical to notebook
279
+ # ------------------------------------------------------------------
280
+
281
+ def user_interest_ratios(
282
+ self,
283
+ user_id: int,
284
+ interactions_df: pd.DataFrame,
285
+ ) -> Dict[str, float]:
286
+ """Compute weighted genre interest ratios — identical to notebook."""
287
+ u = interactions_df[interactions_df.user_id == user_id].merge(
288
+ self.books_df[["book_id", "genre"]], on="book_id", how="left"
289
+ )
290
+ if u.empty:
291
+ return {g: 1 / len(GENRES) for g in GENRES}
292
+
293
+ u["w"] = u["event_type"].map(EVENT_W).fillna(0.0)
294
+ s = u.groupby("genre")["w"].sum().reindex(GENRES, fill_value=0.0)
295
+
296
+ total = s.sum()
297
+ if total == 0:
298
+ return {g: 1 / len(GENRES) for g in GENRES}
299
+
300
+ return (s / total).to_dict()
301
+
302
+ # ------------------------------------------------------------------
303
+ # User vector — identical to notebook
304
+ # ------------------------------------------------------------------
305
+
306
+ def build_user_vector(
307
+ self,
308
+ user_id: int,
309
+ interactions_df: pd.DataFrame,
310
+ ) -> Tuple[Optional[np.ndarray], Set[int]]:
311
+ """Build weighted user embedding vector — identical to notebook."""
312
+ u = interactions_df[interactions_df.user_id == user_id]
313
+ if u.empty:
314
+ return None, set()
315
+
316
+ vecs: List[np.ndarray] = []
317
+ weights: List[float] = []
318
+ seen: Set[int] = set()
319
+
320
+ for _, row in u.iterrows():
321
+ bid = int(row["book_id"])
322
+ ev = row["event_type"]
323
+ if bid not in self.bookid_to_idx:
324
+ continue
325
+ w = EVENT_W.get(ev, 0.0)
326
+ if w == 0:
327
+ continue
328
+ vecs.append(self.book_embeddings[self.bookid_to_idx[bid]])
329
+ weights.append(w)
330
+ seen.add(bid)
331
+
332
+ if not vecs:
333
+ return None, seen
334
+
335
+ vecs_arr = np.array(vecs)
336
+ weights_arr = np.array(weights).reshape(-1, 1)
337
+
338
+ user_vec = np.sum(vecs_arr * weights_arr, axis=0) / (
339
+ np.sum(np.abs(weights_arr)) + 1e-9
340
+ )
341
+ user_vec = user_vec / (np.linalg.norm(user_vec) + 1e-9)
342
+ return user_vec.astype("float32"), seen
343
+
344
+ # ------------------------------------------------------------------
345
+ # Feed allocation — identical to notebook
346
+ # ------------------------------------------------------------------
347
+
348
+ @staticmethod
349
+ def allocate_feed(
350
+ ratios: Dict[str, float],
351
+ unseen_counts: Dict[str, int],
352
+ feed_size: int = 50,
353
+ ) -> Dict[str, int]:
354
+ """Proportional genre allocation — identical to notebook."""
355
+ alloc = {g: 0 for g in GENRES}
356
+ remaining = feed_size
357
+
358
+ # Target counts proportional to ratios
359
+ target = {
360
+ g: int(round(ratios.get(g, 0.0) * feed_size)) for g in GENRES
361
+ }
362
+
363
+ # Cap by availability
364
+ for g in GENRES:
365
+ alloc[g] = min(target[g], unseen_counts.get(g, 0))
366
+ remaining -= alloc[g]
367
+
368
+ # Distribute leftovers to best-ratio genres that still have items
369
+ while remaining > 0:
370
+ candidates = [
371
+ g for g in GENRES if alloc[g] < unseen_counts.get(g, 0)
372
+ ]
373
+ if not candidates:
374
+ break
375
+ g = max(candidates, key=lambda x: ratios.get(x, 0.0))
376
+ alloc[g] += 1
377
+ remaining -= 1
378
+
379
+ return alloc
380
+
381
+ # ------------------------------------------------------------------
382
+ # Mixed feed builder — identical to notebook
383
+ # ------------------------------------------------------------------
384
+
385
+ def build_mixed_feed(
386
+ self,
387
+ user_id: int,
388
+ interactions_df: pd.DataFrame,
389
+ feed_size: int = 50,
390
+ random_state: int = 42,
391
+ ) -> pd.DataFrame:
392
+ """Build a genre-balanced, similarity-ranked feed — identical to notebook."""
393
+ ratios = self.user_interest_ratios(user_id, interactions_df)
394
+
395
+ seen = set(
396
+ interactions_df.loc[
397
+ interactions_df.user_id == user_id, "book_id"
398
+ ]
399
+ .astype(int)
400
+ .tolist()
401
+ )
402
+ unseen_df = self.books_df[~self.books_df.book_id.isin(seen)].copy()
403
+
404
+ unseen_counts = (
405
+ unseen_df.groupby("genre")["book_id"]
406
+ .count()
407
+ .reindex(GENRES, fill_value=0)
408
+ .to_dict()
409
+ )
410
+ alloc = self.allocate_feed(ratios, unseen_counts, feed_size=feed_size)
411
+
412
+ parts: List[pd.DataFrame] = []
413
+ for g, k in alloc.items():
414
+ if k <= 0:
415
+ continue
416
+ g_df = unseen_df[unseen_df.genre == g]
417
+ if len(g_df) == 0:
418
+ continue
419
+ parts.append(
420
+ g_df.sample(n=min(k, len(g_df)), random_state=random_state)
421
+ )
422
+
423
+ if not parts:
424
+ return self.books_df.sample(feed_size, random_state=random_state)[
425
+ ["book_id", "title", "author", "genre"]
426
+ ]
427
+
428
+ feed = pd.concat(parts, ignore_index=True)
429
+
430
+ # Shuffle / ranking — identical to notebook
431
+ feed = feed.sample(frac=1.0, random_state=random_state).reset_index(
432
+ drop=True
433
+ )
434
+
435
+ user_vec, _ = self.build_user_vector(user_id, interactions_df)
436
+ if user_vec is not None:
437
+ idxs = [
438
+ self.bookid_to_idx[int(b)] for b in feed["book_id"].tolist()
439
+ ]
440
+ feed_vecs = self.book_embeddings[idxs]
441
+ feed["score"] = (feed_vecs @ user_vec).astype(float)
442
+ feed = feed.sort_values("score", ascending=False).reset_index(
443
+ drop=True
444
+ )
445
+
446
+ cols = ["book_id", "title", "author", "genre"]
447
+ if "score" in feed.columns:
448
+ cols.append("score")
449
+ return feed[cols]
450
+
451
+ # ------------------------------------------------------------------
452
+ # Paginated feed — identical to notebook
453
+ # ------------------------------------------------------------------
454
+
455
+ def get_next_feed_page(
456
+ self,
457
+ user_id: int,
458
+ interactions_df: pd.DataFrame,
459
+ page_size: int = 20,
460
+ ) -> pd.DataFrame:
461
+ """Return the next page of unseen recommendations — identical to notebook."""
462
+ shown = self.user_feed_state.get(user_id, set())
463
+
464
+ # Add temporary "view" interactions for already-shown books
465
+ if len(shown) > 0:
466
+ temp_rows = pd.DataFrame(
467
+ {
468
+ "user_id": [user_id] * len(shown),
469
+ "book_id": list(shown),
470
+ "event_type": ["view"] * len(shown),
471
+ "timestamp": [datetime.now().isoformat()] * len(shown),
472
+ }
473
+ )
474
+ temp_interactions = pd.concat(
475
+ [interactions_df, temp_rows], ignore_index=True
476
+ )
477
+ else:
478
+ temp_interactions = interactions_df
479
+
480
+ page = self.build_mixed_feed(
481
+ user_id,
482
+ temp_interactions,
483
+ feed_size=page_size,
484
+ random_state=np.random.randint(0, 10_000),
485
+ )
486
+
487
+ # Update shown state
488
+ self.user_feed_state[user_id] = shown.union(
489
+ set(page["book_id"].astype(int).tolist())
490
+ )
491
+ return page
492
+
493
+ def reset_user_feed(self, user_id: int) -> None:
494
+ """Clear pagination state for a user."""
495
+ self.user_feed_state.pop(user_id, None)
496
+
497
+ # ------------------------------------------------------------------
498
+ # Generate interactions from request payload
499
+ # ------------------------------------------------------------------
500
+
501
+ @staticmethod
502
+ def build_interactions_df(
503
+ user_id: int,
504
+ interactions: Optional[List[dict]] = None,
505
+ viewed_books: Optional[List[int]] = None,
506
+ favorite_genres: Optional[List[str]] = None,
507
+ ) -> pd.DataFrame:
508
+ """
509
+ Construct a pandas DataFrame of user interactions from the API
510
+ request payload. This merges explicit interaction events,
511
+ viewed-book IDs (as implicit views), and favourite genres
512
+ (synthesised as likes for books in those genres).
513
+ """
514
+ rows: List[dict] = []
515
+ now_iso = datetime.now().isoformat()
516
+
517
+ # 1. Explicit interactions
518
+ if interactions:
519
+ for inter in interactions:
520
+ rows.append(
521
+ {
522
+ "user_id": user_id,
523
+ "book_id": int(inter["book_id"]),
524
+ "event_type": inter.get("event_type", "view"),
525
+ "timestamp": inter.get("timestamp", now_iso),
526
+ }
527
+ )
528
+
529
+ # 2. Viewed books → implicit "view" events
530
+ if viewed_books:
531
+ for bid in viewed_books:
532
+ rows.append(
533
+ {
534
+ "user_id": user_id,
535
+ "book_id": int(bid),
536
+ "event_type": "view",
537
+ "timestamp": now_iso,
538
+ }
539
+ )
540
+
541
+ if not rows:
542
+ return pd.DataFrame(
543
+ columns=["user_id", "book_id", "event_type", "timestamp"]
544
+ )
545
+
546
+ return pd.DataFrame(rows)
547
+
548
+
549
+ # ═══════════════════════════════════════════════════════════════════════════
550
+ # Module-level singleton
551
+ # ═══════════════════════════════════════════════════════════════════════════
552
+
553
+ engine = RecommendationEngine()
requirements.txt ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ fastapi>=0.100.0
2
+ uvicorn>=0.23.0
3
+ python-multipart>=0.0.6
4
+ transformers>=4.33.0
5
+ torch>=2.0.0
6
+ sentence-transformers>=2.2.0
7
+ faiss-cpu>=1.7.4
8
+ pandas>=2.0.0
9
+ numpy>=1.24.0
10
+ scikit-learn>=1.3.0
11
+ accelerate>=0.23.0
12
+ tqdm>=4.66.0
13
+ pydantic>=2.0.0
sample_data/books.csv ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ book_id,title,author,description
2
+ 1,Business Book 1,Author 29,Negotiation tactics and leadership strategies for managers.
3
+ 2,Children Book 2,Author 21,A curious cat explores the city and learns about friendship.
4
+ 3,Business Book 3,Author 19,Negotiation tactics and leadership strategies for managers.
5
+ 4,Business Book 4,Author 11,How startups scale products and build strong teams.
6
+ 5,Children Book 5,Author 36,A curious cat explores the city and learns about friendship.
7
+ 6,Children Book 6,Author 3,A magical school adventure for kids with puzzles and fun.
8
+ 7,History Book 7,Author 2,An account of ancient empires and the wars that shaped the world.
9
+ 8,Children Book 8,Author 30,A magical school adventure for kids with puzzles and fun.
10
+ 9,History Book 9,Author 21,A deep dive into the political revolutions of the 20th century.
11
+ 10,Fantasy Book 10,Author 58,Dragons rise again as an ancient prophecy awakens in the north.
12
+ 11,History Book 11,Author 44,An account of ancient empires and the wars that shaped the world.
13
+ 12,Horror Book 12,Author 27,"A haunted house whispers at night, luring visitors inside."
14
+ 13,Poetry Book 13,Author 28,Minimalist poems inspired by nature and silence.
15
+ 14,Mystery Book 14,Author 55,A missing diary reveals clues to an old family crime.
16
+ 15,Science Fiction Book 15,Author 57,An AI gains consciousness and changes the future of Earth.
17
+ 16,Mystery Book 16,Author 51,A detective investigates a series of murders in a quiet town.
18
+ 17,Business Book 17,Author 9,How startups scale products and build strong teams.
19
+ 18,Business Book 18,Author 4,Negotiation tactics and leadership strategies for managers.
20
+ 19,Horror Book 19,Author 14,A village faces a terrifying creature in the woods.
21
+ 20,Romance Book 20,Author 9,A long-distance relationship is tested by secrets and time.
22
+ 21,Poetry Book 21,Author 2,"A collection of poems about love, loss, and hope."
23
+ 22,Science Fiction Book 22,Author 47,An AI gains consciousness and changes the future of Earth.
24
+ 23,Business Book 23,Author 8,Negotiation tactics and leadership strategies for managers.
25
+ 24,Mystery Book 24,Author 17,A missing diary reveals clues to an old family crime.
26
+ 25,Science Fiction Book 25,Author 40,An AI gains consciousness and changes the future of Earth.
27
+ 26,Science Fiction Book 26,Author 6,An AI gains consciousness and changes the future of Earth.
28
+ 27,History Book 27,Author 4,A deep dive into the political revolutions of the 20th century.
29
+ 28,History Book 28,Author 18,An account of ancient empires and the wars that shaped the world.
30
+ 29,Poetry Book 29,Author 34,Minimalist poems inspired by nature and silence.
31
+ 30,Poetry Book 30,Author 36,Minimalist poems inspired by nature and silence.
32
+ 31,Children Book 31,Author 23,A magical school adventure for kids with puzzles and fun.
33
+ 32,Horror Book 32,Author 40,A village faces a terrifying creature in the woods.
34
+ 33,Self-Help Book 33,Author 45,Learn to manage anxiety with simple routines and mindset shifts.
35
+ 34,Romance Book 34,Author 53,Two strangers meet in a small cafe and find love against all odds.
36
+ 35,Children Book 35,Author 25,A magical school adventure for kids with puzzles and fun.
37
+ 36,Horror Book 36,Author 15,"A haunted house whispers at night, luring visitors inside."
38
+ 37,Fantasy Book 37,Author 7,A young hero discovers a hidden kingdom and must defeat a dark sorcerer.
39
+ 38,Horror Book 38,Author 1,A village faces a terrifying creature in the woods.
40
+ 39,Children Book 39,Author 11,A magical school adventure for kids with puzzles and fun.
41
+ 40,Mystery Book 40,Author 8,A detective investigates a series of murders in a quiet town.
42
+ 41,Mystery Book 41,Author 33,A detective investigates a series of murders in a quiet town.
43
+ 42,Self-Help Book 42,Author 39,Learn to manage anxiety with simple routines and mindset shifts.
44
+ 43,Poetry Book 43,Author 28,"A collection of poems about love, loss, and hope."
45
+ 44,Business Book 44,Author 8,How startups scale products and build strong teams.
46
+ 45,Romance Book 45,Author 48,Two strangers meet in a small cafe and find love against all odds.
47
+ 46,Business Book 46,Author 24,How startups scale products and build strong teams.
48
+ 47,Self-Help Book 47,Author 44,A practical guide to build habits and improve focus every day.
49
+ 48,Children Book 48,Author 27,A magical school adventure for kids with puzzles and fun.
50
+ 49,Mystery Book 49,Author 35,A detective investigates a series of murders in a quiet town.
51
+ 50,Self-Help Book 50,Author 14,A practical guide to build habits and improve focus every day.
52
+ 51,Mystery Book 51,Author 5,A detective investigates a series of murders in a quiet town.
53
+ 52,Poetry Book 52,Author 14,"A collection of poems about love, loss, and hope."
54
+ 53,Business Book 53,Author 9,How startups scale products and build strong teams.
55
+ 54,Poetry Book 54,Author 13,Minimalist poems inspired by nature and silence.
56
+ 55,Mystery Book 55,Author 32,A detective investigates a series of murders in a quiet town.
57
+ 56,Business Book 56,Author 52,How startups scale products and build strong teams.
58
+ 57,Science Fiction Book 57,Author 37,An AI gains consciousness and changes the future of Earth.
59
+ 58,Business Book 58,Author 45,How startups scale products and build strong teams.
60
+ 59,Science Fiction Book 59,Author 32,A crew travels through a wormhole to save humanity from collapse.
61
+ 60,Business Book 60,Author 51,How startups scale products and build strong teams.
62
+ 61,History Book 61,Author 2,A deep dive into the political revolutions of the 20th century.
63
+ 62,Poetry Book 62,Author 57,"A collection of poems about love, loss, and hope."
64
+ 63,Self-Help Book 63,Author 28,Learn to manage anxiety with simple routines and mindset shifts.
65
+ 64,Science Fiction Book 64,Author 11,An AI gains consciousness and changes the future of Earth.
66
+ 65,Poetry Book 65,Author 28,"A collection of poems about love, loss, and hope."
67
+ 66,Horror Book 66,Author 33,"A haunted house whispers at night, luring visitors inside."
68
+ 67,Fantasy Book 67,Author 27,A young hero discovers a hidden kingdom and must defeat a dark sorcerer.
69
+ 68,Horror Book 68,Author 13,A village faces a terrifying creature in the woods.
70
+ 69,Horror Book 69,Author 39,"A haunted house whispers at night, luring visitors inside."
71
+ 70,History Book 70,Author 27,A deep dive into the political revolutions of the 20th century.
72
+ 71,Horror Book 71,Author 37,A village faces a terrifying creature in the woods.
73
+ 72,Fantasy Book 72,Author 42,A young hero discovers a hidden kingdom and must defeat a dark sorcerer.
74
+ 73,Children Book 73,Author 59,A curious cat explores the city and learns about friendship.
75
+ 74,History Book 74,Author 32,A deep dive into the political revolutions of the 20th century.
76
+ 75,Children Book 75,Author 52,A curious cat explores the city and learns about friendship.
77
+ 76,Fantasy Book 76,Author 49,Dragons rise again as an ancient prophecy awakens in the north.
78
+ 77,Poetry Book 77,Author 12,Minimalist poems inspired by nature and silence.
79
+ 78,Business Book 78,Author 3,Negotiation tactics and leadership strategies for managers.
80
+ 79,Fantasy Book 79,Author 49,A young hero discovers a hidden kingdom and must defeat a dark sorcerer.
81
+ 80,Children Book 80,Author 59,A curious cat explores the city and learns about friendship.
82
+ 81,Fantasy Book 81,Author 2,Dragons rise again as an ancient prophecy awakens in the north.
83
+ 82,History Book 82,Author 37,An account of ancient empires and the wars that shaped the world.
84
+ 83,Fantasy Book 83,Author 19,A young hero discovers a hidden kingdom and must defeat a dark sorcerer.
85
+ 84,Romance Book 84,Author 44,Two strangers meet in a small cafe and find love against all odds.
86
+ 85,Poetry Book 85,Author 6,Minimalist poems inspired by nature and silence.
87
+ 86,Business Book 86,Author 55,Negotiation tactics and leadership strategies for managers.
88
+ 87,Children Book 87,Author 17,A curious cat explores the city and learns about friendship.
89
+ 88,History Book 88,Author 5,A deep dive into the political revolutions of the 20th century.
90
+ 89,Science Fiction Book 89,Author 6,An AI gains consciousness and changes the future of Earth.
91
+ 90,History Book 90,Author 48,An account of ancient empires and the wars that shaped the world.
92
+ 91,Fantasy Book 91,Author 59,A young hero discovers a hidden kingdom and must defeat a dark sorcerer.
93
+ 92,History Book 92,Author 29,A deep dive into the political revolutions of the 20th century.
94
+ 93,Mystery Book 93,Author 59,A missing diary reveals clues to an old family crime.
95
+ 94,Science Fiction Book 94,Author 26,A crew travels through a wormhole to save humanity from collapse.
96
+ 95,Mystery Book 95,Author 19,A detective investigates a series of murders in a quiet town.
97
+ 96,Science Fiction Book 96,Author 7,An AI gains consciousness and changes the future of Earth.
98
+ 97,Science Fiction Book 97,Author 33,A crew travels through a wormhole to save humanity from collapse.
99
+ 98,Children Book 98,Author 39,A magical school adventure for kids with puzzles and fun.
100
+ 99,Romance Book 99,Author 1,A long-distance relationship is tested by secrets and time.
101
+ 100,Horror Book 100,Author 50,"A haunted house whispers at night, luring visitors inside."
102
+ 101,Business Book 101,Author 30,How startups scale products and build strong teams.
103
+ 102,Poetry Book 102,Author 7,"A collection of poems about love, loss, and hope."
104
+ 103,Poetry Book 103,Author 57,Minimalist poems inspired by nature and silence.
105
+ 104,Science Fiction Book 104,Author 49,An AI gains consciousness and changes the future of Earth.
106
+ 105,Romance Book 105,Author 48,Two strangers meet in a small cafe and find love against all odds.
107
+ 106,Self-Help Book 106,Author 32,A practical guide to build habits and improve focus every day.
108
+ 107,Business Book 107,Author 41,How startups scale products and build strong teams.
109
+ 108,Mystery Book 108,Author 48,A detective investigates a series of murders in a quiet town.
110
+ 109,Mystery Book 109,Author 24,A missing diary reveals clues to an old family crime.
111
+ 110,History Book 110,Author 33,A deep dive into the political revolutions of the 20th century.
112
+ 111,Children Book 111,Author 11,A magical school adventure for kids with puzzles and fun.
113
+ 112,Fantasy Book 112,Author 36,Dragons rise again as an ancient prophecy awakens in the north.
114
+ 113,History Book 113,Author 20,A deep dive into the political revolutions of the 20th century.
115
+ 114,Mystery Book 114,Author 25,A missing diary reveals clues to an old family crime.
116
+ 115,Mystery Book 115,Author 29,A detective investigates a series of murders in a quiet town.
117
+ 116,Romance Book 116,Author 46,A long-distance relationship is tested by secrets and time.
118
+ 117,Romance Book 117,Author 54,A long-distance relationship is tested by secrets and time.
119
+ 118,Mystery Book 118,Author 41,A missing diary reveals clues to an old family crime.
120
+ 119,Science Fiction Book 119,Author 4,A crew travels through a wormhole to save humanity from collapse.
121
+ 120,Fantasy Book 120,Author 21,Dragons rise again as an ancient prophecy awakens in the north.
122
+ 121,Science Fiction Book 121,Author 8,An AI gains consciousness and changes the future of Earth.
123
+ 122,Business Book 122,Author 17,How startups scale products and build strong teams.
124
+ 123,Fantasy Book 123,Author 12,Dragons rise again as an ancient prophecy awakens in the north.
125
+ 124,Mystery Book 124,Author 55,A missing diary reveals clues to an old family crime.
126
+ 125,History Book 125,Author 30,A deep dive into the political revolutions of the 20th century.
127
+ 126,History Book 126,Author 45,A deep dive into the political revolutions of the 20th century.
128
+ 127,Mystery Book 127,Author 8,A missing diary reveals clues to an old family crime.
129
+ 128,Romance Book 128,Author 30,Two strangers meet in a small cafe and find love against all odds.
130
+ 129,Fantasy Book 129,Author 47,Dragons rise again as an ancient prophecy awakens in the north.
131
+ 130,Fantasy Book 130,Author 48,A young hero discovers a hidden kingdom and must defeat a dark sorcerer.
132
+ 131,Mystery Book 131,Author 35,A missing diary reveals clues to an old family crime.
133
+ 132,Fantasy Book 132,Author 17,Dragons rise again as an ancient prophecy awakens in the north.
134
+ 133,Self-Help Book 133,Author 35,Learn to manage anxiety with simple routines and mindset shifts.
135
+ 134,Horror Book 134,Author 24,"A haunted house whispers at night, luring visitors inside."
136
+ 135,Fantasy Book 135,Author 53,Dragons rise again as an ancient prophecy awakens in the north.
137
+ 136,Mystery Book 136,Author 33,A missing diary reveals clues to an old family crime.
138
+ 137,Science Fiction Book 137,Author 21,A crew travels through a wormhole to save humanity from collapse.
139
+ 138,Business Book 138,Author 3,How startups scale products and build strong teams.
140
+ 139,Romance Book 139,Author 42,Two strangers meet in a small cafe and find love against all odds.
141
+ 140,History Book 140,Author 3,A deep dive into the political revolutions of the 20th century.
142
+ 141,Children Book 141,Author 24,A magical school adventure for kids with puzzles and fun.
143
+ 142,Romance Book 142,Author 47,A long-distance relationship is tested by secrets and time.
144
+ 143,History Book 143,Author 2,An account of ancient empires and the wars that shaped the world.
145
+ 144,Poetry Book 144,Author 26,Minimalist poems inspired by nature and silence.
146
+ 145,Fantasy Book 145,Author 33,Dragons rise again as an ancient prophecy awakens in the north.
147
+ 146,Horror Book 146,Author 54,"A haunted house whispers at night, luring visitors inside."
148
+ 147,Business Book 147,Author 42,How startups scale products and build strong teams.
149
+ 148,Business Book 148,Author 35,Negotiation tactics and leadership strategies for managers.
150
+ 149,Romance Book 149,Author 24,Two strangers meet in a small cafe and find love against all odds.
151
+ 150,Poetry Book 150,Author 57,"A collection of poems about love, loss, and hope."
152
+ 151,Science Fiction Book 151,Author 20,A crew travels through a wormhole to save humanity from collapse.
153
+ 152,Fantasy Book 152,Author 46,Dragons rise again as an ancient prophecy awakens in the north.
154
+ 153,Mystery Book 153,Author 15,A detective investigates a series of murders in a quiet town.
155
+ 154,Romance Book 154,Author 32,A long-distance relationship is tested by secrets and time.
156
+ 155,Business Book 155,Author 22,How startups scale products and build strong teams.
157
+ 156,Mystery Book 156,Author 58,A detective investigates a series of murders in a quiet town.
158
+ 157,History Book 157,Author 58,A deep dive into the political revolutions of the 20th century.
159
+ 158,History Book 158,Author 52,An account of ancient empires and the wars that shaped the world.
160
+ 159,Poetry Book 159,Author 15,Minimalist poems inspired by nature and silence.
161
+ 160,History Book 160,Author 37,An account of ancient empires and the wars that shaped the world.
162
+ 161,Fantasy Book 161,Author 53,Dragons rise again as an ancient prophecy awakens in the north.
163
+ 162,Self-Help Book 162,Author 4,A practical guide to build habits and improve focus every day.
164
+ 163,History Book 163,Author 32,An account of ancient empires and the wars that shaped the world.
165
+ 164,Science Fiction Book 164,Author 47,An AI gains consciousness and changes the future of Earth.
166
+ 165,Mystery Book 165,Author 40,A detective investigates a series of murders in a quiet town.
167
+ 166,Science Fiction Book 166,Author 13,An AI gains consciousness and changes the future of Earth.
168
+ 167,Romance Book 167,Author 42,A long-distance relationship is tested by secrets and time.
169
+ 168,Mystery Book 168,Author 56,A detective investigates a series of murders in a quiet town.
170
+ 169,Mystery Book 169,Author 58,A missing diary reveals clues to an old family crime.
171
+ 170,Business Book 170,Author 37,Negotiation tactics and leadership strategies for managers.
172
+ 171,Poetry Book 171,Author 23,"A collection of poems about love, loss, and hope."
173
+ 172,Horror Book 172,Author 53,A village faces a terrifying creature in the woods.
174
+ 173,Fantasy Book 173,Author 58,Dragons rise again as an ancient prophecy awakens in the north.
175
+ 174,Fantasy Book 174,Author 34,A young hero discovers a hidden kingdom and must defeat a dark sorcerer.
176
+ 175,History Book 175,Author 25,A deep dive into the political revolutions of the 20th century.
177
+ 176,Children Book 176,Author 53,A curious cat explores the city and learns about friendship.
178
+ 177,Fantasy Book 177,Author 39,Dragons rise again as an ancient prophecy awakens in the north.
179
+ 178,Self-Help Book 178,Author 29,Learn to manage anxiety with simple routines and mindset shifts.
180
+ 179,Business Book 179,Author 12,How startups scale products and build strong teams.
181
+ 180,Poetry Book 180,Author 51,Minimalist poems inspired by nature and silence.
182
+ 181,Self-Help Book 181,Author 57,Learn to manage anxiety with simple routines and mindset shifts.
183
+ 182,Self-Help Book 182,Author 49,A practical guide to build habits and improve focus every day.
184
+ 183,Science Fiction Book 183,Author 12,A crew travels through a wormhole to save humanity from collapse.
185
+ 184,Poetry Book 184,Author 48,Minimalist poems inspired by nature and silence.
186
+ 185,Self-Help Book 185,Author 36,A practical guide to build habits and improve focus every day.
187
+ 186,Fantasy Book 186,Author 30,Dragons rise again as an ancient prophecy awakens in the north.
188
+ 187,Self-Help Book 187,Author 58,A practical guide to build habits and improve focus every day.
189
+ 188,Poetry Book 188,Author 5,Minimalist poems inspired by nature and silence.
190
+ 189,Science Fiction Book 189,Author 52,An AI gains consciousness and changes the future of Earth.
191
+ 190,Poetry Book 190,Author 19,Minimalist poems inspired by nature and silence.
192
+ 191,Poetry Book 191,Author 1,Minimalist poems inspired by nature and silence.
193
+ 192,Children Book 192,Author 45,A curious cat explores the city and learns about friendship.
194
+ 193,Science Fiction Book 193,Author 24,An AI gains consciousness and changes the future of Earth.
195
+ 194,Business Book 194,Author 49,Negotiation tactics and leadership strategies for managers.
196
+ 195,Science Fiction Book 195,Author 12,An AI gains consciousness and changes the future of Earth.
197
+ 196,Romance Book 196,Author 33,Two strangers meet in a small cafe and find love against all odds.
198
+ 197,Fantasy Book 197,Author 51,A young hero discovers a hidden kingdom and must defeat a dark sorcerer.
199
+ 198,Self-Help Book 198,Author 3,Learn to manage anxiety with simple routines and mindset shifts.
200
+ 199,Fantasy Book 199,Author 40,A young hero discovers a hidden kingdom and must defeat a dark sorcerer.
201
+ 200,Poetry Book 200,Author 44,"A collection of poems about love, loss, and hope."
utils.py ADDED
@@ -0,0 +1,106 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LITVISION Recommendation API — Utility Module
3
+ ===============================================
4
+ Production logging, device management, CUDA OOM handling,
5
+ and temp/cache cleanup helpers.
6
+ """
7
+
8
+ import os
9
+ import gc
10
+ import logging
11
+ import shutil
12
+ from typing import Optional
13
+
14
+ import torch
15
+
16
+ # ---------------------------------------------------------------------------
17
+ # Logging
18
+ # ---------------------------------------------------------------------------
19
+
20
+ LOG_FORMAT = "%(asctime)s | %(levelname)-8s | %(name)s | %(message)s"
21
+ LOG_DATE_FORMAT = "%Y-%m-%d %H:%M:%S"
22
+
23
+
24
+ def setup_logging(level: int = logging.INFO) -> None:
25
+ """Configure production-grade structured logging."""
26
+ logging.basicConfig(
27
+ level=level,
28
+ format=LOG_FORMAT,
29
+ datefmt=LOG_DATE_FORMAT,
30
+ force=True,
31
+ )
32
+ # Silence overly chatty third-party loggers
33
+ for noisy in ("transformers", "sentence_transformers", "faiss", "urllib3"):
34
+ logging.getLogger(noisy).setLevel(logging.WARNING)
35
+
36
+
37
+ logger = logging.getLogger("litvision.recommendation")
38
+
39
+ # ---------------------------------------------------------------------------
40
+ # Device helpers
41
+ # ---------------------------------------------------------------------------
42
+
43
+
44
+ def get_device() -> str:
45
+ """Return the best available torch device string."""
46
+ if torch.cuda.is_available():
47
+ device = "cuda"
48
+ gpu_name = torch.cuda.get_device_name(0)
49
+ mem = torch.cuda.get_device_properties(0).total_mem / (1024 ** 3)
50
+ logger.info(f"CUDA device detected: {gpu_name} ({mem:.1f} GB)")
51
+ else:
52
+ device = "cpu"
53
+ logger.info("No CUDA device — running on CPU")
54
+ return device
55
+
56
+
57
+ def safe_cuda_empty_cache() -> None:
58
+ """Clear CUDA cache if available; silently no-op on CPU."""
59
+ if torch.cuda.is_available():
60
+ torch.cuda.empty_cache()
61
+ gc.collect()
62
+ logger.info("CUDA cache cleared")
63
+
64
+
65
+ def handle_cuda_oom(exc: Exception) -> str:
66
+ """Handle a CUDA OOM exception: clear caches and return a user message."""
67
+ safe_cuda_empty_cache()
68
+ msg = (
69
+ "GPU out of memory during recommendation generation. "
70
+ "The CUDA cache has been cleared. Please retry with a smaller request."
71
+ )
72
+ logger.error(f"CUDA OOM: {exc}")
73
+ return msg
74
+
75
+ # ---------------------------------------------------------------------------
76
+ # Temp / cache cleanup
77
+ # ---------------------------------------------------------------------------
78
+
79
+ _TEMP_DIRS = [
80
+ os.environ.get("HF_HOME", "/tmp/huggingface"),
81
+ ]
82
+
83
+
84
+ def cleanup_temp_files() -> None:
85
+ """Remove transient cache artefacts that are safe to delete."""
86
+ for d in _TEMP_DIRS:
87
+ cache_dir = os.path.join(d, "hub", ".locks")
88
+ if os.path.isdir(cache_dir):
89
+ try:
90
+ shutil.rmtree(cache_dir, ignore_errors=True)
91
+ logger.info(f"Cleaned lock dir: {cache_dir}")
92
+ except Exception as e:
93
+ logger.warning(f"Could not clean {cache_dir}: {e}")
94
+
95
+ # ---------------------------------------------------------------------------
96
+ # Validation helpers
97
+ # ---------------------------------------------------------------------------
98
+
99
+
100
+ def validate_positive_int(value: int, name: str, max_val: Optional[int] = None) -> int:
101
+ """Ensure *value* is a positive integer, optionally capped at *max_val*."""
102
+ if not isinstance(value, int) or value < 1:
103
+ raise ValueError(f"{name} must be a positive integer, got {value!r}")
104
+ if max_val is not None and value > max_val:
105
+ raise ValueError(f"{name} must be ≤ {max_val}, got {value}")
106
+ return value