Spaces:
Paused
Paused
Commit
·
a422c4e
1
Parent(s):
24a293d
commit the app
Browse files- README.md +83 -14
- amazon_movies_2023/gcl_embeddings.npz +3 -0
- amazon_movies_2023/title_embeddings.npz +3 -0
- amazon_movies_2023/title_embeddings_mapping.csv +3 -0
- app.py +531 -0
- ranking_agent.py +128 -0
- requirements.txt +14 -0
README.md
CHANGED
|
@@ -1,14 +1,83 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Movie Recommender System
|
| 2 |
+
**Tag:** `agent-demo-track`
|
| 3 |
+
A hybrid movie recommender system that combines collaborative filtering, language model embeddings, and graph convolutional networks to provide personalized movie recommendations.
|
| 4 |
+
|
| 5 |
+
## Features
|
| 6 |
+
|
| 7 |
+
- **Dual Embedding Types:**
|
| 8 |
+
- Pure Language Model (LLM) embeddings from Mistral AI
|
| 9 |
+
- Graph-enhanced embeddings (LLM + GCL) that combine language understanding with user interaction patterns
|
| 10 |
+
- **Hybrid Input:**
|
| 11 |
+
- Select up to 5 movies you've enjoyed
|
| 12 |
+
- Describe what kind of movie you're looking for in natural language
|
| 13 |
+
- Adjust the weight (α) between your movie selections and text description
|
| 14 |
+
- **Rich Results:**
|
| 15 |
+
- Get up to 20 personalized recommendations
|
| 16 |
+
- View similarity scores for each recommendation
|
| 17 |
+
- Search through a database of over 100,000 movies
|
| 18 |
+
|
| 19 |
+
## Requirements
|
| 20 |
+
|
| 21 |
+
1. Python 3.8+
|
| 22 |
+
2. Virtual environment (recommended)
|
| 23 |
+
3. Mistral AI API key (get one at https://console.mistral.ai/)
|
| 24 |
+
|
| 25 |
+
Install the required packages:
|
| 26 |
+
|
| 27 |
+
```bash
|
| 28 |
+
pip install -r requirements.txt
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
## Environment Setup
|
| 32 |
+
|
| 33 |
+
1. Create a `.env` file in the project root:
|
| 34 |
+
```bash
|
| 35 |
+
MISTRAL_API_KEY=your_api_key_here
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
2. Ensure you have the necessary data files in the `amazon_movies_2023` directory:
|
| 39 |
+
- `title_embeddings.npz`: Movie title embeddings from Mistral AI
|
| 40 |
+
- `gcl_embeddings.npz`: Graph-enhanced embeddings
|
| 41 |
+
- `title_embeddings_mapping.csv`: Movie metadata mapping
|
| 42 |
+
|
| 43 |
+
## Usage
|
| 44 |
+
|
| 45 |
+
1. Activate your virtual environment:
|
| 46 |
+
```bash
|
| 47 |
+
source venv/bin/activate # On Unix/macOS
|
| 48 |
+
```
|
| 49 |
+
|
| 50 |
+
2. Run the recommender app:
|
| 51 |
+
```bash
|
| 52 |
+
python movie_recommender_app.py
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
3. Open your browser to the local URL shown in the terminal (typically http://127.0.0.1:7860)
|
| 56 |
+
|
| 57 |
+
## How It Works
|
| 58 |
+
|
| 59 |
+
1. **Movie Selection:**
|
| 60 |
+
- Search and select up to 5 movies you've enjoyed
|
| 61 |
+
- The system uses these as a baseline for your taste
|
| 62 |
+
|
| 63 |
+
2. **Text Preferences:**
|
| 64 |
+
- Describe what you're looking for (e.g., "A thrilling sci-fi movie with deep philosophical themes")
|
| 65 |
+
- Your description is converted to embeddings using Mistral AI
|
| 66 |
+
|
| 67 |
+
3. **Preference Weighting:**
|
| 68 |
+
- Use the α slider to balance between your selected movies and text description
|
| 69 |
+
- α = 0: Only use movie history
|
| 70 |
+
- α = 1: Only use text description
|
| 71 |
+
- Values in between combine both signals
|
| 72 |
+
|
| 73 |
+
4. **Embedding Types:**
|
| 74 |
+
- LLM: Pure language model embeddings for semantic understanding
|
| 75 |
+
- LLM + GCL: Graph-enhanced embeddings that also consider user interaction patterns
|
| 76 |
+
|
| 77 |
+
## Data Processing
|
| 78 |
+
|
| 79 |
+
For information about the dataset processing pipeline, see [DATA_PROCESSING.md](DATA_PROCESSING.md)
|
| 80 |
+
|
| 81 |
+
## Contributing
|
| 82 |
+
|
| 83 |
+
Feel free to open issues or submit pull requests with improvements!
|
amazon_movies_2023/gcl_embeddings.npz
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:957c1b970c9d8371da883523871c956593e81205a499c6962696df545806f6d6
|
| 3 |
+
size 580096202
|
amazon_movies_2023/title_embeddings.npz
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1d134b5950985ee7009a30f370d7b3b281351893d4d440ec5131bc759cf219ab
|
| 3 |
+
size 173284697
|
amazon_movies_2023/title_embeddings_mapping.csv
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:20e2e163e9591dcd7eaf13e72b7c0666e41c0734f303599113c161bb7c9f0bdc
|
| 3 |
+
size 3386200
|
app.py
ADDED
|
@@ -0,0 +1,531 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import gradio as gr
|
| 2 |
+
import numpy as np
|
| 3 |
+
from sklearn.preprocessing import StandardScaler
|
| 4 |
+
import pandas as pd
|
| 5 |
+
import os
|
| 6 |
+
import zlib
|
| 7 |
+
from typing import Dict, List, Tuple, Optional, Literal
|
| 8 |
+
from langchain_mistralai import MistralAIEmbeddings
|
| 9 |
+
from langchain_core.embeddings import Embeddings
|
| 10 |
+
import os
|
| 11 |
+
from dotenv import load_dotenv
|
| 12 |
+
from ranking_agent import rank_with_ai
|
| 13 |
+
from scipy.sparse import load_npz
|
| 14 |
+
from rapidfuzz import process, fuzz
|
| 15 |
+
import re
|
| 16 |
+
from sklearn.metrics.pairwise import cosine_similarity
|
| 17 |
+
|
| 18 |
+
load_dotenv()
|
| 19 |
+
|
| 20 |
+
class MovieRecommender:
|
| 21 |
+
def __init__(self, data_dir: str = "amazon_movies_2023"):
|
| 22 |
+
self.data_dir = data_dir
|
| 23 |
+
self.embeddings = MistralAIEmbeddings(
|
| 24 |
+
model="mistral-embed",
|
| 25 |
+
mistral_api_key=os.getenv("MISTRAL_API_KEY")
|
| 26 |
+
)
|
| 27 |
+
# Load both types of embeddings
|
| 28 |
+
self.load_embeddings()
|
| 29 |
+
|
| 30 |
+
def load_embeddings(self) -> None:
|
| 31 |
+
# Load LLM embeddings
|
| 32 |
+
llm_embeddings_path = os.path.join(self.data_dir, "title_embeddings.npz")
|
| 33 |
+
try:
|
| 34 |
+
llm_data = np.load(llm_embeddings_path)
|
| 35 |
+
self.llm_embeddings = llm_data['embeddings']
|
| 36 |
+
self.llm_item_ids = llm_data['item_ids'].astype(str) # Ensure string type
|
| 37 |
+
print(f"Loaded LLM embeddings with shape: {self.llm_embeddings.shape}")
|
| 38 |
+
print(f"Number of LLM item IDs: {len(self.llm_item_ids)}")
|
| 39 |
+
except (IOError, zlib.error) as e:
|
| 40 |
+
raise RuntimeError(
|
| 41 |
+
f"Error loading LLM embeddings file: {str(e)}\n"
|
| 42 |
+
"The embeddings file appears to be corrupted or invalid."
|
| 43 |
+
)
|
| 44 |
+
|
| 45 |
+
# Load GCL embeddings
|
| 46 |
+
gcl_embeddings_path = os.path.join(self.data_dir, "gcl_embeddings.npz")
|
| 47 |
+
try:
|
| 48 |
+
gcl_data = np.load(gcl_embeddings_path)
|
| 49 |
+
self.gcl_embeddings = gcl_data['embeddings']
|
| 50 |
+
self.gcl_item_ids = gcl_data['item_ids'].astype(str) # Ensure string type
|
| 51 |
+
print(f"Loaded GCL embeddings with shape: {self.gcl_embeddings.shape}")
|
| 52 |
+
print(f"Number of GCL item IDs: {len(self.gcl_item_ids)}")
|
| 53 |
+
except (IOError, zlib.error) as e:
|
| 54 |
+
raise RuntimeError(
|
| 55 |
+
f"Error loading GCL embeddings file: {str(e)}\n"
|
| 56 |
+
"Please run gcl_embeddings.py first to generate GCL embeddings."
|
| 57 |
+
)
|
| 58 |
+
|
| 59 |
+
# Load movie mapping
|
| 60 |
+
mapping_path = os.path.join(self.data_dir, "title_embeddings_mapping.csv")
|
| 61 |
+
self.movies_df = pd.read_csv(mapping_path)
|
| 62 |
+
self.movies_df['item_id'] = self.movies_df['item_id'].astype(str) # Ensure string type
|
| 63 |
+
|
| 64 |
+
# Create standardized embeddings for both types
|
| 65 |
+
scaler = StandardScaler()
|
| 66 |
+
self.llm_embeddings = scaler.fit_transform(self.llm_embeddings)
|
| 67 |
+
self.gcl_embeddings = scaler.fit_transform(self.gcl_embeddings)
|
| 68 |
+
|
| 69 |
+
# Create item_id to index mappings for both types
|
| 70 |
+
self.llm_id_to_idx = {str(item_id): idx for idx, item_id in enumerate(self.llm_item_ids)}
|
| 71 |
+
self.gcl_id_to_idx = {str(item_id): idx for idx, item_id in enumerate(self.gcl_item_ids)}
|
| 72 |
+
|
| 73 |
+
# Create title to id mapping for search
|
| 74 |
+
self.title_to_id = dict(zip(self.movies_df['title'], self.movies_df['item_id']))
|
| 75 |
+
|
| 76 |
+
# Store all titles for search
|
| 77 |
+
self.all_titles = self.movies_df['title'].tolist()
|
| 78 |
+
|
| 79 |
+
print(f"Number of movies in mapping: {len(self.movies_df)}")
|
| 80 |
+
print(f"Number of titles with LLM embeddings: {len(set(self.llm_id_to_idx.keys()) & set(self.title_to_id.values()))}")
|
| 81 |
+
print(f"Number of titles with GCL embeddings: {len(set(self.gcl_id_to_idx.keys()) & set(self.title_to_id.values()))}")
|
| 82 |
+
|
| 83 |
+
# Pre-process titles for fuzzy matching
|
| 84 |
+
self.clean_titles = {self.clean_title_for_comparison(title): title for title in self.title_to_id.keys()}
|
| 85 |
+
|
| 86 |
+
def clean_title_for_comparison(self, title):
|
| 87 |
+
"""Clean title for comparison purposes"""
|
| 88 |
+
# Remove special characters and extra spaces
|
| 89 |
+
title = re.sub(r'[^\w\s]', '', str(title))
|
| 90 |
+
# Convert to lowercase and strip
|
| 91 |
+
return ' '.join(title.lower().split())
|
| 92 |
+
|
| 93 |
+
def search_movies(self, query: str) -> List[str]:
|
| 94 |
+
if not query:
|
| 95 |
+
return [] # Return empty if no query to avoid overwhelming UI
|
| 96 |
+
|
| 97 |
+
clean_query = self.clean_title_for_comparison(query)
|
| 98 |
+
# Use rapidfuzz to find matches across entire dataset
|
| 99 |
+
matches = process.extract(
|
| 100 |
+
clean_query,
|
| 101 |
+
self.clean_titles.keys(),
|
| 102 |
+
scorer=fuzz.WRatio, # WRatio works well for movie titles
|
| 103 |
+
limit=None, # No limit - show all matches
|
| 104 |
+
score_cutoff=60 # Only return matches with score > 60
|
| 105 |
+
)
|
| 106 |
+
|
| 107 |
+
# Convert matches back to original titles
|
| 108 |
+
return [self.clean_titles[match[0]] for match in matches]
|
| 109 |
+
|
| 110 |
+
def get_text_embedding(self, text: str) -> np.ndarray:
|
| 111 |
+
"""Get embedding for text using LangChain Mistral embeddings"""
|
| 112 |
+
try:
|
| 113 |
+
embedding = self.embeddings.embed_query(text)
|
| 114 |
+
# Convert embedding to numpy array
|
| 115 |
+
embedding = np.array(embedding, dtype=np.float32)
|
| 116 |
+
# Normalize the embedding
|
| 117 |
+
if np.any(embedding): # Only normalize if not all zeros
|
| 118 |
+
embedding = embedding / np.linalg.norm(embedding)
|
| 119 |
+
return embedding
|
| 120 |
+
except Exception as e:
|
| 121 |
+
print(f"Error getting embedding from Mistral API: {str(e)}")
|
| 122 |
+
return None
|
| 123 |
+
|
| 124 |
+
def get_recommendations(self, selected_movies: List[str], embedding_type: str = "LLM + GCL", user_preferences: str = "", alpha: float = 0.5) -> str:
|
| 125 |
+
"""
|
| 126 |
+
Get recommendations using proper embedding aggregation:
|
| 127 |
+
- e_h: embedding from user history (selected movies)
|
| 128 |
+
- e_u: embedding from user preferences (text)
|
| 129 |
+
- Combined: alpha * e_u + (1-alpha) * e_h
|
| 130 |
+
"""
|
| 131 |
+
if not selected_movies and not user_preferences:
|
| 132 |
+
return "Please select some movies or provide preferences."
|
| 133 |
+
|
| 134 |
+
# Choose embeddings based on type
|
| 135 |
+
if embedding_type == "LLM + GCL":
|
| 136 |
+
embeddings = self.gcl_embeddings
|
| 137 |
+
id_to_idx = self.gcl_id_to_idx
|
| 138 |
+
else:
|
| 139 |
+
embeddings = self.llm_embeddings
|
| 140 |
+
id_to_idx = self.llm_id_to_idx
|
| 141 |
+
|
| 142 |
+
user_profile = None
|
| 143 |
+
|
| 144 |
+
# Get embedding from user history (e_h)
|
| 145 |
+
e_h = None
|
| 146 |
+
if selected_movies:
|
| 147 |
+
movie_ids = [self.title_to_id[title] for title in selected_movies if title in self.title_to_id]
|
| 148 |
+
if movie_ids:
|
| 149 |
+
selected_embeddings = []
|
| 150 |
+
for movie_id in movie_ids:
|
| 151 |
+
if movie_id in id_to_idx:
|
| 152 |
+
idx = id_to_idx[movie_id]
|
| 153 |
+
selected_embeddings.append(embeddings[idx])
|
| 154 |
+
|
| 155 |
+
if selected_embeddings:
|
| 156 |
+
e_h = np.mean(selected_embeddings, axis=0)
|
| 157 |
+
|
| 158 |
+
# Get embedding from user preferences (e_u)
|
| 159 |
+
e_u = None
|
| 160 |
+
if user_preferences.strip():
|
| 161 |
+
e_u = self.get_text_embedding(user_preferences)
|
| 162 |
+
|
| 163 |
+
# Apply aggregation algorithm
|
| 164 |
+
if e_h is not None and e_u is not None:
|
| 165 |
+
# Both available: alpha * e_u + (1-alpha) * e_h
|
| 166 |
+
user_profile = alpha * e_u + (1 - alpha) * e_h
|
| 167 |
+
print(f"Using combined embedding: α={alpha} (preferences weight)")
|
| 168 |
+
elif e_u is not None:
|
| 169 |
+
# Only preferences available
|
| 170 |
+
user_profile = e_u
|
| 171 |
+
print("Using preferences-only embedding")
|
| 172 |
+
elif e_h is not None:
|
| 173 |
+
# Only history available
|
| 174 |
+
user_profile = e_h
|
| 175 |
+
print("Using history-only embedding")
|
| 176 |
+
else:
|
| 177 |
+
return "Could not create user profile from provided input."
|
| 178 |
+
|
| 179 |
+
# Calculate similarity with all movies
|
| 180 |
+
# Normalize user profile and embeddings for proper cosine similarity
|
| 181 |
+
user_profile_norm = user_profile / np.linalg.norm(user_profile)
|
| 182 |
+
embeddings_norm = embeddings / np.linalg.norm(embeddings, axis=1, keepdims=True)
|
| 183 |
+
|
| 184 |
+
# Calculate cosine similarity (normalized dot product)
|
| 185 |
+
similarities = np.dot(embeddings_norm, user_profile_norm)
|
| 186 |
+
|
| 187 |
+
print(f"Similarity range: {similarities.min():.3f} to {similarities.max():.3f}")
|
| 188 |
+
|
| 189 |
+
# Get top 100 most similar movies
|
| 190 |
+
top_indices = np.argsort(similarities)[-100:][::-1]
|
| 191 |
+
|
| 192 |
+
# Filter out selected movies and create recommendations
|
| 193 |
+
seen_titles = set(selected_movies) if selected_movies else set()
|
| 194 |
+
seen_clean_titles = set(self.clean_title_for_comparison(title) for title in seen_titles)
|
| 195 |
+
final_recommendations = []
|
| 196 |
+
|
| 197 |
+
# Get reverse mapping for the chosen embedding type
|
| 198 |
+
if embedding_type == "LLM + GCL":
|
| 199 |
+
idx_to_id = {idx: item_id for item_id, idx in self.gcl_id_to_idx.items()}
|
| 200 |
+
else:
|
| 201 |
+
idx_to_id = {idx: item_id for item_id, idx in self.llm_id_to_idx.items()}
|
| 202 |
+
|
| 203 |
+
for idx in top_indices:
|
| 204 |
+
if idx not in idx_to_id:
|
| 205 |
+
continue
|
| 206 |
+
|
| 207 |
+
item_id = idx_to_id[idx]
|
| 208 |
+
|
| 209 |
+
# Find the title for this item_id
|
| 210 |
+
title = None
|
| 211 |
+
for t, id_ in self.title_to_id.items():
|
| 212 |
+
if id_ == item_id:
|
| 213 |
+
title = t
|
| 214 |
+
break
|
| 215 |
+
|
| 216 |
+
if not title:
|
| 217 |
+
continue
|
| 218 |
+
|
| 219 |
+
clean_title = self.clean_title_for_comparison(title)
|
| 220 |
+
|
| 221 |
+
# Skip if exact title is in seen titles
|
| 222 |
+
if title in seen_titles:
|
| 223 |
+
continue
|
| 224 |
+
|
| 225 |
+
# Skip if clean version of title is in seen titles
|
| 226 |
+
if clean_title in seen_clean_titles:
|
| 227 |
+
continue
|
| 228 |
+
|
| 229 |
+
# Skip collections/trilogies if user has seen any part
|
| 230 |
+
is_collection = False
|
| 231 |
+
for seen_title in seen_titles:
|
| 232 |
+
seen_clean = self.clean_title_for_comparison(seen_title)
|
| 233 |
+
if seen_clean in clean_title or clean_title in seen_clean:
|
| 234 |
+
if any(marker in title.lower() for marker in ['collection', 'trilogy', 'series', 'complete']):
|
| 235 |
+
is_collection = True
|
| 236 |
+
break
|
| 237 |
+
if is_collection:
|
| 238 |
+
continue
|
| 239 |
+
|
| 240 |
+
# Check if this is a duplicate of already recommended movie
|
| 241 |
+
is_duplicate = any(
|
| 242 |
+
fuzz.ratio(clean_title, self.clean_title_for_comparison(rec[0])) > 90
|
| 243 |
+
for rec in final_recommendations
|
| 244 |
+
)
|
| 245 |
+
if is_duplicate:
|
| 246 |
+
continue
|
| 247 |
+
|
| 248 |
+
# Add with similarity score
|
| 249 |
+
final_recommendations.append((title, similarities[idx]))
|
| 250 |
+
if len(final_recommendations) >= 100:
|
| 251 |
+
break
|
| 252 |
+
|
| 253 |
+
if not final_recommendations:
|
| 254 |
+
return "No recommendations found based on your input."
|
| 255 |
+
|
| 256 |
+
return final_recommendations[:100] # Return top 100 for ranking agent
|
| 257 |
+
|
| 258 |
+
def create_interface():
|
| 259 |
+
try:
|
| 260 |
+
recommender = MovieRecommender()
|
| 261 |
+
except Exception as e:
|
| 262 |
+
print(f"Error initializing recommender: {str(e)}")
|
| 263 |
+
return None
|
| 264 |
+
|
| 265 |
+
with gr.Blocks() as iface:
|
| 266 |
+
gr.Markdown(
|
| 267 |
+
"""
|
| 268 |
+
# Movie Recommender
|
| 269 |
+
Get personalized movie recommendations based on your taste and preferences!
|
| 270 |
+
|
| 271 |
+
**How to use:**
|
| 272 |
+
1. Search and select movies you've enjoyed (no limit!)
|
| 273 |
+
2. Describe what kind of movie you're looking for (optional)
|
| 274 |
+
3. Adjust the preference weight (α) to balance between your description and movie history
|
| 275 |
+
4. Get personalized recommendations
|
| 276 |
+
"""
|
| 277 |
+
)
|
| 278 |
+
|
| 279 |
+
selected_movies = gr.State([])
|
| 280 |
+
retrieval_results = gr.State([]) # Store retrieval results for ranking
|
| 281 |
+
|
| 282 |
+
with gr.Row():
|
| 283 |
+
with gr.Column():
|
| 284 |
+
# Movie search and selection
|
| 285 |
+
movie_search_input = gr.Textbox(
|
| 286 |
+
label="Search movies",
|
| 287 |
+
placeholder="Type to search...",
|
| 288 |
+
interactive=True,
|
| 289 |
+
every=True
|
| 290 |
+
)
|
| 291 |
+
|
| 292 |
+
# Show search results as a list of clickable buttons
|
| 293 |
+
search_results = gr.Radio(
|
| 294 |
+
choices=[],
|
| 295 |
+
label="Search Results",
|
| 296 |
+
interactive=True,
|
| 297 |
+
visible=True
|
| 298 |
+
)
|
| 299 |
+
|
| 300 |
+
# Display selected movies with functional red cross buttons
|
| 301 |
+
with gr.Column(elem_id="selected_movies_container") as selected_movies_container:
|
| 302 |
+
selected_display = gr.HTML(
|
| 303 |
+
label="Your Selected Movies",
|
| 304 |
+
value="<p><i>No movies selected yet</i></p>"
|
| 305 |
+
)
|
| 306 |
+
|
| 307 |
+
# Individual delete buttons (simpler approach)
|
| 308 |
+
delete_buttons = []
|
| 309 |
+
for i in range(20): # Support up to 20 movies
|
| 310 |
+
btn = gr.Button(f"× Remove Movie {i+1}", visible=False, size="sm", variant="secondary")
|
| 311 |
+
delete_buttons.append(btn)
|
| 312 |
+
|
| 313 |
+
# Clear all button
|
| 314 |
+
clear_btn = gr.Button("Clear All", size="sm", variant="secondary")
|
| 315 |
+
|
| 316 |
+
# User preferences text field
|
| 317 |
+
user_preferences = gr.Textbox(
|
| 318 |
+
label="Describe what kind of movie you're looking for",
|
| 319 |
+
placeholder="E.g., 'A thrilling sci-fi movie with deep philosophical themes'",
|
| 320 |
+
lines=3
|
| 321 |
+
)
|
| 322 |
+
|
| 323 |
+
# Alpha slider
|
| 324 |
+
alpha = gr.Slider(
|
| 325 |
+
minimum=0,
|
| 326 |
+
maximum=1,
|
| 327 |
+
value=0.5,
|
| 328 |
+
step=0.1,
|
| 329 |
+
label="Preference Weight (α)",
|
| 330 |
+
info="0: Use only movie history, 1: Use only your description"
|
| 331 |
+
)
|
| 332 |
+
|
| 333 |
+
# Embedding type selection (defaulting to GCL)
|
| 334 |
+
embedding_type = gr.Radio(
|
| 335 |
+
choices=["LLM + GCL", "LLM"],
|
| 336 |
+
value="LLM + GCL",
|
| 337 |
+
label="Embedding Type",
|
| 338 |
+
info="Choose between pure language model embeddings (LLM) or graph-enhanced embeddings (LLM + GCL)"
|
| 339 |
+
)
|
| 340 |
+
|
| 341 |
+
# Get recommendations button
|
| 342 |
+
recommend_btn = gr.Button("Get Recommendations", variant="primary")
|
| 343 |
+
|
| 344 |
+
with gr.Column():
|
| 345 |
+
# Display recommendations with streaming
|
| 346 |
+
recommendations = gr.Markdown(
|
| 347 |
+
label="Your Personalized Recommendations",
|
| 348 |
+
value="Recommendations will appear here"
|
| 349 |
+
)
|
| 350 |
+
|
| 351 |
+
def update_search_results(query):
|
| 352 |
+
"""Update search results based on input"""
|
| 353 |
+
if not query or len(query.strip()) < 2:
|
| 354 |
+
return gr.Radio(choices=[], visible=False)
|
| 355 |
+
|
| 356 |
+
matches = recommender.search_movies(query)
|
| 357 |
+
# Limit display to first 20 for UI performance
|
| 358 |
+
display_matches = matches[:20] if len(matches) > 20 else matches
|
| 359 |
+
|
| 360 |
+
if display_matches:
|
| 361 |
+
return gr.Radio(choices=display_matches, visible=True)
|
| 362 |
+
else:
|
| 363 |
+
return gr.Radio(choices=[], visible=False)
|
| 364 |
+
|
| 365 |
+
def format_selected_movies_display(movies):
|
| 366 |
+
"""Format selected movies with remove buttons on same line"""
|
| 367 |
+
if not movies:
|
| 368 |
+
return "<p><i>No movies selected yet</i></p>"
|
| 369 |
+
|
| 370 |
+
html_items = []
|
| 371 |
+
for i, movie in enumerate(movies):
|
| 372 |
+
html_items.append(f"""
|
| 373 |
+
<div style="display: flex; align-items: center; justify-content: space-between;
|
| 374 |
+
padding: 8px 12px; margin: 4px 0; background-color: #f8f9fa;
|
| 375 |
+
border-radius: 6px; border-left: 3px solid #007bff;">
|
| 376 |
+
<span style="flex-grow: 1; font-size: 14px; margin-right: 10px;">{i+1}. {movie}</span>
|
| 377 |
+
</div>
|
| 378 |
+
""")
|
| 379 |
+
|
| 380 |
+
return f"<div>{''.join(html_items)}</div>"
|
| 381 |
+
|
| 382 |
+
def update_delete_buttons_visibility(movies):
|
| 383 |
+
"""Update visibility and labels of delete buttons"""
|
| 384 |
+
button_updates = []
|
| 385 |
+
for i in range(20): # Support up to 20 movies
|
| 386 |
+
if i < len(movies):
|
| 387 |
+
movie_name = movies[i][:40] + ("..." if len(movies[i]) > 40 else "")
|
| 388 |
+
button_updates.append(gr.Button(f"🗑️ {movie_name}", visible=True, size="sm", variant="secondary"))
|
| 389 |
+
else:
|
| 390 |
+
button_updates.append(gr.Button(f"× Remove Movie {i+1}", visible=False, size="sm", variant="secondary"))
|
| 391 |
+
|
| 392 |
+
return button_updates
|
| 393 |
+
|
| 394 |
+
def delete_movie_by_index(index, current_movies):
|
| 395 |
+
"""Delete movie at specific index"""
|
| 396 |
+
if not current_movies or index >= len(current_movies):
|
| 397 |
+
return current_movies, format_selected_movies_display(current_movies)
|
| 398 |
+
|
| 399 |
+
current_movies.pop(index)
|
| 400 |
+
return current_movies, format_selected_movies_display(current_movies)
|
| 401 |
+
|
| 402 |
+
def handle_movie_selection(selected_movie, current_movies):
|
| 403 |
+
"""Handle movie selection from radio buttons"""
|
| 404 |
+
if not selected_movie:
|
| 405 |
+
return [current_movies, format_selected_movies_display(current_movies)] + update_delete_buttons_visibility(current_movies)
|
| 406 |
+
|
| 407 |
+
# Check if it's a movie title (exists in our database)
|
| 408 |
+
if selected_movie in recommender.title_to_id:
|
| 409 |
+
# It's a movie selection - add it to the list
|
| 410 |
+
current_movies = current_movies or []
|
| 411 |
+
# Remove the 5-movie limit - users can now select as many as they want
|
| 412 |
+
|
| 413 |
+
if selected_movie not in current_movies:
|
| 414 |
+
current_movies.append(selected_movie)
|
| 415 |
+
|
| 416 |
+
return [current_movies, format_selected_movies_display(current_movies)] + update_delete_buttons_visibility(current_movies)
|
| 417 |
+
else:
|
| 418 |
+
# Not a movie from database
|
| 419 |
+
return [current_movies, format_selected_movies_display(current_movies)] + update_delete_buttons_visibility(current_movies)
|
| 420 |
+
|
| 421 |
+
def clear_all_movies():
|
| 422 |
+
"""Clear all selected movies"""
|
| 423 |
+
empty_movies = []
|
| 424 |
+
return [empty_movies, "<p><i>No movies selected yet</i></p>"] + update_delete_buttons_visibility(empty_movies)
|
| 425 |
+
|
| 426 |
+
def get_recommendations(movies, emb_type, preferences, pref_weight):
|
| 427 |
+
"""Get recommendations: retrieval phase only, then delegate to ranking_agent with streaming"""
|
| 428 |
+
if not movies and not preferences:
|
| 429 |
+
yield "Please select some movies or provide preferences"
|
| 430 |
+
return
|
| 431 |
+
|
| 432 |
+
try:
|
| 433 |
+
# RETRIEVAL PHASE: Get top 100 candidates using proper embedding aggregation
|
| 434 |
+
print(f"\n=== RETRIEVAL PHASE ===")
|
| 435 |
+
print(f"Selected movies: {movies}")
|
| 436 |
+
print(f"User preferences: '{preferences}'")
|
| 437 |
+
print(f"Alpha weight: {pref_weight}")
|
| 438 |
+
print(f"Embedding type: {emb_type}")
|
| 439 |
+
|
| 440 |
+
yield "🔍 Searching for similar movies..."
|
| 441 |
+
|
| 442 |
+
recommendations = recommender.get_recommendations(
|
| 443 |
+
selected_movies=movies,
|
| 444 |
+
embedding_type=emb_type,
|
| 445 |
+
user_preferences=preferences,
|
| 446 |
+
alpha=pref_weight
|
| 447 |
+
)
|
| 448 |
+
|
| 449 |
+
# Handle error cases
|
| 450 |
+
if isinstance(recommendations, str):
|
| 451 |
+
yield recommendations
|
| 452 |
+
return
|
| 453 |
+
|
| 454 |
+
# Print retrieval results
|
| 455 |
+
print(f"\nRETRIEVAL RESULTS: Found {len(recommendations)} candidates")
|
| 456 |
+
print("Top 100 from retrieval phase:")
|
| 457 |
+
for i, (title, score) in enumerate(recommendations[:100], 1):
|
| 458 |
+
print(f" {i:2d}. {title} (score: {score:.3f})")
|
| 459 |
+
|
| 460 |
+
# RERANKING + EXPLANATION PHASE: Delegate to ranking_agent with streaming
|
| 461 |
+
print(f"\n=== RERANKING PHASE ===")
|
| 462 |
+
print(f"Calling rank_with_ai with:")
|
| 463 |
+
print(f" - {len(recommendations)} recommendations")
|
| 464 |
+
print(f" - preferences: '{preferences}'")
|
| 465 |
+
print(f" - alpha: {pref_weight}")
|
| 466 |
+
print(f" - user_movies: {movies}")
|
| 467 |
+
|
| 468 |
+
yield "🤖 AI is ranking and explaining your recommendations..."
|
| 469 |
+
|
| 470 |
+
# Stream the responses from ranking agent
|
| 471 |
+
for partial_result in rank_with_ai(
|
| 472 |
+
recommendations=recommendations,
|
| 473 |
+
user_preferences=preferences,
|
| 474 |
+
alpha=pref_weight,
|
| 475 |
+
user_movies=movies
|
| 476 |
+
):
|
| 477 |
+
yield partial_result
|
| 478 |
+
|
| 479 |
+
except Exception as e:
|
| 480 |
+
print(f"ERROR in get_recommendations: {str(e)}")
|
| 481 |
+
import traceback
|
| 482 |
+
traceback.print_exc()
|
| 483 |
+
yield f"Error getting recommendations: {str(e)}"
|
| 484 |
+
|
| 485 |
+
# Event handlers
|
| 486 |
+
movie_search_input.change(
|
| 487 |
+
fn=update_search_results,
|
| 488 |
+
inputs=movie_search_input,
|
| 489 |
+
outputs=search_results
|
| 490 |
+
)
|
| 491 |
+
|
| 492 |
+
search_results.change(
|
| 493 |
+
fn=handle_movie_selection,
|
| 494 |
+
inputs=[search_results, selected_movies],
|
| 495 |
+
outputs=[selected_movies, selected_display] + delete_buttons
|
| 496 |
+
)
|
| 497 |
+
|
| 498 |
+
# Add individual delete button handlers
|
| 499 |
+
for i, btn in enumerate(delete_buttons):
|
| 500 |
+
def make_delete_handler(btn_idx):
|
| 501 |
+
def delete_handler(current_movies):
|
| 502 |
+
updated_movies, updated_display = delete_movie_by_index(btn_idx, current_movies)
|
| 503 |
+
return [updated_movies, updated_display] + update_delete_buttons_visibility(updated_movies)
|
| 504 |
+
return delete_handler
|
| 505 |
+
|
| 506 |
+
btn.click(
|
| 507 |
+
fn=make_delete_handler(i),
|
| 508 |
+
inputs=[selected_movies],
|
| 509 |
+
outputs=[selected_movies, selected_display] + delete_buttons
|
| 510 |
+
)
|
| 511 |
+
|
| 512 |
+
clear_btn.click(
|
| 513 |
+
fn=clear_all_movies,
|
| 514 |
+
inputs=[],
|
| 515 |
+
outputs=[selected_movies, selected_display] + delete_buttons
|
| 516 |
+
)
|
| 517 |
+
|
| 518 |
+
recommend_btn.click(
|
| 519 |
+
fn=get_recommendations,
|
| 520 |
+
inputs=[selected_movies, embedding_type, user_preferences, alpha],
|
| 521 |
+
outputs=recommendations
|
| 522 |
+
)
|
| 523 |
+
|
| 524 |
+
return iface
|
| 525 |
+
|
| 526 |
+
if __name__ == "__main__":
|
| 527 |
+
iface = create_interface()
|
| 528 |
+
if iface is not None:
|
| 529 |
+
iface.launch()
|
| 530 |
+
else:
|
| 531 |
+
print("\nPlease fix the issues above and try again.")
|
ranking_agent.py
ADDED
|
@@ -0,0 +1,128 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from typing import List, Tuple, Dict
|
| 2 |
+
from langchain_core.prompts import ChatPromptTemplate
|
| 3 |
+
from langchain_mistralai.chat_models import ChatMistralAI
|
| 4 |
+
import os
|
| 5 |
+
from dotenv import load_dotenv
|
| 6 |
+
|
| 7 |
+
load_dotenv()
|
| 8 |
+
|
| 9 |
+
def create_ranking_chain():
|
| 10 |
+
"""Create a ranking chain using new RunnableSequence format"""
|
| 11 |
+
prompt = ChatPromptTemplate.from_messages([
|
| 12 |
+
("system", """You are a movie recommendation expert. Your task is to select the top 10 most relevant movies from a list of recommended movies and provide the final formatted output with brief explanations.
|
| 13 |
+
|
| 14 |
+
Rules:
|
| 15 |
+
1. Always return exactly 10 movies
|
| 16 |
+
2. Consider both relevance scores and how well each movie matches user preferences
|
| 17 |
+
3. Pay attention to the alpha weighting parameter - it tells you how much to prioritize text preferences vs viewing history
|
| 18 |
+
4. Return only movies from the provided list
|
| 19 |
+
5. NEVER recommend movies that are already in the user's viewing history - these should be completely excluded
|
| 20 |
+
6. Format each movie exactly as: **1. Movie Title**\n[Exactly 2 sentences explaining why this movie matches their taste]\n\n
|
| 21 |
+
7. Number from 1 to 10, no additional text before or after"""),
|
| 22 |
+
("user", """Given these movie recommendations with their relevance scores:
|
| 23 |
+
{movie_scores}
|
| 24 |
+
|
| 25 |
+
User preferences: {preferences}
|
| 26 |
+
|
| 27 |
+
User's viewing history (DO NOT RECOMMEND ANY OF THESE): {user_movies}
|
| 28 |
+
|
| 29 |
+
Alpha weighting: {alpha}
|
| 30 |
+
(α=0.0 means recommendations were based entirely on viewing history, α=1.0 means entirely on text preferences, α=0.5 means equal balance)
|
| 31 |
+
|
| 32 |
+
Select the 10 most relevant movies and provide the final formatted output with explanations. Format each as:
|
| 33 |
+
**1. Movie Title**
|
| 34 |
+
[Exactly 2 sentences explaining why this movie matches their taste based on the weighted combination of their preferences and history]
|
| 35 |
+
|
| 36 |
+
**2. Movie Title**
|
| 37 |
+
[Exactly 2 sentences explaining why this movie matches their taste based on the weighted combination of their preferences and history]
|
| 38 |
+
|
| 39 |
+
...continue for all 10 movies.
|
| 40 |
+
|
| 41 |
+
Remember: NEVER include any movie from the user's viewing history in your recommendations.""")
|
| 42 |
+
])
|
| 43 |
+
|
| 44 |
+
model = ChatMistralAI(
|
| 45 |
+
mistral_api_key=os.environ["MISTRAL_API_KEY"],
|
| 46 |
+
model="mistral-large-latest",
|
| 47 |
+
temperature=0.5,
|
| 48 |
+
max_tokens=1200,
|
| 49 |
+
streaming=True
|
| 50 |
+
)
|
| 51 |
+
|
| 52 |
+
return prompt | model
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
def rank_with_ai(recommendations: List[Tuple[str, float]], user_preferences: str = "", alpha: float = 0.5, user_movies: List[str] = None):
|
| 57 |
+
"""
|
| 58 |
+
Complete reranking and explanation pipeline with streaming:
|
| 59 |
+
1. Takes top 100 candidates from retrieval phase
|
| 60 |
+
2. Reranks to top 10 using AI
|
| 61 |
+
3. Generates explanations with streaming
|
| 62 |
+
4. Yields partial formatted responses
|
| 63 |
+
|
| 64 |
+
Args:
|
| 65 |
+
recommendations: List of (movie_title, relevance_score) tuples from retrieval phase
|
| 66 |
+
user_preferences: User's textual preferences/description
|
| 67 |
+
alpha: Weighting parameter (0.0 = only history matters, 1.0 = only preferences matter)
|
| 68 |
+
user_movies: List of user's selected movies for context
|
| 69 |
+
"""
|
| 70 |
+
print(f"\n=== RANKING_AGENT DEBUG ===")
|
| 71 |
+
print(f"Received {len(recommendations) if recommendations else 0} recommendations")
|
| 72 |
+
print(f"User preferences: '{user_preferences}' (length: {len(user_preferences) if user_preferences else 0})")
|
| 73 |
+
print(f"Alpha: {alpha}")
|
| 74 |
+
print(f"User movies: {user_movies}")
|
| 75 |
+
|
| 76 |
+
if not recommendations:
|
| 77 |
+
yield "No recommendations available."
|
| 78 |
+
return
|
| 79 |
+
|
| 80 |
+
# Take only top 100 recommendations if more are provided
|
| 81 |
+
recommendations = recommendations[:100]
|
| 82 |
+
|
| 83 |
+
try:
|
| 84 |
+
# Format movie scores for ranking
|
| 85 |
+
movie_scores = "\n".join(
|
| 86 |
+
f"{title} (relevance: {score:.3f})"
|
| 87 |
+
for title, score in recommendations
|
| 88 |
+
)
|
| 89 |
+
|
| 90 |
+
# Start with header
|
| 91 |
+
result_header = "## 🎬 Your Personalized Movie Recommendations\n\n"
|
| 92 |
+
|
| 93 |
+
if user_movies and user_preferences:
|
| 94 |
+
result_header += f"*Based on α={alpha} weighting: {int((1-alpha)*100)}% your viewing history + {int(alpha*100)}% your preferences*\n\n"
|
| 95 |
+
elif user_preferences:
|
| 96 |
+
result_header += f"*Based entirely on your preferences: \"{user_preferences}\"*\n\n"
|
| 97 |
+
elif user_movies:
|
| 98 |
+
result_header += f"*Based entirely on your viewing history*\n\n"
|
| 99 |
+
|
| 100 |
+
result_header += "---\n\n"
|
| 101 |
+
yield result_header
|
| 102 |
+
|
| 103 |
+
# Single chain that does both ranking and explanation
|
| 104 |
+
ranking_chain = create_ranking_chain()
|
| 105 |
+
print("Calling unified ranking + explanation chain...")
|
| 106 |
+
|
| 107 |
+
# Stream the response directly
|
| 108 |
+
accumulated_text = result_header
|
| 109 |
+
for chunk in ranking_chain.stream({
|
| 110 |
+
"movie_scores": movie_scores,
|
| 111 |
+
"preferences": user_preferences if user_preferences else "No specific preferences provided",
|
| 112 |
+
"user_movies": ", ".join(user_movies) if user_movies else "None",
|
| 113 |
+
"alpha": alpha
|
| 114 |
+
}):
|
| 115 |
+
if chunk.content:
|
| 116 |
+
accumulated_text += chunk.content
|
| 117 |
+
yield accumulated_text
|
| 118 |
+
|
| 119 |
+
except Exception as e:
|
| 120 |
+
print(f"ERROR in rank_with_ai: {str(e)}")
|
| 121 |
+
import traceback
|
| 122 |
+
traceback.print_exc()
|
| 123 |
+
# Fallback to simple format
|
| 124 |
+
result = "## 🎬 Your Recommendations\n\n"
|
| 125 |
+
for i, (title, score) in enumerate(recommendations[:10], 1):
|
| 126 |
+
result += f"**{i}. {title}**\n"
|
| 127 |
+
result += f"*Similarity: {score:.3f}*\n\n"
|
| 128 |
+
yield result
|
requirements.txt
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
flask==2.0.1
|
| 2 |
+
numpy>=1.21.0
|
| 3 |
+
pandas>=1.3.0
|
| 4 |
+
scipy>=1.7.1
|
| 5 |
+
rapidfuzz>=3.0.0
|
| 6 |
+
requests>=2.31.0
|
| 7 |
+
tqdm>=4.66.1
|
| 8 |
+
scikit-learn>=1.0.0
|
| 9 |
+
datasets>=2.17.0
|
| 10 |
+
python-dotenv>=1.0.1
|
| 11 |
+
langchain>=0.1.9
|
| 12 |
+
langchain-core>=0.1.27
|
| 13 |
+
langchain-mistralai>=0.0.5
|
| 14 |
+
gradio>=4.19.2
|