AventIQ-AI
/

Movie-Recommendation-Using-Sentence-Transormer

Model card Files Files and versions

nimishgarg commited on May 15, 2025

Commit

dfa48b6

·

verified ·

1 Parent(s): 9b58be1

Upload README.md

Files changed (1) hide show

README.md +119 -0

README.md ADDED Viewed

	@@ -0,0 +1,119 @@

+# Sentence Transformer Quantized Model for Movie Recommendation on Movie-Lens-Dataset
+This repository hosts a quantized version of the Sentence Transformer model, fine-tuned for Movie Recommendation using the Movie Lens dataset. The model has been optimized using FP16 quantization for efficient deployment without significant accuracy loss.
+## Model Details
+- **Model Architecture:** Sentence Transformer
+- **Task:** Movie Recommendation
+- **Dataset:** Movie Lens Dataset
+- **Quantization:** Float16
+- **Fine-tuning Framework:** Hugging Face Transformers
+---
+## Installation
+```bash
+!pip install pandas torch sentence-transformers scikit-learn
+```
+---
+## Loading the Model
+```python
+from sentence_transformers import SentenceTransformer, InputExample, losses, util
+import torch
+# Load  model
+device = 'cuda' if torch.cuda.is_available() else 'cpu'
+model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2', device=device)
+# pass the movie name
+recommend_by_movie_name("Toy Story")
+# Recommend Movies
+def recommend_by_movie_name(movie_name, top_k=5):
+    titles = movie_subset["title"].tolist()
+    matches = get_close_matches(movie_name, titles, n=1, cutoff=0.6)
+    if not matches:
+        print(f"❌ Movie '{movie_name}' not found in dataset.")
+        return
+    matched_title = matches[0]
+    movie_index = movie_subset[movie_subset["title"] == matched_title].index[0]
+    query_embedding = movie_embeddings[movie_index]
+    scores = util.pytorch_cos_sim(query_embedding, movie_embeddings)[0]
+    top_results = torch.topk(scores, k=top_k + 1)
+    print(f"\n🎬 Recommendations for: {matched_title}")
+    for score, idx_tensor in zip(top_results[0][1:], top_results[1][1:]):  # skip itself
+        idx = idx_tensor.item()  # ✅ Convert tensor to int
+        title = movie_subset.iloc[idx]["title"]
+        print(f"  {title} (Score: {score:.4f})")
+```
+---
+---
+## Fine-Tuning Details
+### Dataset
+The dataset is sourced from Hugging Face’s `Movie-Lens` dataset. It contains 20,000 movies and their genres.
+### Training
+- **Epochs:** 2
+- **warmup_steps:** 100
+- **show_progress_bar:** True
+- **Evaluation strategy:** `epoch`
+---
+## Quantization
+Post-training quantization was applied using PyTorch’s `half()` precision (FP16) to reduce model size and inference time.
+---
+## Repository Structure
+```python
+.
+├── quantized-model/               # Contains the quantized model files
+│   ├── config.json
+│   ├── model.safetensors
+│   ├── tokenizer_config.json
+│   ├── modules.json
+│   └── special_tokens_map.json
+│   ├── sentence_bert_config.jason
+│   └── tokenizer.json
+│   ├── config_sentence_transformers.jason
+│   └── vocab.txt
+├── README.md                      # Model documentation
+```
+---
+## Limitations
+- The model is trained specifically for Movie Recommendation on Movies Dataset.
+- FP16 quantization may result in slight numerical instability in edge cases.
+---
+## Contributing
+Feel free to open issues or submit pull requests to improve the model or documentation.