nimishgarg commited on
Commit
dfa48b6
Β·
verified Β·
1 Parent(s): 9b58be1

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +119 -0
README.md ADDED
@@ -0,0 +1,119 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Sentence Transformer Quantized Model for Movie Recommendation on Movie-Lens-Dataset
3
+
4
+ This repository hosts a quantized version of the Sentence Transformer model, fine-tuned for Movie Recommendation using the Movie Lens dataset. The model has been optimized using FP16 quantization for efficient deployment without significant accuracy loss.
5
+
6
+ ## Model Details
7
+
8
+ - **Model Architecture:** Sentence Transformer
9
+ - **Task:** Movie Recommendation
10
+ - **Dataset:** Movie Lens Dataset
11
+ - **Quantization:** Float16
12
+ - **Fine-tuning Framework:** Hugging Face Transformers
13
+
14
+ ---
15
+
16
+ ## Installation
17
+
18
+ ```bash
19
+ !pip install pandas torch sentence-transformers scikit-learn
20
+
21
+ ```
22
+
23
+ ---
24
+
25
+ ## Loading the Model
26
+
27
+ ```python
28
+ from sentence_transformers import SentenceTransformer, InputExample, losses, util
29
+ import torch
30
+
31
+ # Load model
32
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
33
+ model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2', device=device)
34
+
35
+ # pass the movie name
36
+ recommend_by_movie_name("Toy Story")
37
+
38
+
39
+ # Recommend Movies
40
+ def recommend_by_movie_name(movie_name, top_k=5):
41
+ titles = movie_subset["title"].tolist()
42
+ matches = get_close_matches(movie_name, titles, n=1, cutoff=0.6)
43
+
44
+ if not matches:
45
+ print(f"❌ Movie '{movie_name}' not found in dataset.")
46
+ return
47
+
48
+ matched_title = matches[0]
49
+ movie_index = movie_subset[movie_subset["title"] == matched_title].index[0]
50
+
51
+ query_embedding = movie_embeddings[movie_index]
52
+ scores = util.pytorch_cos_sim(query_embedding, movie_embeddings)[0]
53
+ top_results = torch.topk(scores, k=top_k + 1)
54
+
55
+ print(f"\n🎬 Recommendations for: {matched_title}")
56
+ for score, idx_tensor in zip(top_results[0][1:], top_results[1][1:]): # skip itself
57
+ idx = idx_tensor.item() # βœ… Convert tensor to int
58
+ title = movie_subset.iloc[idx]["title"]
59
+ print(f" {title} (Score: {score:.4f})")
60
+
61
+ ```
62
+
63
+ ---
64
+
65
+
66
+ ---
67
+
68
+ ## Fine-Tuning Details
69
+
70
+ ### Dataset
71
+
72
+ The dataset is sourced from Hugging Face’s `Movie-Lens` dataset. It contains 20,000 movies and their genres.
73
+
74
+ ### Training
75
+
76
+ - **Epochs:** 2
77
+ - **warmup_steps:** 100
78
+ - **show_progress_bar:** True
79
+ - **Evaluation strategy:** `epoch`
80
+
81
+ ---
82
+
83
+ ## Quantization
84
+
85
+ Post-training quantization was applied using PyTorch’s `half()` precision (FP16) to reduce model size and inference time.
86
+
87
+ ---
88
+
89
+ ## Repository Structure
90
+
91
+ ```python
92
+ .
93
+ β”œβ”€β”€ quantized-model/ # Contains the quantized model files
94
+ β”‚ β”œβ”€β”€ config.json
95
+ β”‚ β”œβ”€β”€ model.safetensors
96
+ β”‚ β”œβ”€β”€ tokenizer_config.json
97
+ β”‚ β”œβ”€β”€ modules.json
98
+ β”‚ └── special_tokens_map.json
99
+ β”‚ β”œβ”€β”€ sentence_bert_config.jason
100
+ β”‚ └── tokenizer.json
101
+ β”‚ β”œβ”€β”€ config_sentence_transformers.jason
102
+ β”‚ └── vocab.txt
103
+
104
+ β”œβ”€β”€ README.md # Model documentation
105
+ ```
106
+
107
+ ---
108
+
109
+ ## Limitations
110
+
111
+ - The model is trained specifically for Movie Recommendation on Movies Dataset.
112
+ - FP16 quantization may result in slight numerical instability in edge cases.
113
+
114
+
115
+ ---
116
+
117
+ ## Contributing
118
+
119
+ Feel free to open issues or submit pull requests to improve the model or documentation.