AventIQ-AI
/

Movie-Recommendation-Using-Sentence-Transormer

Model card Files Files and versions

Movie-Recommendation-Using-Sentence-Transormer / README.md

nimishgarg's picture

Upload README.md

dfa48b6 verified 9 months ago

|

history blame contribute delete

3.15 kB


	# Sentence Transformer Quantized Model for Movie Recommendation on Movie-Lens-Dataset

	This repository hosts a quantized version of the Sentence Transformer model, fine-tuned for Movie Recommendation using the Movie Lens dataset. The model has been optimized using FP16 quantization for efficient deployment without significant accuracy loss.

	## Model Details

	- Model Architecture: Sentence Transformer
	- Task: Movie Recommendation
	- Dataset: Movie Lens Dataset
	- Quantization: Float16
	- Fine-tuning Framework: Hugging Face Transformers

	---

	## Installation

	```bash
	!pip install pandas torch sentence-transformers scikit-learn

	```

	---

	## Loading the Model

	```python
	from sentence_transformers import SentenceTransformer, InputExample, losses, util
	import torch

	# Load model
	device = 'cuda' if torch.cuda.is_available() else 'cpu'
	model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2', device=device)

	# pass the movie name
	recommend_by_movie_name("Toy Story")


	# Recommend Movies
	def recommend_by_movie_name(movie_name, top_k=5):
	titles = movie_subset["title"].tolist()
	matches = get_close_matches(movie_name, titles, n=1, cutoff=0.6)

	if not matches:
	print(f"❌ Movie '{movie_name}' not found in dataset.")
	return

	matched_title = matches[0]
	movie_index = movie_subset[movie_subset["title"] == matched_title].index[0]

	query_embedding = movie_embeddings[movie_index]
	scores = util.pytorch_cos_sim(query_embedding, movie_embeddings)[0]
	top_results = torch.topk(scores, k=top_k + 1)

	print(f"\n🎬 Recommendations for: {matched_title}")
	for score, idx_tensor in zip(top_results[0][1:], top_results[1][1:]): # skip itself
	idx = idx_tensor.item() # ✅ Convert tensor to int
	title = movie_subset.iloc[idx]["title"]
	print(f" {title} (Score: {score:.4f})")

	```

	---


	---

	## Fine-Tuning Details

	### Dataset

	The dataset is sourced from Hugging Face’s `Movie-Lens` dataset. It contains 20,000 movies and their genres.

	### Training

	- Epochs: 2
	- warmup_steps: 100
	- show_progress_bar: True
	- Evaluation strategy: `epoch`

	---

	## Quantization

	Post-training quantization was applied using PyTorch’s `half()` precision (FP16) to reduce model size and inference time.

	---

	## Repository Structure

	```python
	.
	├── quantized-model/ # Contains the quantized model files
	│ ├── config.json
	│ ├── model.safetensors
	│ ├── tokenizer_config.json
	│ ├── modules.json
	│ └── special_tokens_map.json
	│ ├── sentence_bert_config.jason
	│ └── tokenizer.json
	│ ├── config_sentence_transformers.jason
	│ └── vocab.txt

	├── README.md # Model documentation
	```

	---

	## Limitations

	- The model is trained specifically for Movie Recommendation on Movies Dataset.
	- FP16 quantization may result in slight numerical instability in edge cases.


	---

	## Contributing

	Feel free to open issues or submit pull requests to improve the model or documentation.