Update README.md

e74960a verified over 1 year ago

6.76 kB

	---
	library_name: transformers
	license: mit
	language:
	- en
	metrics:
	- f1
	- precision
	- recall
	- accuracy
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->
	This fine-tuned BERT model is a multilabel multiclass classifier designed to predict the genre of a movie based on its summary. It has been specifically trained to classify movies into one or more of the following genres: Drama, Action, Comedy, Animation, and Crime. The model leverages the capabilities of the BERT architecture to understand and interpret the nuances of movie summaries, providing accurate and potentially multiple genre predictions for each movie.


	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	- Developed by: Sinanmz
	- Model type: Multiclass Multilabel Classifier
	- Language(s) (NLP): English
	- License: MIT
	- Finetuned from model: google-bert/bert-base-uncased

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/Sinanmz/MIR

	## Uses

	<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

	This BERT-based multilabel multiclass classifier is designed to predict the genre(s) of a movie based on its summary. It can be utilized in various applications, including but not limited to:

	- Content Recommendation Systems: Enhancing the accuracy of movie recommendation engines by predicting genres from summaries, allowing for better personalization.
	- Movie Cataloging: Assisting in the organization and tagging of movies in large databases or streaming platforms.
	- Search Optimization: Improving search results by classifying movies into multiple genres, thereby providing more relevant hits for user queries.
	- Content Filtering: Helping users find movies that match their preferences by identifying and categorizing movies into multiple genres.

	Foreseeable Users:

	- Streaming Services: To enhance content recommendation algorithms and search functionalities.
	- Movie Database Administrators: To automate the process of tagging and organizing movies.
	- Developers: Building applications that require genre classification from textual summaries.

	Affected Parties:

	- Viewers/Consumers: Benefiting from improved content recommendations and search results.
	- Content Creators: Gaining better visibility through accurate classification and tagging of their work.
	- Platform Operators: Improving user engagement and satisfaction with more personalized and accurate content delivery.

	## How to Get Started with the Model

	Use the code below to get started with the model.

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# Load the tokenizer and the model
	tokenizer = AutoTokenizer.from_pretrained('Sinanmz/Movie_Genre_Classifier')
	model = AutoModelForSequenceClassification.from_pretrained('Sinanmz/Movie_Genre_Classifier')

	# Example movie summary (summary of Dune: Part Two)
	movie_summary = """Paul Atreides unites with Chani and the Fremen while on a warpath of
	revenge against the conspirators who destroyed his family. Facing a choice between the
	love of his life and the fate of the known universe, he endeavors to prevent a terrible
	future only he can foresee."""

	# Tokenize the input
	inputs = tokenizer(movie_summary, return_tensors="pt", truncation=True, padding=True)

	# Get model predictions
	outputs = model(**inputs)
	logits = outputs.logits

	# Convert logits to probabilities
	probs = torch.sigmoid(logits)

	# Print the predicted genres
	genre_labels = ["Action", "Drama", "Comedy", "Animation", "Crime"]
	predicted_genres = [genre_labels[i] for i in range(len(genre_labels)) if probs[0][i] >= 0.5]

	print(f"Predicted genres: {predicted_genres}")

	# Output:
	# Predicted genres: ['Action', 'Drama']
	```




	## Evaluation

	#### Metrics

	The evaluation metrics used for this model include precision, recall, and F1-score. These metrics were chosen because they provide a comprehensive view of the model's performance, particularly in a multilabel classification setting where it is important to understand not only how many correct predictions were made but also the balance between precision (accuracy of the positive predictions) and recall (the ability to find all positive instances).

	### Results

	#### Summary

	Below are the classification reports for the train, validation, and test splits of the dataset.

	Classification Report for Train Split:
	```
	precision recall f1-score support

	Action 1.00 1.00 1.00 1655
	Drama 1.00 1.00 1.00 4109
	Comedy 1.00 1.00 1.00 2094
	Animation 1.00 1.00 1.00 669
	Crime 1.00 1.00 1.00 1284

	micro avg 1.00 1.00 1.00 9811
	macro avg 1.00 1.00 1.00 9811
	weighted avg 1.00 1.00 1.00 9811
	samples avg 1.00 1.00 1.00 9811
	```

	Classification Report for Val Split:
	```
	precision recall f1-score support

	Action 0.70 0.73 0.71 220
	Drama 0.77 0.84 0.80 507
	Comedy 0.69 0.54 0.61 260
	Animation 0.59 0.44 0.50 80
	Crime 0.72 0.66 0.69 165

	micro avg 0.73 0.71 0.72 1232
	macro avg 0.70 0.64 0.66 1232
	weighted avg 0.72 0.71 0.71 1232
	samples avg 0.75 0.74 0.71 1232
	```

	Classification Report for Test Split:
	```
	precision recall f1-score support

	Action 0.62 0.66 0.64 191
	Drama 0.80 0.85 0.82 520
	Comedy 0.69 0.58 0.63 260
	Animation 0.60 0.49 0.54 78
	Crime 0.65 0.67 0.66 154

	micro avg 0.72 0.71 0.72 1203
	macro avg 0.67 0.65 0.66 1203
	weighted avg 0.72 0.71 0.71 1203
	samples avg 0.75 0.75 0.72 1203
	```

	The results indicate that the model performs well on the training data with high precision, recall, and F1-scores across all genres. However, there is a drop in performance on the validation and test splits, highlighting areas where the model could be further improved to generalize better to unseen data.

	## Model Card Authors

	Sina Namazi

	## Model Card Contact

	- Github: github.com/Sinanmz
	- Hugging Face: huggingface.co/Sinanmz