|
|
--- |
|
|
library_name: transformers |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
- accuracy |
|
|
--- |
|
|
|
|
|
# Model Card for Model ID |
|
|
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
This fine-tuned BERT model is a multilabel multiclass classifier designed to predict the genre of a movie based on its summary. It has been specifically trained to classify movies into one or more of the following genres: Drama, Action, Comedy, Animation, and Crime. The model leverages the capabilities of the BERT architecture to understand and interpret the nuances of movie summaries, providing accurate and potentially multiple genre predictions for each movie. |
|
|
|
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
|
|
- **Developed by:** Sinanmz |
|
|
- **Model type:** Multiclass Multilabel Classifier |
|
|
- **Language(s) (NLP):** English |
|
|
- **License:** MIT |
|
|
- **Finetuned from model:** google-bert/bert-base-uncased |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
|
|
- **Repository:** https://github.com/Sinanmz/MIR |
|
|
|
|
|
## Uses |
|
|
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
|
|
This BERT-based multilabel multiclass classifier is designed to predict the genre(s) of a movie based on its summary. It can be utilized in various applications, including but not limited to: |
|
|
|
|
|
- **Content Recommendation Systems:** Enhancing the accuracy of movie recommendation engines by predicting genres from summaries, allowing for better personalization. |
|
|
- **Movie Cataloging:** Assisting in the organization and tagging of movies in large databases or streaming platforms. |
|
|
- **Search Optimization:** Improving search results by classifying movies into multiple genres, thereby providing more relevant hits for user queries. |
|
|
- **Content Filtering:** Helping users find movies that match their preferences by identifying and categorizing movies into multiple genres. |
|
|
|
|
|
**Foreseeable Users:** |
|
|
|
|
|
- **Streaming Services:** To enhance content recommendation algorithms and search functionalities. |
|
|
- **Movie Database Administrators:** To automate the process of tagging and organizing movies. |
|
|
- **Developers:** Building applications that require genre classification from textual summaries. |
|
|
|
|
|
**Affected Parties:** |
|
|
|
|
|
- **Viewers/Consumers:** Benefiting from improved content recommendations and search results. |
|
|
- **Content Creators:** Gaining better visibility through accurate classification and tagging of their work. |
|
|
- **Platform Operators:** Improving user engagement and satisfaction with more personalized and accurate content delivery. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
Use the code below to get started with the model. |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# Load the tokenizer and the model |
|
|
tokenizer = AutoTokenizer.from_pretrained('Sinanmz/Movie_Genre_Classifier') |
|
|
model = AutoModelForSequenceClassification.from_pretrained('Sinanmz/Movie_Genre_Classifier') |
|
|
|
|
|
# Example movie summary (summary of Dune: Part Two) |
|
|
movie_summary = """Paul Atreides unites with Chani and the Fremen while on a warpath of |
|
|
revenge against the conspirators who destroyed his family. Facing a choice between the |
|
|
love of his life and the fate of the known universe, he endeavors to prevent a terrible |
|
|
future only he can foresee.""" |
|
|
|
|
|
# Tokenize the input |
|
|
inputs = tokenizer(movie_summary, return_tensors="pt", truncation=True, padding=True) |
|
|
|
|
|
# Get model predictions |
|
|
outputs = model(**inputs) |
|
|
logits = outputs.logits |
|
|
|
|
|
# Convert logits to probabilities |
|
|
probs = torch.sigmoid(logits) |
|
|
|
|
|
# Print the predicted genres |
|
|
genre_labels = ["Action", "Drama", "Comedy", "Animation", "Crime"] |
|
|
predicted_genres = [genre_labels[i] for i in range(len(genre_labels)) if probs[0][i] >= 0.5] |
|
|
|
|
|
print(f"Predicted genres: {predicted_genres}") |
|
|
|
|
|
# Output: |
|
|
# Predicted genres: ['Action', 'Drama'] |
|
|
``` |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Evaluation |
|
|
|
|
|
#### Metrics |
|
|
|
|
|
The evaluation metrics used for this model include precision, recall, and F1-score. These metrics were chosen because they provide a comprehensive view of the model's performance, particularly in a multilabel classification setting where it is important to understand not only how many correct predictions were made but also the balance between precision (accuracy of the positive predictions) and recall (the ability to find all positive instances). |
|
|
|
|
|
### Results |
|
|
|
|
|
#### Summary |
|
|
|
|
|
Below are the classification reports for the train, validation, and test splits of the dataset. |
|
|
|
|
|
**Classification Report for Train Split:** |
|
|
``` |
|
|
precision recall f1-score support |
|
|
|
|
|
Action 1.00 1.00 1.00 1655 |
|
|
Drama 1.00 1.00 1.00 4109 |
|
|
Comedy 1.00 1.00 1.00 2094 |
|
|
Animation 1.00 1.00 1.00 669 |
|
|
Crime 1.00 1.00 1.00 1284 |
|
|
|
|
|
micro avg 1.00 1.00 1.00 9811 |
|
|
macro avg 1.00 1.00 1.00 9811 |
|
|
weighted avg 1.00 1.00 1.00 9811 |
|
|
samples avg 1.00 1.00 1.00 9811 |
|
|
``` |
|
|
|
|
|
**Classification Report for Val Split:** |
|
|
``` |
|
|
precision recall f1-score support |
|
|
|
|
|
Action 0.70 0.73 0.71 220 |
|
|
Drama 0.77 0.84 0.80 507 |
|
|
Comedy 0.69 0.54 0.61 260 |
|
|
Animation 0.59 0.44 0.50 80 |
|
|
Crime 0.72 0.66 0.69 165 |
|
|
|
|
|
micro avg 0.73 0.71 0.72 1232 |
|
|
macro avg 0.70 0.64 0.66 1232 |
|
|
weighted avg 0.72 0.71 0.71 1232 |
|
|
samples avg 0.75 0.74 0.71 1232 |
|
|
``` |
|
|
|
|
|
**Classification Report for Test Split:** |
|
|
``` |
|
|
precision recall f1-score support |
|
|
|
|
|
Action 0.62 0.66 0.64 191 |
|
|
Drama 0.80 0.85 0.82 520 |
|
|
Comedy 0.69 0.58 0.63 260 |
|
|
Animation 0.60 0.49 0.54 78 |
|
|
Crime 0.65 0.67 0.66 154 |
|
|
|
|
|
micro avg 0.72 0.71 0.72 1203 |
|
|
macro avg 0.67 0.65 0.66 1203 |
|
|
weighted avg 0.72 0.71 0.71 1203 |
|
|
samples avg 0.75 0.75 0.72 1203 |
|
|
``` |
|
|
|
|
|
The results indicate that the model performs well on the training data with high precision, recall, and F1-scores across all genres. However, there is a drop in performance on the validation and test splits, highlighting areas where the model could be further improved to generalize better to unseen data. |
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
Sina Namazi |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
- **Github:** github.com/Sinanmz |
|
|
- **Hugging Face:** huggingface.co/Sinanmz |