File size: 6,758 Bytes
0c98c4f
 
d4f7da0
 
 
 
 
 
 
 
0c98c4f
 
 
 
 
d4f7da0
0c98c4f
 
 
 
 
 
 
 
04823ae
 
 
 
a969513
0c98c4f
872a937
0c98c4f
 
 
04823ae
0c98c4f
 
 
 
d4f7da0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
04823ae
0c98c4f
 
 
04823ae
 
 
0c98c4f
04823ae
 
 
0c98c4f
04823ae
 
 
 
 
0c98c4f
04823ae
 
0c98c4f
04823ae
 
 
0c98c4f
04823ae
 
0c98c4f
04823ae
 
 
0c98c4f
04823ae
0c98c4f
04823ae
 
 
0c98c4f
 
 
 
dad8859
0c98c4f
 
 
dad8859
0c98c4f
 
 
 
 
dad8859
0c98c4f
dad8859
 
 
 
 
 
 
 
 
 
 
 
e74960a
dad8859
 
0c98c4f
dad8859
 
 
 
 
 
 
 
 
 
 
 
e74960a
dad8859
 
0c98c4f
dad8859
 
 
 
 
 
 
 
 
 
 
 
e74960a
dad8859
 
0c98c4f
dad8859
0c98c4f
dad8859
0c98c4f
dad8859
0c98c4f
 
 
dad8859
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
library_name: transformers
license: mit
language:
- en
metrics:
- f1
- precision
- recall
- accuracy
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
This fine-tuned BERT model is a multilabel multiclass classifier designed to predict the genre of a movie based on its summary. It has been specifically trained to classify movies into one or more of the following genres: Drama, Action, Comedy, Animation, and Crime. The model leverages the capabilities of the BERT architecture to understand and interpret the nuances of movie summaries, providing accurate and potentially multiple genre predictions for each movie.


## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

- **Developed by:** Sinanmz
- **Model type:** Multiclass Multilabel Classifier
- **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model:** google-bert/bert-base-uncased

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/Sinanmz/MIR

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

This BERT-based multilabel multiclass classifier is designed to predict the genre(s) of a movie based on its summary. It can be utilized in various applications, including but not limited to:

- **Content Recommendation Systems:** Enhancing the accuracy of movie recommendation engines by predicting genres from summaries, allowing for better personalization.
- **Movie Cataloging:** Assisting in the organization and tagging of movies in large databases or streaming platforms.
- **Search Optimization:** Improving search results by classifying movies into multiple genres, thereby providing more relevant hits for user queries.
- **Content Filtering:** Helping users find movies that match their preferences by identifying and categorizing movies into multiple genres.

**Foreseeable Users:**

- **Streaming Services:** To enhance content recommendation algorithms and search functionalities.
- **Movie Database Administrators:** To automate the process of tagging and organizing movies.
- **Developers:** Building applications that require genre classification from textual summaries.

**Affected Parties:**

- **Viewers/Consumers:** Benefiting from improved content recommendations and search results.
- **Content Creators:** Gaining better visibility through accurate classification and tagging of their work.
- **Platform Operators:** Improving user engagement and satisfaction with more personalized and accurate content delivery.

## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained('Sinanmz/Movie_Genre_Classifier')
model = AutoModelForSequenceClassification.from_pretrained('Sinanmz/Movie_Genre_Classifier')

# Example movie summary (summary of Dune: Part Two)
movie_summary = """Paul Atreides unites with Chani and the Fremen while on a warpath of 
revenge against the conspirators who destroyed his family. Facing a choice between the 
love of his life and the fate of the known universe, he endeavors to prevent a terrible 
future only he can foresee."""

# Tokenize the input
inputs = tokenizer(movie_summary, return_tensors="pt", truncation=True, padding=True)

# Get model predictions
outputs = model(**inputs)
logits = outputs.logits

# Convert logits to probabilities
probs = torch.sigmoid(logits)

# Print the predicted genres
genre_labels = ["Action", "Drama", "Comedy", "Animation", "Crime"]
predicted_genres = [genre_labels[i] for i in range(len(genre_labels)) if probs[0][i] >= 0.5]

print(f"Predicted genres: {predicted_genres}")

# Output:
# Predicted genres: ['Action', 'Drama']
```




## Evaluation

#### Metrics

The evaluation metrics used for this model include precision, recall, and F1-score. These metrics were chosen because they provide a comprehensive view of the model's performance, particularly in a multilabel classification setting where it is important to understand not only how many correct predictions were made but also the balance between precision (accuracy of the positive predictions) and recall (the ability to find all positive instances).

### Results

#### Summary

Below are the classification reports for the train, validation, and test splits of the dataset.

**Classification Report for Train Split:**
```
              precision    recall  f1-score   support

     Action       1.00      1.00      1.00      1655
      Drama       1.00      1.00      1.00      4109
     Comedy       1.00      1.00      1.00      2094
  Animation       1.00      1.00      1.00       669
      Crime       1.00      1.00      1.00      1284

  micro avg       1.00      1.00      1.00      9811
  macro avg       1.00      1.00      1.00      9811
weighted avg      1.00      1.00      1.00      9811
samples avg       1.00      1.00      1.00      9811
```

**Classification Report for Val Split:**
```
              precision    recall  f1-score   support

     Action       0.70      0.73      0.71       220
      Drama       0.77      0.84      0.80       507
     Comedy       0.69      0.54      0.61       260
  Animation       0.59      0.44      0.50        80
      Crime       0.72      0.66      0.69       165

  micro avg       0.73      0.71      0.72      1232
  macro avg       0.70      0.64      0.66      1232
weighted avg      0.72      0.71      0.71      1232
samples avg       0.75      0.74      0.71      1232
```

**Classification Report for Test Split:**
```
              precision    recall  f1-score   support

     Action       0.62      0.66      0.64       191
      Drama       0.80      0.85      0.82       520
     Comedy       0.69      0.58      0.63       260
  Animation       0.60      0.49      0.54        78
      Crime       0.65      0.67      0.66       154

  micro avg       0.72      0.71      0.72      1203
  macro avg       0.67      0.65      0.66      1203
weighted avg      0.72      0.71      0.71      1203
samples avg       0.75      0.75      0.72      1203
```

The results indicate that the model performs well on the training data with high precision, recall, and F1-scores across all genres. However, there is a drop in performance on the validation and test splits, highlighting areas where the model could be further improved to generalize better to unseen data.

## Model Card Authors 

Sina Namazi

## Model Card Contact

- **Github:** github.com/Sinanmz
- **Hugging Face:** huggingface.co/Sinanmz