ashaduzzaman commited on
Commit
d16e73b
·
verified ·
1 Parent(s): 24b57f3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -23
README.md CHANGED
@@ -8,42 +8,75 @@ metrics:
8
  model-index:
9
  - name: imdb-distilbert-funetuned
10
  results: []
 
 
 
 
 
 
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
- # imdb-distilbert-funetuned
17
 
18
- This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on an unknown dataset.
19
- It achieves the following results on the evaluation set:
20
- - Loss: 0.2319
21
- - Accuracy: 0.9320
 
 
 
 
22
 
23
- ## Model description
 
24
 
25
- More information needed
 
 
 
 
26
 
27
- ## Intended uses & limitations
28
 
29
- More information needed
30
 
31
- ## Training and evaluation data
 
 
32
 
33
- More information needed
 
 
 
34
 
35
- ## Training procedure
 
36
 
37
- ### Training hyperparameters
 
 
 
 
38
 
39
- The following hyperparameters were used during training:
40
- - learning_rate: 2e-05
41
- - train_batch_size: 16
42
- - eval_batch_size: 16
43
- - seed: 42
44
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
- - lr_scheduler_type: linear
46
- - num_epochs: 2
 
 
 
 
 
 
 
 
47
 
48
  ### Training results
49
 
@@ -52,10 +85,25 @@ The following hyperparameters were used during training:
52
  | 0.2239 | 1.0 | 1563 | 0.2026 | 0.9227 |
53
  | 0.1468 | 2.0 | 3126 | 0.2319 | 0.9320 |
54
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
56
- ### Framework versions
57
 
58
  - Transformers 4.42.4
59
  - Pytorch 2.3.1+cu121
60
  - Datasets 2.21.0
61
- - Tokenizers 0.19.1
 
8
  model-index:
9
  - name: imdb-distilbert-funetuned
10
  results: []
11
+ datasets:
12
+ - ajaykarthick/imdb-movie-reviews
13
+ language:
14
+ - en
15
+ library_name: transformers
16
+ pipeline_tag: text-classification
17
  ---
18
 
19
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
20
  should probably proofread and complete it, then remove this comment. -->
21
 
22
+ To create a model card for the Hugging Face Hub based on your fine-tuned DistilBERT model for text classification on the IMDb dataset, here's a template you can use:
23
 
24
+ ---
25
+
26
+ # DistilBERT IMDb Sentiment Classifier
27
+
28
+ ## Model Description
29
+ This is a fine-tuned version of [DistilBERT](https://huggingface.co/distilbert-base-uncased) for sentiment analysis on the IMDb movie review dataset. DistilBERT is a smaller, faster, and lighter variant of BERT, designed to perform efficiently while retaining the core strengths of BERT in natural language understanding.
30
+
31
+ The model is trained to classify movie reviews as either **positive** or **negative** sentiments, making it ideal for applications where sentiment analysis is needed, such as analyzing customer feedback, social media posts, or reviews.
32
 
33
+ ## Intended Use
34
+ This model is intended for text classification tasks, specifically sentiment analysis. It can be used to automatically label a piece of text as either having a positive or negative sentiment.
35
 
36
+ ### Use Cases
37
+ - **Movie review sentiment analysis**
38
+ - **Customer feedback analysis**
39
+ - **Social media sentiment monitoring**
40
+ - **Product review classification**
41
 
42
+ ## How to Use
43
 
44
+ Here is how you can use this model with the Hugging Face `transformers` library:
45
 
46
+ ```python
47
+ from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
48
+ import torch
49
 
50
+ # Load the model and tokenizer
51
+ model_name = "Ashaduzzaman/imdb-distilbert-funetuned",
52
+ tokenizer = DistilBertTokenizer.from_pretrained(model_name)
53
+ model = DistilBertForSequenceClassification.from_pretrained(model_name)
54
 
55
+ # Example text
56
+ text = "The movie was absolutely fantastic! The acting was superb and the story was gripping."
57
 
58
+ # Tokenize and predict
59
+ inputs = tokenizer(text, return_tensors="pt")
60
+ outputs = model(**inputs)
61
+ logits = outputs.logits
62
+ predictions = torch.softmax(logits, dim=1)
63
 
64
+ # Get the predicted label
65
+ predicted_label = torch.argmax(predictions).item()
66
+ labels = ["Negative", "Positive"]
67
+ print(f"Predicted sentiment: {labels[predicted_label]}")
68
+ ```
69
+
70
+ ## Training Data
71
+ This model was trained on the IMDb movie review dataset, a large dataset for binary sentiment classification. The dataset contains 50,000 highly polarized movie reviews. This dataset is balanced, with 25,000 positive and 25,000 negative reviews.
72
+
73
+ ## Training Procedure
74
+ The model was fine-tuned using the IMDb dataset with the following configuration:
75
+ - **Optimizer**: AdamW (Adam with betas=(0.9,0.999) and epsilon=1e-08)
76
+ - **Learning Rate**: 2e-5
77
+ - **Batch Size**: 16
78
+ - **Epochs**: 2
79
+ - **Max Sequence Length**: 512 tokens
80
 
81
  ### Training results
82
 
 
85
  | 0.2239 | 1.0 | 1563 | 0.2026 | 0.9227 |
86
  | 0.1468 | 2.0 | 3126 | 0.2319 | 0.9320 |
87
 
88
+ - **Loss:** 0.2319
89
+ - **Accuracy:** 0.9320
90
+
91
+ ## Limitations
92
+ - The model is specifically trained on the IMDb dataset, so its effectiveness may be reduced when applied to other domains or types of text.
93
+ - Sentiment detection is binary (positive or negative). Neutral sentiments or more nuanced emotions are not captured.
94
+ - The model may not perform well on text that is highly sarcastic, contains slang, or is very short (e.g., one-word reviews).
95
+
96
+ ## Ethical Considerations
97
+ - **Bias**: The model may reflect biases present in the IMDb dataset. Users should be cautious about applying this model to sensitive applications.
98
+ - **Content**: Since the IMDb dataset includes movie reviews, the model might not generalize well to text outside of this context.
99
+
100
+ ## Acknowledgments
101
+ - The original [DistilBERT](https://huggingface.co/distilbert-base-uncased) model was developed by Hugging Face.
102
+ - The IMDb dataset is provided by Stanford and can be found [here](https://ai.stanford.edu/~amaas/data/sentiment/).
103
 
104
+ ## Framework versions
105
 
106
  - Transformers 4.42.4
107
  - Pytorch 2.3.1+cu121
108
  - Datasets 2.21.0
109
+ - Tokenizers 0.19.1