File size: 3,035 Bytes
5e5973e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109

# DistilBERT Quantized Model for IMDB Sentiment Analysis

This repository contains a quantized DistilBERT model fine-tuned for binary sentiment classification on IMDB movie reviews. Optimized for production deployment, the model achieves high accuracy while maintaining efficiency.

## Model Details

- **Model Architecture:** DistilBERT Base Uncased  
- **Task:** Binary Sentiment Analysis (Positive/Negative)  
- **Dataset:** IMDB Movie Reviews (50K samples)  
- **Quantization:** Dynamic Quantization (INT8)  
- **Framework:** Hugging Face Transformers + PyTorch  

## Usage

### Installation

```sh
pip install transformers torch scikit-learn pandas
```

### Loading the Model

```python
from transformers import DistilBertForSequenceClassification, DistilBertTokenizer
import torch

# Load quantized model
model_path = "./quantized_sentiment_model.pth"
model = DistilBertForSequenceClassification.from_pretrained("./sentiment_model")
model.load_state_dict(torch.load(model_path))
model.eval()

# Load tokenizer
tokenizer = DistilBertTokenizer.from_pretrained("./sentiment_model")

def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", 
                      padding=True, truncation=True, 
                      max_length=128)
    
    with torch.no_grad():
        outputs = model(**inputs)
    
    prediction = torch.argmax(outputs.logits).item()
    return "Positive" if prediction == 1 else "Negative"

# Example usage
review = "This movie blew me away with its stunning visuals and gripping storyline."
print(predict_sentiment(review))  # Output: Positive
```

## πŸ“Š Performance Metrics

| Metric                   | Value   |
|--------------------------|---------|
| Accuracy                 | 89.1%   |
| F1 Score                 | 89.0%   |
| Inference Latency (CPU) | 12ms    |
| Model Size               | 67MB    |

## πŸ‹οΈ Training Details

### Dataset

- 50,000 IMDB movie reviews  
- Balanced binary classes (50% positive, 50% negative)  

### Hyperparameters

- Epochs: 5  
- Batch Size: 24 (Effective 48 with accumulation)  
- Learning Rate: 8e-6  
- Warmup Ratio: 10%  
- Weight Decay: 0.005  
- Optimizer: AdamW with Cosine LR Schedule  

### Quantization

Applied dynamic post-training quantization:

```python
quantized_model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)
```

## πŸ“ Repository Structure

```
.
β”œβ”€β”€ sentiment_model/                # Full-precision model files
β”‚   β”œβ”€β”€ config.json
β”‚   β”œβ”€β”€ pytorch_model.bin
β”‚   └── tokenizer files...
β”œβ”€β”€ quantized_sentiment_model.pth  # Quantized weights
β”œβ”€β”€ imdb_train.csv                 # Sample training data
β”œβ”€β”€ train.py                       # Training script
└── inference.py                   # Usage examples
```

## ⚠️ Limitations

- Accuracy may drop on reviews with:
  - Sarcasm or nuanced language
  - Domain-specific terminology (non-movie content)
- Maximum sequence length: 128 tokens
- English language only