File size: 2,941 Bytes
de84762
 
 
 
 
 
 
bd29416
 
 
 
 
 
 
 
 
5f1f6fb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
---
license: apache-2.0
datasets:
- nyu-mll/glue
- stanfordnlp/sst2
base_model:
- google-bert/bert-base-uncased
tags:
- sentiment-analysis
- text-classification
- transformers
- pytorch
- bert
- sst2
- glue
pipeline_tag: text-classification
---
# BERT-base-uncased fine-tuned on SST-2 (GLUE)

This repository contains a `bert-base-uncased` model fine-tuned for **binary sentiment classification** on the [GLUE/SST-2](https://huggingface.co/datasets/glue/viewer/sst2) dataset.

## Model summary

- **Task**: sentiment analysis (binary classification)  
- **Labels**: negative (`0`), positive (`1`)  
- **Base model**: `bert-base-uncased`  
- **Library**: Transformers (`Trainer` API)  
- **Note**: In the training notebook, the model was fine-tuned on a small subset (640 train / 640 validation) for demonstration purposes. For production use, fine-tune on the full dataset and validate thoroughly.

## Intended uses

### ✅ Supported
- Quick demos of sentiment classification on English sentences  
- Educational examples of fine-tuning with `Trainer`  
- Baseline experiments on SST-2-like sentiment data  

### ⚠️ Not recommended
- High-stakes or safety-critical decisions (medical, legal, hiring, etc.)  
- Domains significantly different from SST-2 (e.g., clinical notes, finance news) without further fine-tuning  
- Non-English text (model and data are English-focused)  

## Limitations and biases

- **Dataset bias**: SST-2 reflects movie review sentiment distribution and language patterns; performance may degrade on other domains.  
- **Small fine-tuning subset**: if you trained on 640 samples, results are not representative of the full SST-2 benchmark.  
- **Short-text behavior**: very short/ambiguous or sarcastic statements can be misclassified.  
- **Offensive/toxic content**: the model may output confident predictions on harmful text; it does not provide safety filtering.  

## Training data

Fine-tuning used the GLUE benchmark dataset configuration **SST-2** (Stanford Sentiment Treebank v2 as used in GLUE).

- **Dataset**: `glue`, config `sst2`  
- **Text field**: `sentence`  
- **Label field**: `label` (`0`/`1`)  

In the provided Colab:
- `train`: selected `range(640)`
- `validation`: selected `range(640)`
- `test`: predictions generated without labels (GLUE test split)

## Training procedure

### Preprocessing
- Tokenizer: `AutoTokenizer.from_pretrained("bert-base-uncased")`
- Truncation enabled (`truncation=True`)
- Dynamic padding via `DataCollatorWithPadding`

### Hyperparameters (from Colab)
- `epochs`: 3  
- `learning_rate`: 2e-5  
- `batch_size`: 16 (per device)  
- `weight_decay`: 0.01  
- `evaluation`: each epoch  
- `checkpointing`: each epoch  
- `best model selection`: accuracy on validation  
- `logging`: disabled (`report_to="none"`)

## Results (validation)

- **Accuracy**: 0.8625  
- **Loss**: 0.33919745683670044  

> *(Optional: add confusion matrix, F1, etc. if available)*