--- license: apache-2.0 datasets: - nyu-mll/glue - stanfordnlp/sst2 base_model: - google-bert/bert-base-uncased tags: - sentiment-analysis - text-classification - transformers - pytorch - bert - sst2 - glue pipeline_tag: text-classification --- # BERT-base-uncased fine-tuned on SST-2 (GLUE) This repository contains a `bert-base-uncased` model fine-tuned for **binary sentiment classification** on the [GLUE/SST-2](https://huggingface.co/datasets/glue/viewer/sst2) dataset. ## Model summary - **Task**: sentiment analysis (binary classification) - **Labels**: negative (`0`), positive (`1`) - **Base model**: `bert-base-uncased` - **Library**: Transformers (`Trainer` API) - **Note**: In the training notebook, the model was fine-tuned on a small subset (640 train / 640 validation) for demonstration purposes. For production use, fine-tune on the full dataset and validate thoroughly. ## Intended uses ### ✅ Supported - Quick demos of sentiment classification on English sentences - Educational examples of fine-tuning with `Trainer` - Baseline experiments on SST-2-like sentiment data ### ⚠️ Not recommended - High-stakes or safety-critical decisions (medical, legal, hiring, etc.) - Domains significantly different from SST-2 (e.g., clinical notes, finance news) without further fine-tuning - Non-English text (model and data are English-focused) ## Limitations and biases - **Dataset bias**: SST-2 reflects movie review sentiment distribution and language patterns; performance may degrade on other domains. - **Small fine-tuning subset**: if you trained on 640 samples, results are not representative of the full SST-2 benchmark. - **Short-text behavior**: very short/ambiguous or sarcastic statements can be misclassified. - **Offensive/toxic content**: the model may output confident predictions on harmful text; it does not provide safety filtering. ## Training data Fine-tuning used the GLUE benchmark dataset configuration **SST-2** (Stanford Sentiment Treebank v2 as used in GLUE). - **Dataset**: `glue`, config `sst2` - **Text field**: `sentence` - **Label field**: `label` (`0`/`1`) In the provided Colab: - `train`: selected `range(640)` - `validation`: selected `range(640)` - `test`: predictions generated without labels (GLUE test split) ## Training procedure ### Preprocessing - Tokenizer: `AutoTokenizer.from_pretrained("bert-base-uncased")` - Truncation enabled (`truncation=True`) - Dynamic padding via `DataCollatorWithPadding` ### Hyperparameters (from Colab) - `epochs`: 3 - `learning_rate`: 2e-5 - `batch_size`: 16 (per device) - `weight_decay`: 0.01 - `evaluation`: each epoch - `checkpointing`: each epoch - `best model selection`: accuracy on validation - `logging`: disabled (`report_to="none"`) ## Results (validation) - **Accuracy**: 0.8625 - **Loss**: 0.33919745683670044 > *(Optional: add confusion matrix, F1, etc. if available)*