heican
/

sentiment-bert-base

Text Classification

Model card Files Files and versions

heican commited on 25 days ago

Commit

9c81d7d

·

verified ·

1 Parent(s): cc42736

Create README.md

Files changed (1) hide show

README.md +40 -0

README.md ADDED Viewed

	@@ -0,0 +1,40 @@

+---
+base_model: google-bert/bert-base-uncased
+datasets:
+- stanfordnlp/sentiment140
+---
+# sentiment-bert-base
+Fine-tuned BERT-base for binary sentiment classification on the Sentiment140 dataset (1.6M tweets).
+## Base model
+[google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) — the original BERT-base-uncased from Devlin et al. (2019), 110M parameters.
+## Training
+- Dataset: Sentiment140 (1.6M tweets, 80/20 split, seed 42)
+- Hyperparameters: learning rate 2e-5, batch size 16, 3 epochs
+- Hardware: NVIDIA A10G, AWS SageMaker (g5.2xlarge)
+- Training time: 7.3 hours
+- Trainer: Hugging Face Transformers + Trainer API; load_best_model_at_end=True
+## Test set performance
+| Metric | Value |
+|---|---|
+| Accuracy | 87.46% |
+| Precision | 0.880 |
+| Recall | 0.869 |
+| F1 | 0.874 |
+## Intended use
+Demonstration model for an academic purposes
+## Limitations
+- English only, binary sentiment, 2009-era Twitter language.
+- Sentiment140 labels generated automatically using emoticons (distant supervision), introducing systematic noise.
+- Does not handle sarcasm reliably (the dataset does not separate it as a phenomenon).