Floressek
/

sentiment_classification_from_distillbert

Text Classification

sentiment-analysis

text-embeddings-inference

Model card Files Files and versions

sentiment_classification_from_distillbert / README.md

Floressek's picture

Update README.md

3fbb5e3 verified 4 months ago

|

history blame contribute delete

2.24 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- text-classification
	- sentiment-analysis
	- distilbert
	- transformers
	pipeline_tag: text-classification
	library_name: transformers
	datasets:
	- Amazon_Unlocked_Mobile
	base_model: distilbert-base-uncased
	metrics:
	- accuracy
	- f1
	- recall
	- precision
	widget:
	- text: "Great handset! Works flawlessly."
	- text: "Terrible product, waste of money."
	---

	# DistilBERT for Binary Sentiment Classification

	Lightweight sentiment classifier fine-tuned from `distilbert-base-uncased` to predict sentiment (negative vs. positive) for short English product reviews. Trained on a filtered subset of the Amazon Unlocked Mobile dataset.

	## Model Details
	- Base model: `distilbert-base-uncased`
	- Task: binary sentiment classification
	- Labels: `0 -> negative` (rating 1), `1 -> positive` (rating 5)
	- Max input length: 128 tokens
	- Tokenizer: `AutoTokenizer` for the same checkpoint
	- Mixed precision: fp16 (when CUDA available)

	## Intended Use and Limitations
	- Use for short English product-review style texts.
	- Binary only (negative/positive). Not suited for nuanced or multi-class sentiment.
	- Not for safety-critical decisions or content moderation on its own.

	## Dataset and Preprocessing
	- Source: Amazon Unlocked Mobile (`Amazon_Unlocked_Mobile.csv`)
	- Filtering: keep rows where `Rating ∈ {1, 5}`; drop unrelated columns
	- Tokenization: padding to max length, truncation at 128
	- Split: train/test with `test_size = 0.3`, `seed = 100`

	## Training Configuration
	- Optimizer and schedule: handled by `transformers.Trainer`
	- Learning rate: `2e-5`
	- Batch size: `48` (train/eval per device)
	- Epochs: `2`
	- Weight decay: `0.01`
	- Save/eval strategy: `epoch`
	- Push to Hub: enabled

	## Evaluation
	Computed with `accuracy` and `f1` on the held-out test split. See the repository "Files and versions" / "Training metrics" tabs for run artifacts and exact scores.

	## How to Use

	Python (Transformers pipeline):
	```python
	from transformers import pipeline

	clf = pipeline(
	"text-classification",
	model="Floressek/sentiment_classification_from_distillbert",
	top_k=None # returns single label with score
	)

	print(clf("Great handset!"))
	print(clf("Shame. I wish I hadn't bought it."))

	---