Floressek commited on
Commit
3fbb5e3
·
verified ·
1 Parent(s): 4c819e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -3
README.md CHANGED
@@ -1,12 +1,75 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - en
 
 
 
 
 
 
 
 
 
 
 
5
  metrics:
6
  - accuracy
7
  - f1
8
  - recall
9
  - precision
10
- base_model:
11
- - distilbert/distilbert-base-uncased
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ---
 
1
  ---
 
2
  language:
3
  - en
4
+ license: apache-2.0
5
+ tags:
6
+ - text-classification
7
+ - sentiment-analysis
8
+ - distilbert
9
+ - transformers
10
+ pipeline_tag: text-classification
11
+ library_name: transformers
12
+ datasets:
13
+ - Amazon_Unlocked_Mobile
14
+ base_model: distilbert-base-uncased
15
  metrics:
16
  - accuracy
17
  - f1
18
  - recall
19
  - precision
20
+ widget:
21
+ - text: "Great handset! Works flawlessly."
22
+ - text: "Terrible product, waste of money."
23
+ ---
24
+
25
+ # DistilBERT for Binary Sentiment Classification
26
+
27
+ Lightweight sentiment classifier fine-tuned from `distilbert-base-uncased` to predict sentiment (negative vs. positive) for short English product reviews. Trained on a filtered subset of the Amazon Unlocked Mobile dataset.
28
+
29
+ ## Model Details
30
+ - Base model: `distilbert-base-uncased`
31
+ - Task: binary sentiment classification
32
+ - Labels: `0 -> negative` (rating 1), `1 -> positive` (rating 5)
33
+ - Max input length: 128 tokens
34
+ - Tokenizer: `AutoTokenizer` for the same checkpoint
35
+ - Mixed precision: fp16 (when CUDA available)
36
+
37
+ ## Intended Use and Limitations
38
+ - Use for short English product-review style texts.
39
+ - Binary only (negative/positive). Not suited for nuanced or multi-class sentiment.
40
+ - Not for safety-critical decisions or content moderation on its own.
41
+
42
+ ## Dataset and Preprocessing
43
+ - Source: Amazon Unlocked Mobile (`Amazon_Unlocked_Mobile.csv`)
44
+ - Filtering: keep rows where `Rating ∈ {1, 5}`; drop unrelated columns
45
+ - Tokenization: padding to max length, truncation at 128
46
+ - Split: train/test with `test_size = 0.3`, `seed = 100`
47
+
48
+ ## Training Configuration
49
+ - Optimizer and schedule: handled by `transformers.Trainer`
50
+ - Learning rate: `2e-5`
51
+ - Batch size: `48` (train/eval per device)
52
+ - Epochs: `2`
53
+ - Weight decay: `0.01`
54
+ - Save/eval strategy: `epoch`
55
+ - Push to Hub: enabled
56
+
57
+ ## Evaluation
58
+ Computed with `accuracy` and `f1` on the held-out test split. See the repository "Files and versions" / "Training metrics" tabs for run artifacts and exact scores.
59
+
60
+ ## How to Use
61
+
62
+ Python (Transformers pipeline):
63
+ ```python
64
+ from transformers import pipeline
65
+
66
+ clf = pipeline(
67
+ "text-classification",
68
+ model="Floressek/sentiment_classification_from_distillbert",
69
+ top_k=None # returns single label with score
70
+ )
71
+
72
+ print(clf("Great handset!"))
73
+ print(clf("Shame. I wish I hadn't bought it."))
74
+
75
  ---