IberaSoft commited on
Commit
7e2f9ea
Β·
verified Β·
1 Parent(s): a65df16

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +395 -3
README.md CHANGED
@@ -1,3 +1,395 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - sentiment-analysis
6
+ - customer-reviews
7
+ - transformers
8
+ - distilbert
9
+ - text-classification
10
+ datasets:
11
+ - IberaSoft/ecommerce-reviews-sentiment
12
+ metrics:
13
+ - accuracy
14
+ - f1
15
+ model-index:
16
+ - name: customer-sentiment-analyzer
17
+ results:
18
+ - task:
19
+ type: text-classification
20
+ name: Sentiment Analysis
21
+ dataset:
22
+ name: E-commerce Reviews
23
+ type: IberaSoft/ecommerce-reviews-sentiment
24
+ metrics:
25
+ - type: accuracy
26
+ value: 90.2
27
+ name: Accuracy
28
+ - type: f1
29
+ value: 0.89
30
+ name: F1 Score
31
+ widget:
32
+ - text: "This product exceeded my expectations! Fast shipping and great quality."
33
+ example_title: "Positive Review"
34
+ - text: "Terrible experience. Product broke after one week and customer service was unhelpful."
35
+ example_title: "Negative Review"
36
+ - text: "It's okay, nothing special. Does what it's supposed to do."
37
+ example_title: "Neutral Review"
38
+ ---
39
+
40
+ # 🎯 Customer Sentiment Analyzer
41
+
42
+ > Fine-tuned DistilBERT model for analyzing customer review sentiment in e-commerce and SaaS domains.
43
+
44
+ [![Model on HF](https://huggingface.co/datasets/huggingface/badges/resolve/main/model-on-hf-sm.svg)](https://huggingface.co/IberaSoft/customer-sentiment-analyzer)
45
+ [![Dataset](https://img.shields.io/badge/Dataset-HuggingFace-yellow)](https://huggingface.co/datasets/IberaSoft/ecommerce-reviews-sentiment)
46
+ [![Demo](https://img.shields.io/badge/Demo-Spaces-orange)](https://huggingface.co/spaces/IberaSoft/sentiment-analyzer-demo)
47
+ [![License](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
48
+
49
+ ## 🌟 Model Description
50
+
51
+ This model is a fine-tuned version of [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased) on a custom dataset of 20,000 customer reviews from e-commerce and SaaS platforms. It classifies text into three sentiment categories: **positive**, **negative**, and **neutral**.
52
+
53
+ ### Key Features
54
+
55
+ - βœ… **Fast Inference**: ~35ms per prediction (CPU)
56
+ - βœ… **High Accuracy**: 90.2% on test set
57
+ - βœ… **Domain-Specific**: Trained on customer reviews
58
+ - βœ… **Production-Ready**: Optimized for real-world deployment
59
+ - βœ… **Multi-Class**: Handles positive, negative, and neutral sentiments
60
+
61
+ ## πŸš€ Quick Start
62
+
63
+ ### Using Transformers Pipeline
64
+ ```python
65
+ from transformers import pipeline
66
+
67
+ # Load the model
68
+ classifier = pipeline(
69
+ "sentiment-analysis",
70
+ model="IberaSoft/customer-sentiment-analyzer"
71
+ )
72
+
73
+ # Analyze sentiment
74
+ result = classifier("This product is amazing! Highly recommend.")
75
+ print(result)
76
+ # [{'label': 'positive', 'score': 0.9823}]
77
+ ```
78
+
79
+ ### Using AutoModel
80
+ ```python
81
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
82
+ import torch
83
+
84
+ # Load model and tokenizer
85
+ model_name = "IberaSoft/customer-sentiment-analyzer"
86
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
87
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
88
+
89
+ # Prepare text
90
+ text = "Great quality but shipping took forever"
91
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
92
+
93
+ # Get prediction
94
+ with torch.no_grad():
95
+ outputs = model(**inputs)
96
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
97
+
98
+ # Map to labels
99
+ labels = ['negative', 'neutral', 'positive']
100
+ predicted_class = predictions.argmax().item()
101
+ confidence = predictions[0][predicted_class].item()
102
+
103
+ print(f"Sentiment: {labels[predicted_class]}")
104
+ print(f"Confidence: {confidence:.2%}")
105
+ ```
106
+
107
+ ### Batch Processing
108
+ ```python
109
+ from transformers import pipeline
110
+
111
+ classifier = pipeline(
112
+ "sentiment-analysis",
113
+ model="IberaSoft/customer-sentiment-analyzer",
114
+ device=0 # Use GPU if available
115
+ )
116
+
117
+ reviews = [
118
+ "Excellent product, will buy again!",
119
+ "Disappointed with the quality.",
120
+ "It's okay, nothing special."
121
+ ]
122
+
123
+ results = classifier(reviews)
124
+ for review, result in zip(reviews, results):
125
+ print(f"{review[:30]}... β†’ {result['label']} ({result['score']:.2f})")
126
+ ```
127
+
128
+ ## πŸ“Š Model Performance
129
+
130
+ ### Evaluation Metrics
131
+
132
+ | Metric | Score |
133
+ |--------|-------|
134
+ | **Accuracy** | 90.2% |
135
+ | **F1 Score (Macro)** | 0.89 |
136
+ | **Precision** | 0.90 |
137
+ | **Recall** | 0.89 |
138
+
139
+ ### Per-Class Performance
140
+
141
+ | Class | Precision | Recall | F1-Score | Support |
142
+ |-------|-----------|--------|----------|---------|
143
+ | **Positive** | 0.92 | 0.91 | 0.91 | 800 |
144
+ | **Negative** | 0.89 | 0.90 | 0.89 | 700 |
145
+ | **Neutral** | 0.88 | 0.86 | 0.87 | 500 |
146
+
147
+ ### Confusion Matrix
148
+ ```
149
+ Predicted
150
+ Pos Neu Neg
151
+ Actual Pos [ 728 45 27 ]
152
+ Neu [ 38 430 32 ]
153
+ Neg [ 22 48 630 ]
154
+ ```
155
+
156
+ ### Inference Speed
157
+
158
+ | Batch Size | CPU (ms) | GPU (ms) |
159
+ |------------|----------|----------|
160
+ | 1 | 35 | 8 |
161
+ | 8 | 180 | 25 |
162
+ | 32 | 650 | 75 |
163
+
164
+ *Tested on Intel i7-11700K (CPU) and NVIDIA RTX 3080 (GPU)*
165
+
166
+ ## 🎯 Intended Use
167
+
168
+ ### Primary Use Cases
169
+
170
+ - **Customer Support**: Automatically triage support tickets by sentiment
171
+ - **Product Reviews**: Analyze product feedback at scale
172
+ - **Brand Monitoring**: Track customer sentiment over time
173
+ - **Market Research**: Understand customer opinions
174
+ - **Quality Assurance**: Flag negative feedback for review
175
+
176
+ ### Out-of-Scope Use
177
+
178
+ ❌ Medical or health-related sentiment analysis
179
+ ❌ Financial advice or stock sentiment (not trained on financial data)
180
+ ❌ Political sentiment analysis (potential bias)
181
+ ❌ Languages other than English
182
+ ❌ Detecting sarcasm or irony (limited capability)
183
+
184
+ ## πŸ“š Training Details
185
+
186
+ ### Training Data
187
+
188
+ The model was fine-tuned on **20,000 labeled customer reviews** consisting of:
189
+
190
+ - **Amazon Customer Reviews**: 8,000 reviews
191
+ - **Yelp Business Reviews**: 7,000 reviews
192
+ - **SaaS Product Reviews**: 5,000 reviews (G2, Capterra, TrustRadius)
193
+
194
+ **Dataset Distribution**:
195
+ - Training: 15,000 (75%)
196
+ - Validation: 3,000 (15%)
197
+ - Test: 2,000 (10%)
198
+
199
+ **Class Balance**:
200
+ - Positive: 40% (8,000 reviews)
201
+ - Negative: 35% (7,000 reviews)
202
+ - Neutral: 25% (5,000 reviews)
203
+
204
+ πŸ“¦ **[View Dataset on HuggingFace](https://huggingface.co/datasets/IberaSoft/ecommerce-reviews-sentiment)**
205
+
206
+ ### Training Procedure
207
+
208
+ **Base Model**: `distilbert-base-uncased` (66M parameters)
209
+
210
+ **Hyperparameters**:
211
+ ```yaml
212
+ learning_rate: 2e-5
213
+ batch_size: 16
214
+ epochs: 3
215
+ warmup_steps: 500
216
+ weight_decay: 0.01
217
+ max_length: 512
218
+ optimizer: AdamW
219
+ scheduler: linear with warmup
220
+ ```
221
+
222
+ **Training Environment**:
223
+ - **Hardware**: NVIDIA Tesla V100 (16GB)
224
+ - **Training Time**: ~2.5 hours
225
+ - **Framework**: PyTorch 2.1, Transformers 4.36
226
+ - **Mixed Precision**: FP16
227
+
228
+ **Training Code**: [GitHub Repository](https://github.com/IberaSoft/sentiment-analysis-api)
229
+
230
+ ### Preprocessing
231
+
232
+ Text preprocessing steps:
233
+ 1. Lowercase conversion
234
+ 2. URL removal
235
+ 3. Excessive whitespace normalization
236
+ 4. Emoji handling (converted to text)
237
+ 5. HTML tag removal
238
+ 6. Truncation to 512 tokens
239
+
240
+ ## ⚠️ Limitations and Bias
241
+
242
+ ### Known Limitations
243
+
244
+ 1. **English Only**: Trained exclusively on English text
245
+ 2. **Domain Specificity**: Best performance on e-commerce/SaaS reviews
246
+ 3. **Sarcasm**: May misclassify sarcastic reviews
247
+ 4. **Context Length**: Limited to 512 tokens (~350 words)
248
+ 5. **Informal Language**: May struggle with heavy slang or abbreviations
249
+
250
+ ### Potential Biases
251
+
252
+ - **Product Category Bias**: Training data skewed toward electronics and software
253
+ - **Platform Bias**: Amazon and Yelp reviews may have different characteristics
254
+ - **Temporal Bias**: Reviews collected 2020-2023
255
+ - **Rating Correlation**: 5-star reviews assumed positive (may not always be true)
256
+
257
+ ### Recommendations
258
+
259
+ - βœ… Test on your specific domain before production use
260
+ - βœ… Implement human review for edge cases
261
+ - βœ… Monitor performance on your data distribution
262
+ - βœ… Consider retraining for specialized domains
263
+ - βœ… Use confidence scores to flag uncertain predictions
264
+
265
+ ## πŸ”§ Optimization
266
+
267
+ ### Model Size Reduction
268
+
269
+ **Standard Model**: 268 MB
270
+ **Quantized (INT8)**: 67 MB (4x smaller, <2% accuracy drop)
271
+ ```python
272
+ from optimum.onnxruntime import ORTModelForSequenceClassification
273
+
274
+ # Convert to ONNX with quantization
275
+ model = ORTModelForSequenceClassification.from_pretrained(
276
+ "IberaSoft/customer-sentiment-analyzer",
277
+ export=True,
278
+ provider="CPUExecutionProvider"
279
+ )
280
+
281
+ # Save quantized model
282
+ model.save_pretrained("./optimized_model")
283
+ ```
284
+
285
+ ### Performance Tips
286
+ ```python
287
+ import torch
288
+
289
+ # Use GPU if available
290
+ device = "cuda" if torch.cuda.is_available() else "cpu"
291
+ model = model.to(device)
292
+
293
+ # Enable inference mode
294
+ model.eval()
295
+ torch.set_grad_enabled(False)
296
+
297
+ # Batch processing for better throughput
298
+ classifier = pipeline(
299
+ "sentiment-analysis",
300
+ model=model,
301
+ tokenizer=tokenizer,
302
+ batch_size=32,
303
+ device=0 if device == "cuda" else -1
304
+ )
305
+ ```
306
+
307
+ ## 🌐 Production Deployment
308
+
309
+ ### FastAPI Example
310
+ ```python
311
+ from fastapi import FastAPI
312
+ from transformers import pipeline
313
+ from pydantic import BaseModel
314
+
315
+ app = FastAPI()
316
+
317
+ # Load model once at startup
318
+ classifier = pipeline(
319
+ "sentiment-analysis",
320
+ model="IberaSoft/customer-sentiment-analyzer"
321
+ )
322
+
323
+ class ReviewRequest(BaseModel):
324
+ text: str
325
+
326
+ @app.post("/predict")
327
+ def predict_sentiment(request: ReviewRequest):
328
+ result = classifier(request.text)[0]
329
+ return {
330
+ "sentiment": result["label"],
331
+ "confidence": round(result["score"], 4)
332
+ }
333
+ ```
334
+
335
+ ### Docker Deployment
336
+ ```dockerfile
337
+ FROM python:3.11-slim
338
+
339
+ RUN pip install transformers torch fastapi uvicorn
340
+
341
+ # Download model during build
342
+ RUN python -c "from transformers import pipeline; \
343
+ pipeline('sentiment-analysis', \
344
+ model='IberaSoft/customer-sentiment-analyzer')"
345
+
346
+ COPY app.py .
347
+
348
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
349
+ ```
350
+
351
+ **Full API**: [GitHub Repository](https://github.com/IberaSoft/sentiment-analysis-api)
352
+
353
+ ## πŸ“– Citation
354
+
355
+ If you use this model in your research or application, please cite:
356
+ ```bibtex
357
+ @misc{customer-sentiment-analyzer,
358
+ author = {Your Name},
359
+ title = {Customer Sentiment Analyzer: Fine-tuned DistilBERT for E-commerce Reviews},
360
+ year = {2026},
361
+ publisher = {HuggingFace},
362
+ howpublished = {\url{https://huggingface.co/IberaSoft/customer-sentiment-analyzer}},
363
+ }
364
+ ```
365
+
366
+ ## πŸ“ License
367
+
368
+ This model is licensed under the **MIT License**. See [LICENSE](LICENSE) for details.
369
+
370
+ The base model `distilbert-base-uncased` is licensed under Apache 2.0.
371
+
372
+ ## 🀝 Contributing
373
+
374
+ Found an issue or want to improve the model?
375
+
376
+ - πŸ› [Report bugs](https://github.com/IberaSoft/sentiment-analysis-api/issues)
377
+ - πŸ’‘ [Suggest features](https://github.com/IberaSoft/sentiment-analysis-api/issues)
378
+ - πŸ”§ [Submit pull requests](https://github.com/IberaSoft/sentiment-analysis-api/pulls)
379
+
380
+ ## πŸ™ Acknowledgments
381
+
382
+ - **HuggingFace** for the Transformers library and model hub
383
+ - **DistilBERT Authors** for the efficient base model
384
+ - **Dataset Contributors** for publicly available reviews
385
+ - **Community** for feedback and testing
386
+
387
+ ---
388
+
389
+ <div align="center">
390
+
391
+ ### ⭐ Star this model if you find it useful!
392
+
393
+ **Try the live demo**: [HuggingFace Spaces](https://huggingface.co/spaces/IberaSoft/sentiment-analyzer-demo)
394
+
395
+ </div>