Martin Rodrigo Morales commited on
Commit
b3fbeb1
Β·
1 Parent(s): 6609649

Update: Simplified Gradio app with fine-tuned model

Browse files
Files changed (3) hide show
  1. README.md +29 -38
  2. app.py +285 -0
  3. requirements.txt +4 -12
README.md CHANGED
@@ -4,61 +4,52 @@ emoji: πŸ€–
4
  colorFrom: blue
5
  colorTo: purple
6
  sdk: gradio
7
- sdk_version: "4.0"
8
- app_file: gradio_app.py
9
  pinned: false
10
- license: mit
11
  tags:
12
  - sentiment-analysis
13
  - transformers
14
- - pytorch
15
- - nlp
16
  - distilbert
17
- - machine-learning
 
18
  models:
19
- - distilbert-base-uncased-finetuned-sst-2-english
20
- datasets:
21
- - imdb
22
- - sst2
23
  ---
24
 
25
  # πŸ€– Transformer Sentiment Analysis
26
 
27
- Advanced AI-powered sentiment analysis using state-of-the-art transformer models.
28
-
29
- ## ✨ Features
30
 
31
- - **Real-time Analysis**: Instant sentiment classification with confidence scores
32
- - **Batch Processing**: Analyze multiple texts simultaneously
33
- - **Interactive Visualizations**: Probability distributions and analytics
34
- - **Professional Interface**: Modern, responsive UI design
35
- - **Production-Ready**: Optimized for performance and scalability
36
 
37
- ## 🧠 Model Details
 
 
 
38
 
39
- - **Architecture**: DistilBERT (66M parameters)
40
- - **Performance**: 74% accuracy on IMDB dataset
41
- - **Speed**: ~100ms inference time
42
- - **Training**: Fine-tuned on Stanford Sentiment Treebank
43
 
44
- ## πŸš€ Tech Stack
 
 
 
45
 
46
- - **Framework**: PyTorch + Hugging Face Transformers
47
- - **Interface**: Gradio with custom CSS
48
- - **Backend**: FastAPI with async support
49
- - **Deployment**: Docker + Cloud platforms
50
 
51
- ## 🎯 Use Cases
 
52
 
53
- - Social media monitoring
54
- - Customer feedback analysis
55
- - Market research insights
56
- - Product review classification
57
 
58
- ## πŸ”— Links
 
59
 
60
- - **GitHub Repository**: [Complete source code and documentation](https://github.com/mrdesautu/ransformer-sentiment-analysis)
61
- - **Live Demo**: Try the interactive demo above
62
- - **Documentation**: Comprehensive guides and API docs
 
 
63
 
64
- Built with modern ML engineering practices including comprehensive testing, CI/CD, and scalable deployment configurations.
 
4
  colorFrom: blue
5
  colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
  pinned: false
10
+ license: apache-2.0
11
  tags:
12
  - sentiment-analysis
13
  - transformers
 
 
14
  - distilbert
15
+ - mlflow
16
+ - pytorch
17
  models:
18
+ - MartinRodrigo/distilbert-sentiment-imdb
 
 
 
19
  ---
20
 
21
  # πŸ€– Transformer Sentiment Analysis
22
 
23
+ Advanced sentiment analysis using DistilBERT fine-tuned on IMDB dataset with MLflow experiment tracking.
 
 
24
 
25
+ ## 🎯 Model Performance
 
 
 
 
26
 
27
+ - **Accuracy:** 80% on IMDB test set
28
+ - **F1 Score:** 0.7981
29
+ - **Model:** DistilBERT (66M parameters)
30
+ - **Speed:** ~100ms per prediction
31
 
32
+ ## πŸš€ Features
 
 
 
33
 
34
+ - Real-time sentiment analysis
35
+ - Batch text processing
36
+ - Confidence scores and probabilities
37
+ - Interactive visualizations
38
 
39
+ ## πŸ”— Links
 
 
 
40
 
41
+ - **Model Repository:** [MartinRodrigo/distilbert-sentiment-imdb](https://huggingface.co/MartinRodrigo/distilbert-sentiment-imdb)
42
+ - **GitHub:** [transformer-sentiment-analysis](https://github.com/mrdesautu/ransformer-sentiment-analysis)
43
 
44
+ ## πŸ’‘ Usage
 
 
 
45
 
46
+ ```python
47
+ from transformers import pipeline
48
 
49
+ classifier = pipeline("sentiment-analysis",
50
+ model="MartinRodrigo/distilbert-sentiment-imdb")
51
+ result = classifier("I love this movie!")
52
+ print(result)
53
+ ```
54
 
55
+ Built with Transformers, MLflow, and Gradio πŸš€
app.py ADDED
@@ -0,0 +1,285 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Gradio app for HuggingFace Spaces
4
+ Sentiment analysis with MLflow metrics visualization
5
+ """
6
+
7
+ import gradio as gr
8
+ import torch
9
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
10
+ import numpy as np
11
+ import plotly.express as px
12
+ import plotly.graph_objects as go
13
+ import pandas as pd
14
+ from typing import Dict, List, Tuple
15
+ import logging
16
+
17
+ # Configure logging
18
+ logging.basicConfig(level=logging.INFO)
19
+ logger = logging.getLogger(__name__)
20
+
21
+ class SentimentAnalyzer:
22
+ """Sentiment analyzer for production"""
23
+
24
+ def __init__(self):
25
+ # Use the deployed model from HuggingFace Hub
26
+ self.model_name = "MartinRodrigo/distilbert-sentiment-imdb"
27
+ self.tokenizer = None
28
+ self.model = None
29
+ self.load_model()
30
+
31
+ def load_model(self):
32
+ """Load the fine-tuned model"""
33
+ try:
34
+ logger.info(f"Loading model: {self.model_name}")
35
+ self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
36
+ self.model = AutoModelForSequenceClassification.from_pretrained(self.model_name)
37
+ logger.info("Model loaded successfully!")
38
+ except Exception as e:
39
+ logger.error(f"Error loading model: {e}")
40
+ # Fallback to base model
41
+ logger.info("Falling back to base model...")
42
+ self.model_name = "distilbert-base-uncased-finetuned-sst-2-english"
43
+ self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
44
+ self.model = AutoModelForSequenceClassification.from_pretrained(self.model_name)
45
+
46
+ def analyze_single(self, text: str) -> Dict:
47
+ """Analyze sentiment of a single text"""
48
+ if not text.strip():
49
+ return {
50
+ "sentiment": "Please enter some text",
51
+ "confidence": 0.0,
52
+ "probabilities": None
53
+ }
54
+
55
+ try:
56
+ inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
57
+
58
+ with torch.no_grad():
59
+ outputs = self.model(**inputs)
60
+ predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
61
+
62
+ probs = predictions[0].numpy()
63
+ predicted_class = np.argmax(probs)
64
+ confidence = float(probs[predicted_class])
65
+
66
+ sentiment = "POSITIVE" if predicted_class == 1 else "NEGATIVE"
67
+
68
+ return {
69
+ "sentiment": sentiment,
70
+ "confidence": confidence,
71
+ "probabilities": {
72
+ "Negative": float(probs[0]),
73
+ "Positive": float(probs[1])
74
+ }
75
+ }
76
+
77
+ except Exception as e:
78
+ logger.error(f"Error in analysis: {e}")
79
+ return {
80
+ "sentiment": f"Error: {str(e)}",
81
+ "confidence": 0.0,
82
+ "probabilities": None
83
+ }
84
+
85
+ def analyze_batch(self, texts: List[str]) -> List[Dict]:
86
+ """Analyze multiple texts"""
87
+ results = []
88
+ for text in texts:
89
+ if text.strip():
90
+ results.append(self.analyze_single(text))
91
+ return results
92
+
93
+ # Initialize analyzer
94
+ analyzer = SentimentAnalyzer()
95
+
96
+ def analyze_sentiment(text: str) -> Tuple[str, float, dict]:
97
+ """Main analysis function for Gradio"""
98
+ result = analyzer.analyze_single(text)
99
+
100
+ if result["probabilities"]:
101
+ df = pd.DataFrame([
102
+ {"Sentiment": "Negative", "Probability": result["probabilities"]["Negative"]},
103
+ {"Sentiment": "Positive", "Probability": result["probabilities"]["Positive"]}
104
+ ])
105
+
106
+ fig = px.bar(
107
+ df,
108
+ x="Sentiment",
109
+ y="Probability",
110
+ color="Sentiment",
111
+ color_discrete_map={"Negative": "#ff4444", "Positive": "#44ff44"},
112
+ title="Sentiment Probability Distribution"
113
+ )
114
+ fig.update_layout(showlegend=False, height=300)
115
+
116
+ return (
117
+ f"**{result['sentiment']}** (Confidence: {result['confidence']:.1%})",
118
+ result['confidence'],
119
+ fig
120
+ )
121
+
122
+ return result['sentiment'], result['confidence'], None
123
+
124
+ def analyze_batch_texts(text_input: str) -> Tuple[str, dict]:
125
+ """Analyze multiple texts separated by newlines"""
126
+ if not text_input.strip():
127
+ return "Please enter some texts (one per line)", None
128
+
129
+ texts = [line.strip() for line in text_input.split('\n') if line.strip()]
130
+
131
+ if not texts:
132
+ return "No valid texts found", None
133
+
134
+ results = analyzer.analyze_batch(texts)
135
+
136
+ summary_lines = []
137
+ plot_data = []
138
+
139
+ for i, (text, result) in enumerate(zip(texts, results)):
140
+ sentiment = result['sentiment']
141
+ confidence = result['confidence']
142
+ summary_lines.append(f"{i+1}. **{sentiment}** ({confidence:.1%}) - {text[:50]}{'...' if len(text) > 50 else ''}")
143
+
144
+ plot_data.append({
145
+ "Text": f"Text {i+1}",
146
+ "Sentiment": sentiment,
147
+ "Confidence": confidence
148
+ })
149
+
150
+ summary = "\n".join(summary_lines)
151
+
152
+ if plot_data:
153
+ df = pd.DataFrame(plot_data)
154
+ fig = px.bar(
155
+ df,
156
+ x="Text",
157
+ y="Confidence",
158
+ color="Sentiment",
159
+ color_discrete_map={"NEGATIVE": "#ff4444", "POSITIVE": "#44ff44"},
160
+ title="Batch Analysis Results"
161
+ )
162
+ fig.update_layout(height=400)
163
+
164
+ return summary, fig
165
+
166
+ return summary, None
167
+
168
+ # Demo examples
169
+ EXAMPLES = [
170
+ "🎬 This movie absolutely blew my mind! Best film I've seen this year!",
171
+ "😞 Worst customer service ever. Total waste of money.",
172
+ "πŸš€ Revolutionary AI technology! Incredible understanding of language.",
173
+ "❌ I regret this purchase deeply. Poor quality materials.",
174
+ "✈️ Amazing travel experience! The hotel exceeded expectations!",
175
+ "🎡 Concert was phenomenal! Everything was absolutely perfect!"
176
+ ]
177
+
178
+ BATCH_EXAMPLE = """πŸ›οΈ This online store has amazing customer service!
179
+ 😑 Terrible experience with their support team.
180
+ ⭐ Outstanding quality! Exceeded all my expectations.
181
+ πŸ’Έ Disappointed with this expensive purchase."""
182
+
183
+ # Create Gradio interface
184
+ with gr.Blocks(
185
+ title="πŸ€– Transformer Sentiment Analysis",
186
+ theme=gr.themes.Soft(
187
+ primary_hue="blue",
188
+ secondary_hue="purple",
189
+ neutral_hue="slate"
190
+ )
191
+ ) as demo:
192
+
193
+ gr.Markdown("""
194
+ # πŸ€– Transformer Sentiment Analysis
195
+
196
+ Advanced AI-powered sentiment analysis using **DistilBERT** fine-tuned on IMDB dataset.
197
+
198
+ **Model Performance:**
199
+ - 🎯 Accuracy: **80%** on test set
200
+ - πŸ“Š F1 Score: **0.7981**
201
+ - ⚑ Speed: ~100ms per prediction
202
+ - 🧠 Parameters: 66M (DistilBERT)
203
+ """)
204
+
205
+ with gr.Tabs():
206
+ with gr.TabItem("πŸ” Single Analysis"):
207
+ with gr.Row():
208
+ with gr.Column(scale=2):
209
+ single_input = gr.Textbox(
210
+ label="πŸ’¬ Enter your text",
211
+ placeholder="Type your text here...",
212
+ lines=4
213
+ )
214
+ single_btn = gr.Button("πŸš€ Analyze Sentiment", variant="primary", size="lg")
215
+
216
+ with gr.Column(scale=2):
217
+ single_output = gr.Markdown(label="πŸ“‹ Result")
218
+ confidence_score = gr.Number(label="🎯 Confidence", precision=3)
219
+ probability_plot = gr.Plot(label="πŸ“Š Probabilities")
220
+
221
+ gr.Examples(
222
+ examples=EXAMPLES,
223
+ inputs=single_input,
224
+ label="πŸ’‘ Try these examples:"
225
+ )
226
+
227
+ with gr.TabItem("πŸ“Š Batch Processing"):
228
+ with gr.Row():
229
+ with gr.Column(scale=2):
230
+ batch_input = gr.Textbox(
231
+ label="πŸ“ Multiple texts (one per line)",
232
+ placeholder="Enter texts, one per line...",
233
+ lines=8,
234
+ value=BATCH_EXAMPLE
235
+ )
236
+ batch_btn = gr.Button("πŸš€ Process Batch", variant="primary", size="lg")
237
+
238
+ with gr.Column(scale=2):
239
+ batch_output = gr.Markdown(label="πŸ“ˆ Results")
240
+ batch_plot = gr.Plot(label="πŸ“Š Analytics")
241
+
242
+ with gr.TabItem("ℹ️ About"):
243
+ gr.Markdown("""
244
+ ## About This Model
245
+
246
+ ### πŸ—οΈ Architecture
247
+ - **Model:** DistilBERT (Distilled BERT)
248
+ - **Parameters:** 66 million
249
+ - **Training:** Fine-tuned on IMDB dataset
250
+ - **Accuracy:** 80% on test set
251
+
252
+ ### ⚑ Performance
253
+ - **Speed:** ~100ms per prediction
254
+ - **Batch Processing:** Supported
255
+ - **Memory:** Optimized for production
256
+
257
+ ### πŸš€ Tech Stack
258
+ - **Framework:** PyTorch + Transformers
259
+ - **Tracking:** MLflow experiments
260
+ - **UI:** Gradio
261
+
262
+ ### πŸ”— Links
263
+ - **Model:** [MartinRodrigo/distilbert-sentiment-imdb](https://huggingface.co/MartinRodrigo/distilbert-sentiment-imdb)
264
+ - **GitHub:** [transformer-sentiment-analysis](https://github.com/mrdesautu/ransformer-sentiment-analysis)
265
+
266
+ ---
267
+
268
+ Built with ❀️ using Transformers, MLflow, and Gradio
269
+ """)
270
+
271
+ # Event handlers
272
+ single_btn.click(
273
+ fn=analyze_sentiment,
274
+ inputs=single_input,
275
+ outputs=[single_output, confidence_score, probability_plot]
276
+ )
277
+
278
+ batch_btn.click(
279
+ fn=analyze_batch_texts,
280
+ inputs=batch_input,
281
+ outputs=[batch_output, batch_plot]
282
+ )
283
+
284
+ if __name__ == "__main__":
285
+ demo.launch()
requirements.txt CHANGED
@@ -1,14 +1,6 @@
1
  transformers>=4.30.0
2
  torch>=2.0.0
3
- datasets>=2.0.0
4
- evaluate>=0.4.0
5
- scikit-learn>=1.0.0
6
- matplotlib>=3.5.0
7
- seaborn>=0.11.0
8
- numpy>=1.21.0
9
- pytest>=7.0.0
10
- fastapi>=0.100.0
11
- uvicorn[standard]>=0.20.0
12
- pydantic>=2.0.0
13
- python-multipart
14
- aiofiles
 
1
  transformers>=4.30.0
2
  torch>=2.0.0
3
+ gradio>=4.0.0
4
+ plotly>=5.0.0
5
+ pandas>=1.5.0
6
+ numpy>=1.24.0