File size: 4,380 Bytes
b97c280
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1159516
 
 
 
 
 
 
 
 
b97c280
 
 
609c079
b97c280
 
1159516
b97c280
1159516
b97c280
 
 
 
1159516
 
 
 
 
 
 
 
 
 
 
 
 
b97c280
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
title: Real-Time Sentiment Analysis Dashboard
emoji: πŸ“Š
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
---

# πŸ“Š Sentiment Analysis Dashboard

A real-time sentiment analysis dashboard that processes tweets and displays sentiment trends using Apache Kafka, Spark, and Docker.

**Live Demo Dashboard**: [https://huggingface.co/spaces/xtinkarpiu/sentiment-analysis](https://huggingface.co/spaces/xtinkarpiu/sentiment-analysis). *Demo runs in mock mode with simulated tweets*

Author: Kristine Karp (karpkristine@gmail.com)

## πŸ“Έ Preview
![Dashboard Overview](assets/dashboard_screenshot1.jpg)
![Real-Time Tweets and Charts](assets/dashboard_screenshot2.jpg)

## 🌟 Features

- Real-time tweet processing - Live streaming with Apache Kafka
- Intelligent sentiment analysis - Keyword-based classification with Spark
- Live dashboard updates - WebSocket-powered real-time interface
- Comprehensive visualization - Sentiment trends and recent tweet streams
- Flexible data sources - Support for both live Twitter API and mock data
- Containerized deployment - Full Docker orchestration for easy setup

## πŸ”§ Tech Stack
- Backend:🐍 Python | 🌢️ Flask | πŸ”Œ Flask-SocketIO  
- Message Streaming: πŸ“¨ Apache Kafka  
- Stream Processing: ✨ Apache Spark  
- Frontend* πŸ–ΌοΈ HTML5 | 🎨 CSS3 | ⚑ JavaScript | πŸ“Š Chart.js  
- Real-time Communication: πŸ”„ WebSocket  
- Containerization: 🐳 Docker | πŸ“¦ Docker Compose  
- API Integration: 🐦 Twitter API v2

## βš™οΈ Workflow Overview
This project implements a real-time sentiment analysis ETL pipeline using **Python scripts**, all orchestrated with **Docker**:

1. **Extract** - `producer.py` connects to Twitter API for live streaming, or `mock_tweet_producer.py` generates realistic demo data for testing or hugging face demonstration purposes.
2. **Transform** -  Apache Kafka ingests tweets under 'sentiment-topic', while `consumer.py` applies sentiment analysis using Spark streaming.
3. **Load** -  Processed results are published to 'sentiment-results' topic and displayed in real-time, also in `consumer.py`.
4. **Visualize** - `dashboard.py` provides a web interface with live sentiment metrics and trend charts.
5. **Orchestrate** - `docker-compose.yml` and `Docker` manages all services for consistent deployment.

---

## πŸ§ͺ How to Reproduce Locally

**πŸ› οΈ Project Structure**

| File/Folder            | Purpose                                           |
|------------------------|---------------------------------------------------|
| `dashboard.py`         | Flask app + Kafka consumer for real-time dashboard, flexible for real Kafka data or Hugging Face demo data |
| `templates/dashboard.html` | HTML UI template with real-time charts and tweet display |
| `mock_tweet_producer.py` | Generates realistic mock tweets for demo/testing |
| `producer.py`          | Connects to Twitter API to stream live tweets   |
| `consumer.py`          | Runs Spark-based sentiment analysis on Kafka stream        |
| `docker-compose.yml`   | Docker setup orchestrating Kafka, Spark, producer, dashboard |
| `requirements.txt`     | Python dependencies                               |
| `.env` (optional)      | Contains Twitter API credentials           |

**Option 1: Mock Mode (No API Required)**

```bash
git clone https://huggingface.co/spaces/xtinkarpiu/sentiment-analysis
cd sentiment-analysis
docker-compose up --build
```

This launches:

- Kafka: Message broker for tweet streaming
- Spark: Real-time sentiment analysis processing
- Producer: Tweet ingestion (default: mock tweets)
- Dashboard: Web interface at http://localhost:5000

**Option 2: Live Twitter Integration**

1. Create a Twitter/X Developer App at [developer.twitter.com](https://developer.twitter.com). Ensure your app has read and write permissions enabled, and your API access level is at least Basic to access streaming endpoints.
2. Add your **Bearer Token** to a `.env` file:
```env
BEARER_TOKEN=your_token_here
```
3. Set mock mode to false in `app.py`:
```python
os.environ["USE_MOCK"] = "false"
```
4. In `docker-compose.yml`, under sentiment-producer, replace the command to run producer.py instead of mock_tweet_producer.py:
```python
command: ["python", "producer.py"]
```
5. Restart the Docker Compose stack to begin processing live tweets.
```bash
docker-compose up --build
```