Spaces:

xtinkarpiu
/

sentiment-analysis

Sleeping

App Files Files Community

xtinkarpiu commited on Aug 8, 2025

Commit

c1c559f

verified ·

1 Parent(s): e18a159

Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

Dockerfile +3 -0
README.md +49 -20
app.py +3 -0
dashboard.py +245 -0

Dockerfile CHANGED Viewed

@@ -11,4 +11,7 @@ RUN pip install --no-cache-dir -r requirements.txt
 COPY . .
 CMD ["python", "dashboard.py"]

 COPY . .
+ENV USE_MOCK=true
+ENV PORT=7860
 CMD ["python", "dashboard.py"]

README.md CHANGED Viewed

@@ -1,17 +1,21 @@
 ---
-title: Sentiment Analysis Dashboard
 emoji: 📊
 colorFrom: blue
 colorTo: green
 sdk: docker
-app_file: dashboard.py
 pinned: false
 ---
 # 📊 Sentiment Analysis Dashboard
-A real-time dashboard for visualizing tweet sentiment (positive, negative, neutral) using **Kafka**, **Spark**, and **Flask**.
-It supports both live Twitter streams (via producer.py) and demo mode with mock tweets (via mock_tweet_producer.py).
 This version runs in mock/demo mode on Hugging Face Spaces.
@@ -22,7 +26,9 @@ Author: Kristine Karp (karpkristine@gmail.com)
 ## 🚀 Demo Mode (Hugging Face)
 > This Space runs in **mock mode**, generating fake tweets using `mock_tweet_producer.py`.
-This allows users to explore the dashboard **without requiring Twitter API credentials or external Kafka setup**.
 ---
@@ -30,10 +36,12 @@ This allows users to explore the dashboard **without requiring Twitter API crede
 ## 🧠 Features
 - Real-time tweet ingestion (simulated or live)
-- Sentiment counts: Positive, Neutral, Negative
-- Recent tweet stream with sentiment tags
-- Hourly sentiment trend summary
-- WebSocket-powered live updates
 ---
@@ -41,13 +49,14 @@ This allows users to explore the dashboard **without requiring Twitter API crede
 | File/Folder            | Purpose                                           |
 |------------------------|---------------------------------------------------|
-| `dashboard.py`         | Main Flask app + Kafka consumer for hugging faces demo purposes             |
-| `local_dashboard.py`         | Flask app + Kafka consumer that can be run locally in http://localhost:5000/            |
-| `templates/dashboard.html` | HTML UI template for the dashboard          |
-| `mock_tweet_producer.py` | Generates mock tweets for demo/testing        |
-| `producer.py`          | Optional Twitter producer if you have API token  |
-| `requirements.txt`     | All Python dependencies                           |
-| `.env` (optional)      | Set up your Twitter API token if using real data |
 ---
@@ -55,11 +64,13 @@ This allows users to explore the dashboard **without requiring Twitter API crede
 If you want to stream real tweets and analyze their sentiment:
-1. Create a Twitter/X Developer App
 2. Add your **Bearer Token** to a `.env` file:
    ```env
    BEARER_TOKEN=your_token_here
-3. Run producer.py instead
 ## 🧪 Local Development
@@ -67,11 +78,29 @@ git clone https://huggingface.co/spaces/xtinkarpiu/sentiment-analysis
 cd sentiment-analysis
 docker-compose up --build
 ## 📷 Dashboard Preview
 Here's a preview of the sentiment dashboard in action:
 ![Dashboard Overview](assets/dashboard_screenshot1.jpg)
 ![Real-Time Tweets and Charts](assets/dashboard_screenshot2.jpg)
-*Demo hosted on Hugging Face Spaces*

 ---
+title: Real-Time Sentiment Analysis Dashboard
 emoji: 📊
 colorFrom: blue
 colorTo: green
 sdk: docker
+app_port: 7860
 pinned: false
 ---
 # 📊 Sentiment Analysis Dashboard
+A real-time sentiment analysis dashboard that processes tweets and displays sentiment trends.
+- 🟢 **Live Demo Mode**: Shows mock data for demonstration
+- 🔄 **Real-time Updates**: Uses WebSocket for live data streaming
+- 📊 **Interactive Charts**: Pie charts and trend analysis
+- 📱 **Recent Tweets**: Live feed of processed tweets
 This version runs in mock/demo mode on Hugging Face Spaces.
 ## 🚀 Demo Mode (Hugging Face)
 > This Space runs in **mock mode**, generating fake tweets using `mock_tweet_producer.py`.
+> This allows users to explore the dashboard **without requiring Twitter API credentials or external Kafka setup**.
+>
+> If you have a Twitter API token, you can use `producer.py` and set `os.environ["USE_MOCK"]` to `"false"` in `app.py`.
 ---
 ## 🧠 Features
 - Real-time tweet ingestion (simulated or live)
+- Sentiment analysis using keyword-based classification
+- Live sentiment counts: Positive, Neutral, Negative
+- Recent tweet stream with color-coded sentiment tags
+- Hourly sentiment trend visualization
+- WebSocket-powered live dashboard updates
+- Responsive design with modern UI
 ---
 | File/Folder            | Purpose                                           |
 |------------------------|---------------------------------------------------|
+| `dashboard.py`         | Main Flask app + Kafka consumer, flexible for real Kafka data or Hugging Face demo |
+| `templates/dashboard.html` | HTML UI template with real-time charts and tweet display |
+| `mock_tweet_producer.py` | Generates realistic mock tweets for demo/testing |
+| `producer.py`          | Twitter API producer for live tweet streaming    |
+| `consumer.py`          | Spark-based sentiment analysis processor         |
+| `docker-compose.yml`   | Full microservices setup (Kafka + Spark + Dashboard) |
+| `requirements.txt`     | Python dependencies                               |
+| `.env` (optional)      | Twitter API credentials for live data            |
 ---
 If you want to stream real tweets and analyze their sentiment:
+1. Create a Twitter/X Developer App at [developer.twitter.com](https://developer.twitter.com)
 2. Add your **Bearer Token** to a `.env` file:
    ```env
    BEARER_TOKEN=your_token_here
+3. Set mock mode to false in app.py:
+   os.environ["USE_MOCK"] = "false"
+4. Run producer.py instead. Run with real data: The system will connect to Twitter API and process live tweets
 ## 🧪 Local Development
 cd sentiment-analysis
 docker-compose up --build
+This will start:
+🔴 Kafka: Message broker for tweet streaming
+⚡ Spark: Real-time sentiment analysis processing
+🐍 Producer: Tweet ingestion (mock or real)
+📊 Dashboard: Web interface at http://localhost:5000
 ## 📷 Dashboard Preview
 Here's a preview of the sentiment dashboard in action:
 ![Dashboard Overview](assets/dashboard_screenshot1.jpg)
+Main dashboard with real-time sentiment counters and charts
 ![Real-Time Tweets and Charts](assets/dashboard_screenshot2.jpg)
+Live tweet feed with sentiment analysis and hourly trends
+## 🔧 Technologies Used
+- Backend: Python, Flask, Flask-SocketIO
+- Message Streaming: Apache Kafka
+- Stream Processing: Apache Spark
+- Frontend: HTML5, CSS3, JavaScript, Chart.js
+- Real-time Communication: WebSocket
+- Containerization: Docker, Docker Compose
+- API Integration: Twitter API v2
+*Demo hosted on Hugging Face Spaces with mock data for demonstration purposes.*

app.py ADDED Viewed

	@@ -0,0 +1,3 @@

+import os
+os.environ["USE_MOCK"] = "true"  # Set to false if using real tweets or local kafka
+from dashboard import *

dashboard.py ADDED Viewed

	@@ -0,0 +1,245 @@

+from flask import Flask, render_template, jsonify
+from flask_socketio import SocketIO, emit
+import json
+import threading
+import time
+from datetime import datetime
+from collections import defaultdict, deque
+import logging
+import os
+import random
+app = Flask(__name__)
+app.config['SECRET_KEY'] = 'sentiment-dashboard-secret'
+socketio = SocketIO(app, cors_allowed_origins="*")
+# In-memory storage for dashboard data
+sentiment_counts = {'positive': 0, 'negative': 0, 'neutral': 0}
+recent_tweets = deque(maxlen=50)  # Keep last 50 tweets
+hourly_sentiment = defaultdict(lambda: {'positive': 0, 'negative': 0, 'neutral': 0})
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Check environment for mock mode
+USE_MOCK = os.environ.get("USE_MOCK", "true").lower() == "true"
+def kafka_consumer_thread():
+    """Background thread to consume processed tweets from Kafka or generate mock data"""
+    if USE_MOCK:
+        logger.info("Running in MOCK mode - generating demo data")
+        mock_tweet_generator()
+    else:
+        logger.info("Running in KAFKA mode - connecting to real Kafka")
+        real_kafka_consumer()
+def mock_tweet_generator():
+    """Generate mock tweets for demo purposes"""
+    sentiments = ["positive", "neutral", "negative"]
+    # Sample mock tweets for demo
+    sample_tweets = [
+        "I absolutely love this new Python framework! Amazing! 🐍✨",
+        "Just finished my first machine learning project! So excited! 🚀",
+        "Beautiful sunny day! Perfect for coding ☕️💻",
+        "Finally understood how Kafka works! Awesome technology 🎉",
+        "Ugh, spent 3 hours debugging this error. So frustrated 😤",
+        "This API documentation is terrible. Nothing works 😡",
+        "Why is deployment always so painful? 💔",
+        "Working on a new feature. Should be ready next week.",
+        "Attending a tech conference tomorrow. Looking forward to it.",
+        "Updated the dependencies. Everything seems fine.",
+        "Django vs Flask debate continues. Both are good.",
+        "Love how clean Python code can be. Beautiful language!",
+        "FastAPI is becoming my go-to for REST APIs. So fast!",
+        "NumPy arrays are much faster than regular lists.",
+        "Jupyter notebooks are perfect for data exploration.",
+    ]
+    tweet_count = 0
+    while True:
+        try:
+            # Generate a mock tweet
+            sentiment = random.choice(sentiments)
+            tweet_text = random.choice(sample_tweets)
+            tweet_data = {
+                'text': tweet_text,
+                'sentiment': sentiment,
+                'timestamp': datetime.now().strftime('%H:%M:%S'),
+                'author_id': f'user_{random.randint(1000, 9999)}'
+            }
+            # Update sentiment counts
+            sentiment_counts[sentiment] += 1
+            # Add to recent tweets
+            recent_tweets.append(tweet_data)
+            # Update hourly data
+            hour = datetime.now().strftime('%H:00')
+            hourly_sentiment[hour][sentiment] += 1
+            # Emit real-time update to connected clients
+            socketio.emit('sentiment_update', {
+                'sentiment_counts': dict(sentiment_counts),
+                'recent_tweets': list(recent_tweets),
+                'hourly_data': dict(hourly_sentiment)
+            })
+            tweet_count += 1
+            logger.info(f"Generated mock tweet #{tweet_count} with sentiment: {sentiment}")
+            # Random delay between tweets (1-3 seconds for demo)
+            time.sleep(random.uniform(1, 3))
+        except Exception as e:
+            logger.error(f"Error in mock tweet generator: {e}")
+            time.sleep(5)
+def real_kafka_consumer():
+    """Real Kafka consumer for production use"""
+    try:
+        from kafka import KafkaConsumer
+        from kafka.errors import NoBrokersAvailable
+        def create_kafka_consumer(max_retries=10, retry_delay=5):
+            """Create Kafka consumer with retry logic"""
+            for attempt in range(max_retries):
+                try:
+                    consumer = KafkaConsumer(
+                        'sentiment-results',
+                        bootstrap_servers=['kafka:9092'],
+                        value_deserializer=lambda m: json.loads(m.decode('utf-8')),
+                        consumer_timeout_ms=1000,
+                        auto_offset_reset='earliest',
+                        enable_auto_commit=True,
+                        group_id='dashboard-group'
+                    )
+                    logger.info("Successfully connected to Kafka consumer!")
+                    return consumer
+                except NoBrokersAvailable as e:
+                    logger.warning(f"Kafka not ready, attempt {attempt + 1}/{max_retries}. Retrying in {retry_delay}s...")
+                    time.sleep(retry_delay)
+                except Exception as e:
+                    logger.error(f"Unexpected error connecting to Kafka: {e}")
+                    time.sleep(retry_delay)
+            raise Exception(f"Could not connect to Kafka consumer after {max_retries} attempts")
+        # Wait for Kafka and Spark to be ready
+        logger.info("Waiting for Kafka and Spark services to be ready...")
+        time.sleep(10)
+        consumer = create_kafka_consumer()
+        logger.info("Connected to Kafka consumer for dashboard - waiting for processed tweets...")
+        message_count = 0
+        while True:
+            try:
+                # Poll for messages with timeout
+                message_batch = consumer.poll(timeout_ms=1000)
+                if message_batch:
+                    logger.info(f"Received batch with {len(message_batch)} topic partitions")
+                    for topic_partition, messages in message_batch.items():
+                        logger.info(f"Processing {len(messages)} messages from {topic_partition}")
+                        for message in messages:
+                            try:
+                                tweet_data = message.value
+                                message_count += 1
+                                logger.info(f"Message {message_count}: Received tweet data: {tweet_data}")
+                                # Update sentiment counts
+                                sentiment = tweet_data.get('sentiment', 'neutral')
+                                sentiment_counts[sentiment] += 1
+                                # Add to recent tweets
+                                recent_tweets.append({
+                                    'text': tweet_data.get('tweet_text', '')[:100] + '...' if len(tweet_data.get('tweet_text', '')) > 100 else tweet_data.get('tweet_text', ''),
+                                    'sentiment': sentiment,
+                                    'timestamp': datetime.now().strftime('%H:%M:%S'),
+                                    'author_id': tweet_data.get('author_id', 'Unknown')
+                                })
+                                # Update hourly data
+                                hour = datetime.now().strftime('%H:00')
+                                hourly_sentiment[hour][sentiment] += 1
+                                # Emit real-time update to connected clients
+                                socketio.emit('sentiment_update', {
+                                    'sentiment_counts': dict(sentiment_counts),
+                                    'recent_tweets': list(recent_tweets),
+                                    'hourly_data': dict(hourly_sentiment)
+                                })
+                                logger.info(f"Processed tweet with sentiment: {sentiment} - Total counts: {dict(sentiment_counts)}")
+                            except Exception as e:
+                                logger.error(f"Error processing individual tweet data: {e}")
+                else:
+                    if message_count == 0:
+                        logger.info("No messages received yet, continuing to poll...")
+                    time.sleep(1)
+            except Exception as e:
+                logger.error(f"Error in polling loop: {e}")
+                time.sleep(5)
+    except ImportError:
+        logger.warning("kafka-python not available, falling back to mock mode")
+        mock_tweet_generator()
+    except Exception as e:
+        logger.error(f"Error in real Kafka consumer: {e}")
+        logger.info("Falling back to mock mode")
+        mock_tweet_generator()
+@app.route('/')
+def dashboard():
+    """Main dashboard page"""
+    return render_template('dashboard.html')
+@app.route('/api/data')
+def get_data():
+    """API endpoint to get current dashboard data"""
+    data = {
+        'sentiment_counts': dict(sentiment_counts),
+        'recent_tweets': list(recent_tweets),
+        'hourly_data': dict(hourly_sentiment),
+        'total_tweets': sum(sentiment_counts.values())
+    }
+    logger.info(f"API request - returning data: {data}")
+    return jsonify(data)
+@socketio.on('connect')
+def handle_connect():
+    """Handle client connection"""
+    logger.info("Client connected to dashboard")
+    emit('sentiment_update', {
+        'sentiment_counts': dict(sentiment_counts),
+        'recent_tweets': list(recent_tweets),
+        'hourly_data': dict(hourly_sentiment)
+    })
+if __name__ == '__main__':
+    # Start consumer thread (either mock or real Kafka)
+    consumer_thread = threading.Thread(target=kafka_consumer_thread, daemon=True)
+    consumer_thread.start()
+    mode = "MOCK" if USE_MOCK else "KAFKA"
+    logger.info(f"Starting sentiment dashboard in {mode} mode on port 5000")
+    if USE_MOCK:
+        logger.info("Dashboard will display mock demo data")
+    else:
+        logger.info("Dashboard will display data once Spark processes tweets from Kafka")
+    # Get port from environment (Hugging Face Spaces uses port 7860)
+    port = int(os.environ.get('PORT', 5000))
+    # Fix for Werkzeug warning - use allow_unsafe_werkzeug for development
+    socketio.run(app, host='0.0.0.0', port=port, debug=False, allow_unsafe_werkzeug=True)