Spaces:

xtinkarpiu
/

sentiment-analysis

Sleeping

App Files Files Community

xtinkarpiu commited on Aug 8, 2025

Commit

e18a159

verified ·

1 Parent(s): 44819aa

Upload folder using huggingface_hub

Browse files

Files changed (12) hide show

.gitignore +38 -0
Dockerfile +14 -0
README.md +77 -11
assets/dashboard_screenshot1.jpg +0 -0
assets/dashboard_screenshot2.jpg +0 -0
consumer.py +112 -0
docker-compose.yml +101 -0
local_dashboard.py +155 -0
mock_tweet_producer.py +157 -0
producer.py +149 -0
requirements.txt +8 -0
templates/dashboard.html +431 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,38 @@

+# Ignore Python cache and logs
+__pycache__/
+*.pyc
+*.pyo
+*.log
+# Ignore environment files with secrets
+.env
+*.env
+# Ignore IDE/editor config
+.vscode/
+.idea/
+# Ignore OS/system files
+.DS_Store
+Thumbs.db
+# Ignore test scripts or diagnostics
+test_*.py
+diagnostic.sh
+# Ignore raw media or large unneeded assets
+*.mp4
+*.mov
+*.avi
+assets/raw_videos/
+screenshots/
+# Ignore Kafka jars or build artifacts (if any)
+*.jar
+*.class
+build/
+dist/
+# Ignore Docker stuff not needed in repo
+*.pid
+*.sock

Dockerfile ADDED Viewed

	@@ -0,0 +1,14 @@

+FROM python:3.9-slim
+WORKDIR /app
+RUN apt-get update && apt-get install -y \
+    gcc \
+    && rm -rf /var/lib/apt/lists/*
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+COPY . .
+CMD ["python", "dashboard.py"]

README.md CHANGED Viewed

@@ -1,11 +1,77 @@
----
-title: Sentiment Analysis
-emoji: 🦀
-colorFrom: red
-colorTo: indigo
-sdk: docker
-pinned: false
-short_description: Real-time dashboard for visualizing tweet sentiment
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+---
+title: Sentiment Analysis Dashboard
+emoji: 📊
+colorFrom: blue
+colorTo: green
+sdk: docker
+app_file: dashboard.py
+pinned: false
+---
+# 📊 Sentiment Analysis Dashboard
+A real-time dashboard for visualizing tweet sentiment (positive, negative, neutral) using **Kafka**, **Spark**, and **Flask**.
+It supports both live Twitter streams (via producer.py) and demo mode with mock tweets (via mock_tweet_producer.py).
+This version runs in mock/demo mode on Hugging Face Spaces.
+Author: Kristine Karp (karpkristine@gmail.com)
+---
+## 🚀 Demo Mode (Hugging Face)
+> This Space runs in **mock mode**, generating fake tweets using `mock_tweet_producer.py`.
+This allows users to explore the dashboard **without requiring Twitter API credentials or external Kafka setup**.
+---
+## 🧠 Features
+- Real-time tweet ingestion (simulated or live)
+- Sentiment counts: Positive, Neutral, Negative
+- Recent tweet stream with sentiment tags
+- Hourly sentiment trend summary
+- WebSocket-powered live updates
+---
+## 🛠️ File Overview
+| File/Folder            | Purpose                                           |
+|------------------------|---------------------------------------------------|
+| `dashboard.py`         | Main Flask app + Kafka consumer for hugging faces demo purposes             |
+| `local_dashboard.py`         | Flask app + Kafka consumer that can be run locally in http://localhost:5000/            |
+| `templates/dashboard.html` | HTML UI template for the dashboard          |
+| `mock_tweet_producer.py` | Generates mock tweets for demo/testing        |
+| `producer.py`          | Optional Twitter producer if you have API token  |
+| `requirements.txt`     | All Python dependencies                           |
+| `.env` (optional)      | Set up your Twitter API token if using real data |
+---
+## 📡 Using Real Twitter Streaming
+If you want to stream real tweets and analyze their sentiment:
+1. Create a Twitter/X Developer App
+2. Add your **Bearer Token** to a `.env` file:
+   ```env
+   BEARER_TOKEN=your_token_here
+3. Run producer.py instead
+## 🧪 Local Development
+git clone https://huggingface.co/spaces/xtinkarpiu/sentiment-analysis
+cd sentiment-analysis
+docker-compose up --build
+## 📷 Dashboard Preview
+Here's a preview of the sentiment dashboard in action:
+![Dashboard Overview](assets/dashboard_screenshot1.jpg)
+![Real-Time Tweets and Charts](assets/dashboard_screenshot2.jpg)
+*Demo hosted on Hugging Face Spaces*

assets/dashboard_screenshot1.jpg ADDED Viewed

assets/dashboard_screenshot2.jpg ADDED Viewed

consumer.py ADDED Viewed

	@@ -0,0 +1,112 @@

+from pyspark.sql import SparkSession
+from pyspark.sql.functions import col, udf, from_json, to_json, struct
+from pyspark.sql.types import StringType, StructType, StructField, LongType, DoubleType
+import time
+import logging
+# Set up logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+def simple_sentiment(text):
+    if text is None:
+        return 'neutral'
+    text = text.lower()
+    if any(word in text for word in ['good', 'great', 'awesome', 'happy', 'love', 'excellent', 'amazing']):
+        return 'positive'
+    elif any(word in text for word in ['bad', 'terrible', 'awful', 'sad', 'hate', 'worst', 'horrible']):
+        return 'negative'
+    return 'neutral'
+sentiment_udf = udf(simple_sentiment, StringType())
+logger.info("Waiting for services to start...")
+time.sleep(15)  # Wait for services
+try:
+    spark = SparkSession.builder \
+        .appName("KafkaSentimentConsumer") \
+        .config("spark.sql.streaming.forceDeleteTempCheckpointLocation", "true") \
+        .config("spark.sql.adaptive.enabled", "false") \
+        .config("spark.sql.adaptive.coalescePartitions.enabled", "false") \
+        .getOrCreate()
+    spark.sparkContext.setLogLevel("WARN")
+    logger.info("Spark session created successfully")
+    tweet_schema = StructType([
+        StructField("id", LongType(), True),
+        StructField("text", StringType(), True),
+        StructField("created_at", StringType(), True),
+        StructField("author_id", LongType(), True),
+        StructField("timestamp", DoubleType(), True)
+    ])
+    logger.info("Connecting to Kafka...")
+    # Read from input topic
+    df = spark.readStream \
+        .format("kafka") \
+        .option("kafka.bootstrap.servers", "kafka:9092") \
+        .option("subscribe", "sentiment-topic") \
+        .option("startingOffsets", "latest") \
+        .option("failOnDataLoss", "false") \
+        .load()
+    logger.info("Connected to Kafka, processing tweets...")
+    # Parse and process tweets
+    parsed_df = df.select(
+        col("timestamp").alias("kafka_timestamp"),
+        from_json(col("value").cast("string"), tweet_schema).alias("tweet_data")
+    ).filter(col("tweet_data").isNotNull())
+    result_df = parsed_df.select(
+        col("tweet_data.id").alias("tweet_id"),
+        col("tweet_data.text").alias("tweet_text"),
+        col("tweet_data.created_at").alias("created_at"),
+        col("tweet_data.author_id").alias("author_id"),
+        col("kafka_timestamp")
+    ).withColumn("sentiment", sentiment_udf(col("tweet_text")))
+    # Create a copy for console output
+    console_query = result_df.writeStream \
+        .outputMode("append") \
+        .format("console") \
+        .option("truncate", False) \
+        .trigger(processingTime='5 seconds') \
+        .start()
+    logger.info("Console output stream started")
+    # Send results to dashboard topic
+    dashboard_df = result_df.select(
+        to_json(struct(
+            col("tweet_id"),
+            col("tweet_text"),
+            col("sentiment"),
+            col("author_id"),
+            col("created_at")
+        )).alias("value")
+    )
+    dashboard_query = dashboard_df.writeStream \
+        .format("kafka") \
+        .option("kafka.bootstrap.servers", "kafka:9092") \
+        .option("topic", "sentiment-results") \
+        .option("checkpointLocation", "/tmp/checkpoint-dashboard") \
+        .outputMode("append") \
+        .trigger(processingTime='5 seconds') \
+        .start()
+    logger.info("Dashboard output stream started")
+    logger.info("Starting sentiment analysis consumer...")
+    logger.info("Processing tweets and sending results to dashboard...")
+    logger.info("Topics: sentiment-topic (input) -> sentiment-results (output)")
+    # Wait for both streams
+    spark.streams.awaitAnyTermination()
+except Exception as e:
+    logger.error(f"Error in consumer: {e}")
+    raise

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,101 @@

+services:
+  kafka:
+    image: bitnami/kafka:latest
+    container_name: kafka
+    environment:
+      - KAFKA_CFG_PROCESS_ROLES=broker,controller
+      - KAFKA_CFG_NODE_ID=1
+      - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT
+      - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093
+      - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
+      - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=1@localhost:9093
+      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER
+      - KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE=true
+    ports:
+      - "9092:9092"
+    healthcheck:
+      test: ["CMD-SHELL", "kafka-topics.sh --bootstrap-server localhost:9092 --list"]
+      interval: 30s
+      timeout: 10s
+      retries: 5
+      start_period: 60s
+    networks:
+      - kafka-network
+  sentiment-producer:
+    container_name: sentiment-producer
+    build: .
+    depends_on:
+      kafka:
+        condition: service_healthy
+    command: ["python", "mock_tweet_producer.py"]
+    restart: on-failure
+    networks:
+      - kafka-network
+  spark:
+    image: bitnami/spark:3.4
+    container_name: spark
+    environment:
+      - SPARK_MODE=master
+      - SPARK_RPC_AUTHENTICATION_ENABLED=no
+      - SPARK_RPC_ENCRYPTION_ENABLED=no
+      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
+      - SPARK_SSL_ENABLED=no
+    ports:
+      - "4040:4040"
+      - "7077:7077"
+    depends_on:
+      kafka:
+        condition: service_healthy
+    networks:
+      - kafka-network
+  spark-worker:
+    image: bitnami/spark:3.4
+    container_name: spark-worker
+    environment:
+      - SPARK_MODE=worker
+      - SPARK_MASTER_URL=spark://spark:7077
+      - SPARK_RPC_AUTHENTICATION_ENABLED=no
+      - SPARK_RPC_ENCRYPTION_ENABLED=no
+      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
+      - SPARK_SSL_ENABLED=no
+    depends_on:
+      - spark
+    networks:
+      - kafka-network
+  sentiment-consumer:
+    image: bitnami/spark:3.4
+    container_name: sentiment-consumer
+    depends_on:
+      kafka:
+        condition: service_healthy
+      spark:
+        condition: service_started
+      spark-worker:
+        condition: service_started
+    command: ["spark-submit", "--master", "spark://spark:7077", "--packages", "org.apache.spark:spark-sql-kafka-0-10_2.12:3.4.0", "/app/consumer.py"]
+    volumes:
+      - .:/app
+    restart: on-failure
+    networks:
+      - kafka-network
+  dashboard:
+    container_name: dashboard
+    build: .
+    depends_on:
+      kafka:
+        condition: service_healthy
+    command: ["python", "dashboard.py"]
+    ports:
+      - "5000:5000"
+    restart: on-failure
+    networks:
+      - kafka-network
+networks:
+  kafka-network:
+    driver: bridge

local_dashboard.py ADDED Viewed

	@@ -0,0 +1,155 @@

+from flask import Flask, render_template, jsonify
+from flask_socketio import SocketIO, emit
+from kafka import KafkaConsumer
+from kafka.errors import NoBrokersAvailable
+import json
+import threading
+import time
+from datetime import datetime
+from collections import defaultdict, deque
+import logging
+import os
+app = Flask(__name__)
+app.config['SECRET_KEY'] = 'sentiment-dashboard-secret'
+socketio = SocketIO(app, cors_allowed_origins="*")
+# In-memory storage for dashboard data
+sentiment_counts = {'positive': 0, 'negative': 0, 'neutral': 0}
+recent_tweets = deque(maxlen=50)  # Keep last 50 tweets
+hourly_sentiment = defaultdict(lambda: {'positive': 0, 'negative': 0, 'neutral': 0})
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+def create_kafka_consumer(max_retries=10, retry_delay=5):
+    """Create Kafka consumer with retry logic"""
+    for attempt in range(max_retries):
+        try:
+            consumer = KafkaConsumer(
+                'sentiment-results',
+                bootstrap_servers=['kafka:9092'],
+                value_deserializer=lambda m: json.loads(m.decode('utf-8')),
+                consumer_timeout_ms=1000,
+                auto_offset_reset='earliest',  # Changed from 'latest' to 'earliest'
+                enable_auto_commit=True,
+                group_id='dashboard-group'  # Added consumer group
+            )
+            logger.info("Successfully connected to Kafka consumer!")
+            return consumer
+        except NoBrokersAvailable as e:
+            logger.warning(f"Kafka not ready, attempt {attempt + 1}/{max_retries}. Retrying in {retry_delay}s...")
+            time.sleep(retry_delay)
+        except Exception as e:
+            logger.error(f"Unexpected error connecting to Kafka: {e}")
+            time.sleep(retry_delay)
+    raise Exception(f"Could not connect to Kafka consumer after {max_retries} attempts")
+def kafka_consumer_thread():
+    """Background thread to consume processed tweets from Kafka"""
+    try:
+        # Wait for Kafka and Spark to be ready
+        logger.info("Waiting for Kafka and Spark services to be ready...")
+        time.sleep(10)  # Reduced from 30 to 10 seconds
+        consumer = create_kafka_consumer()
+        logger.info("Connected to Kafka consumer for dashboard - waiting for processed tweets...")
+        logger.info("Starting to poll for messages from sentiment-results topic...")
+        message_count = 0
+        while True:
+            try:
+                # Poll for messages with timeout
+                message_batch = consumer.poll(timeout_ms=1000)
+                if message_batch:
+                    logger.info(f"Received batch with {len(message_batch)} topic partitions")
+                    for topic_partition, messages in message_batch.items():
+                        logger.info(f"Processing {len(messages)} messages from {topic_partition}")
+                        for message in messages:
+                            try:
+                                tweet_data = message.value
+                                message_count += 1
+                                logger.info(f"Message {message_count}: Received tweet data: {tweet_data}")
+                                # Update sentiment counts
+                                sentiment = tweet_data.get('sentiment', 'neutral')
+                                sentiment_counts[sentiment] += 1
+                                # Add to recent tweets
+                                recent_tweets.append({
+                                    'text': tweet_data.get('tweet_text', '')[:100] + '...' if len(tweet_data.get('tweet_text', '')) > 100 else tweet_data.get('tweet_text', ''),
+                                    'sentiment': sentiment,
+                                    'timestamp': datetime.now().strftime('%H:%M:%S'),
+                                    'author_id': tweet_data.get('author_id', 'Unknown')
+                                })
+                                # Update hourly data
+                                hour = datetime.now().strftime('%H:00')
+                                hourly_sentiment[hour][sentiment] += 1
+                                # Emit real-time update to connected clients
+                                socketio.emit('sentiment_update', {
+                                    'sentiment_counts': dict(sentiment_counts),
+                                    'recent_tweets': list(recent_tweets),
+                                    'hourly_data': dict(hourly_sentiment)
+                                })
+                                logger.info(f"Processed tweet with sentiment: {sentiment} - Total counts: {dict(sentiment_counts)}")
+                            except Exception as e:
+                                logger.error(f"Error processing individual tweet data: {e}")
+                else:
+                    # No messages received
+                    if message_count == 0:
+                        logger.info("No messages received yet, continuing to poll...")
+                    time.sleep(1)
+            except Exception as e:
+                logger.error(f"Error in polling loop: {e}")
+                time.sleep(5)
+    except Exception as e:
+        logger.error(f"Error in Kafka consumer thread: {e}")
+@app.route('/')
+def dashboard():
+    """Main dashboard page"""
+    return render_template('dashboard.html')
+@app.route('/api/data')
+def get_data():
+    """API endpoint to get current dashboard data"""
+    data = {
+        'sentiment_counts': dict(sentiment_counts),
+        'recent_tweets': list(recent_tweets),
+        'hourly_data': dict(hourly_sentiment),
+        'total_tweets': sum(sentiment_counts.values())
+    }
+    logger.info(f"API request - returning data: {data}")
+    return jsonify(data)
+@socketio.on('connect')
+def handle_connect():
+    """Handle client connection"""
+    logger.info("Client connected to dashboard")
+    emit('sentiment_update', {
+        'sentiment_counts': dict(sentiment_counts),
+        'recent_tweets': list(recent_tweets),
+        'hourly_data': dict(hourly_sentiment)
+    })
+if __name__ == '__main__':
+    # Start Kafka consumer in background thread
+    consumer_thread = threading.Thread(target=kafka_consumer_thread, daemon=True)
+    consumer_thread.start()
+    logger.info("Starting sentiment dashboard on port 5000")
+    logger.info("Dashboard will display data once Spark processes tweets from Kafka")
+    # Fix for Werkzeug warning - use allow_unsafe_werkzeug for development
+    socketio.run(app, host='0.0.0.0', port=5000, debug=False, allow_unsafe_werkzeug=True)

mock_tweet_producer.py ADDED Viewed

	@@ -0,0 +1,157 @@

+import json
+import time
+import random
+from kafka import KafkaProducer
+from kafka.errors import NoBrokersAvailable
+import logging
+from datetime import datetime
+# Set up logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Kafka settings
+KAFKA_TOPIC = "sentiment-topic"
+KAFKA_BOOTSTRAP_SERVERS = ['kafka:9092']
+# Sample tweets with different sentiments
+SAMPLE_TWEETS = [
+    # Positive tweets
+    "I absolutely love this new Python framework! It's amazing how easy it is to use 🐍✨",
+    "Just finished my first machine learning project and I'm so excited about the results! 🚀",
+    "Beautiful sunny day! Perfect for coding outside with a cup of coffee ☕️💻",
+    "Finally understood how Kafka works! This is such an awesome technology 🎉",
+    "Great job team! Our deployment went smoothly and everyone is happy 👏",
+    "Python makes data analysis so much fun! Love working with pandas and numpy 📊",
+    "Incredible performance boost after optimizing our database queries! 🔥",
+    "Happy Friday everyone! Time to celebrate another successful sprint 🎊",
+    "Just discovered this amazing open source library. The community is fantastic! 💖",
+    "Feeling grateful for all the learning opportunities in tech. Best career choice ever! 🙏",
+    # Negative tweets
+    "Ugh, spent 3 hours debugging this stupid error. So frustrated right now 😤",
+    "This API documentation is terrible. Nothing works as described 😡",
+    "Why is deployment always so painful? Something always breaks in production 💔",
+    "Hate it when the server crashes right before the demo. Murphy's law strikes again 😭",
+    "This legacy code is a nightmare. Who wrote this mess? 🤬",
+    "Another day, another merge conflict. Git is driving me crazy today 😵",
+    "The client changed requirements again. This project is becoming impossible 😞",
+    "Performance is awful after the latest update. Users are complaining non-stop 📉",
+    "Terrible meeting. Two hours of my life I'll never get back 😴",
+    "Bug fixes breaking more things. This codebase is cursed 👻",
+    # Neutral tweets
+    "Working on a new feature for our application. Should be ready next week.",
+    "Attending a tech conference tomorrow. Looking forward to the presentations.",
+    "Updated the dependencies in our project. Everything seems to be working fine.",
+    "Reading about microservices architecture. Interesting design patterns.",
+    "Team meeting scheduled for 2 PM. We'll discuss the quarterly roadmap.",
+    "Deploying version 2.3.1 to staging environment for testing.",
+    "Database migration completed successfully. All tables are updated.",
+    "Code review session with the team. Found a few minor issues to fix.",
+    "Working with the new intern on their first task. They're learning quickly.",
+    "Backup process completed. All data is safely stored in the cloud.",
+    # Python-specific tweets
+    "Python 3.12 has some interesting new features. Time to upgrade our projects.",
+    "Django vs Flask debate continues. Both have their strengths and use cases.",
+    "Love how clean and readable Python code can be. Truly a beautiful language.",
+    "Pandas is incredibly powerful for data manipulation. Such a time saver!",
+    "FastAPI is becoming my go-to choice for building REST APIs. So fast!",
+    "NumPy arrays are so much faster than regular Python lists for calculations.",
+    "Jupyter notebooks are perfect for data exploration and prototyping.",
+    "PEP 8 style guide helps keep Python code consistent across the team.",
+    "Virtual environments in Python save so much dependency headache.",
+    "Type hints in Python make the code much more maintainable and clear."
+]
+def create_kafka_producer(max_retries=10, retry_delay=5):
+    """Create Kafka producer with retry logic"""
+    for attempt in range(max_retries):
+        try:
+            producer = KafkaProducer(
+                bootstrap_servers=KAFKA_BOOTSTRAP_SERVERS,
+                value_serializer=lambda v: json.dumps(v).encode('utf-8'),
+                key_serializer=lambda k: k.encode('utf-8') if k else None
+            )
+            logger.info("Successfully connected to Kafka!")
+            return producer
+        except NoBrokersAvailable as e:
+            logger.warning(f"Kafka not ready, attempt {attempt + 1}/{max_retries}. Retrying in {retry_delay}s...")
+            time.sleep(retry_delay)
+        except Exception as e:
+            logger.error(f"Unexpected error connecting to Kafka: {e}")
+            time.sleep(retry_delay)
+    raise Exception(f"Could not connect to Kafka after {max_retries} attempts")
+def generate_mock_tweet():
+    """Generate a mock tweet with realistic data"""
+    tweet_text = random.choice(SAMPLE_TWEETS)
+    tweet_data = {
+        'id': random.randint(100000000000000000, 999999999999999999),  # Twitter-like ID
+        'text': tweet_text,
+        'created_at': datetime.now().isoformat(),
+        'author_id': random.randint(100000000, 999999999),  # Random author ID
+        'timestamp': time.time()
+    }
+    return tweet_data
+def main():
+    """Main function to start mock tweet streaming"""
+    logger.info("Starting Mock Tweet Kafka Producer...")
+    # Wait for services to be ready
+    logger.info("Waiting for Kafka to be ready...")
+    time.sleep(10)
+    try:
+        # Create Kafka producer
+        producer = create_kafka_producer()
+        logger.info("Starting mock tweet stream...")
+        logger.info("Generating tweets with various sentiments...")
+        tweet_count = 0
+        while True:
+            try:
+                # Generate a mock tweet
+                tweet_data = generate_mock_tweet()
+                # Send to Kafka
+                producer.send(
+                    KAFKA_TOPIC,
+                    value=tweet_data,
+                    key=str(tweet_data['id'])
+                )
+                tweet_count += 1
+                # Log tweet info
+                tweet_preview = tweet_data['text'][:50] + "..." if len(tweet_data['text']) > 50 else tweet_data['text']
+                logger.info(f"Tweet {tweet_count}: {tweet_preview}")
+                # Random delay between tweets (1-5 seconds)
+                delay = random.uniform(1, 5)
+                time.sleep(delay)
+            except KeyboardInterrupt:
+                logger.info("Stopping tweet generation...")
+                break
+            except Exception as e:
+                logger.error(f"Error generating tweet: {e}")
+                time.sleep(1)
+    except Exception as e:
+        logger.error(f"Error in main: {e}")
+        raise
+    finally:
+        if 'producer' in locals():
+            producer.close()
+            logger.info("Kafka producer closed")
+if __name__ == "__main__":
+    main()

producer.py ADDED Viewed

	@@ -0,0 +1,149 @@

+import tweepy
+import json
+import time
+from kafka import KafkaProducer
+from kafka.errors import NoBrokersAvailable
+import logging
+import os
+from dotenv import load_dotenv
+import urllib.parse
+# Set up logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+load_dotenv()
+encoded_token = os.getenv("TWITTER_BEARER_TOKEN")
+BEARER_TOKEN = urllib.parse.unquote(encoded_token)
+# Kafka settings
+KAFKA_TOPIC = "sentiment-topic"
+KAFKA_BOOTSTRAP_SERVERS = ['kafka:9092']
+def create_kafka_producer(max_retries=10, retry_delay=5):
+    """Create Kafka producer with retry logic"""
+    for attempt in range(max_retries):
+        try:
+            producer = KafkaProducer(
+                bootstrap_servers=KAFKA_BOOTSTRAP_SERVERS,
+                value_serializer=lambda v: json.dumps(v).encode('utf-8'),
+                key_serializer=lambda k: k.encode('utf-8') if k else None
+            )
+            logger.info("Successfully connected to Kafka!")
+            return producer
+        except NoBrokersAvailable as e:
+            logger.warning(f"Kafka not ready, attempt {attempt + 1}/{max_retries}. Retrying in {retry_delay}s...")
+            time.sleep(retry_delay)
+        except Exception as e:
+            logger.error(f"Unexpected error connecting to Kafka: {e}")
+            time.sleep(retry_delay)
+    raise Exception(f"Could not connect to Kafka after {max_retries} attempts")
+class KafkaTweetStreamer(tweepy.StreamingClient):
+    """Streaming client for X API v2"""
+    def __init__(self, bearer_token, kafka_producer, topic):
+        super().__init__(bearer_token, wait_on_rate_limit=True)
+        self.kafka_producer = kafka_producer
+        self.topic = topic
+        self.tweet_count = 0
+    def on_tweet(self, tweet):
+        """Handle incoming tweets"""
+        try:
+            # Extract tweet data
+            tweet_data = {
+                'id': tweet.id,
+                'text': tweet.text,
+                'created_at': tweet.created_at.isoformat() if tweet.created_at else time.strftime('%Y-%m-%d %H:%M:%S'),
+                'author_id': tweet.author_id if tweet.author_id else 0,
+                'timestamp': time.time()
+            }
+            # Send to Kafka
+            self.kafka_producer.send(
+                self.topic,
+                value=tweet_data,
+                key=str(tweet.id)
+            )
+            self.tweet_count += 1
+            # Log every tweet for debugging
+            tweet_preview = tweet_data['text'][:50] + "..." if len(tweet_data['text']) > 50 else tweet_data['text']
+            logger.info(f"Tweet {self.tweet_count}: {tweet_preview}")
+            return True
+        except Exception as e:
+            logger.error(f"Error processing tweet: {e}")
+            return True  # Continue streaming
+    def on_errors(self, errors):
+        """Handle streaming errors"""
+        logger.error(f"Streaming error: {errors}")
+    def on_connection_error(self):
+        """Handle connection errors"""
+        logger.error("Connection error occurred")
+def main():
+    """Main function to start tweet streaming"""
+    logger.info("Starting X (Twitter) Kafka Producer...")
+    # Wait for services to be ready
+    logger.info("Waiting for services to be ready...")
+    time.sleep(10)
+    try:
+        # Create Kafka producer
+        producer = create_kafka_producer()
+        # Create streaming client (simplified)
+        client = tweepy.Client(bearer_token=BEARER_TOKEN)
+        streamer = KafkaTweetStreamer(BEARER_TOKEN, producer, KAFKA_TOPIC)
+        # Clean up any existing rules first
+        try:
+            rules = streamer.get_rules()
+            if rules.data:
+                rule_ids = [rule.id for rule in rules.data]
+                streamer.delete_rules(rule_ids)
+                logger.info(f"Deleted {len(rule_ids)} existing rules")
+        except Exception as e:
+            logger.info("No existing rules to delete")
+        # Add simple, broad rules that should get tweets
+        new_rules = [
+            tweepy.StreamRule("python", tag="python"),
+            tweepy.StreamRule("happy OR excited", tag="positive"),
+            tweepy.StreamRule("sad OR angry", tag="negative"),
+        ]
+        streamer.add_rules(new_rules)
+        logger.info("Added streaming rules")
+        # Start streaming with basic fields
+        logger.info("Starting tweet stream...")
+        logger.info("Listening for tweets containing: python, happy, excited, sad, angry")
+        # Start the stream
+        streamer.filter(tweet_fields=['created_at', 'author_id'])
+    except tweepy.Forbidden as e:
+        logger.error(f"Forbidden error (403): {e}")
+        logger.error("This might be a project/app attachment issue")
+    except tweepy.Unauthorized as e:
+        logger.error(f"Unauthorized error (401): {e}")
+        logger.error("Check your Bearer Token")
+    except Exception as e:
+        logger.error(f"Error in main: {e}")
+        raise
+    finally:
+        if 'producer' in locals():
+            producer.close()
+            logger.info("Kafka producer closed")
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+tweepy>=4.14.0
+kafka-python>=2.0.2
+pyspark>=3.4.0
+requests>=2.28.0
+python-dotenv>=0.19.0
+flask>=2.3.0
+flask-socketio>=5.3.0
+python-dotenv>=0.19.0

templates/dashboard.html ADDED Viewed

	@@ -0,0 +1,431 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>Real-Time Sentiment Analysis Dashboard</title>
+    <script src="https://cdn.socket.io/4.7.2/socket.io.min.js"></script>
+    <script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/3.9.1/chart.min.js"></script>
+    <style>
+        * {
+            margin: 0;
+            padding: 0;
+            box-sizing: border-box;
+        }
+        body {
+            font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
+            background: linear-gradient(135deg, #1e3c72 0%, #2a5298 100%);
+            color: white;
+            min-height: 100vh;
+        }
+        .container {
+            max-width: 1400px;
+            margin: 0 auto;
+            padding: 20px;
+        }
+        .header {
+            text-align: center;
+            margin-bottom: 30px;
+        }
+        .header h1 {
+            font-size: 2.5rem;
+            margin-bottom: 10px;
+            text-shadow: 2px 2px 4px rgba(0,0,0,0.3);
+        }
+        .header p {
+            font-size: 1.1rem;
+            opacity: 0.9;
+        }
+        .stats-grid {
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
+            gap: 20px;
+            margin-bottom: 30px;
+        }
+        .stat-card {
+            background: rgba(255, 255, 255, 0.1);
+            backdrop-filter: blur(10px);
+            border: 1px solid rgba(255, 255, 255, 0.2);
+            border-radius: 15px;
+            padding: 25px;
+            text-align: center;
+            transition: transform 0.3s ease;
+        }
+        .stat-card:hover {
+            transform: translateY(-5px);
+        }
+        .stat-number {
+            font-size: 2.5rem;
+            font-weight: bold;
+            margin-bottom: 10px;
+        }
+        .stat-label {
+            font-size: 1rem;
+            opacity: 0.8;
+            text-transform: uppercase;
+            letter-spacing: 1px;
+        }
+        .positive { color: #4ade80; }
+        .negative { color: #f87171; }
+        .neutral { color: #60a5fa; }
+        .total { color: #fbbf24; }
+        .charts-section {
+            display: grid;
+            grid-template-columns: 1fr 1fr;
+            gap: 30px;
+            margin-bottom: 30px;
+        }
+        .chart-container {
+            background: rgba(255, 255, 255, 0.1);
+            backdrop-filter: blur(10px);
+            border: 1px solid rgba(255, 255, 255, 0.2);
+            border-radius: 15px;
+            padding: 25px;
+        }
+        .chart-title {
+            font-size: 1.3rem;
+            margin-bottom: 20px;
+            text-align: center;
+        }
+        .tweets-section {
+            background: rgba(255, 255, 255, 0.1);
+            backdrop-filter: blur(10px);
+            border: 1px solid rgba(255, 255, 255, 0.2);
+            border-radius: 15px;
+            padding: 25px;
+        }
+        .section-title {
+            font-size: 1.5rem;
+            margin-bottom: 20px;
+            text-align: center;
+        }
+        .tweets-container {
+            max-height: 400px;
+            overflow-y: auto;
+        }
+        .tweet-item {
+            background: rgba(255, 255, 255, 0.05);
+            border-radius: 10px;
+            padding: 15px;
+            margin-bottom: 10px;
+            border-left: 4px solid;
+            transition: all 0.3s ease;
+        }
+        .tweet-item:hover {
+            background: rgba(255, 255, 255, 0.1);
+        }
+        .tweet-item.positive { border-left-color: #4ade80; }
+        .tweet-item.negative { border-left-color: #f87171; }
+        .tweet-item.neutral { border-left-color: #60a5fa; }
+        .tweet-header {
+            display: flex;
+            justify-content: space-between;
+            align-items: center;
+            margin-bottom: 8px;
+        }
+        .tweet-sentiment {
+            font-size: 0.8rem;
+            padding: 4px 8px;
+            border-radius: 12px;
+            font-weight: bold;
+            text-transform: uppercase;
+        }
+        .tweet-sentiment.positive { background: #4ade80; color: #000; }
+        .tweet-sentiment.negative { background: #f87171; color: #000; }
+        .tweet-sentiment.neutral { background: #60a5fa; color: #000; }
+        .tweet-time {
+            font-size: 0.8rem;
+            opacity: 0.7;
+        }
+        .tweet-text {
+            font-size: 0.9rem;
+            line-height: 1.4;
+        }
+        .status-indicator {
+            position: fixed;
+            top: 20px;
+            right: 20px;
+            padding: 10px 15px;
+            border-radius: 20px;
+            font-size: 0.8rem;
+            font-weight: bold;
+        }
+        .status-connected {
+            background: #4ade80;
+            color: #000;
+        }
+        .status-disconnected {
+            background: #f87171;
+            color: #000;
+        }
+        @media (max-width: 768px) {
+            .charts-section {
+                grid-template-columns: 1fr;
+            }
+            .stats-grid {
+                grid-template-columns: repeat(2, 1fr);
+            }
+            .header h1 {
+                font-size: 2rem;
+            }
+        }
+        /* Custom scrollbar */
+        .tweets-container::-webkit-scrollbar {
+            width: 8px;
+        }
+        .tweets-container::-webkit-scrollbar-track {
+            background: rgba(255, 255, 255, 0.1);
+            border-radius: 10px;
+        }
+        .tweets-container::-webkit-scrollbar-thumb {
+            background: rgba(255, 255, 255, 0.3);
+            border-radius: 10px;
+        }
+        .tweets-container::-webkit-scrollbar-thumb:hover {
+            background: rgba(255, 255, 255, 0.5);
+        }
+    </style>
+</head>
+<body>
+    <div class="status-indicator" id="status">Connecting...</div>
+    <div class="container">
+        <div class="header">
+            <h1>🚀 Real-Time Sentiment Analysis</h1>
+            <p>Live Tweet Processing with Apache Kafka & Apache Spark</p>
+        </div>
+        <div class="stats-grid">
+            <div class="stat-card">
+                <div class="stat-number positive" id="positive-count">0</div>
+                <div class="stat-label">Positive Tweets</div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-number negative" id="negative-count">0</div>
+                <div class="stat-label">Negative Tweets</div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-number neutral" id="neutral-count">0</div>
+                <div class="stat-label">Neutral Tweets</div>
+            </div>
+            <div class="stat-card">
+                <div class="stat-number total" id="total-count">0</div>
+                <div class="stat-label">Total Processed</div>
+            </div>
+        </div>
+        <div class="charts-section">
+            <div class="chart-container">
+                <h3 class="chart-title">Sentiment Distribution</h3>
+                <canvas id="sentiment-pie-chart"></canvas>
+            </div>
+            <div class="chart-container">
+                <h3 class="chart-title">Hourly Sentiment Trend</h3>
+                <canvas id="hourly-chart"></canvas>
+            </div>
+        </div>
+        <div class="tweets-section">
+            <h3 class="section-title">📱 Recent Tweets</h3>
+            <div class="tweets-container" id="tweets-container">
+                <div style="text-align: center; opacity: 0.7; padding: 20px;">
+                    Waiting for tweets...
+                </div>
+            </div>
+        </div>
+    </div>
+    <script>
+        // Initialize Socket.IO
+        const socket = io();
+        // Status indicator
+        const statusElement = document.getElementById('status');
+        // Charts
+        let pieChart, hourlyChart;
+        // Initialize charts
+        function initCharts() {
+            // Pie Chart
+            const pieCtx = document.getElementById('sentiment-pie-chart').getContext('2d');
+            pieChart = new Chart(pieCtx, {
+                type: 'doughnut',
+                data: {
+                    labels: ['Positive', 'Negative', 'Neutral'],
+                    datasets: [{
+                        data: [0, 0, 0],
+                        backgroundColor: ['#4ade80', '#f87171', '#60a5fa'],
+                        borderWidth: 0
+                    }]
+                },
+                options: {
+                    responsive: true,
+                    plugins: {
+                        legend: {
+                            position: 'bottom',
+                            labels: { color: '#fff' }
+                        }
+                    }
+                }
+            });
+            // Hourly Chart
+            const hourlyCtx = document.getElementById('hourly-chart').getContext('2d');
+            hourlyChart = new Chart(hourlyCtx, {
+                type: 'line',
+                data: {
+                    labels: [],
+                    datasets: [
+                        {
+                            label: 'Positive',
+                            data: [],
+                            borderColor: '#4ade80',
+                            backgroundColor: 'rgba(74, 222, 128, 0.1)',
+                            tension: 0.4
+                        },
+                        {
+                            label: 'Negative',
+                            data: [],
+                            borderColor: '#f87171',
+                            backgroundColor: 'rgba(248, 113, 113, 0.1)',
+                            tension: 0.4
+                        },
+                        {
+                            label: 'Neutral',
+                            data: [],
+                            borderColor: '#60a5fa',
+                            backgroundColor: 'rgba(96, 165, 250, 0.1)',
+                            tension: 0.4
+                        }
+                    ]
+                },
+                options: {
+                    responsive: true,
+                    plugins: {
+                        legend: {
+                            labels: { color: '#fff' }
+                        }
+                    },
+                    scales: {
+                        y: {
+                            ticks: { color: '#fff' },
+                            grid: { color: 'rgba(255, 255, 255, 0.1)' }
+                        },
+                        x: {
+                            ticks: { color: '#fff' },
+                            grid: { color: 'rgba(255, 255, 255, 0.1)' }
+                        }
+                    }
+                }
+            });
+        }
+        // Update dashboard with new data
+        function updateDashboard(data) {
+            // Update counters
+            document.getElementById('positive-count').textContent = data.sentiment_counts.positive || 0;
+            document.getElementById('negative-count').textContent = data.sentiment_counts.negative || 0;
+            document.getElementById('neutral-count').textContent = data.sentiment_counts.neutral || 0;
+            const total = (data.sentiment_counts.positive || 0) +
+                         (data.sentiment_counts.negative || 0) +
+                         (data.sentiment_counts.neutral || 0);
+            document.getElementById('total-count').textContent = total;
+            // Update pie chart
+            pieChart.data.datasets[0].data = [
+                data.sentiment_counts.positive || 0,
+                data.sentiment_counts.negative || 0,
+                data.sentiment_counts.neutral || 0
+            ];
+            pieChart.update();
+            // Update hourly chart
+            if (data.hourly_data) {
+                const hours = Object.keys(data.hourly_data).sort();
+                hourlyChart.data.labels = hours;
+                hourlyChart.data.datasets[0].data = hours.map(h => data.hourly_data[h].positive || 0);
+                hourlyChart.data.datasets[1].data = hours.map(h => data.hourly_data[h].negative || 0);
+                hourlyChart.data.datasets[2].data = hours.map(h => data.hourly_data[h].neutral || 0);
+                hourlyChart.update();
+            }
+            // Update recent tweets
+            if (data.recent_tweets && data.recent_tweets.length > 0) {
+                const container = document.getElementById('tweets-container');
+                container.innerHTML = data.recent_tweets.map(tweet => `
+                    <div class="tweet-item ${tweet.sentiment}">
+                        <div class="tweet-header">
+                            <span class="tweet-sentiment ${tweet.sentiment}">${tweet.sentiment}</span>
+                            <span class="tweet-time">${tweet.timestamp}</span>
+                        </div>
+                        <div class="tweet-text">${tweet.text}</div>
+                    </div>
+                `).join('');
+            }
+        }
+        // Socket event handlers
+        socket.on('connect', function() {
+            statusElement.textContent = '🟢 Connected';
+            statusElement.className = 'status-indicator status-connected';
+        });
+        socket.on('disconnect', function() {
+            statusElement.textContent = '🔴 Disconnected';
+            statusElement.className = 'status-indicator status-disconnected';
+        });
+        socket.on('sentiment_update', function(data) {
+            updateDashboard(data);
+        });
+        // Initialize everything when page loads
+        document.addEventListener('DOMContentLoaded', function() {
+            initCharts();
+            // Fetch initial data
+            fetch('/api/data')
+                .then(response => response.json())
+                .then(data => updateDashboard(data))
+                .catch(error => console.error('Error fetching initial data:', error));
+        });
+    </script>
+</body>
+</html>