Spaces:

aasthav18
/

PulseAI

Sleeping

App Files Files Community

PulseAI / README.md

aasthav18

Fix Hugging Face README config

0eb1d54 4 days ago

preview code

raw

history blame contribute delete

16.3 kB

	---
	title: PulseAI
	emoji: 🦀
	colorFrom: green
	colorTo: indigo
	sdk: docker
	pinned: false
	---


	# 🚀 Social Intelligence Platform

	AI-powered brand monitoring, sentiment analysis, and competitive intelligence

	A production-grade NLP platform that helps product teams discover customer insights, detect brand crises, and track competitive signals — all in real-time.

	---

	## 🎯 Problem Solved

	Before: Product teams were drowning in thousands of reviews and social posts, manually trying to identify recurring themes, sentiment trends, and competitive threats. By the time they spotted a brand crisis, it had already gone viral.

	After: Automated NLP pipeline processes all customer conversations in real-time, surfacing actionable insights:
	- Sentiment Analysis — BERT-powered classification with aspect-level granularity
	- Topic Discovery — NMF clustering finds recurring themes automatically
	- Crisis Detection — Multi-signal scoring catches PR disasters before they escalate
	- Trend Forecasting — Statistical forecasting predicts sentiment trajectory
	- Competitor Intelligence — Tracks competitor mentions and switch signals

	---

	## ✨ Key Features

	### 🧠 NLP Pipeline
	- BERT Sentiment Analysis (`cardiffnlp/twitter-roberta-base-sentiment-latest`)
	- Document-level sentiment (positive/negative/neutral)
	- Aspect-based sentiment extraction (Performance, Pricing, Support, UI, etc.)
	- Confidence scoring with fallback to VADER/keyword analysis

	- Topic Modeling (NMF + TF-IDF)
	- Automated topic discovery from short-text corpus
	- Named clusters with keyword extraction
	- Sentiment distribution per topic

	- Trend Analysis & Forecasting
	- Rolling statistical analysis with anomaly detection
	- Exponential smoothing for 14-day sentiment forecast
	- Volume trend analysis and spike detection

	- Crisis Detection Engine
	- Multi-signal crisis scoring (legal, data breach, outrage, viral threats)
	- Severity classification (low/medium/high/critical)
	- Engagement amplification (viral posts get higher weight)

	- Competitor Intelligence
	- Competitor mention extraction and sentiment comparison
	- Switch signal detection (users leaving competitors)
	- Opportunity gap identification

	### 🎨 Dashboard Features
	- Real-time KPIs — Sentiment score, NPS estimate, volume trends, crisis alerts
	- Interactive Visualizations — Time series, donut charts, topic bubbles, competitor comparison
	- Topic Explorer — Click-to-explore topic clusters with keyword clouds
	- Crisis Radar — Prioritized list of high-severity posts requiring action
	- Live Analyzer — Real-time sentiment + aspect + crisis analysis for any text
	- Post Feed — Filterable feed with sentiment labels and source badges

	---

	## 🏗️ Architecture

	```
	┌─────────────────────────────────────────────────────────────┐
	│ Frontend (Vanilla JS) │
	│ • Dark SaaS UI with Syne/Instrument Sans typography │
	│ • Chart.js for time series, D3.js for topic bubbles │
	│ • Real-time API polling, demo fallback when offline │
	└──────────────────────┬──────────────────────────────────────┘
	│ REST API
	┌──────────────────────▼──────────────────────────────────────┐
	│ FastAPI Backend (Python) │
	│ • /api/dashboard — Full analytics payload │
	│ • /api/analyze — Single text sentiment + crisis scoring │
	│ • /api/topics — Topic clusters with examples │
	│ • /api/trends — Time series + forecast │
	│ • /api/competitors — Competitive intelligence │
	└──────────────────────┬──────────────────────────────────────┘
	│
	┌──────────────┴──────────────┐
	▼ ▼
	┌───────────────────┐ ┌───────────────────────┐
	│ NLP Pipeline │ │ Sample Data Gen │
	│ • sentiment.py │ │ • 500 synthetic │
	│ • topic_model.py │ │ reviews/tweets │
	│ • trends.py │ │ • Realistic crisis │
	│ • crisis.py │ │ scenarios │
	│ • competitor.py │ │ • Time series data │
	└───────────────────┘ └───────────────────────┘
	```

	---

	## 📦 Tech Stack

	Backend:
	- FastAPI — Modern async Python web framework
	- Transformers (Hugging Face) — BERT sentiment model
	- scikit-learn — NMF topic modeling, TF-IDF vectorization
	- NumPy/Pandas — Statistical analysis and data manipulation
	- NLTK — Fallback sentiment analysis (VADER)

	Frontend:
	- Vanilla JavaScript (no framework dependencies)
	- Chart.js — Time series and bar/donut charts
	- D3.js — Topic bubble visualization
	- Custom CSS — Dark enterprise SaaS design system
	- Fonts: Syne (display), Instrument Sans (body), DM Mono (code)

	Models:
	- Primary: `cardiffnlp/twitter-roberta-base-sentiment-latest` (RoBERTa fine-tuned on 124M tweets)
	- Fallback: VADER lexicon-based sentiment (works offline)

	---

	## 🚀 Quick Start

	### Prerequisites
	- Python 3.8+
	- pip (Python package manager)
	- Modern web browser (Chrome, Firefox, Safari, Edge)

	### Installation

	1. Extract the project
	```bash
	unzip social-intelligence-platform.zip
	cd social-intelligence-platform
	```

	2. Install Python dependencies
	```bash
	cd backend
	pip install -r requirements.txt
	```

	3. Download NLTK data (for fallback sentiment)
	```bash
	python -c "import nltk; nltk.download('vader_lexicon')"
	```

	### Running the Application

	#### Option 1: Run Backend + Frontend (Recommended)

	Terminal 1 — Start Backend:
	```bash
	cd backend
	python main.py
	```

	The backend will:
	- Start on `http://localhost:8000`
	- Generate 500 sample posts on startup
	- Run BERT sentiment analysis (or fallback to VADER if model unavailable)
	- Fit topic model (NMF)
	- Build trend forecasts
	- Scan for crisis signals
	- Assemble competitor intelligence

	This takes 15-30 seconds on first run (model download + bootstrap).

	Terminal 2 — Serve Frontend:
	```bash
	cd frontend
	python -m http.server 3000
	```

	Open browser to: http://localhost:3000

	#### Option 2: Frontend Only (Demo Mode)

	If the backend is unavailable, the frontend falls back to demo data automatically.

	```bash
	cd frontend
	python -m http.server 3000
	```

	Open browser to: http://localhost:3000

	You'll see "Backend offline — showing demo data" during load. The dashboard will render with pre-generated synthetic data.

	---

	## 📊 Usage Guide

	### Dashboard Views

	1. Dashboard (Home)
	- Overview KPIs: Sentiment score, volume, NPS estimate, crisis alert level
	- 90-day sentiment trend with forecast
	- Sentiment mix (donut chart)
	- Volume by source (Twitter, Reddit, G2, etc.)
	- Top crisis posts requiring immediate action
	- Recent post feed with filters

	2. Trends
	- 7-day vs 30-day sentiment comparison
	- Trend direction (improving/declining/stable)
	- 14-day forecast with confidence bands
	- Anomaly detection (spikes and dips)
	- Daily volume trend

	3. Topic Clusters
	- 8 auto-discovered topics with keyword weights
	- Interactive bubble chart (size = post volume)
	- Click to explore: top keywords, sample posts, sentiment distribution

	4. Crisis Radar
	- Overall alert level (🟢 Low → 🔴 Critical)
	- Active high-severity posts
	- Signal frequency breakdown (legal, data breach, outrage, etc.)
	- Recommended actions

	5. Competitors
	- Sentiment comparison across brands
	- Share of voice (% of corpus mentions)
	- Opportunity intelligence (AI-identified competitive gaps)
	- Switch signal detection

	6. Live Analyzer
	- Paste any text for real-time analysis
	- Returns: sentiment label, confidence, crisis score, aspect breakdown
	- Quick example templates

	7. Post Feed
	- Full scrollable feed with sentiment labels
	- Filter by positive/negative/neutral/crisis
	- Topic tags and source badges

	### API Endpoints

	```bash
	# Health check
	GET http://localhost:8000/api/health

	# Full dashboard data
	GET http://localhost:8000/api/dashboard

	# Summary metrics only
	GET http://localhost:8000/api/summary

	# Topic clusters
	GET http://localhost:8000/api/topics

	# Trend analysis + forecast
	GET http://localhost:8000/api/trends

	# Crisis scan results
	GET http://localhost:8000/api/crisis

	# Competitor intelligence
	GET http://localhost:8000/api/competitors

	# Post feed (with filters)
	GET http://localhost:8000/api/posts?limit=50&sentiment=negative&source=Twitter

	# Analyze single text
	POST http://localhost:8000/api/analyze
	Body: {"text": "Your review text here", "include_aspects": true, "include_crisis": true}

	# Batch analysis
	POST http://localhost:8000/api/batch-analyze
	Body: {"texts": ["Review 1", "Review 2", "Review 3"]}
	```

	---

	## 🧪 Sample Data

	The platform generates 500 realistic posts on startup:
	- 60% Positive — Praise for features, support, UI
	- 25% Negative — Complaints about performance, pricing, bugs
	- 10% Neutral — Migration stories, feature requests
	- 5% Crisis — Data breaches, outages, legal threats, scams

	Sources: Twitter, Reddit, G2, Trustpilot, ProductHunt, AppStore, LinkedIn

	Time Range: Last 90 days with recency bias (more recent posts)

	Topics Covered:
	- Performance & Speed
	- Customer Support
	- Pricing & Billing
	- UI & Design
	- Features & Integrations
	- Data Quality & Accuracy
	- Onboarding & Documentation
	- Security & Compliance

	Competitor Mentions: RivalOne, CompeteX, AltStream appear in ~15% of posts

	Crisis Cluster: Injected 7 days ago to simulate a real brand crisis event

	---

	## 🎨 Design System

	The UI uses a dark enterprise SaaS aesthetic inspired by Linear, Vercel, and Notion:

	Colors:
	- `--bg-void: #080b12` — Deep background
	- `--bg-surface: #111827` — Card backgrounds
	- `--blue-500: #5b9cf6` — Primary accent
	- `--green-500: #10b981` — Positive sentiment
	- `--red-500: #ef4444` — Negative sentiment / crisis
	- `--amber-500: #f59e0b` — Warnings / neutral

	Typography:
	- Display (Headings): Syne — Bold, modern, slightly geometric
	- Body (UI Text): Instrument Sans — Clean, readable, professional
	- Monospace (Data): DM Mono — Metrics, badges, code

	Layout:
	- Sidebar navigation (240px fixed)
	- Header with search and status indicators
	- Card-based grid system
	- Consistent 16px/20px/24px spacing rhythm

	Animations:
	- Staggered fade-in on page load
	- Smooth chart transitions (800ms easing)
	- Hover states with subtle elevation
	- Loading states with branded skeleton screens

	---

	## 🔧 Configuration

	### Backend Settings

	Model Selection (in `backend/nlp/sentiment.py`):
	```python
	MODEL_ID = "cardiffnlp/twitter-roberta-base-sentiment-latest" # Primary model
	FALLBACK_MODE = False # Set True to skip transformer download
	```

	Topic Count (in `backend/main.py`):
	```python
	modeler = get_modeler(n_topics=8) # Adjust number of topics
	```

	Sample Data Size (in `backend/main.py`):
	```python
	_corpus = generate_posts(n=500) # Generate 500 posts (adjust as needed)
	```

	### Crisis Detection Thresholds

	Edit `backend/nlp/crisis_detector.py`:
	```python
	ALERT_LEVELS = {
	(0, 4): ("low", "🟢", "No action required."),
	(4, 8): ("medium", "🟡", "Monitor closely."),
	(8, 15): ("high", "🟠", "Escalate to communications team."),
	(15, 99): ("critical", "🔴", "Activate crisis response immediately."),
	}
	```

	---

	## 📈 Performance Notes

	First Run:
	- Model download: ~440MB (RoBERTa weights)
	- Bootstrap time: 15-30 seconds (sentiment + topic modeling + trends)

	Subsequent Runs:
	- Model loads from cache: ~3-5 seconds
	- Bootstrap time: 5-10 seconds

	Runtime Performance:
	- Sentiment analysis: ~50ms per post (transformer mode)
	- Topic modeling fit: ~2 seconds (500 posts, 8 topics)
	- Trend forecast: <1 second (90-day series)
	- Dashboard payload: ~1 second (full analysis)

	Offline Mode:
	- If transformers unavailable: Falls back to VADER (100x faster)
	- If backend offline: Frontend uses demo data (instant load)

	---

	## 🚀 Production Deployment

	This is a demo/portfolio project. For production use:

	1. Replace sample data with real data sources:
	- Twitter API / Reddit API / Review aggregators
	- Implement proper data ingestion pipeline
	- Add database (PostgreSQL / MongoDB) for persistence

	2. Fine-tune the BERT model on your domain:
	- Collect labeled training data from your industry
	- Fine-tune on HuggingFace Trainer
	- Deploy custom model endpoint

	3. Add authentication:
	- OAuth 2.0 / JWT tokens
	- User accounts and multi-tenancy
	- API rate limiting

	4. Scale the backend:
	- Containerize with Docker
	- Deploy to AWS/GCP/Azure
	- Add Redis cache for analytics
	- Use Celery for async NLP jobs

	5. Enhance frontend:
	- Add React/Vue for state management
	- Implement WebSocket for real-time updates
	- Add export to PDF/CSV functionality

	---

	## 📝 Project Structure

	```
	social-intelligence-platform/
	├── backend/
	│ ├── main.py # FastAPI application
	│ ├── requirements.txt # Python dependencies
	│ ├── data/
	│ │ ├── __init__.py
	│ │ └── sample_data.py # Synthetic data generator
	│ └── nlp/
	│ ├── __init__.py
	│ ├── sentiment.py # BERT sentiment pipeline
	│ ├── topic_model.py # NMF topic modeling
	│ ├── trend_analysis.py # Time series forecasting
	│ ├── crisis_detector.py # Crisis scoring engine
	│ └── competitor_intel.py # Competitor mention analysis
	├── frontend/
	│ └── index.html # Dashboard UI (self-contained)
	├── docs/
	│ └── CASE_STUDY.md # Detailed project writeup
	└── README.md # This file
	```

	---

	## 🎓 Skills Demonstrated

	### NLP & Machine Learning
	- ✅ BERT/Transformer fine-tuning and inference
	- ✅ Topic modeling (NMF, LDA alternatives)
	- ✅ Time series forecasting (exponential smoothing)
	- ✅ Aspect-based sentiment analysis
	- ✅ Anomaly detection (statistical outliers)
	- ✅ Multi-signal classification (crisis scoring)

	### Backend Engineering
	- ✅ FastAPI REST API design
	- ✅ Async Python patterns
	- ✅ Model serving and caching
	- ✅ Batch processing pipelines
	- ✅ Error handling and fallbacks

	### Frontend Development
	- ✅ Modern vanilla JS (no framework bloat)
	- ✅ Chart.js and D3.js visualizations
	- ✅ Responsive CSS Grid layouts
	- ✅ Design system implementation
	- ✅ Performance optimization (lazy loading, debouncing)

	### Product Thinking
	- ✅ Problem-first approach (not technology-first)
	- ✅ User-centered design (product teams, not ML researchers)
	- ✅ Actionable insights over raw metrics
	- ✅ Crisis prioritization and triage

	---

	## 📧 Questions?

	This project demonstrates production-ready NLP engineering, API design, and data visualization skills. Built to solve real product team pain points with modern ML techniques.

	Author: [Your Name]
	Portfolio: [Your Portfolio URL]
	GitHub: [Your GitHub]
	LinkedIn: [Your LinkedIn]

	---

	## 📄 License

	MIT License — Free to use for educational and portfolio purposes.

	---

	Built with: 🐍 Python • ⚡ FastAPI • 🤗 Transformers • 📊 Chart.js • 🎨 Custom CSS