PulseAI / README.md
aasthav18's picture
Fix Hugging Face README config
0eb1d54
---
title: PulseAI
emoji: πŸ¦€
colorFrom: green
colorTo: indigo
sdk: docker
pinned: false
---
# πŸš€ Social Intelligence Platform
**AI-powered brand monitoring, sentiment analysis, and competitive intelligence**
A production-grade NLP platform that helps product teams discover customer insights, detect brand crises, and track competitive signals β€” all in real-time.
---
## 🎯 Problem Solved
**Before:** Product teams were drowning in thousands of reviews and social posts, manually trying to identify recurring themes, sentiment trends, and competitive threats. By the time they spotted a brand crisis, it had already gone viral.
**After:** Automated NLP pipeline processes all customer conversations in real-time, surfacing actionable insights:
- **Sentiment Analysis** β€” BERT-powered classification with aspect-level granularity
- **Topic Discovery** β€” NMF clustering finds recurring themes automatically
- **Crisis Detection** β€” Multi-signal scoring catches PR disasters before they escalate
- **Trend Forecasting** β€” Statistical forecasting predicts sentiment trajectory
- **Competitor Intelligence** β€” Tracks competitor mentions and switch signals
---
## ✨ Key Features
### 🧠 NLP Pipeline
- **BERT Sentiment Analysis** (`cardiffnlp/twitter-roberta-base-sentiment-latest`)
- Document-level sentiment (positive/negative/neutral)
- Aspect-based sentiment extraction (Performance, Pricing, Support, UI, etc.)
- Confidence scoring with fallback to VADER/keyword analysis
- **Topic Modeling** (NMF + TF-IDF)
- Automated topic discovery from short-text corpus
- Named clusters with keyword extraction
- Sentiment distribution per topic
- **Trend Analysis & Forecasting**
- Rolling statistical analysis with anomaly detection
- Exponential smoothing for 14-day sentiment forecast
- Volume trend analysis and spike detection
- **Crisis Detection Engine**
- Multi-signal crisis scoring (legal, data breach, outrage, viral threats)
- Severity classification (low/medium/high/critical)
- Engagement amplification (viral posts get higher weight)
- **Competitor Intelligence**
- Competitor mention extraction and sentiment comparison
- Switch signal detection (users leaving competitors)
- Opportunity gap identification
### 🎨 Dashboard Features
- **Real-time KPIs** β€” Sentiment score, NPS estimate, volume trends, crisis alerts
- **Interactive Visualizations** β€” Time series, donut charts, topic bubbles, competitor comparison
- **Topic Explorer** β€” Click-to-explore topic clusters with keyword clouds
- **Crisis Radar** β€” Prioritized list of high-severity posts requiring action
- **Live Analyzer** β€” Real-time sentiment + aspect + crisis analysis for any text
- **Post Feed** β€” Filterable feed with sentiment labels and source badges
---
## πŸ—οΈ Architecture
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Frontend (Vanilla JS) β”‚
β”‚ β€’ Dark SaaS UI with Syne/Instrument Sans typography β”‚
β”‚ β€’ Chart.js for time series, D3.js for topic bubbles β”‚
β”‚ β€’ Real-time API polling, demo fallback when offline β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ REST API
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ FastAPI Backend (Python) β”‚
β”‚ β€’ /api/dashboard β€” Full analytics payload β”‚
β”‚ β€’ /api/analyze β€” Single text sentiment + crisis scoring β”‚
β”‚ β€’ /api/topics β€” Topic clusters with examples β”‚
β”‚ β€’ /api/trends β€” Time series + forecast β”‚
β”‚ β€’ /api/competitors β€” Competitive intelligence β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ NLP Pipeline β”‚ β”‚ Sample Data Gen β”‚
β”‚ β€’ sentiment.py β”‚ β”‚ β€’ 500 synthetic β”‚
β”‚ β€’ topic_model.py β”‚ β”‚ reviews/tweets β”‚
β”‚ β€’ trends.py β”‚ β”‚ β€’ Realistic crisis β”‚
β”‚ β€’ crisis.py β”‚ β”‚ scenarios β”‚
β”‚ β€’ competitor.py β”‚ β”‚ β€’ Time series data β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## πŸ“¦ Tech Stack
**Backend:**
- FastAPI β€” Modern async Python web framework
- Transformers (Hugging Face) β€” BERT sentiment model
- scikit-learn β€” NMF topic modeling, TF-IDF vectorization
- NumPy/Pandas β€” Statistical analysis and data manipulation
- NLTK β€” Fallback sentiment analysis (VADER)
**Frontend:**
- Vanilla JavaScript (no framework dependencies)
- Chart.js β€” Time series and bar/donut charts
- D3.js β€” Topic bubble visualization
- Custom CSS β€” Dark enterprise SaaS design system
- Fonts: Syne (display), Instrument Sans (body), DM Mono (code)
**Models:**
- Primary: `cardiffnlp/twitter-roberta-base-sentiment-latest` (RoBERTa fine-tuned on 124M tweets)
- Fallback: VADER lexicon-based sentiment (works offline)
---
## πŸš€ Quick Start
### Prerequisites
- Python 3.8+
- pip (Python package manager)
- Modern web browser (Chrome, Firefox, Safari, Edge)
### Installation
1. **Extract the project**
```bash
unzip social-intelligence-platform.zip
cd social-intelligence-platform
```
2. **Install Python dependencies**
```bash
cd backend
pip install -r requirements.txt
```
3. **Download NLTK data (for fallback sentiment)**
```bash
python -c "import nltk; nltk.download('vader_lexicon')"
```
### Running the Application
#### Option 1: Run Backend + Frontend (Recommended)
**Terminal 1 β€” Start Backend:**
```bash
cd backend
python main.py
```
The backend will:
- Start on `http://localhost:8000`
- Generate 500 sample posts on startup
- Run BERT sentiment analysis (or fallback to VADER if model unavailable)
- Fit topic model (NMF)
- Build trend forecasts
- Scan for crisis signals
- Assemble competitor intelligence
This takes **15-30 seconds** on first run (model download + bootstrap).
**Terminal 2 β€” Serve Frontend:**
```bash
cd frontend
python -m http.server 3000
```
Open browser to: **http://localhost:3000**
#### Option 2: Frontend Only (Demo Mode)
If the backend is unavailable, the frontend falls back to **demo data** automatically.
```bash
cd frontend
python -m http.server 3000
```
Open browser to: **http://localhost:3000**
You'll see "Backend offline β€” showing demo data" during load. The dashboard will render with pre-generated synthetic data.
---
## πŸ“Š Usage Guide
### Dashboard Views
**1. Dashboard (Home)**
- Overview KPIs: Sentiment score, volume, NPS estimate, crisis alert level
- 90-day sentiment trend with forecast
- Sentiment mix (donut chart)
- Volume by source (Twitter, Reddit, G2, etc.)
- Top crisis posts requiring immediate action
- Recent post feed with filters
**2. Trends**
- 7-day vs 30-day sentiment comparison
- Trend direction (improving/declining/stable)
- 14-day forecast with confidence bands
- Anomaly detection (spikes and dips)
- Daily volume trend
**3. Topic Clusters**
- 8 auto-discovered topics with keyword weights
- Interactive bubble chart (size = post volume)
- Click to explore: top keywords, sample posts, sentiment distribution
**4. Crisis Radar**
- Overall alert level (🟒 Low β†’ πŸ”΄ Critical)
- Active high-severity posts
- Signal frequency breakdown (legal, data breach, outrage, etc.)
- Recommended actions
**5. Competitors**
- Sentiment comparison across brands
- Share of voice (% of corpus mentions)
- Opportunity intelligence (AI-identified competitive gaps)
- Switch signal detection
**6. Live Analyzer**
- Paste any text for real-time analysis
- Returns: sentiment label, confidence, crisis score, aspect breakdown
- Quick example templates
**7. Post Feed**
- Full scrollable feed with sentiment labels
- Filter by positive/negative/neutral/crisis
- Topic tags and source badges
### API Endpoints
```bash
# Health check
GET http://localhost:8000/api/health
# Full dashboard data
GET http://localhost:8000/api/dashboard
# Summary metrics only
GET http://localhost:8000/api/summary
# Topic clusters
GET http://localhost:8000/api/topics
# Trend analysis + forecast
GET http://localhost:8000/api/trends
# Crisis scan results
GET http://localhost:8000/api/crisis
# Competitor intelligence
GET http://localhost:8000/api/competitors
# Post feed (with filters)
GET http://localhost:8000/api/posts?limit=50&sentiment=negative&source=Twitter
# Analyze single text
POST http://localhost:8000/api/analyze
Body: {"text": "Your review text here", "include_aspects": true, "include_crisis": true}
# Batch analysis
POST http://localhost:8000/api/batch-analyze
Body: {"texts": ["Review 1", "Review 2", "Review 3"]}
```
---
## πŸ§ͺ Sample Data
The platform generates **500 realistic posts** on startup:
- **60% Positive** β€” Praise for features, support, UI
- **25% Negative** β€” Complaints about performance, pricing, bugs
- **10% Neutral** β€” Migration stories, feature requests
- **5% Crisis** β€” Data breaches, outages, legal threats, scams
**Sources:** Twitter, Reddit, G2, Trustpilot, ProductHunt, AppStore, LinkedIn
**Time Range:** Last 90 days with recency bias (more recent posts)
**Topics Covered:**
- Performance & Speed
- Customer Support
- Pricing & Billing
- UI & Design
- Features & Integrations
- Data Quality & Accuracy
- Onboarding & Documentation
- Security & Compliance
**Competitor Mentions:** RivalOne, CompeteX, AltStream appear in ~15% of posts
**Crisis Cluster:** Injected 7 days ago to simulate a real brand crisis event
---
## 🎨 Design System
The UI uses a **dark enterprise SaaS aesthetic** inspired by Linear, Vercel, and Notion:
**Colors:**
- `--bg-void: #080b12` β€” Deep background
- `--bg-surface: #111827` β€” Card backgrounds
- `--blue-500: #5b9cf6` β€” Primary accent
- `--green-500: #10b981` β€” Positive sentiment
- `--red-500: #ef4444` β€” Negative sentiment / crisis
- `--amber-500: #f59e0b` β€” Warnings / neutral
**Typography:**
- **Display (Headings):** Syne β€” Bold, modern, slightly geometric
- **Body (UI Text):** Instrument Sans β€” Clean, readable, professional
- **Monospace (Data):** DM Mono β€” Metrics, badges, code
**Layout:**
- Sidebar navigation (240px fixed)
- Header with search and status indicators
- Card-based grid system
- Consistent 16px/20px/24px spacing rhythm
**Animations:**
- Staggered fade-in on page load
- Smooth chart transitions (800ms easing)
- Hover states with subtle elevation
- Loading states with branded skeleton screens
---
## πŸ”§ Configuration
### Backend Settings
**Model Selection** (in `backend/nlp/sentiment.py`):
```python
MODEL_ID = "cardiffnlp/twitter-roberta-base-sentiment-latest" # Primary model
FALLBACK_MODE = False # Set True to skip transformer download
```
**Topic Count** (in `backend/main.py`):
```python
modeler = get_modeler(n_topics=8) # Adjust number of topics
```
**Sample Data Size** (in `backend/main.py`):
```python
_corpus = generate_posts(n=500) # Generate 500 posts (adjust as needed)
```
### Crisis Detection Thresholds
Edit `backend/nlp/crisis_detector.py`:
```python
ALERT_LEVELS = {
(0, 4): ("low", "🟒", "No action required."),
(4, 8): ("medium", "🟑", "Monitor closely."),
(8, 15): ("high", "🟠", "Escalate to communications team."),
(15, 99): ("critical", "πŸ”΄", "Activate crisis response immediately."),
}
```
---
## πŸ“ˆ Performance Notes
**First Run:**
- Model download: ~440MB (RoBERTa weights)
- Bootstrap time: 15-30 seconds (sentiment + topic modeling + trends)
**Subsequent Runs:**
- Model loads from cache: ~3-5 seconds
- Bootstrap time: 5-10 seconds
**Runtime Performance:**
- Sentiment analysis: ~50ms per post (transformer mode)
- Topic modeling fit: ~2 seconds (500 posts, 8 topics)
- Trend forecast: <1 second (90-day series)
- Dashboard payload: ~1 second (full analysis)
**Offline Mode:**
- If transformers unavailable: Falls back to VADER (100x faster)
- If backend offline: Frontend uses demo data (instant load)
---
## πŸš€ Production Deployment
This is a **demo/portfolio project**. For production use:
1. **Replace sample data** with real data sources:
- Twitter API / Reddit API / Review aggregators
- Implement proper data ingestion pipeline
- Add database (PostgreSQL / MongoDB) for persistence
2. **Fine-tune the BERT model** on your domain:
- Collect labeled training data from your industry
- Fine-tune on HuggingFace Trainer
- Deploy custom model endpoint
3. **Add authentication**:
- OAuth 2.0 / JWT tokens
- User accounts and multi-tenancy
- API rate limiting
4. **Scale the backend**:
- Containerize with Docker
- Deploy to AWS/GCP/Azure
- Add Redis cache for analytics
- Use Celery for async NLP jobs
5. **Enhance frontend**:
- Add React/Vue for state management
- Implement WebSocket for real-time updates
- Add export to PDF/CSV functionality
---
## πŸ“ Project Structure
```
social-intelligence-platform/
β”œβ”€β”€ backend/
β”‚ β”œβ”€β”€ main.py # FastAPI application
β”‚ β”œβ”€β”€ requirements.txt # Python dependencies
β”‚ β”œβ”€β”€ data/
β”‚ β”‚ β”œβ”€β”€ __init__.py
β”‚ β”‚ └── sample_data.py # Synthetic data generator
β”‚ └── nlp/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ sentiment.py # BERT sentiment pipeline
β”‚ β”œβ”€β”€ topic_model.py # NMF topic modeling
β”‚ β”œβ”€β”€ trend_analysis.py # Time series forecasting
β”‚ β”œβ”€β”€ crisis_detector.py # Crisis scoring engine
β”‚ └── competitor_intel.py # Competitor mention analysis
β”œβ”€β”€ frontend/
β”‚ └── index.html # Dashboard UI (self-contained)
β”œβ”€β”€ docs/
β”‚ └── CASE_STUDY.md # Detailed project writeup
└── README.md # This file
```
---
## πŸŽ“ Skills Demonstrated
### NLP & Machine Learning
- βœ… BERT/Transformer fine-tuning and inference
- βœ… Topic modeling (NMF, LDA alternatives)
- βœ… Time series forecasting (exponential smoothing)
- βœ… Aspect-based sentiment analysis
- βœ… Anomaly detection (statistical outliers)
- βœ… Multi-signal classification (crisis scoring)
### Backend Engineering
- βœ… FastAPI REST API design
- βœ… Async Python patterns
- βœ… Model serving and caching
- βœ… Batch processing pipelines
- βœ… Error handling and fallbacks
### Frontend Development
- βœ… Modern vanilla JS (no framework bloat)
- βœ… Chart.js and D3.js visualizations
- βœ… Responsive CSS Grid layouts
- βœ… Design system implementation
- βœ… Performance optimization (lazy loading, debouncing)
### Product Thinking
- βœ… Problem-first approach (not technology-first)
- βœ… User-centered design (product teams, not ML researchers)
- βœ… Actionable insights over raw metrics
- βœ… Crisis prioritization and triage
---
## πŸ“§ Questions?
This project demonstrates production-ready NLP engineering, API design, and data visualization skills. Built to solve real product team pain points with modern ML techniques.
**Author:** [Your Name]
**Portfolio:** [Your Portfolio URL]
**GitHub:** [Your GitHub]
**LinkedIn:** [Your LinkedIn]
---
## πŸ“„ License
MIT License β€” Free to use for educational and portfolio purposes.
---
**Built with:** 🐍 Python β€’ ⚑ FastAPI β€’ πŸ€— Transformers β€’ πŸ“Š Chart.js β€’ 🎨 Custom CSS