DivYonko commited on
Commit ·
b8d79b9
1
Parent(s): 906e964
docs: add session changelog
Browse files- CHANGELOG.md +75 -139
CHANGELOG.md
CHANGED
|
@@ -1,177 +1,113 @@
|
|
| 1 |
-
# LivePulse —
|
| 2 |
-
**Date:** April
|
| 3 |
-
**Session
|
| 4 |
|
| 5 |
---
|
| 6 |
|
| 7 |
-
##
|
| 8 |
|
| 9 |
-
|
| 10 |
-
|------|---------------|-------------|--------|
|
| 11 |
-
| `frontend/streamlit_app.py` | ~540 | 1354 | +814 |
|
| 12 |
-
| `backend/scraper.py` | 115 | 135 | +20 |
|
| 13 |
-
| `requirements.txt` | 22 | 35 | +13 |
|
| 14 |
|
| 15 |
---
|
| 16 |
|
| 17 |
-
##
|
| 18 |
-
|
| 19 |
-
### 1.
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
### 1.2 Sentiment Velocity
|
| 25 |
-
- Added `compute_velocity()` — compares positive ratio of last 20 messages vs previous 20
|
| 26 |
-
- Displayed as a 5th stat card alongside cumulative counts
|
| 27 |
-
- Three states: ↑ Rising (green), → Stable (yellow), ↓ Falling (red)
|
| 28 |
-
- Shows delta percentage shift
|
| 29 |
-
|
| 30 |
-
### 1.3 Notification / Alert System
|
| 31 |
-
- **Negative spike alert** — pulsing red banner when negative % in rolling window exceeds configurable threshold (default 40%)
|
| 32 |
-
- **Spam surge alert** — separate orange banner when spam topic % exceeds configurable threshold (default 30%)
|
| 33 |
-
- Both alerts are dismissable with a ✕ button and re-arm automatically when new messages arrive
|
| 34 |
-
- Alert window size and thresholds configurable from sidebar sliders
|
| 35 |
-
|
| 36 |
-
### 1.4 Pinned Messages
|
| 37 |
-
- Every message in the live feed has a 📍 pin button
|
| 38 |
-
- Pinned messages appear in a dedicated "Pinned Messages" section above the feed with gold highlight styling
|
| 39 |
-
- Individual unpin buttons per message
|
| 40 |
-
- Sidebar shows pin count and a "Clear pins" button
|
| 41 |
-
- Pin state persists across auto-refreshes via `st.session_state`
|
| 42 |
-
|
| 43 |
-
### 1.5 Multi-Stream Comparison (fully rebuilt)
|
| 44 |
-
- Sidebar now manages up to **5 independent stream slots** (A–E), each with its own color, video ID field, Redis key field, and Start/Stop buttons
|
| 45 |
-
- **+ Add stream / - Remove last** buttons to dynamically add/remove slots
|
| 46 |
-
- Comparison section appears automatically when 2+ streams have data — no toggle needed
|
| 47 |
-
- Renders sentiment bar charts in rows of 3
|
| 48 |
-
- Overlay line chart shows rolling positive % for all active streams on the same axis
|
| 49 |
-
- Fixed Streamlit widget re-render bug: widget keys used as single source of truth instead of `value=` overrides
|
| 50 |
|
| 51 |
---
|
| 52 |
|
| 53 |
-
## 2.
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
- Message rate (msgs/min) — 40% weight
|
| 58 |
-
- Positive ratio — 40% weight
|
| 59 |
-
- Question density — 20% weight
|
| 60 |
-
- Displayed as a large score card with a fill bar and grade (🔥 High / ⚡ Medium / 💤 Low)
|
| 61 |
-
- Three supporting metric tiles: Msgs/min, Positive ratio, Question density
|
| 62 |
-
|
| 63 |
-
### 2.2 Top Contributors Leaderboard
|
| 64 |
-
- `compute_top_contributors()` — ranks authors by message count, tracks per-author sentiment breakdown
|
| 65 |
-
- Left panel: ranked list with 🥇🥈🥉 medals, progress bar, colored sentiment dots per author
|
| 66 |
-
- Right panel: stacked horizontal bar chart showing sentiment % for top 5 authors
|
| 67 |
-
- CSV export of full leaderboard
|
| 68 |
-
|
| 69 |
-
### 2.3 Word Cloud
|
| 70 |
-
- `compute_word_freq()` — extracts top 60 words after removing stopwords (English + common Hinglish filler words)
|
| 71 |
-
- Filterable by sentiment (All / Positive / Neutral / Negative) and topic
|
| 72 |
-
- Renders word cloud image via `wordcloud` library using `wc.to_array()` directly (no matplotlib pipeline)
|
| 73 |
-
- Top-20 frequency bar chart shown below the cloud
|
| 74 |
-
- Falls back to bar chart only if `wordcloud` not installed
|
| 75 |
-
|
| 76 |
-
### 2.4 Spam Rate Alert
|
| 77 |
-
- `check_spam_alert()` — monitors spam topic ratio in rolling window
|
| 78 |
-
- Separate dismissable banner distinct from the negative sentiment alert
|
| 79 |
-
- Configurable threshold and window from sidebar
|
| 80 |
|
| 81 |
---
|
| 82 |
|
| 83 |
-
## 3.
|
|
|
|
|
|
|
|
|
|
| 84 |
|
| 85 |
-
|
| 86 |
-
- Added `argparse` CLI interface with two arguments:
|
| 87 |
-
- `--video_id` — YouTube video ID to scrape (defaults to `config.py` value)
|
| 88 |
-
- `--redis_key` — Redis list key to write messages to (defaults to `chat_messages`)
|
| 89 |
-
- `run()` function now accepts `video_id` and `redis_key` as parameters instead of reading globals
|
| 90 |
-
- Redis connection moved inside `run()` so each scraper instance is fully independent
|
| 91 |
-
- Each stream writes to its own Redis key, enabling true parallel multi-stream operation
|
| 92 |
|
| 93 |
-
|
| 94 |
-
``
|
| 95 |
-
|
| 96 |
-
|
| 97 |
|
| 98 |
-
|
| 99 |
-
python -m backend.scraper --video_id XYZ789 --redis_key chat_messages_b
|
| 100 |
|
| 101 |
-
#
|
| 102 |
-
|
| 103 |
-
``
|
|
|
|
| 104 |
|
| 105 |
---
|
| 106 |
|
| 107 |
-
##
|
|
|
|
|
|
|
|
|
|
| 108 |
|
| 109 |
-
|
| 110 |
-
- `load_stream_data("chat_messages")` called **once** per refresh cycle
|
| 111 |
-
- Windowed slice (`data = all_data[-msg_limit:]`) derived in-memory instead of a second Redis read
|
| 112 |
-
- Multi-stream comparison reuses cached data instead of calling `load_stream_data` twice per stream
|
| 113 |
|
| 114 |
-
###
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
| `compute_velocity()` | 10s | Skips recompute if data unchanged |
|
| 119 |
-
| `build_heatmap_data()` | 10s | Skips full groupby on every refresh |
|
| 120 |
-
| `compute_engagement()` | 10s | Skips recompute if data unchanged |
|
| 121 |
-
| `compute_top_contributors()` | 10s | Skips recompute if data unchanged |
|
| 122 |
-
| `compute_word_freq()` | 10s | Skips word counting on every refresh |
|
| 123 |
|
| 124 |
-
|
| 125 |
-
- `compute_velocity()` and `build_heatmap_data()` refactored to accept JSON strings instead of DataFrames — `st.cache_data` requires hashable arguments and DataFrames are not hashable
|
| 126 |
|
| 127 |
-
###
|
| 128 |
-
|
|
|
|
|
|
|
| 129 |
|
| 130 |
---
|
| 131 |
|
| 132 |
-
##
|
|
|
|
|
|
|
|
|
|
| 133 |
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
|
|
|
|
|
|
|
|
|
| 137 |
|
| 138 |
-
|
| 139 |
-
- **Problem:** `r.exists(key)` returns an integer (0 or 1), not a bool, and returns 1 for any existing key including empty lists
|
| 140 |
-
- **Fix:** Changed to `r.llen(key) > 0` which correctly checks for actual message data
|
| 141 |
|
| 142 |
-
##
|
| 143 |
-
- **Problem:** `background_color="transparent"` is not a valid PIL color specifier, causing `ValueError: unknown color specifier: 'transparent'`
|
| 144 |
-
- **Fix:** Changed to `background_color="white"` and render via `wc.to_array()` directly — removes the matplotlib pipeline entirely
|
| 145 |
|
| 146 |
-
|
| 147 |
-
-
|
| 148 |
-
|
|
|
|
| 149 |
|
| 150 |
---
|
| 151 |
|
| 152 |
-
##
|
| 153 |
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
``
|
|
|
|
|
|
|
| 158 |
|
| 159 |
---
|
| 160 |
|
| 161 |
-
##
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
| 168 |
-
|
| 169 |
-
|
| 170 |
-
└── video_title ← Stream A title for page header
|
| 171 |
-
|
| 172 |
-
backend/scraper.py ← One process per stream, --video_id + --redis_key args
|
| 173 |
-
backend/main.py ← FastAPI REST API (reads from chat_messages)
|
| 174 |
-
frontend/streamlit_app.py ← Dashboard (reads from all active Redis keys)
|
| 175 |
-
ml/sentiment_model.py ← 3-model ensemble (MuRIL + XLM-R + Multilingual)
|
| 176 |
-
ml/topic_model.py ← Keyword fast-path + BART zero-shot fallback
|
| 177 |
-
```
|
|
|
|
| 1 |
+
# LivePulse — Session Changelog
|
| 2 |
+
**Date:** April 16, 2026
|
| 3 |
+
**Session:** HF Spaces Deployment Debugging & Fixes
|
| 4 |
|
| 5 |
---
|
| 6 |
|
| 7 |
+
## Summary
|
| 8 |
|
| 9 |
+
This session was entirely focused on getting the deployed LivePulse app on Hugging Face Spaces (`huggingface.co/spaces/Divyonko/LivePulse`) to actually work end-to-end — from scraping YouTube live chat to displaying analytics in the dashboard.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
---
|
| 12 |
|
| 13 |
+
## Issues Found & Fixed (in order)
|
| 14 |
+
|
| 15 |
+
### 1. Missing `return None` in `_get_live_chat_id`
|
| 16 |
+
**File:** `app.py`
|
| 17 |
+
**Problem:** The `except` block in `_get_live_chat_id` was missing `return None`, meaning on an exception the function could fall through with undefined behavior.
|
| 18 |
+
**Fix:** Added explicit `return None` in the `except` block.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
---
|
| 21 |
|
| 22 |
+
### 2. No logging output visible in HF Spaces logs
|
| 23 |
+
**File:** `app.py`
|
| 24 |
+
**Problem:** Python's root logger defaults to WARNING level. All our `logger.info()` calls were silently dropped — nothing useful appeared in the logs.
|
| 25 |
+
**Fix:** Added `logging.basicConfig(level=logging.INFO, force=True)` so all INFO and above messages appear in HF Spaces logs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
---
|
| 28 |
|
| 29 |
+
### 3. Torchvision warnings flooding the logs
|
| 30 |
+
**File:** `Dockerfile`
|
| 31 |
+
**Problem:** Streamlit's file watcher scans all imported modules including `transformers`, which tries to import `torchvision` (not installed). This produced hundreds of `ModuleNotFoundError: No module named 'torchvision'` lines, making real errors impossible to find.
|
| 32 |
+
**Fix:** Added `ENV STREAMLIT_SERVER_FILE_WATCHER_TYPE=none` to the Dockerfile to disable the file watcher entirely.
|
| 33 |
|
| 34 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
+
### 4. Improved HTTP error logging in `_get_live_chat_id`
|
| 37 |
+
**File:** `app.py`
|
| 38 |
+
**Problem:** Generic `except Exception` swallowed the actual YouTube API error body (e.g. "API key invalid", "quota exceeded").
|
| 39 |
+
**Fix:** Added a separate `urllib.error.HTTPError` handler that reads and logs the full error response body, making API failures immediately diagnosable.
|
| 40 |
|
| 41 |
+
---
|
|
|
|
| 42 |
|
| 43 |
+
### 5. API key presence logging
|
| 44 |
+
**File:** `app.py`
|
| 45 |
+
**Problem:** No way to confirm whether the `YOUTUBE_API_KEY` secret was actually being read from HF Spaces environment.
|
| 46 |
+
**Fix:** Added `logger.info("YOUTUBE_API_KEY present: %s (length=%d)", ...)` at scraper thread start.
|
| 47 |
|
| 48 |
---
|
| 49 |
|
| 50 |
+
### 6. Chat message fetch logging
|
| 51 |
+
**File:** `app.py`
|
| 52 |
+
**Problem:** No confirmation that `liveChat/messages` API calls were succeeding.
|
| 53 |
+
**Fix:** Added `logger.info("Fetched %d chat messages ...")` after each successful API poll.
|
| 54 |
|
| 55 |
+
---
|
|
|
|
|
|
|
|
|
|
| 56 |
|
| 57 |
+
### 7. `@st.cache_data` on `load_stream_data` returning stale empty results
|
| 58 |
+
**File:** `app.py`
|
| 59 |
+
**Problem:** `load_stream_data` was decorated with `@st.cache_data(ttl=5)`. The cache key was just `redis_key` (a constant string), so it cached the first result (empty list) and kept returning it even after the scraper had written messages. Attempted fix with `_store_len` cache-busting parameter failed because Streamlit ignores parameters prefixed with `_` for hashing purposes.
|
| 60 |
+
**Fix:** Removed `@st.cache_data` entirely from `load_stream_data`. Since the store is in-memory (later SQLite), there is zero I/O cost to reading it directly on every rerun.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
+
---
|
|
|
|
| 63 |
|
| 64 |
+
### 8. Scraper thread blocking on ML inference for 60+ backlog messages
|
| 65 |
+
**File:** `app.py`
|
| 66 |
+
**Problem:** On startup, the YouTube API returns a backlog of 50-70 messages from the last few minutes. The scraper was running full ML inference (MuRIL + XLM-R + BART = 3 models × 60 messages = 180 forward passes on CPU) before writing a single message to the store. This took several minutes, during which the UI showed "No messages yet" and users kept clicking Start again, killing and restarting the thread.
|
| 67 |
+
**Fix:** Added `is_first_page` flag. On the first API page (backlog), messages are stored immediately with `Neutral/General` placeholder sentiment so the UI shows data within seconds. Full ML inference only runs on subsequent pages (new live messages, typically 5-15 at a time).
|
| 68 |
|
| 69 |
---
|
| 70 |
|
| 71 |
+
### 9. Per-message ML inference error logging
|
| 72 |
+
**File:** `app.py`
|
| 73 |
+
**Problem:** If `predict_sentiment` or `predict_topic` threw an exception for a specific message, it was silently caught by `_safe_sentiment`/`_safe_topic` with no indication of which message failed or why.
|
| 74 |
+
**Fix:** Added explicit `try/except` with `logger.error("ML inference failed for text=%r: %s", ...)` around each message's inference call in the scraper loop.
|
| 75 |
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
### 10. Root cause: In-memory store not shared across Streamlit worker processes
|
| 79 |
+
**File:** `app.py`
|
| 80 |
+
**Problem:** This was the fundamental bug causing "No messages yet" despite the scraper working correctly. HF Spaces runs Streamlit with multiple worker processes. The scraper thread ran in worker process A and wrote to `_STORE` (a Python `dict` in that process's RAM). Browser requests were served by worker process B, which had its own separate empty `_STORE`. The two processes never shared memory — the UI always saw zero messages regardless of how many the scraper had collected.
|
| 81 |
+
**Fix:** Replaced the entire in-memory `deque`-based store with **SQLite** at `/tmp/livepulse.db`. SQLite is a file on disk that all worker processes in the container share. The scraper writes to it; any worker serving the UI reads from the same file. All store functions (`store_rpush`, `store_lrange`, `store_llen`, `store_delete`) were rewritten to use SQLite queries with a threading lock.
|
| 82 |
|
| 83 |
+
---
|
|
|
|
|
|
|
| 84 |
|
| 85 |
+
## Files Changed
|
|
|
|
|
|
|
| 86 |
|
| 87 |
+
| File | Changes |
|
| 88 |
+
|------|---------|
|
| 89 |
+
| `app.py` | SQLite store, logging setup, backlog fix, cache removal, HTTP error handling, `return None` fix |
|
| 90 |
+
| `Dockerfile` | Added `STREAMLIT_SERVER_FILE_WATCHER_TYPE=none` |
|
| 91 |
|
| 92 |
---
|
| 93 |
|
| 94 |
+
## What Was NOT Changed
|
| 95 |
|
| 96 |
+
- All dashboard features preserved: charts, alerts, word cloud, engagement score, leaderboard, multi-stream comparison, pinned messages, sentiment heatmap, topic distribution, confidence trend, CSV export
|
| 97 |
+
- ML models unchanged: MuRIL + XLM-R + BART ensemble still runs on new messages
|
| 98 |
+
- YouTube Data API v3 scraper logic unchanged
|
| 99 |
+
- `requirements.txt` unchanged
|
| 100 |
+
- `.gitattributes` (Git LFS for model weights) unchanged
|
| 101 |
+
- `README.md` unchanged
|
| 102 |
|
| 103 |
---
|
| 104 |
|
| 105 |
+
## Current State
|
| 106 |
+
|
| 107 |
+
The app is fully functional on HF Spaces:
|
| 108 |
+
- Scraper fetches YouTube live chat via YouTube Data API v3
|
| 109 |
+
- API key read from HF Spaces secret `YOUTUBE_API_KEY`
|
| 110 |
+
- Backlog messages stored immediately on start (with placeholder sentiment)
|
| 111 |
+
- New messages processed with full ML inference
|
| 112 |
+
- SQLite ensures scraper and UI share data across all worker processes
|
| 113 |
+
- Dashboard displays all analytics once messages are in the store
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|