File size: 15,309 Bytes
93a0411 9858829 c2ff09b 9858829 c2ff09b 9858829 c2ff09b 9858829 c2ff09b 9858829 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 | ---
title: Sabian Brand Sentiment Dashboard
emoji: π₯
colorFrom: red
colorTo: yellow
sdk: streamlit
sdk_version: "1.50.0"
app_file: app.py
pinned: false
---
# Musora Sentiment Analysis Dashboard
A Streamlit dashboard for visualising sentiment analysis results from **social media comments** (Facebook, Instagram, YouTube, Twitter) and the **Musora internal app** across brands (Drumeo, Pianote, Guitareo, Singeo, Musora).
---
## Table of Contents
1. [Project Structure](#project-structure)
2. [How Data Flows](#how-data-flows)
3. [Data Loading Strategy](#data-loading-strategy)
4. [Pages](#pages)
5. [Global Filters & Session State](#global-filters--session-state)
6. [Snowflake Queries](#snowflake-queries)
7. [Adding or Changing Things](#adding-or-changing-things)
8. [Running the App](#running-the-app)
9. [Configuration Reference](#configuration-reference)
---
## Project Structure
```
visualization/
βββ app.py # Entry point β routing, sidebar, session state
βββ config/
β βββ viz_config.json # Colors, query strings, dashboard settings
βββ data/
β βββ data_loader.py # All Snowflake queries and caching logic
βββ utils/
β βββ data_processor.py # Pandas aggregations (intent dist, content summary, etc.)
β βββ metrics.py # KPI calculations (sentiment score, urgency, etc.)
βββ components/
β βββ dashboard.py # Dashboard page renderer
β βββ sentiment_analysis.py # Sentiment Analysis page renderer
β βββ reply_required.py # Reply Required page renderer
βββ visualizations/
β βββ sentiment_charts.py # Plotly sentiment chart functions
β βββ distribution_charts.py # Plotly distribution / heatmap / scatter functions
β βββ demographic_charts.py # Plotly demographic chart functions
β βββ content_cards.py # Streamlit card components (comment cards, content cards)
βββ agents/
β βββ content_summary_agent.py # AI analysis agent (OpenAI) for comment summarisation
βββ img/
β βββ musora.png # Sidebar logo
βββ SnowFlakeConnection.py # Snowflake connection wrapper (Snowpark session)
```
---
## How Data Flows
```
Snowflake
β
βΌ
data_loader.py β Three separate loading modes (see below)
β
βββ load_dashboard_data() βββΊ st.session_state['dashboard_df']
β βββΊ app.py sidebar (filter options, counts)
β βββΊ dashboard.py (all charts)
β
βββ load_sa_data() βββΊ st.session_state['sa_contents']
β (on-demand, button) st.session_state['sa_comments']
β βββΊ sentiment_analysis.py
β
βββ load_reply_required_data() βΊ st.session_state['rr_df']
(on-demand, button) βββΊ reply_required.py
```
**Key principle:** Data is loaded as little as possible, as late as possible.
- The **Dashboard** uses a lightweight query (no text columns, no content join) cached for 24 hours.
- The **Sentiment Analysis** and **Reply Required** pages never load data automatically β they wait for the user to click **Fetch Data**.
- All data is stored in `st.session_state` so page navigation and widget interactions do not re-trigger Snowflake queries.
---
## Data Loading Strategy
All loading logic lives in **`data/data_loader.py`** (`SentimentDataLoader` class).
### `load_dashboard_data()`
- Uses `dashboard_query` from `viz_config.json`.
- Fetches only: `comment_sk, content_sk, platform, brand, sentiment_polarity, intent, requires_reply, detected_language, comment_timestamp, processed_at, author_id`.
- No text columns, no `DIM_CONTENT` join β significantly faster than the full query.
- Also merges demographics data if `demographics_query` is configured.
- Cached for **24 hours** (`@st.cache_data(ttl=86400)`).
- Called once by `app.py` at startup; result stored in `st.session_state['dashboard_df']`.
### `load_sa_data(platform, brand, top_n, min_comments, sort_by, sentiments, intents, date_range)`
- Runs **two** sequential Snowflake queries:
1. **Content aggregation** β groups by `content_sk`, counts per sentiment, computes severity score, returns top N.
2. **Sampled comments** β for the top N `content_sk`s only, fetches up to 50 comments per sentiment group per content (negative, positive, other), using Snowflake `QUALIFY ROW_NUMBER()`. `display_text` is computed in SQL (`CASE WHEN IS_ENGLISH = FALSE AND TRANSLATED_TEXT IS NOT NULL THEN TRANSLATED_TEXT ELSE ORIGINAL_TEXT END`).
- Returns a tuple `(contents_df, comments_df)`.
- Cached for **24 hours**.
- Called only when the user clicks **Fetch Data** on the Sentiment Analysis page.
### `load_reply_required_data(platforms, brands, date_range)`
- Runs a single query filtering `REQUIRES_REPLY = TRUE`.
- Dynamically includes/excludes the social media table and musora table based on selected platforms.
- `display_text` computed in SQL.
- Cached for **24 hours**.
- Called only when the user clicks **Fetch Data** on the Reply Required page.
### Important: SQL Column Qualification
Both the social media table (`COMMENT_SENTIMENT_FEATURES`) and the content dimension table (`DIM_CONTENT`) share column names. Any `WHERE` clause inside a query that joins these two tables **must** use the table alias prefix (e.g. `s.PLATFORM`, `s.COMMENT_TIMESTAMP`, `s.CHANNEL_NAME`) to avoid Snowflake `ambiguous column name` errors. The musora table (`MUSORA_COMMENT_SENTIMENT_FEATURES`) has no joins so unqualified column names are fine there.
---
## Pages
### Dashboard (`components/dashboard.py`)
**Receives:** `filtered_df` β the lightweight dashboard dataframe (after optional global filter applied by `app.py`).
**Does not need:** text, translations, content URLs. All charts work purely on aggregated columns (sentiment_polarity, brand, platform, intent, requires_reply, comment_timestamp).
**Key sections:**
- Summary stats + health indicator
- Sentiment distribution (pie + gauge)
- Sentiment by brand and platform (stacked + percentage bar charts)
- Intent analysis
- Brand-Platform heatmap
- Reply requirements + urgency breakdown
- Demographics (age, timezone, experience level) β only rendered if `author_id` is present and demographics were merged
**To add a new chart:** create the chart function in `visualizations/` and call it from `render_dashboard()`. The function receives `filtered_df`.
---
### Sentiment Analysis (`components/sentiment_analysis.py`)
**Receives:** `data_loader` instance only (no dataframe).
**Flow:**
1. Reads `st.session_state['dashboard_df']` for filter option lists (platforms, brands, sentiments, intents).
2. Pre-populates platform/brand dropdowns from `st.session_state['global_filters']`.
3. Shows filter controls (platform, brand, sentiment, intent, top_n, min_comments, sort_by).
4. On **Fetch Data** click: calls `data_loader.load_sa_data(...)` and stores results in `st.session_state['sa_contents']` and `['sa_comments']`.
5. Renders content cards, per-content sentiment + intent charts, AI analysis buttons, and sampled comment expanders.
**Pagination:** `st.session_state['sentiment_page']` (5 contents per page). Reset on new fetch.
**Comments:** Sampled (up to 50 negative + 50 positive + 50 neutral per content). These are already in memory after the fetch β no extra query is needed when the user expands a comment section.
**AI Analysis:** Uses `ContentSummaryAgent` (see `agents/`). Results cached in `st.session_state['content_summaries']`.
---
### Reply Required (`components/reply_required.py`)
**Receives:** `data_loader` instance only.
**Flow:**
1. Reads `st.session_state['dashboard_df']` for filter option lists.
2. Pre-populates platform, brand, and date from `st.session_state['global_filters']`.
3. On **Fetch Data** click: calls `data_loader.load_reply_required_data(...)` and stores result in `st.session_state['rr_df']`.
4. Shows urgency breakdown, in-page view filters (priority, platform, brand, intent β applied in Python, no new query), paginated comment cards, and a "Reply by Content" summary.
**Pagination:** `st.session_state['reply_page']` (10 comments per page). Reset on new fetch.
---
## Global Filters & Session State
Global filters live in the sidebar (`app.py`) and are stored in `st.session_state['global_filters']` as a dict:
```python
{
'platforms': ['facebook', 'instagram'], # list or []
'brands': ['drumeo'],
'sentiments': [],
'date_range': (date(2025, 1, 1), date(2025, 12, 31)), # or None
}
```
- **Dashboard:** `app.py` applies global filters to `dashboard_df` using `data_loader.apply_filters()` and passes the result to `render_dashboard()`.
- **Sentiment Analysis / Reply Required:** global filters are used to pre-populate their own filter widgets. The actual Snowflake query uses those values when the user clicks Fetch. The pages do **not** receive a pre-filtered dataframe.
### Full session state key reference
| Key | Set by | Used by |
|-----|--------|---------|
| `dashboard_df` | `app.py` on startup | sidebar (filter options), dashboard, SA + RR (filter option lists) |
| `global_filters` | sidebar "Apply Filters" button | app.py (dashboard filter), SA + RR (pre-populate widgets) |
| `filters_applied` | sidebar buttons | app.py (whether to apply filters) |
| `sa_contents` | SA fetch button | SA page rendering |
| `sa_comments` | SA fetch button | SA page rendering |
| `sa_fetch_key` | SA fetch button | SA page (detect stale data) |
| `rr_df` | RR fetch button | RR page rendering |
| `rr_fetch_key` | RR fetch button | RR page (detect stale data) |
| `sentiment_page` | SA page / fetch | SA pagination |
| `reply_page` | RR page / fetch | RR pagination |
| `content_summaries` | AI analysis buttons | SA AI analysis display |
---
## Snowflake Queries
All query strings are either stored in `config/viz_config.json` (static queries) or built dynamically in `data/data_loader.py` (page-specific queries).
### Static queries (in `viz_config.json`)
| Key | Purpose |
|-----|---------|
| `query` | Full query with all columns (legacy, kept for compatibility) |
| `dashboard_query` | Lightweight query β no text, no DIM_CONTENT join |
| `demographics_query` | Joins `usora_users` with `preprocessed.users` to get age/timezone/experience |
### Dynamic queries (built in `data_loader.py`)
| Method | Description |
|--------|-------------|
| `_build_sa_content_query()` | Content aggregation for SA page; filters by platform + brand + date |
| `_build_sa_comments_query()` | Sampled comments for SA page; uses `QUALIFY ROW_NUMBER() <= 50` |
| `_build_rr_query()` | Reply-required comments; filters by platform/brand/date; conditionally includes social media and/or musora table |
### Data source tables
| Table | Platform | Notes |
|-------|----------|-------|
| `SOCIAL_MEDIA_DB.ML_FEATURES.COMMENT_SENTIMENT_FEATURES` | facebook, instagram, youtube, twitter | Needs `LEFT JOIN SOCIAL_MEDIA_DB.CORE.DIM_CONTENT` for `PERMALINK_URL` |
| `SOCIAL_MEDIA_DB.ML_FEATURES.MUSORA_COMMENT_SENTIMENT_FEATURES` | musora_app | Has `PERMALINK_URL` and `THUMBNAIL_URL` natively; platform stored as `'musora'`, mapped to `'musora_app'` in queries |
---
## Adding or Changing Things
### Add a new chart to the Dashboard
1. Write the chart function in the appropriate `visualizations/` file.
2. Call it from `render_dashboard()` in `components/dashboard.py`, passing `filtered_df`.
3. The chart function receives a lightweight df β it has no text columns but has all the columns listed in `dashboard_query`.
### Add a new filter to the Dashboard sidebar
1. Add the widget in `app.py` under the "Global Filters" section.
2. Store the selected value in the `global_filters` dict under `st.session_state`.
3. Pass it to `data_loader.apply_filters()`.
### Change what the Sentiment Analysis page queries
- Edit `_build_sa_content_query()` and/or `_build_sa_comments_query()` in `data_loader.py`.
- If you add new columns to the content aggregation result, also update `_process_sa_content_stats()` so they are available in `contents_df`.
- If you add new columns to the comments result, update `_process_sa_comments()`.
### Change what the Reply Required page queries
- Edit `_build_rr_query()` in `data_loader.py`.
- Remember: all column references inside the social media block (which has a `JOIN`) must be prefixed with `s.` to avoid Snowflake ambiguity errors.
### Change the cache duration
- `@st.cache_data(ttl=86400)` is set on `load_dashboard_data`, `_fetch_sa_data`, `_fetch_rr_data`, and `load_demographics_data`.
- Change `86400` (seconds) to the desired TTL, or set `ttl=None` for no expiry.
- Users can always force a refresh with the "Reload Data" button in the sidebar (which calls `st.cache_data.clear()` and deletes `st.session_state['dashboard_df']`).
### Add a new page
1. Create `components/new_page.py` with a `render_new_page(data_loader)` function.
2. Import and add a radio option in `app.py`.
3. If the page needs its own Snowflake data, add a `load_new_page_data()` method to `SentimentDataLoader` following the same pattern as `load_sa_data`.
### Add a new column to the Dashboard query
- Edit `dashboard_query` in `config/viz_config.json`.
- Both UNION branches must select the same columns in the same order.
- `_process_dashboard_dataframe()` in `data_loader.py` handles basic type casting β add processing there if needed.
---
## Running the App
```bash
# From the project root
streamlit run visualization/app.py
```
**Required environment variables** (in `.env` at project root):
```
SNOWFLAKE_USER
SNOWFLAKE_PASSWORD
SNOWFLAKE_ACCOUNT
SNOWFLAKE_ROLE
SNOWFLAKE_DATABASE
SNOWFLAKE_WAREHOUSE
SNOWFLAKE_SCHEMA
```
---
## Configuration Reference
`config/viz_config.json` controls:
| Section | What it configures |
|---------|-------------------|
| `color_schemes.sentiment_polarity` | Hex colors for each sentiment level |
| `color_schemes.intent` | Hex colors for each intent label |
| `color_schemes.platform` | Hex colors for each platform |
| `color_schemes.brand` | Hex colors for each brand |
| `sentiment_order` | Display order for sentiment categories in charts |
| `intent_order` | Display order for intent categories |
| `negative_sentiments` | Which sentiment values count as "negative" |
| `dashboard.default_date_range_days` | Default date filter window (days) |
| `dashboard.max_comments_display` | Max comments shown per pagination page |
| `dashboard.chart_height` | Default Plotly chart height |
| `dashboard.top_n_contents` | Default top-N for content ranking |
| `snowflake.query` | Full query (legacy, all columns) |
| `snowflake.dashboard_query` | Lightweight dashboard query (no text columns) |
| `snowflake.demographics_query` | Demographics join query |
| `demographics.age_groups` | Age bucket definitions (label β [min, max]) |
| `demographics.experience_groups` | Experience bucket definitions |
| `demographics.top_timezones_count` | How many timezones to show in the geographic chart | |