| | --- |
| | title: Sabian Brand Sentiment Dashboard |
| | emoji: π₯ |
| | colorFrom: red |
| | colorTo: yellow |
| | sdk: streamlit |
| | sdk_version: "1.50.0" |
| | app_file: app.py |
| | pinned: false |
| | --- |
| | |
| |
|
| | # Musora Sentiment Analysis Dashboard |
| |
|
| | A Streamlit dashboard for visualising sentiment analysis results from **social media comments** (Facebook, Instagram, YouTube, Twitter) and the **Musora internal app** across brands (Drumeo, Pianote, Guitareo, Singeo, Musora). |
| |
|
| | --- |
| |
|
| | ## Table of Contents |
| |
|
| | 1. [Project Structure](#project-structure) |
| | 2. [How Data Flows](#how-data-flows) |
| | 3. [Data Loading Strategy](#data-loading-strategy) |
| | 4. [Pages](#pages) |
| | 5. [Global Filters & Session State](#global-filters--session-state) |
| | 6. [Snowflake Queries](#snowflake-queries) |
| | 7. [Adding or Changing Things](#adding-or-changing-things) |
| | 8. [Running the App](#running-the-app) |
| | 9. [Configuration Reference](#configuration-reference) |
| |
|
| | --- |
| |
|
| | ## Project Structure |
| |
|
| | ``` |
| | visualization/ |
| | βββ app.py # Entry point β routing, sidebar, session state |
| | βββ config/ |
| | β βββ viz_config.json # Colors, query strings, dashboard settings |
| | βββ data/ |
| | β βββ data_loader.py # All Snowflake queries and caching logic |
| | βββ utils/ |
| | β βββ data_processor.py # Pandas aggregations (intent dist, content summary, etc.) |
| | β βββ metrics.py # KPI calculations (sentiment score, urgency, etc.) |
| | βββ components/ |
| | β βββ dashboard.py # Dashboard page renderer |
| | β βββ sentiment_analysis.py # Sentiment Analysis page renderer |
| | β βββ reply_required.py # Reply Required page renderer |
| | βββ visualizations/ |
| | β βββ sentiment_charts.py # Plotly sentiment chart functions |
| | β βββ distribution_charts.py # Plotly distribution / heatmap / scatter functions |
| | β βββ demographic_charts.py # Plotly demographic chart functions |
| | β βββ content_cards.py # Streamlit card components (comment cards, content cards) |
| | βββ agents/ |
| | β βββ content_summary_agent.py # AI analysis agent (OpenAI) for comment summarisation |
| | βββ img/ |
| | β βββ musora.png # Sidebar logo |
| | βββ SnowFlakeConnection.py # Snowflake connection wrapper (Snowpark session) |
| | ``` |
| |
|
| | --- |
| |
|
| | ## How Data Flows |
| |
|
| | ``` |
| | Snowflake |
| | β |
| | βΌ |
| | data_loader.py β Three separate loading modes (see below) |
| | β |
| | βββ load_dashboard_data() βββΊ st.session_state['dashboard_df'] |
| | β βββΊ app.py sidebar (filter options, counts) |
| | β βββΊ dashboard.py (all charts) |
| | β |
| | βββ load_sa_data() βββΊ st.session_state['sa_contents'] |
| | β (on-demand, button) st.session_state['sa_comments'] |
| | β βββΊ sentiment_analysis.py |
| | β |
| | βββ load_reply_required_data() βΊ st.session_state['rr_df'] |
| | (on-demand, button) βββΊ reply_required.py |
| | ``` |
| |
|
| | **Key principle:** Data is loaded as little as possible, as late as possible. |
| |
|
| | - The **Dashboard** uses a lightweight query (no text columns, no content join) cached for 24 hours. |
| | - The **Sentiment Analysis** and **Reply Required** pages never load data automatically β they wait for the user to click **Fetch Data**. |
| | - All data is stored in `st.session_state` so page navigation and widget interactions do not re-trigger Snowflake queries. |
| |
|
| | --- |
| |
|
| | ## Data Loading Strategy |
| |
|
| | All loading logic lives in **`data/data_loader.py`** (`SentimentDataLoader` class). |
| | |
| | ### `load_dashboard_data()` |
| | - Uses `dashboard_query` from `viz_config.json`. |
| | - Fetches only: `comment_sk, content_sk, platform, brand, sentiment_polarity, intent, requires_reply, detected_language, comment_timestamp, processed_at, author_id`. |
| | - No text columns, no `DIM_CONTENT` join β significantly faster than the full query. |
| | - Also merges demographics data if `demographics_query` is configured. |
| | - Cached for **24 hours** (`@st.cache_data(ttl=86400)`). |
| | - Called once by `app.py` at startup; result stored in `st.session_state['dashboard_df']`. |
| | |
| | ### `load_sa_data(platform, brand, top_n, min_comments, sort_by, sentiments, intents, date_range)` |
| | - Runs **two** sequential Snowflake queries: |
| | 1. **Content aggregation** β groups by `content_sk`, counts per sentiment, computes severity score, returns top N. |
| | 2. **Sampled comments** β for the top N `content_sk`s only, fetches up to 50 comments per sentiment group per content (negative, positive, other), using Snowflake `QUALIFY ROW_NUMBER()`. `display_text` is computed in SQL (`CASE WHEN IS_ENGLISH = FALSE AND TRANSLATED_TEXT IS NOT NULL THEN TRANSLATED_TEXT ELSE ORIGINAL_TEXT END`). |
| | - Returns a tuple `(contents_df, comments_df)`. |
| | - Cached for **24 hours**. |
| | - Called only when the user clicks **Fetch Data** on the Sentiment Analysis page. |
| |
|
| | ### `load_reply_required_data(platforms, brands, date_range)` |
| | - Runs a single query filtering `REQUIRES_REPLY = TRUE`. |
| | - Dynamically includes/excludes the social media table and musora table based on selected platforms. |
| | - `display_text` computed in SQL. |
| | - Cached for **24 hours**. |
| | - Called only when the user clicks **Fetch Data** on the Reply Required page. |
| |
|
| | ### Important: SQL Column Qualification |
| | Both the social media table (`COMMENT_SENTIMENT_FEATURES`) and the content dimension table (`DIM_CONTENT`) share column names. Any `WHERE` clause inside a query that joins these two tables **must** use the table alias prefix (e.g. `s.PLATFORM`, `s.COMMENT_TIMESTAMP`, `s.CHANNEL_NAME`) to avoid Snowflake `ambiguous column name` errors. The musora table (`MUSORA_COMMENT_SENTIMENT_FEATURES`) has no joins so unqualified column names are fine there. |
| |
|
| | --- |
| |
|
| | ## Pages |
| |
|
| | ### Dashboard (`components/dashboard.py`) |
| |
|
| | **Receives:** `filtered_df` β the lightweight dashboard dataframe (after optional global filter applied by `app.py`). |
| |
|
| | **Does not need:** text, translations, content URLs. All charts work purely on aggregated columns (sentiment_polarity, brand, platform, intent, requires_reply, comment_timestamp). |
| | |
| | **Key sections:** |
| | - Summary stats + health indicator |
| | - Sentiment distribution (pie + gauge) |
| | - Sentiment by brand and platform (stacked + percentage bar charts) |
| | - Intent analysis |
| | - Brand-Platform heatmap |
| | - Reply requirements + urgency breakdown |
| | - Demographics (age, timezone, experience level) β only rendered if `author_id` is present and demographics were merged |
| |
|
| | **To add a new chart:** create the chart function in `visualizations/` and call it from `render_dashboard()`. The function receives `filtered_df`. |
| |
|
| | --- |
| |
|
| | ### Sentiment Analysis (`components/sentiment_analysis.py`) |
| | |
| | **Receives:** `data_loader` instance only (no dataframe). |
| |
|
| | **Flow:** |
| | 1. Reads `st.session_state['dashboard_df']` for filter option lists (platforms, brands, sentiments, intents). |
| | 2. Pre-populates platform/brand dropdowns from `st.session_state['global_filters']`. |
| | 3. Shows filter controls (platform, brand, sentiment, intent, top_n, min_comments, sort_by). |
| | 4. On **Fetch Data** click: calls `data_loader.load_sa_data(...)` and stores results in `st.session_state['sa_contents']` and `['sa_comments']`. |
| | 5. Renders content cards, per-content sentiment + intent charts, AI analysis buttons, and sampled comment expanders. |
| | |
| | **Pagination:** `st.session_state['sentiment_page']` (5 contents per page). Reset on new fetch. |
| | |
| | **Comments:** Sampled (up to 50 negative + 50 positive + 50 neutral per content). These are already in memory after the fetch β no extra query is needed when the user expands a comment section. |
| | |
| | **AI Analysis:** Uses `ContentSummaryAgent` (see `agents/`). Results cached in `st.session_state['content_summaries']`. |
| | |
| | --- |
| | |
| | ### Reply Required (`components/reply_required.py`) |
| |
|
| | **Receives:** `data_loader` instance only. |
| |
|
| | **Flow:** |
| | 1. Reads `st.session_state['dashboard_df']` for filter option lists. |
| | 2. Pre-populates platform, brand, and date from `st.session_state['global_filters']`. |
| | 3. On **Fetch Data** click: calls `data_loader.load_reply_required_data(...)` and stores result in `st.session_state['rr_df']`. |
| | 4. Shows urgency breakdown, in-page view filters (priority, platform, brand, intent β applied in Python, no new query), paginated comment cards, and a "Reply by Content" summary. |
| |
|
| | **Pagination:** `st.session_state['reply_page']` (10 comments per page). Reset on new fetch. |
| |
|
| | --- |
| |
|
| | ## Global Filters & Session State |
| |
|
| | Global filters live in the sidebar (`app.py`) and are stored in `st.session_state['global_filters']` as a dict: |
| |
|
| | ```python |
| | { |
| | 'platforms': ['facebook', 'instagram'], # list or [] |
| | 'brands': ['drumeo'], |
| | 'sentiments': [], |
| | 'date_range': (date(2025, 1, 1), date(2025, 12, 31)), # or None |
| | } |
| | ``` |
| |
|
| | - **Dashboard:** `app.py` applies global filters to `dashboard_df` using `data_loader.apply_filters()` and passes the result to `render_dashboard()`. |
| | - **Sentiment Analysis / Reply Required:** global filters are used to pre-populate their own filter widgets. The actual Snowflake query uses those values when the user clicks Fetch. The pages do **not** receive a pre-filtered dataframe. |
| |
|
| | ### Full session state key reference |
| |
|
| | | Key | Set by | Used by | |
| | |-----|--------|---------| |
| | | `dashboard_df` | `app.py` on startup | sidebar (filter options), dashboard, SA + RR (filter option lists) | |
| | | `global_filters` | sidebar "Apply Filters" button | app.py (dashboard filter), SA + RR (pre-populate widgets) | |
| | | `filters_applied` | sidebar buttons | app.py (whether to apply filters) | |
| | | `sa_contents` | SA fetch button | SA page rendering | |
| | | `sa_comments` | SA fetch button | SA page rendering | |
| | | `sa_fetch_key` | SA fetch button | SA page (detect stale data) | |
| | | `rr_df` | RR fetch button | RR page rendering | |
| | | `rr_fetch_key` | RR fetch button | RR page (detect stale data) | |
| | | `sentiment_page` | SA page / fetch | SA pagination | |
| | | `reply_page` | RR page / fetch | RR pagination | |
| | | `content_summaries` | AI analysis buttons | SA AI analysis display | |
| |
|
| | --- |
| |
|
| | ## Snowflake Queries |
| |
|
| | All query strings are either stored in `config/viz_config.json` (static queries) or built dynamically in `data/data_loader.py` (page-specific queries). |
| |
|
| | ### Static queries (in `viz_config.json`) |
| | |
| | | Key | Purpose | |
| | |-----|---------| |
| | | `query` | Full query with all columns (legacy, kept for compatibility) | |
| | | `dashboard_query` | Lightweight query β no text, no DIM_CONTENT join | |
| | | `demographics_query` | Joins `usora_users` with `preprocessed.users` to get age/timezone/experience | |
| | |
| | ### Dynamic queries (built in `data_loader.py`) |
| |
|
| | | Method | Description | |
| | |--------|-------------| |
| | | `_build_sa_content_query()` | Content aggregation for SA page; filters by platform + brand + date | |
| | | `_build_sa_comments_query()` | Sampled comments for SA page; uses `QUALIFY ROW_NUMBER() <= 50` | |
| | | `_build_rr_query()` | Reply-required comments; filters by platform/brand/date; conditionally includes social media and/or musora table | |
| |
|
| | ### Data source tables |
| |
|
| | | Table | Platform | Notes | |
| | |-------|----------|-------| |
| | | `SOCIAL_MEDIA_DB.ML_FEATURES.COMMENT_SENTIMENT_FEATURES` | facebook, instagram, youtube, twitter | Needs `LEFT JOIN SOCIAL_MEDIA_DB.CORE.DIM_CONTENT` for `PERMALINK_URL` | |
| | | `SOCIAL_MEDIA_DB.ML_FEATURES.MUSORA_COMMENT_SENTIMENT_FEATURES` | musora_app | Has `PERMALINK_URL` and `THUMBNAIL_URL` natively; platform stored as `'musora'`, mapped to `'musora_app'` in queries | |
| |
|
| | --- |
| |
|
| | ## Adding or Changing Things |
| |
|
| | ### Add a new chart to the Dashboard |
| | 1. Write the chart function in the appropriate `visualizations/` file. |
| | 2. Call it from `render_dashboard()` in `components/dashboard.py`, passing `filtered_df`. |
| | 3. The chart function receives a lightweight df β it has no text columns but has all the columns listed in `dashboard_query`. |
| |
|
| | ### Add a new filter to the Dashboard sidebar |
| | 1. Add the widget in `app.py` under the "Global Filters" section. |
| | 2. Store the selected value in the `global_filters` dict under `st.session_state`. |
| | 3. Pass it to `data_loader.apply_filters()`. |
| |
|
| | ### Change what the Sentiment Analysis page queries |
| | - Edit `_build_sa_content_query()` and/or `_build_sa_comments_query()` in `data_loader.py`. |
| | - If you add new columns to the content aggregation result, also update `_process_sa_content_stats()` so they are available in `contents_df`. |
| | - If you add new columns to the comments result, update `_process_sa_comments()`. |
| |
|
| | ### Change what the Reply Required page queries |
| | - Edit `_build_rr_query()` in `data_loader.py`. |
| | - Remember: all column references inside the social media block (which has a `JOIN`) must be prefixed with `s.` to avoid Snowflake ambiguity errors. |
| |
|
| | ### Change the cache duration |
| | - `@st.cache_data(ttl=86400)` is set on `load_dashboard_data`, `_fetch_sa_data`, `_fetch_rr_data`, and `load_demographics_data`. |
| | - Change `86400` (seconds) to the desired TTL, or set `ttl=None` for no expiry. |
| | - Users can always force a refresh with the "Reload Data" button in the sidebar (which calls `st.cache_data.clear()` and deletes `st.session_state['dashboard_df']`). |
| |
|
| | ### Add a new page |
| | 1. Create `components/new_page.py` with a `render_new_page(data_loader)` function. |
| | 2. Import and add a radio option in `app.py`. |
| | 3. If the page needs its own Snowflake data, add a `load_new_page_data()` method to `SentimentDataLoader` following the same pattern as `load_sa_data`. |
| |
|
| | ### Add a new column to the Dashboard query |
| | - Edit `dashboard_query` in `config/viz_config.json`. |
| | - Both UNION branches must select the same columns in the same order. |
| | - `_process_dashboard_dataframe()` in `data_loader.py` handles basic type casting β add processing there if needed. |
| |
|
| | --- |
| |
|
| | ## Running the App |
| |
|
| | ```bash |
| | # From the project root |
| | streamlit run visualization/app.py |
| | ``` |
| |
|
| | **Required environment variables** (in `.env` at project root): |
| |
|
| | ``` |
| | SNOWFLAKE_USER |
| | SNOWFLAKE_PASSWORD |
| | SNOWFLAKE_ACCOUNT |
| | SNOWFLAKE_ROLE |
| | SNOWFLAKE_DATABASE |
| | SNOWFLAKE_WAREHOUSE |
| | SNOWFLAKE_SCHEMA |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Configuration Reference |
| |
|
| | `config/viz_config.json` controls: |
| |
|
| | | Section | What it configures | |
| | |---------|-------------------| |
| | | `color_schemes.sentiment_polarity` | Hex colors for each sentiment level | |
| | | `color_schemes.intent` | Hex colors for each intent label | |
| | | `color_schemes.platform` | Hex colors for each platform | |
| | | `color_schemes.brand` | Hex colors for each brand | |
| | | `sentiment_order` | Display order for sentiment categories in charts | |
| | | `intent_order` | Display order for intent categories | |
| | | `negative_sentiments` | Which sentiment values count as "negative" | |
| | | `dashboard.default_date_range_days` | Default date filter window (days) | |
| | | `dashboard.max_comments_display` | Max comments shown per pagination page | |
| | | `dashboard.chart_height` | Default Plotly chart height | |
| | | `dashboard.top_n_contents` | Default top-N for content ranking | |
| | | `snowflake.query` | Full query (legacy, all columns) | |
| | | `snowflake.dashboard_query` | Lightweight dashboard query (no text columns) | |
| | | `snowflake.demographics_query` | Demographics join query | |
| | | `demographics.age_groups` | Age bucket definitions (label β [min, max]) | |
| | | `demographics.experience_groups` | Experience bucket definitions | |
| | | `demographics.top_timezones_count` | How many timezones to show in the geographic chart | |