Danialebrat's picture
Deploying sentiment analysis project
9858829

A newer version of the Streamlit SDK is available: 1.55.0

Upgrade

Musora Sentiment Analysis Dashboard

A Streamlit dashboard for visualising sentiment analysis results from social media comments (Facebook, Instagram, YouTube, Twitter) and the Musora internal app across brands (Drumeo, Pianote, Guitareo, Singeo, Musora).


Table of Contents

  1. Project Structure
  2. How Data Flows
  3. Data Loading Strategy
  4. Pages
  5. Global Filters & Session State
  6. Snowflake Queries
  7. Adding or Changing Things
  8. Running the App
  9. Configuration Reference

Project Structure

visualization/
β”œβ”€β”€ app.py                          # Entry point β€” routing, sidebar, session state
β”œβ”€β”€ config/
β”‚   └── viz_config.json             # Colors, query strings, dashboard settings
β”œβ”€β”€ data/
β”‚   └── data_loader.py              # All Snowflake queries and caching logic
β”œβ”€β”€ utils/
β”‚   β”œβ”€β”€ data_processor.py           # Pandas aggregations (intent dist, content summary, etc.)
β”‚   └── metrics.py                  # KPI calculations (sentiment score, urgency, etc.)
β”œβ”€β”€ components/
β”‚   β”œβ”€β”€ dashboard.py                # Dashboard page renderer
β”‚   β”œβ”€β”€ sentiment_analysis.py       # Sentiment Analysis page renderer
β”‚   └── reply_required.py           # Reply Required page renderer
β”œβ”€β”€ visualizations/
β”‚   β”œβ”€β”€ sentiment_charts.py         # Plotly sentiment chart functions
β”‚   β”œβ”€β”€ distribution_charts.py      # Plotly distribution / heatmap / scatter functions
β”‚   β”œβ”€β”€ demographic_charts.py       # Plotly demographic chart functions
β”‚   └── content_cards.py            # Streamlit card components (comment cards, content cards)
β”œβ”€β”€ agents/
β”‚   └── content_summary_agent.py    # AI analysis agent (OpenAI) for comment summarisation
β”œβ”€β”€ img/
β”‚   └── musora.png                  # Sidebar logo
└── SnowFlakeConnection.py          # Snowflake connection wrapper (Snowpark session)

How Data Flows

Snowflake
    β”‚
    β–Ό
data_loader.py          ← Three separate loading modes (see below)
    β”‚
    β”œβ”€β”€ load_dashboard_data()   ──► st.session_state['dashboard_df']
    β”‚                                   └─► app.py sidebar (filter options, counts)
    β”‚                                   └─► dashboard.py (all charts)
    β”‚
    β”œβ”€β”€ load_sa_data()          ──► st.session_state['sa_contents']
    β”‚   (on-demand, button)          st.session_state['sa_comments']
    β”‚                                   └─► sentiment_analysis.py
    β”‚
    └── load_reply_required_data() β–Ί st.session_state['rr_df']
        (on-demand, button)             └─► reply_required.py

Key principle: Data is loaded as little as possible, as late as possible.

  • The Dashboard uses a lightweight query (no text columns, no content join) cached for 24 hours.
  • The Sentiment Analysis and Reply Required pages never load data automatically β€” they wait for the user to click Fetch Data.
  • All data is stored in st.session_state so page navigation and widget interactions do not re-trigger Snowflake queries.

Data Loading Strategy

All loading logic lives in data/data_loader.py (SentimentDataLoader class).

load_dashboard_data()

  • Uses dashboard_query from viz_config.json.
  • Fetches only: comment_sk, content_sk, platform, brand, sentiment_polarity, intent, requires_reply, detected_language, comment_timestamp, processed_at, author_id.
  • No text columns, no DIM_CONTENT join β€” significantly faster than the full query.
  • Also merges demographics data if demographics_query is configured.
  • Cached for 24 hours (@st.cache_data(ttl=86400)).
  • Called once by app.py at startup; result stored in st.session_state['dashboard_df'].

load_sa_data(platform, brand, top_n, min_comments, sort_by, sentiments, intents, date_range)

  • Runs two sequential Snowflake queries:
    1. Content aggregation β€” groups by content_sk, counts per sentiment, computes severity score, returns top N.
    2. Sampled comments β€” for the top N content_sks only, fetches up to 50 comments per sentiment group per content (negative, positive, other), using Snowflake QUALIFY ROW_NUMBER(). display_text is computed in SQL (CASE WHEN IS_ENGLISH = FALSE AND TRANSLATED_TEXT IS NOT NULL THEN TRANSLATED_TEXT ELSE ORIGINAL_TEXT END).
  • Returns a tuple (contents_df, comments_df).
  • Cached for 24 hours.
  • Called only when the user clicks Fetch Data on the Sentiment Analysis page.

load_reply_required_data(platforms, brands, date_range)

  • Runs a single query filtering REQUIRES_REPLY = TRUE.
  • Dynamically includes/excludes the social media table and musora table based on selected platforms.
  • display_text computed in SQL.
  • Cached for 24 hours.
  • Called only when the user clicks Fetch Data on the Reply Required page.

Important: SQL Column Qualification

Both the social media table (COMMENT_SENTIMENT_FEATURES) and the content dimension table (DIM_CONTENT) share column names. Any WHERE clause inside a query that joins these two tables must use the table alias prefix (e.g. s.PLATFORM, s.COMMENT_TIMESTAMP, s.CHANNEL_NAME) to avoid Snowflake ambiguous column name errors. The musora table (MUSORA_COMMENT_SENTIMENT_FEATURES) has no joins so unqualified column names are fine there.


Pages

Dashboard (components/dashboard.py)

Receives: filtered_df β€” the lightweight dashboard dataframe (after optional global filter applied by app.py).

Does not need: text, translations, content URLs. All charts work purely on aggregated columns (sentiment_polarity, brand, platform, intent, requires_reply, comment_timestamp).

Key sections:

  • Summary stats + health indicator
  • Sentiment distribution (pie + gauge)
  • Sentiment by brand and platform (stacked + percentage bar charts)
  • Intent analysis
  • Brand-Platform heatmap
  • Reply requirements + urgency breakdown
  • Demographics (age, timezone, experience level) β€” only rendered if author_id is present and demographics were merged

To add a new chart: create the chart function in visualizations/ and call it from render_dashboard(). The function receives filtered_df.


Sentiment Analysis (components/sentiment_analysis.py)

Receives: data_loader instance only (no dataframe).

Flow:

  1. Reads st.session_state['dashboard_df'] for filter option lists (platforms, brands, sentiments, intents).
  2. Pre-populates platform/brand dropdowns from st.session_state['global_filters'].
  3. Shows filter controls (platform, brand, sentiment, intent, top_n, min_comments, sort_by).
  4. On Fetch Data click: calls data_loader.load_sa_data(...) and stores results in st.session_state['sa_contents'] and ['sa_comments'].
  5. Renders content cards, per-content sentiment + intent charts, AI analysis buttons, and sampled comment expanders.

Pagination: st.session_state['sentiment_page'] (5 contents per page). Reset on new fetch.

Comments: Sampled (up to 50 negative + 50 positive + 50 neutral per content). These are already in memory after the fetch β€” no extra query is needed when the user expands a comment section.

AI Analysis: Uses ContentSummaryAgent (see agents/). Results cached in st.session_state['content_summaries'].


Reply Required (components/reply_required.py)

Receives: data_loader instance only.

Flow:

  1. Reads st.session_state['dashboard_df'] for filter option lists.
  2. Pre-populates platform, brand, and date from st.session_state['global_filters'].
  3. On Fetch Data click: calls data_loader.load_reply_required_data(...) and stores result in st.session_state['rr_df'].
  4. Shows urgency breakdown, in-page view filters (priority, platform, brand, intent β€” applied in Python, no new query), paginated comment cards, and a "Reply by Content" summary.

Pagination: st.session_state['reply_page'] (10 comments per page). Reset on new fetch.


Global Filters & Session State

Global filters live in the sidebar (app.py) and are stored in st.session_state['global_filters'] as a dict:

{
    'platforms':  ['facebook', 'instagram'],   # list or []
    'brands':     ['drumeo'],
    'sentiments': [],
    'date_range': (date(2025, 1, 1), date(2025, 12, 31)),  # or None
}
  • Dashboard: app.py applies global filters to dashboard_df using data_loader.apply_filters() and passes the result to render_dashboard().
  • Sentiment Analysis / Reply Required: global filters are used to pre-populate their own filter widgets. The actual Snowflake query uses those values when the user clicks Fetch. The pages do not receive a pre-filtered dataframe.

Full session state key reference

Key Set by Used by
dashboard_df app.py on startup sidebar (filter options), dashboard, SA + RR (filter option lists)
global_filters sidebar "Apply Filters" button app.py (dashboard filter), SA + RR (pre-populate widgets)
filters_applied sidebar buttons app.py (whether to apply filters)
sa_contents SA fetch button SA page rendering
sa_comments SA fetch button SA page rendering
sa_fetch_key SA fetch button SA page (detect stale data)
rr_df RR fetch button RR page rendering
rr_fetch_key RR fetch button RR page (detect stale data)
sentiment_page SA page / fetch SA pagination
reply_page RR page / fetch RR pagination
content_summaries AI analysis buttons SA AI analysis display

Snowflake Queries

All query strings are either stored in config/viz_config.json (static queries) or built dynamically in data/data_loader.py (page-specific queries).

Static queries (in viz_config.json)

Key Purpose
query Full query with all columns (legacy, kept for compatibility)
dashboard_query Lightweight query β€” no text, no DIM_CONTENT join
demographics_query Joins usora_users with preprocessed.users to get age/timezone/experience

Dynamic queries (built in data_loader.py)

Method Description
_build_sa_content_query() Content aggregation for SA page; filters by platform + brand + date
_build_sa_comments_query() Sampled comments for SA page; uses QUALIFY ROW_NUMBER() <= 50
_build_rr_query() Reply-required comments; filters by platform/brand/date; conditionally includes social media and/or musora table

Data source tables

Table Platform Notes
SOCIAL_MEDIA_DB.ML_FEATURES.COMMENT_SENTIMENT_FEATURES facebook, instagram, youtube, twitter Needs LEFT JOIN SOCIAL_MEDIA_DB.CORE.DIM_CONTENT for PERMALINK_URL
SOCIAL_MEDIA_DB.ML_FEATURES.MUSORA_COMMENT_SENTIMENT_FEATURES musora_app Has PERMALINK_URL and THUMBNAIL_URL natively; platform stored as 'musora', mapped to 'musora_app' in queries

Adding or Changing Things

Add a new chart to the Dashboard

  1. Write the chart function in the appropriate visualizations/ file.
  2. Call it from render_dashboard() in components/dashboard.py, passing filtered_df.
  3. The chart function receives a lightweight df β€” it has no text columns but has all the columns listed in dashboard_query.

Add a new filter to the Dashboard sidebar

  1. Add the widget in app.py under the "Global Filters" section.
  2. Store the selected value in the global_filters dict under st.session_state.
  3. Pass it to data_loader.apply_filters().

Change what the Sentiment Analysis page queries

  • Edit _build_sa_content_query() and/or _build_sa_comments_query() in data_loader.py.
  • If you add new columns to the content aggregation result, also update _process_sa_content_stats() so they are available in contents_df.
  • If you add new columns to the comments result, update _process_sa_comments().

Change what the Reply Required page queries

  • Edit _build_rr_query() in data_loader.py.
  • Remember: all column references inside the social media block (which has a JOIN) must be prefixed with s. to avoid Snowflake ambiguity errors.

Change the cache duration

  • @st.cache_data(ttl=86400) is set on load_dashboard_data, _fetch_sa_data, _fetch_rr_data, and load_demographics_data.
  • Change 86400 (seconds) to the desired TTL, or set ttl=None for no expiry.
  • Users can always force a refresh with the "Reload Data" button in the sidebar (which calls st.cache_data.clear() and deletes st.session_state['dashboard_df']).

Add a new page

  1. Create components/new_page.py with a render_new_page(data_loader) function.
  2. Import and add a radio option in app.py.
  3. If the page needs its own Snowflake data, add a load_new_page_data() method to SentimentDataLoader following the same pattern as load_sa_data.

Add a new column to the Dashboard query

  • Edit dashboard_query in config/viz_config.json.
  • Both UNION branches must select the same columns in the same order.
  • _process_dashboard_dataframe() in data_loader.py handles basic type casting β€” add processing there if needed.

Running the App

# From the project root
streamlit run visualization/app.py

Required environment variables (in .env at project root):

SNOWFLAKE_USER
SNOWFLAKE_PASSWORD
SNOWFLAKE_ACCOUNT
SNOWFLAKE_ROLE
SNOWFLAKE_DATABASE
SNOWFLAKE_WAREHOUSE
SNOWFLAKE_SCHEMA

Configuration Reference

config/viz_config.json controls:

Section What it configures
color_schemes.sentiment_polarity Hex colors for each sentiment level
color_schemes.intent Hex colors for each intent label
color_schemes.platform Hex colors for each platform
color_schemes.brand Hex colors for each brand
sentiment_order Display order for sentiment categories in charts
intent_order Display order for intent categories
negative_sentiments Which sentiment values count as "negative"
dashboard.default_date_range_days Default date filter window (days)
dashboard.max_comments_display Max comments shown per pagination page
dashboard.chart_height Default Plotly chart height
dashboard.top_n_contents Default top-N for content ranking
snowflake.query Full query (legacy, all columns)
snowflake.dashboard_query Lightweight dashboard query (no text columns)
snowflake.demographics_query Demographics join query
demographics.age_groups Age bucket definitions (label β†’ [min, max])
demographics.experience_groups Experience bucket definitions
demographics.top_timezones_count How many timezones to show in the geographic chart