A newer version of the Streamlit SDK is available:
1.55.0
Musora Sentiment Analysis Dashboard
A Streamlit dashboard for visualising sentiment analysis results from social media comments (Facebook, Instagram, YouTube, Twitter) and the Musora internal app across brands (Drumeo, Pianote, Guitareo, Singeo, Musora).
Table of Contents
- Project Structure
- How Data Flows
- Data Loading Strategy
- Pages
- Global Filters & Session State
- Snowflake Queries
- Adding or Changing Things
- Running the App
- Configuration Reference
Project Structure
visualization/
βββ app.py # Entry point β routing, sidebar, session state
βββ config/
β βββ viz_config.json # Colors, query strings, dashboard settings
βββ data/
β βββ data_loader.py # All Snowflake queries and caching logic
βββ utils/
β βββ data_processor.py # Pandas aggregations (intent dist, content summary, etc.)
β βββ metrics.py # KPI calculations (sentiment score, urgency, etc.)
βββ components/
β βββ dashboard.py # Dashboard page renderer
β βββ sentiment_analysis.py # Sentiment Analysis page renderer
β βββ reply_required.py # Reply Required page renderer
βββ visualizations/
β βββ sentiment_charts.py # Plotly sentiment chart functions
β βββ distribution_charts.py # Plotly distribution / heatmap / scatter functions
β βββ demographic_charts.py # Plotly demographic chart functions
β βββ content_cards.py # Streamlit card components (comment cards, content cards)
βββ agents/
β βββ content_summary_agent.py # AI analysis agent (OpenAI) for comment summarisation
βββ img/
β βββ musora.png # Sidebar logo
βββ SnowFlakeConnection.py # Snowflake connection wrapper (Snowpark session)
How Data Flows
Snowflake
β
βΌ
data_loader.py β Three separate loading modes (see below)
β
βββ load_dashboard_data() βββΊ st.session_state['dashboard_df']
β βββΊ app.py sidebar (filter options, counts)
β βββΊ dashboard.py (all charts)
β
βββ load_sa_data() βββΊ st.session_state['sa_contents']
β (on-demand, button) st.session_state['sa_comments']
β βββΊ sentiment_analysis.py
β
βββ load_reply_required_data() βΊ st.session_state['rr_df']
(on-demand, button) βββΊ reply_required.py
Key principle: Data is loaded as little as possible, as late as possible.
- The Dashboard uses a lightweight query (no text columns, no content join) cached for 24 hours.
- The Sentiment Analysis and Reply Required pages never load data automatically β they wait for the user to click Fetch Data.
- All data is stored in
st.session_stateso page navigation and widget interactions do not re-trigger Snowflake queries.
Data Loading Strategy
All loading logic lives in data/data_loader.py (SentimentDataLoader class).
load_dashboard_data()
- Uses
dashboard_queryfromviz_config.json. - Fetches only:
comment_sk, content_sk, platform, brand, sentiment_polarity, intent, requires_reply, detected_language, comment_timestamp, processed_at, author_id. - No text columns, no
DIM_CONTENTjoin β significantly faster than the full query. - Also merges demographics data if
demographics_queryis configured. - Cached for 24 hours (
@st.cache_data(ttl=86400)). - Called once by
app.pyat startup; result stored inst.session_state['dashboard_df'].
load_sa_data(platform, brand, top_n, min_comments, sort_by, sentiments, intents, date_range)
- Runs two sequential Snowflake queries:
- Content aggregation β groups by
content_sk, counts per sentiment, computes severity score, returns top N. - Sampled comments β for the top N
content_sks only, fetches up to 50 comments per sentiment group per content (negative, positive, other), using SnowflakeQUALIFY ROW_NUMBER().display_textis computed in SQL (CASE WHEN IS_ENGLISH = FALSE AND TRANSLATED_TEXT IS NOT NULL THEN TRANSLATED_TEXT ELSE ORIGINAL_TEXT END).
- Content aggregation β groups by
- Returns a tuple
(contents_df, comments_df). - Cached for 24 hours.
- Called only when the user clicks Fetch Data on the Sentiment Analysis page.
load_reply_required_data(platforms, brands, date_range)
- Runs a single query filtering
REQUIRES_REPLY = TRUE. - Dynamically includes/excludes the social media table and musora table based on selected platforms.
display_textcomputed in SQL.- Cached for 24 hours.
- Called only when the user clicks Fetch Data on the Reply Required page.
Important: SQL Column Qualification
Both the social media table (COMMENT_SENTIMENT_FEATURES) and the content dimension table (DIM_CONTENT) share column names. Any WHERE clause inside a query that joins these two tables must use the table alias prefix (e.g. s.PLATFORM, s.COMMENT_TIMESTAMP, s.CHANNEL_NAME) to avoid Snowflake ambiguous column name errors. The musora table (MUSORA_COMMENT_SENTIMENT_FEATURES) has no joins so unqualified column names are fine there.
Pages
Dashboard (components/dashboard.py)
Receives: filtered_df β the lightweight dashboard dataframe (after optional global filter applied by app.py).
Does not need: text, translations, content URLs. All charts work purely on aggregated columns (sentiment_polarity, brand, platform, intent, requires_reply, comment_timestamp).
Key sections:
- Summary stats + health indicator
- Sentiment distribution (pie + gauge)
- Sentiment by brand and platform (stacked + percentage bar charts)
- Intent analysis
- Brand-Platform heatmap
- Reply requirements + urgency breakdown
- Demographics (age, timezone, experience level) β only rendered if
author_idis present and demographics were merged
To add a new chart: create the chart function in visualizations/ and call it from render_dashboard(). The function receives filtered_df.
Sentiment Analysis (components/sentiment_analysis.py)
Receives: data_loader instance only (no dataframe).
Flow:
- Reads
st.session_state['dashboard_df']for filter option lists (platforms, brands, sentiments, intents). - Pre-populates platform/brand dropdowns from
st.session_state['global_filters']. - Shows filter controls (platform, brand, sentiment, intent, top_n, min_comments, sort_by).
- On Fetch Data click: calls
data_loader.load_sa_data(...)and stores results inst.session_state['sa_contents']and['sa_comments']. - Renders content cards, per-content sentiment + intent charts, AI analysis buttons, and sampled comment expanders.
Pagination: st.session_state['sentiment_page'] (5 contents per page). Reset on new fetch.
Comments: Sampled (up to 50 negative + 50 positive + 50 neutral per content). These are already in memory after the fetch β no extra query is needed when the user expands a comment section.
AI Analysis: Uses ContentSummaryAgent (see agents/). Results cached in st.session_state['content_summaries'].
Reply Required (components/reply_required.py)
Receives: data_loader instance only.
Flow:
- Reads
st.session_state['dashboard_df']for filter option lists. - Pre-populates platform, brand, and date from
st.session_state['global_filters']. - On Fetch Data click: calls
data_loader.load_reply_required_data(...)and stores result inst.session_state['rr_df']. - Shows urgency breakdown, in-page view filters (priority, platform, brand, intent β applied in Python, no new query), paginated comment cards, and a "Reply by Content" summary.
Pagination: st.session_state['reply_page'] (10 comments per page). Reset on new fetch.
Global Filters & Session State
Global filters live in the sidebar (app.py) and are stored in st.session_state['global_filters'] as a dict:
{
'platforms': ['facebook', 'instagram'], # list or []
'brands': ['drumeo'],
'sentiments': [],
'date_range': (date(2025, 1, 1), date(2025, 12, 31)), # or None
}
- Dashboard:
app.pyapplies global filters todashboard_dfusingdata_loader.apply_filters()and passes the result torender_dashboard(). - Sentiment Analysis / Reply Required: global filters are used to pre-populate their own filter widgets. The actual Snowflake query uses those values when the user clicks Fetch. The pages do not receive a pre-filtered dataframe.
Full session state key reference
| Key | Set by | Used by |
|---|---|---|
dashboard_df |
app.py on startup |
sidebar (filter options), dashboard, SA + RR (filter option lists) |
global_filters |
sidebar "Apply Filters" button | app.py (dashboard filter), SA + RR (pre-populate widgets) |
filters_applied |
sidebar buttons | app.py (whether to apply filters) |
sa_contents |
SA fetch button | SA page rendering |
sa_comments |
SA fetch button | SA page rendering |
sa_fetch_key |
SA fetch button | SA page (detect stale data) |
rr_df |
RR fetch button | RR page rendering |
rr_fetch_key |
RR fetch button | RR page (detect stale data) |
sentiment_page |
SA page / fetch | SA pagination |
reply_page |
RR page / fetch | RR pagination |
content_summaries |
AI analysis buttons | SA AI analysis display |
Snowflake Queries
All query strings are either stored in config/viz_config.json (static queries) or built dynamically in data/data_loader.py (page-specific queries).
Static queries (in viz_config.json)
| Key | Purpose |
|---|---|
query |
Full query with all columns (legacy, kept for compatibility) |
dashboard_query |
Lightweight query β no text, no DIM_CONTENT join |
demographics_query |
Joins usora_users with preprocessed.users to get age/timezone/experience |
Dynamic queries (built in data_loader.py)
| Method | Description |
|---|---|
_build_sa_content_query() |
Content aggregation for SA page; filters by platform + brand + date |
_build_sa_comments_query() |
Sampled comments for SA page; uses QUALIFY ROW_NUMBER() <= 50 |
_build_rr_query() |
Reply-required comments; filters by platform/brand/date; conditionally includes social media and/or musora table |
Data source tables
| Table | Platform | Notes |
|---|---|---|
SOCIAL_MEDIA_DB.ML_FEATURES.COMMENT_SENTIMENT_FEATURES |
facebook, instagram, youtube, twitter | Needs LEFT JOIN SOCIAL_MEDIA_DB.CORE.DIM_CONTENT for PERMALINK_URL |
SOCIAL_MEDIA_DB.ML_FEATURES.MUSORA_COMMENT_SENTIMENT_FEATURES |
musora_app | Has PERMALINK_URL and THUMBNAIL_URL natively; platform stored as 'musora', mapped to 'musora_app' in queries |
Adding or Changing Things
Add a new chart to the Dashboard
- Write the chart function in the appropriate
visualizations/file. - Call it from
render_dashboard()incomponents/dashboard.py, passingfiltered_df. - The chart function receives a lightweight df β it has no text columns but has all the columns listed in
dashboard_query.
Add a new filter to the Dashboard sidebar
- Add the widget in
app.pyunder the "Global Filters" section. - Store the selected value in the
global_filtersdict underst.session_state. - Pass it to
data_loader.apply_filters().
Change what the Sentiment Analysis page queries
- Edit
_build_sa_content_query()and/or_build_sa_comments_query()indata_loader.py. - If you add new columns to the content aggregation result, also update
_process_sa_content_stats()so they are available incontents_df. - If you add new columns to the comments result, update
_process_sa_comments().
Change what the Reply Required page queries
- Edit
_build_rr_query()indata_loader.py. - Remember: all column references inside the social media block (which has a
JOIN) must be prefixed withs.to avoid Snowflake ambiguity errors.
Change the cache duration
@st.cache_data(ttl=86400)is set onload_dashboard_data,_fetch_sa_data,_fetch_rr_data, andload_demographics_data.- Change
86400(seconds) to the desired TTL, or setttl=Nonefor no expiry. - Users can always force a refresh with the "Reload Data" button in the sidebar (which calls
st.cache_data.clear()and deletesst.session_state['dashboard_df']).
Add a new page
- Create
components/new_page.pywith arender_new_page(data_loader)function. - Import and add a radio option in
app.py. - If the page needs its own Snowflake data, add a
load_new_page_data()method toSentimentDataLoaderfollowing the same pattern asload_sa_data.
Add a new column to the Dashboard query
- Edit
dashboard_queryinconfig/viz_config.json. - Both UNION branches must select the same columns in the same order.
_process_dashboard_dataframe()indata_loader.pyhandles basic type casting β add processing there if needed.
Running the App
# From the project root
streamlit run visualization/app.py
Required environment variables (in .env at project root):
SNOWFLAKE_USER
SNOWFLAKE_PASSWORD
SNOWFLAKE_ACCOUNT
SNOWFLAKE_ROLE
SNOWFLAKE_DATABASE
SNOWFLAKE_WAREHOUSE
SNOWFLAKE_SCHEMA
Configuration Reference
config/viz_config.json controls:
| Section | What it configures |
|---|---|
color_schemes.sentiment_polarity |
Hex colors for each sentiment level |
color_schemes.intent |
Hex colors for each intent label |
color_schemes.platform |
Hex colors for each platform |
color_schemes.brand |
Hex colors for each brand |
sentiment_order |
Display order for sentiment categories in charts |
intent_order |
Display order for intent categories |
negative_sentiments |
Which sentiment values count as "negative" |
dashboard.default_date_range_days |
Default date filter window (days) |
dashboard.max_comments_display |
Max comments shown per pagination page |
dashboard.chart_height |
Default Plotly chart height |
dashboard.top_n_contents |
Default top-N for content ranking |
snowflake.query |
Full query (legacy, all columns) |
snowflake.dashboard_query |
Lightweight dashboard query (no text columns) |
snowflake.demographics_query |
Demographics join query |
demographics.age_groups |
Age bucket definitions (label β [min, max]) |
demographics.experience_groups |
Experience bucket definitions |
demographics.top_timezones_count |
How many timezones to show in the geographic chart |