Spaces:

MusoraProductDepartment
/

Sentiment_analysis

Running

App Files Files Community

Sentiment_analysis / visualization /README.md

Danialebrat

Deploying sentiment analysis project

9858829 13 days ago

preview code

raw

history blame contribute delete

15.1 kB

A newer version of the Streamlit SDK is available: 1.55.0

Upgrade

Musora Sentiment Analysis Dashboard

A Streamlit dashboard for visualising sentiment analysis results from social media comments (Facebook, Instagram, YouTube, Twitter) and the Musora internal app across brands (Drumeo, Pianote, Guitareo, Singeo, Musora).

Project Structure
How Data Flows
Data Loading Strategy
Pages
Global Filters & Session State
Snowflake Queries
Adding or Changing Things
Running the App
Configuration Reference

Project Structure

visualization/
├── app.py                          # Entry point — routing, sidebar, session state
├── config/
│   └── viz_config.json             # Colors, query strings, dashboard settings
├── data/
│   └── data_loader.py              # All Snowflake queries and caching logic
├── utils/
│   ├── data_processor.py           # Pandas aggregations (intent dist, content summary, etc.)
│   └── metrics.py                  # KPI calculations (sentiment score, urgency, etc.)
├── components/
│   ├── dashboard.py                # Dashboard page renderer
│   ├── sentiment_analysis.py       # Sentiment Analysis page renderer
│   └── reply_required.py           # Reply Required page renderer
├── visualizations/
│   ├── sentiment_charts.py         # Plotly sentiment chart functions
│   ├── distribution_charts.py      # Plotly distribution / heatmap / scatter functions
│   ├── demographic_charts.py       # Plotly demographic chart functions
│   └── content_cards.py            # Streamlit card components (comment cards, content cards)
├── agents/
│   └── content_summary_agent.py    # AI analysis agent (OpenAI) for comment summarisation
├── img/
│   └── musora.png                  # Sidebar logo
└── SnowFlakeConnection.py          # Snowflake connection wrapper (Snowpark session)

How Data Flows

Snowflake
    │
    ▼
data_loader.py          ← Three separate loading modes (see below)
    │
    ├── load_dashboard_data()   ──► st.session_state['dashboard_df']
    │                                   └─► app.py sidebar (filter options, counts)
    │                                   └─► dashboard.py (all charts)
    │
    ├── load_sa_data()          ──► st.session_state['sa_contents']
    │   (on-demand, button)          st.session_state['sa_comments']
    │                                   └─► sentiment_analysis.py
    │
    └── load_reply_required_data() ► st.session_state['rr_df']
        (on-demand, button)             └─► reply_required.py

Key principle: Data is loaded as little as possible, as late as possible.

The Dashboard uses a lightweight query (no text columns, no content join) cached for 24 hours.
The Sentiment Analysis and Reply Required pages never load data automatically — they wait for the user to click Fetch Data.
All data is stored in st.session_state so page navigation and widget interactions do not re-trigger Snowflake queries.

Data Loading Strategy

All loading logic lives in data/data_loader.py (SentimentDataLoader class).

`load_dashboard_data()`

Uses dashboard_query from viz_config.json.
Fetches only: comment_sk, content_sk, platform, brand, sentiment_polarity, intent, requires_reply, detected_language, comment_timestamp, processed_at, author_id.
No text columns, no DIM_CONTENT join — significantly faster than the full query.
Also merges demographics data if demographics_query is configured.
Cached for 24 hours (@st.cache_data(ttl=86400)).
Called once by app.py at startup; result stored in st.session_state['dashboard_df'].

`load_sa_data(platform, brand, top_n, min_comments, sort_by, sentiments, intents, date_range)`

Runs two sequential Snowflake queries:
1. Content aggregation — groups by content_sk, counts per sentiment, computes severity score, returns top N.
2. Sampled comments — for the top N content_sks only, fetches up to 50 comments per sentiment group per content (negative, positive, other), using Snowflake QUALIFY ROW_NUMBER(). display_text is computed in SQL (CASE WHEN IS_ENGLISH = FALSE AND TRANSLATED_TEXT IS NOT NULL THEN TRANSLATED_TEXT ELSE ORIGINAL_TEXT END).
Returns a tuple (contents_df, comments_df).
Cached for 24 hours.
Called only when the user clicks Fetch Data on the Sentiment Analysis page.

`load_reply_required_data(platforms, brands, date_range)`

Runs a single query filtering REQUIRES_REPLY = TRUE.
Dynamically includes/excludes the social media table and musora table based on selected platforms.
display_text computed in SQL.
Cached for 24 hours.
Called only when the user clicks Fetch Data on the Reply Required page.

Important: SQL Column Qualification

Both the social media table (COMMENT_SENTIMENT_FEATURES) and the content dimension table (DIM_CONTENT) share column names. Any WHERE clause inside a query that joins these two tables must use the table alias prefix (e.g. s.PLATFORM, s.COMMENT_TIMESTAMP, s.CHANNEL_NAME) to avoid Snowflake ambiguous column name errors. The musora table (MUSORA_COMMENT_SENTIMENT_FEATURES) has no joins so unqualified column names are fine there.

Global Filters & Session State

Global filters live in the sidebar (app.py) and are stored in st.session_state['global_filters'] as a dict:

{
    'platforms':  ['facebook', 'instagram'],   # list or []
    'brands':     ['drumeo'],
    'sentiments': [],
    'date_range': (date(2025, 1, 1), date(2025, 12, 31)),  # or None
}

Dashboard: app.py applies global filters to dashboard_df using data_loader.apply_filters() and passes the result to render_dashboard().
Sentiment Analysis / Reply Required: global filters are used to pre-populate their own filter widgets. The actual Snowflake query uses those values when the user clicks Fetch. The pages do not receive a pre-filtered dataframe.

Full session state key reference

Key	Set by	Used by
`dashboard_df`	`app.py` on startup	sidebar (filter options), dashboard, SA + RR (filter option lists)
`global_filters`	sidebar "Apply Filters" button	app.py (dashboard filter), SA + RR (pre-populate widgets)
`filters_applied`	sidebar buttons	app.py (whether to apply filters)
`sa_contents`	SA fetch button	SA page rendering
`sa_comments`	SA fetch button	SA page rendering
`sa_fetch_key`	SA fetch button	SA page (detect stale data)
`rr_df`	RR fetch button	RR page rendering
`rr_fetch_key`	RR fetch button	RR page (detect stale data)
`sentiment_page`	SA page / fetch	SA pagination
`reply_page`	RR page / fetch	RR pagination
`content_summaries`	AI analysis buttons	SA AI analysis display

Snowflake Queries

All query strings are either stored in config/viz_config.json (static queries) or built dynamically in data/data_loader.py (page-specific queries).

Static queries (in `viz_config.json`)

Key	Purpose
`query`	Full query with all columns (legacy, kept for compatibility)
`dashboard_query`	Lightweight query — no text, no DIM_CONTENT join
`demographics_query`	Joins `usora_users` with `preprocessed.users` to get age/timezone/experience

Dynamic queries (built in `data_loader.py`)

Method	Description
`_build_sa_content_query()`	Content aggregation for SA page; filters by platform + brand + date
`_build_sa_comments_query()`	Sampled comments for SA page; uses `QUALIFY ROW_NUMBER() <= 50`
`_build_rr_query()`	Reply-required comments; filters by platform/brand/date; conditionally includes social media and/or musora table

Data source tables

Table	Platform	Notes
`SOCIAL_MEDIA_DB.ML_FEATURES.COMMENT_SENTIMENT_FEATURES`	facebook, instagram, youtube, twitter	Needs `LEFT JOIN SOCIAL_MEDIA_DB.CORE.DIM_CONTENT` for `PERMALINK_URL`
`SOCIAL_MEDIA_DB.ML_FEATURES.MUSORA_COMMENT_SENTIMENT_FEATURES`	musora_app	Has `PERMALINK_URL` and `THUMBNAIL_URL` natively; platform stored as `'musora'`, mapped to `'musora_app'` in queries

Adding or Changing Things

Add a new chart to the Dashboard

Write the chart function in the appropriate visualizations/ file.
Call it from render_dashboard() in components/dashboard.py, passing filtered_df.
The chart function receives a lightweight df — it has no text columns but has all the columns listed in dashboard_query.

Add a new filter to the Dashboard sidebar

Add the widget in app.py under the "Global Filters" section.
Store the selected value in the global_filters dict under st.session_state.
Pass it to data_loader.apply_filters().

Change what the Sentiment Analysis page queries

Edit _build_sa_content_query() and/or _build_sa_comments_query() in data_loader.py.
If you add new columns to the content aggregation result, also update _process_sa_content_stats() so they are available in contents_df.
If you add new columns to the comments result, update _process_sa_comments().

Change what the Reply Required page queries

Edit _build_rr_query() in data_loader.py.
Remember: all column references inside the social media block (which has a JOIN) must be prefixed with s. to avoid Snowflake ambiguity errors.

Change the cache duration

@st.cache_data(ttl=86400) is set on load_dashboard_data, _fetch_sa_data, _fetch_rr_data, and load_demographics_data.
Change 86400 (seconds) to the desired TTL, or set ttl=None for no expiry.
Users can always force a refresh with the "Reload Data" button in the sidebar (which calls st.cache_data.clear() and deletes st.session_state['dashboard_df']).

Add a new page

Create components/new_page.py with a render_new_page(data_loader) function.
Import and add a radio option in app.py.
If the page needs its own Snowflake data, add a load_new_page_data() method to SentimentDataLoader following the same pattern as load_sa_data.

Add a new column to the Dashboard query

Edit dashboard_query in config/viz_config.json.
Both UNION branches must select the same columns in the same order.
_process_dashboard_dataframe() in data_loader.py handles basic type casting — add processing there if needed.

Running the App

# From the project root
streamlit run visualization/app.py

Required environment variables (in .env at project root):

SNOWFLAKE_USER
SNOWFLAKE_PASSWORD
SNOWFLAKE_ACCOUNT
SNOWFLAKE_ROLE
SNOWFLAKE_DATABASE
SNOWFLAKE_WAREHOUSE
SNOWFLAKE_SCHEMA

Configuration Reference

config/viz_config.json controls:

Section	What it configures
`color_schemes.sentiment_polarity`	Hex colors for each sentiment level
`color_schemes.intent`	Hex colors for each intent label
`color_schemes.platform`	Hex colors for each platform
`color_schemes.brand`	Hex colors for each brand
`sentiment_order`	Display order for sentiment categories in charts
`intent_order`	Display order for intent categories
`negative_sentiments`	Which sentiment values count as "negative"
`dashboard.default_date_range_days`	Default date filter window (days)
`dashboard.max_comments_display`	Max comments shown per pagination page
`dashboard.chart_height`	Default Plotly chart height
`dashboard.top_n_contents`	Default top-N for content ranking
`snowflake.query`	Full query (legacy, all columns)
`snowflake.dashboard_query`	Lightweight dashboard query (no text columns)
`snowflake.demographics_query`	Demographics join query
`demographics.age_groups`	Age bucket definitions (label → [min, max])
`demographics.experience_groups`	Experience bucket definitions
`demographics.top_timezones_count`	How many timezones to show in the geographic chart

Spaces:

MusoraProductDepartment
/

Sentiment_analysis

Running

Musora Sentiment Analysis Dashboard

Table of Contents

Project Structure

How Data Flows

Data Loading Strategy

`load_dashboard_data()`

`load_sa_data(platform, brand, top_n, min_comments, sort_by, sentiments, intents, date_range)`

`load_reply_required_data(platforms, brands, date_range)`

Important: SQL Column Qualification

Pages

Dashboard (`components/dashboard.py`)

Sentiment Analysis (`components/sentiment_analysis.py`)

Reply Required (`components/reply_required.py`)

Global Filters & Session State

Full session state key reference

Snowflake Queries

Static queries (in `viz_config.json`)

Dynamic queries (built in `data_loader.py`)

Data source tables

Adding or Changing Things

Add a new chart to the Dashboard

Add a new filter to the Dashboard sidebar

Change what the Sentiment Analysis page queries

Change what the Reply Required page queries

Change the cache duration

Add a new page

Add a new column to the Dashboard query

Running the App

Configuration Reference

Musora Sentiment Analysis Dashboard

Table of Contents

Project Structure

How Data Flows

Data Loading Strategy

load_dashboard_data()

load_sa_data(platform, brand, top_n, min_comments, sort_by, sentiments, intents, date_range)

load_reply_required_data(platforms, brands, date_range)

Important: SQL Column Qualification

Pages

Dashboard (components/dashboard.py)

Sentiment Analysis (components/sentiment_analysis.py)

Reply Required (components/reply_required.py)

Global Filters & Session State

Full session state key reference

Snowflake Queries

Static queries (in viz_config.json)

Dynamic queries (built in data_loader.py)

Data source tables

Adding or Changing Things

Add a new chart to the Dashboard

Add a new filter to the Dashboard sidebar

Change what the Sentiment Analysis page queries

Change what the Reply Required page queries

Change the cache duration

Add a new page

Add a new column to the Dashboard query

Running the App

Configuration Reference

`load_dashboard_data()`

`load_sa_data(platform, brand, top_n, min_comments, sort_by, sentiments, intents, date_range)`

`load_reply_required_data(platforms, brands, date_range)`

Dashboard (`components/dashboard.py`)

Sentiment Analysis (`components/sentiment_analysis.py`)

Reply Required (`components/reply_required.py`)

Static queries (in `viz_config.json`)

Dynamic queries (built in `data_loader.py`)