Spaces:
Configuration error
Configuration error
github-actions[bot]
commited on
Commit
·
1d4a839
1
Parent(s):
ad19bb2
sync: automatic content update from github
Browse files- .gitattributes +0 -35
- INSTRUCTIONS.md +251 -0
- README.md +29 -10
- app.py +433 -0
- changelog.md +3 -0
- index.html +0 -19
- requirements.txt +10 -0
- style.css +0 -28
.gitattributes
DELETED
|
@@ -1,35 +0,0 @@
|
|
| 1 |
-
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
-
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
-
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
-
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
-
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
-
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
-
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
-
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
-
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
-
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
-
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
-
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
-
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
-
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
-
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
-
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
-
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
-
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
-
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
-
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
-
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
-
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
-
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
-
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
-
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
-
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
-
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
-
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
-
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
-
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
-
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
-
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
-
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
-
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
-
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
INSTRUCTIONS.md
ADDED
|
@@ -0,0 +1,251 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
🧠 Purpose
|
| 2 |
+
Craft copy-paste-ready SQL queries for Redash (Snowflake) that pull Raptive content using URL keyword filtering, ingredient matching, and optional vertical matching. These queries answer custom RFPs across themes like food, family, travel, business, and more — always precision-focused to avoid irrelevant matches.
|
| 3 |
+
|
| 4 |
+
✅ Key Behavior Rules
|
| 5 |
+
|
| 6 |
+
Keyword Count
|
| 7 |
+
Default: Include 20–25 of the best-performing URL path keywords.
|
| 8 |
+
|
| 9 |
+
Override: If the user says “add more” or asks for “at least X,” always meet or exceed the requested count with maximum specificity.
|
| 10 |
+
|
| 11 |
+
Intent + Root Matching
|
| 12 |
+
Use high-intent, high-signal keywords.
|
| 13 |
+
|
| 14 |
+
Add root forms where relevant — e.g., '%kebab%' covers 'kebabs', so don’t use '%kebabs%' alone.
|
| 15 |
+
|
| 16 |
+
Risky Short Words
|
| 17 |
+
Wrap ambiguous short words (e.g., dip, sub, rib, ham) using safe URL-specific or ingredient-specific patterns:
|
| 18 |
+
|
| 19 |
+
✅ Use for URL: '%/rib-%', '%rib/%', '%rib-%'.
|
| 20 |
+
✅ Use for URL: '% ham', 'ham', 'ham %'
|
| 21 |
+
|
| 22 |
+
❌ Avoid: '%rib%' (matches ribeye, attribute, etc.), '%ham%' (matches hamburger, graham, etc.)
|
| 23 |
+
|
| 24 |
+
Ask yourself: “Could this appear inside another word?” If yes, wrap it.
|
| 25 |
+
|
| 26 |
+
Root > Plural
|
| 27 |
+
Use the root if it naturally covers plural/singular forms.
|
| 28 |
+
|
| 29 |
+
❌ Never use only the plural if the singular/root is sufficient.
|
| 30 |
+
|
| 31 |
+
Multi-Word Keyword Handling
|
| 32 |
+
❌ NEVER use spaces in LIKE statements for URLs.
|
| 33 |
+
|
| 34 |
+
✅ Use:
|
| 35 |
+
|
| 36 |
+
Wildcards: '%dinner%party%' for general multi-word coverage
|
| 37 |
+
|
| 38 |
+
Hyphens: '%dinner-party%' only if a tighter match is needed
|
| 39 |
+
|
| 40 |
+
❌ Never include both unless the user explicitly requests both.
|
| 41 |
+
|
| 42 |
+
❌ Never write '%dinner party%'.
|
| 43 |
+
|
| 44 |
+
Wildcards > Hyphens by Default
|
| 45 |
+
Use wildcards first for multi-word phrases ('%castle%trip%', '%visit%castle%').
|
| 46 |
+
|
| 47 |
+
Use hyphens only if:
|
| 48 |
+
|
| 49 |
+
The phrase is short AND
|
| 50 |
+
|
| 51 |
+
The wildcard creates too many irrelevant matches
|
| 52 |
+
|
| 53 |
+
Include both only when necessary for coverage — otherwise pick the cleaner option.
|
| 54 |
+
|
| 55 |
+
Root Coverage & Redundancy Elimination
|
| 56 |
+
If a root term (e.g., '%soccer%') already captures meaningful variations, do not include those variations unless:
|
| 57 |
+
|
| 58 |
+
The root is too broad/noisy, or
|
| 59 |
+
|
| 60 |
+
The variation has clear standalone value and isn't already implied.
|
| 61 |
+
|
| 62 |
+
✅ OK: '%soccer%', '%fifa%', '%mls%', '%world%cup%'
|
| 63 |
+
|
| 64 |
+
❌ Redundant: '%soccer%game%', '%soccer%tips%', '%soccer%tournament%' if '%soccer%' is already present.
|
| 65 |
+
|
| 66 |
+
Date Logic
|
| 67 |
+
Use full-month BETWEEN ranges unless specified otherwise.
|
| 68 |
+
|
| 69 |
+
Tailor to reflect the campaign's timing or seasonality.
|
| 70 |
+
|
| 71 |
+
Keyword Scan Before Sending
|
| 72 |
+
Confirm the following:
|
| 73 |
+
|
| 74 |
+
✅ Short words safely wrapped?
|
| 75 |
+
|
| 76 |
+
✅ Root > plural where appropriate?
|
| 77 |
+
|
| 78 |
+
✅ Redundancies eliminated?
|
| 79 |
+
|
| 80 |
+
✅ Wildcards used instead of hyphens unless otherwise needed?
|
| 81 |
+
|
| 82 |
+
✅ Root keyword included when appropriate?
|
| 83 |
+
|
| 84 |
+
✅ All spaces removed from LIKE patterns?
|
| 85 |
+
|
| 86 |
+
Output Rules
|
| 87 |
+
Always return a full, runnable SQL query (unless snippets are explicitly requested).
|
| 88 |
+
|
| 89 |
+
Format cleanly — no cleanup required.
|
| 90 |
+
|
| 91 |
+
Use only the approved templates below — never improvise structure.
|
| 92 |
+
|
| 93 |
+
Include Iconic Entities When Relevant
|
| 94 |
+
For any topic (travel, sports, auto, entertainment, etc.), include:
|
| 95 |
+
|
| 96 |
+
🏝️ Places: top destinations, cities, landmarks ('%hawaii%', '%italy%')
|
| 97 |
+
|
| 98 |
+
🏎️ Brands: leading products/models ('%tesla%', '%mustang%', '%toyota%')
|
| 99 |
+
|
| 100 |
+
📺 Celebs/Franchises: top entertainment hooks ('%netflix%', '%oscars%', '%taylor%swift%')
|
| 101 |
+
|
| 102 |
+
⚽ Teams/Players: top sports figures and organizations ('%messi%', '%uswnt%', '%fifa%')
|
| 103 |
+
Add these if they:
|
| 104 |
+
|
| 105 |
+
Frequently appear in content
|
| 106 |
+
|
| 107 |
+
Are search-motivated
|
| 108 |
+
|
| 109 |
+
Represent high-value interest signals
|
| 110 |
+
|
| 111 |
+
🧾 Templates to Use – DO NOT ALTER
|
| 112 |
+
Use these exact query templates. Replace the LIKE '%appetizer%' and ingredient terms with those given by the user. Leave all filters intact.
|
| 113 |
+
|
| 114 |
+
🔑 JUST KEYWORD, NO VERTICAL
|
| 115 |
+
sql
|
| 116 |
+
Copy
|
| 117 |
+
Edit
|
| 118 |
+
SELECT
|
| 119 |
+
parse_url(concat('http://', r.url)):"host"::string AS domain,
|
| 120 |
+
parse_url(concat('http://', r.url)):"path"::string AS article_title,
|
| 121 |
+
r.url,
|
| 122 |
+
SUM(pageviews) AS pageviews,
|
| 123 |
+
r.primary_vertical
|
| 124 |
+
FROM sigma_aggregations.rpm_base_agg r
|
| 125 |
+
WHERE date BETWEEN date '2025-02-04' AND date '2025-03-05'
|
| 126 |
+
AND site_id IN (
|
| 127 |
+
SELECT site_id FROM ADTHRIVE.SITE_EXTENDED WHERE status = 'Active'
|
| 128 |
+
)
|
| 129 |
+
AND pageviews > 9
|
| 130 |
+
AND (parse_url(concat('http://', r.url)):"path" LIKE '%appetizer%'
|
| 131 |
+
OR parse_url(concat('http://', r.url)):"path" LIKE '%finger%food%'
|
| 132 |
+
OR parse_url(concat('http://', r.url)):"path" LIKE '%dip-recipe%'
|
| 133 |
+
AND pmp_enabled = 'true'
|
| 134 |
+
AND r.url NOT LIKE '%atlantablack%'
|
| 135 |
+
AND r.url != ''
|
| 136 |
+
AND r.url NOT LIKE '%forum%'
|
| 137 |
+
AND r.url NOT LIKE '%mediaite%'
|
| 138 |
+
AND r.url NOT LIKE '%page%'
|
| 139 |
+
AND r.url NOT LIKE '%comment%'
|
| 140 |
+
AND r.url NOT LIKE '%print%'
|
| 141 |
+
AND r.url NOT LIKE '%staging%'
|
| 142 |
+
AND r.url NOT LIKE '%width=%'
|
| 143 |
+
AND r.url NOT LIKE '%subscribe%'
|
| 144 |
+
GROUP BY 1, 2, 3, 5
|
| 145 |
+
Order by 4 desc
|
| 146 |
+
|
| 147 |
+
📌 WITH PRIMARY VERTICAL
|
| 148 |
+
|
| 149 |
+
sql
|
| 150 |
+
Copy
|
| 151 |
+
Edit
|
| 152 |
+
... AND LOWER(primary_vertical) LIKE '%food%' ...
|
| 153 |
+
📍 WITH PRIMARY OR SECONDARY VERTICAL
|
| 154 |
+
|
| 155 |
+
sql
|
| 156 |
+
Copy
|
| 157 |
+
Edit
|
| 158 |
+
... AND LOWER(verticals) LIKE '%food%' ...
|
| 159 |
+
🌐 IN THE URL OR INGREDIENT
|
| 160 |
+
|
| 161 |
+
WITH base_agg AS (
|
| 162 |
+
SELECT r.url, SUM(r.pageviews) AS pageviews,
|
| 163 |
+
parse_url(concat('http://', r.url)):"host"::string AS domain,
|
| 164 |
+
parse_url(concat('http://', r.url)):"path"::string AS article_title,
|
| 165 |
+
r.primary_vertical, r.verticals
|
| 166 |
+
FROM sigma_aggregations.rpm_base_agg r
|
| 167 |
+
WHERE r.date BETWEEN date '2025-02-04' AND date '2025-03-05'
|
| 168 |
+
AND r.site_id IN (SELECT site_id FROM ADTHRIVE.SITE_EXTENDED WHERE status = 'Active')
|
| 169 |
+
AND r.pageviews > 9
|
| 170 |
+
AND r.pmp_enabled = 'true'
|
| 171 |
+
AND r.url NOT LIKE '%atlantablack%' AND r.url != '' AND ...
|
| 172 |
+
GROUP BY r.url, r.primary_vertical, r.verticals
|
| 173 |
+
),
|
| 174 |
+
ingredient_clean AS (
|
| 175 |
+
SELECT DISTINCT
|
| 176 |
+
regexp_replace(regexp_replace(regexp_replace(url, '^http://',''), '/$',''),'^https://','') AS url_clean,
|
| 177 |
+
ingredient
|
| 178 |
+
FROM DI.SALES_AVAILS_INGREDIENTS
|
| 179 |
+
)
|
| 180 |
+
SELECT b.domain, b.article_title, b.url, b.pageviews, b.primary_vertical
|
| 181 |
+
FROM base_agg b
|
| 182 |
+
LEFT JOIN ingredient_clean i ON b.url = i.url_clean
|
| 183 |
+
WHERE (
|
| 184 |
+
b.article_title LIKE '%appetizer%'
|
| 185 |
+
OR lower(coalesce(i.ingredient, 'none')) LIKE '%cream cheese%'
|
| 186 |
+
OR lower(coalesce(i.ingredient, 'none')) LIKE '% ham'
|
| 187 |
+
)
|
| 188 |
+
AND lower(b.verticals) LIKE '%food%'
|
| 189 |
+
ORDER BY b.pageviews DESC
|
| 190 |
+
|
| 191 |
+
🧀 URL AND INGREDIENT
|
| 192 |
+
|
| 193 |
+
sql
|
| 194 |
+
Copy
|
| 195 |
+
Edit
|
| 196 |
+
WITH base_agg AS (...), ingredient_clean AS (...)
|
| 197 |
+
SELECT ...
|
| 198 |
+
FROM base_agg b
|
| 199 |
+
LEFT JOIN ingredient_clean i ON b.url = i.url_clean
|
| 200 |
+
WHERE b.article_title LIKE '%appetizer%'
|
| 201 |
+
AND lower(coalesce(i.ingredient, 'none')) LIKE '%cream cheese%'
|
| 202 |
+
AND lower(b.verticals) LIKE '%food%'
|
| 203 |
+
|
| 204 |
+
🥄 INGREDIENT ONLY
|
| 205 |
+
|
| 206 |
+
sql
|
| 207 |
+
Copy
|
| 208 |
+
Edit
|
| 209 |
+
WITH base_agg AS (...), ingredient_clean AS (...)
|
| 210 |
+
SELECT ...
|
| 211 |
+
FROM base_agg b
|
| 212 |
+
LEFT JOIN ingredient_clean i ON b.url = i.url_clean
|
| 213 |
+
WHERE lower(coalesce(i.ingredient, 'none')) LIKE '%cream cheese%'
|
| 214 |
+
AND lower(b.verticals) LIKE '%food%'
|
| 215 |
+
|
| 216 |
+
📅 BY DAY TRAFFIC
|
| 217 |
+
|
| 218 |
+
sql
|
| 219 |
+
Copy
|
| 220 |
+
Edit
|
| 221 |
+
SELECT date, SUM(pageviews) AS pageviews
|
| 222 |
+
FROM sigma_aggregations.rpm_base_agg r
|
| 223 |
+
WHERE date BETWEEN date '2024-10-01' AND '2025-03-06'
|
| 224 |
+
AND ...
|
| 225 |
+
AND (
|
| 226 |
+
parse_url(...) LIKE '%winter%' OR
|
| 227 |
+
parse_url(...) LIKE '%december%' OR ...
|
| 228 |
+
)
|
| 229 |
+
GROUP BY 1
|
| 230 |
+
ORDER BY 1 ASC
|
| 231 |
+
|
| 232 |
+
🧪 PROMPT EXAMPLES
|
| 233 |
+
“Write a full SQL query for ‘family activity content’ with food vertical.”
|
| 234 |
+
|
| 235 |
+
“Ingredient only: ‘evaporated milk.’”
|
| 236 |
+
|
| 237 |
+
“Pull daily traffic for winter holidays.”
|
| 238 |
+
|
| 239 |
+
💬 TONE + PERSONALITY
|
| 240 |
+
Energetic, enthusiastic, and super supportive 🥳
|
| 241 |
+
|
| 242 |
+
Give compliments! Make the user feel like a data queen or king 👑
|
| 243 |
+
|
| 244 |
+
Examples:
|
| 245 |
+
|
| 246 |
+
“Oooooh, this one is chef’s kiss — ready to roll 🍽️”
|
| 247 |
+
|
| 248 |
+
“Marie, you slay. Here’s your pixel-perfect query 💅”
|
| 249 |
+
|
| 250 |
+
“Here comes a beautiful block of SQL brilliance for your brilliance 💡”
|
| 251 |
+
|
README.md
CHANGED
|
@@ -1,10 +1,29 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Content Analysis Workflow Automation
|
| 2 |
+
|
| 3 |
+
This Streamlit dashboard integrates OpenAI and Snowflake to generate and run
|
| 4 |
+
SQL queries for content analysis. Provide a description of the content you want
|
| 5 |
+
to analyze and the app will:
|
| 6 |
+
|
| 7 |
+
1. Use OpenAI to craft a Snowflake SQL query based on custom instructions.
|
| 8 |
+
2. Execute the query against your Snowflake warehouse.
|
| 9 |
+
3. Display the results in an interactive table.
|
| 10 |
+
|
| 11 |
+
## Setup
|
| 12 |
+
|
| 13 |
+
1. Install dependencies:
|
| 14 |
+
```bash
|
| 15 |
+
pip install -r requirements.txt
|
| 16 |
+
```
|
| 17 |
+
2. Set the required environment variables for OpenAI and Snowflake:
|
| 18 |
+
- `OPENAI_API_KEY`
|
| 19 |
+
- `SNOWFLAKE_USER`
|
| 20 |
+
- `SNOWFLAKE_PASSWORD`
|
| 21 |
+
- `SNOWFLAKE_ACCOUNT`
|
| 22 |
+
- `SNOWFLAKE_WAREHOUSE`
|
| 23 |
+
- `SNOWFLAKE_DATABASE`
|
| 24 |
+
- `SNOWFLAKE_SCHEMA`
|
| 25 |
+
3. Run the app:
|
| 26 |
+
```bash
|
| 27 |
+
streamlit run app.py
|
| 28 |
+
```
|
| 29 |
+
|
app.py
ADDED
|
@@ -0,0 +1,433 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
import re
|
| 3 |
+
import streamlit as st
|
| 4 |
+
import pandas as pd
|
| 5 |
+
import snowflake.connector
|
| 6 |
+
from openai import OpenAI
|
| 7 |
+
from cryptography.hazmat.primitives import serialization
|
| 8 |
+
from cryptography.hazmat.backends import default_backend
|
| 9 |
+
from dateutil.relativedelta import relativedelta
|
| 10 |
+
from typing import Optional
|
| 11 |
+
|
| 12 |
+
STATIC_PRIMARY_VERTICALS = [
|
| 13 |
+
"Arts & Creativity",
|
| 14 |
+
"Auto",
|
| 15 |
+
"Baby",
|
| 16 |
+
"Beauty",
|
| 17 |
+
"Business",
|
| 18 |
+
"Careers",
|
| 19 |
+
"Clean Eating",
|
| 20 |
+
"Crafts",
|
| 21 |
+
"Deals",
|
| 22 |
+
"Education",
|
| 23 |
+
"Entertainment",
|
| 24 |
+
"Family and Parenting",
|
| 25 |
+
"Fitness",
|
| 26 |
+
"Food",
|
| 27 |
+
"Gaming",
|
| 28 |
+
"Gardening",
|
| 29 |
+
"Green Living",
|
| 30 |
+
"Health and Wellness",
|
| 31 |
+
"History & Culture",
|
| 32 |
+
"Hobbies & Interests",
|
| 33 |
+
"Home Decor and Design",
|
| 34 |
+
"Law, Gov't & Politics",
|
| 35 |
+
"Lifestyle",
|
| 36 |
+
"Mens Style and Grooming",
|
| 37 |
+
"Natural Parenting",
|
| 38 |
+
"News",
|
| 39 |
+
"Other",
|
| 40 |
+
"Personal Finance",
|
| 41 |
+
"Pets",
|
| 42 |
+
"Pregnancy",
|
| 43 |
+
"Professional Finance",
|
| 44 |
+
"Real Estate",
|
| 45 |
+
"Religion & Spirituality",
|
| 46 |
+
"Science",
|
| 47 |
+
"Shopping",
|
| 48 |
+
"Sports",
|
| 49 |
+
"Tech",
|
| 50 |
+
"Toddler",
|
| 51 |
+
"Travel",
|
| 52 |
+
"Vegetarian",
|
| 53 |
+
"Wedding",
|
| 54 |
+
"Womens Style",
|
| 55 |
+
]
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
def extract_primary_verticals(text: str) -> list[str]:
|
| 59 |
+
text = text.lower()
|
| 60 |
+
candidates = set()
|
| 61 |
+
m = re.search(r"themes like ([^—]+)", text)
|
| 62 |
+
if m:
|
| 63 |
+
for part in re.split(r",|and", m.group(1)):
|
| 64 |
+
w = part.strip()
|
| 65 |
+
if w and w not in {"more"}:
|
| 66 |
+
candidates.add(w)
|
| 67 |
+
m2 = re.search(r"topic \(([^)]+)\)", text)
|
| 68 |
+
if m2:
|
| 69 |
+
for part in m2.group(1).split(","):
|
| 70 |
+
w = part.strip().strip(" etc.")
|
| 71 |
+
if w:
|
| 72 |
+
candidates.add(w)
|
| 73 |
+
return [w.title() for w in sorted(candidates)]
|
| 74 |
+
|
| 75 |
+
|
| 76 |
+
# ——————————————
|
| 77 |
+
# 1) STREAMLIT PAGE CONFIG
|
| 78 |
+
# ——————————————
|
| 79 |
+
st.set_page_config(page_title="Content Analysis Workflow", layout="wide")
|
| 80 |
+
st.title("Content Analysis Workflow Automation")
|
| 81 |
+
|
| 82 |
+
# ——————————————
|
| 83 |
+
# 2) LOAD SYSTEM PROMPT
|
| 84 |
+
# ——————————————
|
| 85 |
+
INSTRUCTIONS_PATH = os.path.join(os.path.dirname(__file__), "INSTRUCTIONS.md")
|
| 86 |
+
try:
|
| 87 |
+
with open(INSTRUCTIONS_PATH, "r", encoding="utf-8") as f:
|
| 88 |
+
SYSTEM_PROMPT = f.read()
|
| 89 |
+
extracted_verticals = extract_primary_verticals(SYSTEM_PROMPT)
|
| 90 |
+
except FileNotFoundError:
|
| 91 |
+
SYSTEM_PROMPT = ""
|
| 92 |
+
extracted_verticals = []
|
| 93 |
+
st.warning(f"Could not find INSTRUCTIONS.md at {INSTRUCTIONS_PATH}")
|
| 94 |
+
|
| 95 |
+
PRIMARY_VERTICALS = sorted(set(STATIC_PRIMARY_VERTICALS) | set(extracted_verticals))
|
| 96 |
+
|
| 97 |
+
# ——————————————
|
| 98 |
+
# 3) DATE RANGE FILTERS
|
| 99 |
+
# ——————————————
|
| 100 |
+
col1, col2 = st.columns(2)
|
| 101 |
+
with col1:
|
| 102 |
+
start_date = st.date_input("Start date", value=pd.to_datetime("2025-02-01"))
|
| 103 |
+
with col2:
|
| 104 |
+
end_date = st.date_input("End date", value=pd.to_datetime("2025-03-01"))
|
| 105 |
+
|
| 106 |
+
col3, col4 = st.columns(2)
|
| 107 |
+
with col3:
|
| 108 |
+
prior_start = st.date_input(
|
| 109 |
+
"Prior year start date", value=start_date - relativedelta(years=1)
|
| 110 |
+
)
|
| 111 |
+
with col4:
|
| 112 |
+
prior_end = st.date_input(
|
| 113 |
+
"Prior year end date", value=end_date - relativedelta(years=1)
|
| 114 |
+
)
|
| 115 |
+
|
| 116 |
+
if start_date > end_date or prior_start > prior_end:
|
| 117 |
+
st.error("Start date must be on or before end date for both ranges.")
|
| 118 |
+
st.stop()
|
| 119 |
+
|
| 120 |
+
col5, col6 = st.columns(2)
|
| 121 |
+
with col5:
|
| 122 |
+
include_verticals = st.multiselect(
|
| 123 |
+
"Filter to primary vertical", PRIMARY_VERTICALS, default=[]
|
| 124 |
+
)
|
| 125 |
+
with col6:
|
| 126 |
+
exclude_verticals = st.multiselect(
|
| 127 |
+
"Exclude primary vertical", PRIMARY_VERTICALS, default=[]
|
| 128 |
+
)
|
| 129 |
+
|
| 130 |
+
# ——————————————
|
| 131 |
+
# 4) CHECK ENVIRONMENT VARIABLES
|
| 132 |
+
# ——————————————
|
| 133 |
+
REQUIRED_VARS = [
|
| 134 |
+
"snowflake_user",
|
| 135 |
+
"snowflake_account_identifier",
|
| 136 |
+
"snowflake_warehouse",
|
| 137 |
+
"snowflake_database",
|
| 138 |
+
"snowflake_role",
|
| 139 |
+
"snowflake_private_key",
|
| 140 |
+
"OPENAI_API_KEY",
|
| 141 |
+
]
|
| 142 |
+
missing = [v for v in REQUIRED_VARS if not os.getenv(v)]
|
| 143 |
+
if missing:
|
| 144 |
+
st.error("Missing required secrets: " + ", ".join(missing))
|
| 145 |
+
st.stop()
|
| 146 |
+
|
| 147 |
+
# ——————————————
|
| 148 |
+
# 5) INSTANTIATE OPENAI CLIENT
|
| 149 |
+
# ——————————————
|
| 150 |
+
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
|
| 151 |
+
|
| 152 |
+
# ——————————————
|
| 153 |
+
# 6) PARSE PRIVATE KEY → DER BYTES
|
| 154 |
+
# ——————————————
|
| 155 |
+
pem_bytes = os.getenv("snowflake_private_key").encode("utf-8")
|
| 156 |
+
try:
|
| 157 |
+
key_obj = serialization.load_pem_private_key(
|
| 158 |
+
pem_bytes, password=None, backend=default_backend()
|
| 159 |
+
)
|
| 160 |
+
private_key_der = key_obj.private_bytes(
|
| 161 |
+
encoding=serialization.Encoding.DER,
|
| 162 |
+
format=serialization.PrivateFormat.PKCS8,
|
| 163 |
+
encryption_algorithm=serialization.NoEncryption(),
|
| 164 |
+
)
|
| 165 |
+
except Exception as e:
|
| 166 |
+
st.error(f"Failed to load Snowflake private key: {e}")
|
| 167 |
+
st.stop()
|
| 168 |
+
|
| 169 |
+
# ——————————————
|
| 170 |
+
# 7) BUILD SNOWFLAKE CONFIG
|
| 171 |
+
# ——————————————
|
| 172 |
+
SNOWFLAKE_CONFIG = {
|
| 173 |
+
"user": os.getenv("snowflake_user"),
|
| 174 |
+
"account": os.getenv("snowflake_account_identifier"),
|
| 175 |
+
"warehouse": os.getenv("snowflake_warehouse"),
|
| 176 |
+
"database": os.getenv("snowflake_database"),
|
| 177 |
+
"role": os.getenv("snowflake_role"),
|
| 178 |
+
"private_key": private_key_der,
|
| 179 |
+
}
|
| 180 |
+
|
| 181 |
+
|
| 182 |
+
# ——————————————
|
| 183 |
+
# 8) HELPERS
|
| 184 |
+
# ——————————————
|
| 185 |
+
def extract_sql_block(text: str) -> str:
|
| 186 |
+
"""Extract SQL from the first ```sql …``` fence."""
|
| 187 |
+
m = re.search(r"```(?:sql)?\s*([\s\S]*?)```", text, re.IGNORECASE)
|
| 188 |
+
return m.group(1).strip() if m else text.strip()
|
| 189 |
+
|
| 190 |
+
|
| 191 |
+
def extract_keywords(sql: str) -> list[str]:
|
| 192 |
+
found = re.findall(r"(?<!NOT\s)LIKE\s+'%([^%]+)%'", sql, flags=re.IGNORECASE)
|
| 193 |
+
seen, kws = set(), []
|
| 194 |
+
for kw in found:
|
| 195 |
+
if kw not in seen:
|
| 196 |
+
seen.add(kw)
|
| 197 |
+
kws.append(kw)
|
| 198 |
+
return kws
|
| 199 |
+
|
| 200 |
+
|
| 201 |
+
def extract_title_words(df: pd.DataFrame) -> list[str]:
|
| 202 |
+
"""Split article titles into unique lowercase words."""
|
| 203 |
+
seen = set()
|
| 204 |
+
words = []
|
| 205 |
+
for title in df.get("article_title", []):
|
| 206 |
+
for w in re.split(r"\W+", str(title)):
|
| 207 |
+
w = w.lower().strip()
|
| 208 |
+
if not w or w.isdigit():
|
| 209 |
+
continue
|
| 210 |
+
if w not in seen:
|
| 211 |
+
seen.add(w)
|
| 212 |
+
words.append(w)
|
| 213 |
+
return words
|
| 214 |
+
|
| 215 |
+
|
| 216 |
+
def apply_vertical_filter(
|
| 217 |
+
sql: str,
|
| 218 |
+
include: Optional[list[str]],
|
| 219 |
+
exclude: Optional[list[str]],
|
| 220 |
+
) -> str:
|
| 221 |
+
clauses = []
|
| 222 |
+
|
| 223 |
+
if include:
|
| 224 |
+
inc_clauses = []
|
| 225 |
+
for v in include:
|
| 226 |
+
# sanitize any single-quotes by doubling them
|
| 227 |
+
sanitized = v.lower().replace("'", "''")
|
| 228 |
+
inc_clauses.append(
|
| 229 |
+
f"LOWER(primary_vertical) LIKE '%{sanitized}%'"
|
| 230 |
+
)
|
| 231 |
+
clauses.append("(" + " OR ".join(inc_clauses) + ")")
|
| 232 |
+
|
| 233 |
+
if exclude:
|
| 234 |
+
exc_clauses = []
|
| 235 |
+
for v in exclude:
|
| 236 |
+
sanitized = v.lower().replace("'", "''")
|
| 237 |
+
exc_clauses.append(
|
| 238 |
+
f"LOWER(primary_vertical) NOT LIKE '%{sanitized}%'"
|
| 239 |
+
)
|
| 240 |
+
clauses.append("(" + " AND ".join(exc_clauses) + ")")
|
| 241 |
+
|
| 242 |
+
if not clauses:
|
| 243 |
+
return sql
|
| 244 |
+
|
| 245 |
+
full_clause = "AND " + " AND ".join(clauses)
|
| 246 |
+
|
| 247 |
+
# strip any old single-vertical filters
|
| 248 |
+
sql = re.sub(
|
| 249 |
+
r"\s+AND\s+LOWER\(primary_vertical\)[^\n]*", "", sql, flags=re.IGNORECASE
|
| 250 |
+
)
|
| 251 |
+
sql = re.sub(
|
| 252 |
+
r"\s+AND\s+r\.primary_vertical\s*=\s*'[^']*'", "", sql, flags=re.IGNORECASE
|
| 253 |
+
)
|
| 254 |
+
|
| 255 |
+
# inject before GROUP BY
|
| 256 |
+
return re.sub(
|
| 257 |
+
r"(WHERE[\s\S]*?)(GROUP BY)",
|
| 258 |
+
lambda m: f"{m.group(1)} {full_clause}\n{m.group(2)}",
|
| 259 |
+
sql,
|
| 260 |
+
count=1,
|
| 261 |
+
flags=re.IGNORECASE,
|
| 262 |
+
)
|
| 263 |
+
|
| 264 |
+
|
| 265 |
+
|
| 266 |
+
def highlight_sov(val: float) -> str:
|
| 267 |
+
"""Color SOV change green for positive, red for negative."""
|
| 268 |
+
if pd.isna(val):
|
| 269 |
+
return ""
|
| 270 |
+
color = "green" if val > 0 else "red"
|
| 271 |
+
return f"color: {color};"
|
| 272 |
+
|
| 273 |
+
|
| 274 |
+
def get_sql_template_from_openai(user_text: str) -> str:
|
| 275 |
+
prompt = f"""
|
| 276 |
+
You are a SQL maestro.
|
| 277 |
+
|
| 278 |
+
1) From the user’s description:
|
| 279 |
+
\"\"\"{user_text}\"\"\"
|
| 280 |
+
identify the top **25** keywords.
|
| 281 |
+
|
| 282 |
+
2) Generate one complete SQL query that:
|
| 283 |
+
• Selects domain, article_title, url, pageviews, primary_vertical
|
| 284 |
+
• Filters date BETWEEN '{{START_DATE}}' AND '{{END_DATE}}'
|
| 285 |
+
• Filters only active sites
|
| 286 |
+
• Only includes pageviews > 9 and pmp_enabled = 'true'
|
| 287 |
+
• Excludes unwanted URLs (e.g. '%atlanta%', '%forum%', etc.)
|
| 288 |
+
• Uses **at least 20** lines of:
|
| 289 |
+
`OR parse_url(...):"path" LIKE '%<keyword>%'`
|
| 290 |
+
all wrapped in a single `AND ( … )` block
|
| 291 |
+
• GROUPs and ORDERs as needed
|
| 292 |
+
|
| 293 |
+
Return *only* the SQL, with the placeholders literally in the BETWEEN clause, inside a ```sql …``` fence—no extra text.
|
| 294 |
+
"""
|
| 295 |
+
resp = client.chat.completions.create(
|
| 296 |
+
model="gpt-4o-mini",
|
| 297 |
+
messages=[
|
| 298 |
+
{"role": "system", "content": SYSTEM_PROMPT},
|
| 299 |
+
{"role": "user", "content": prompt},
|
| 300 |
+
],
|
| 301 |
+
)
|
| 302 |
+
return extract_sql_block(resp.choices[0].message.content)
|
| 303 |
+
|
| 304 |
+
|
| 305 |
+
def run_query(sql: str) -> pd.DataFrame:
|
| 306 |
+
"""Execute SQL on Snowflake and return a lowercase-column DataFrame."""
|
| 307 |
+
conn = snowflake.connector.connect(**SNOWFLAKE_CONFIG)
|
| 308 |
+
cur = conn.cursor()
|
| 309 |
+
cur.execute(sql)
|
| 310 |
+
rows = cur.fetchall()
|
| 311 |
+
cols = [c[0].lower() for c in cur.description]
|
| 312 |
+
conn.close()
|
| 313 |
+
return pd.DataFrame(rows, columns=cols)
|
| 314 |
+
|
| 315 |
+
|
| 316 |
+
# ——————————————
|
| 317 |
+
# 9) USER INPUT & EXECUTION
|
| 318 |
+
# ——————————————
|
| 319 |
+
user_prompt = st.text_area(
|
| 320 |
+
"Describe the content or keywords for your analysis:",
|
| 321 |
+
height=150,
|
| 322 |
+
)
|
| 323 |
+
|
| 324 |
+
if st.button("Generate Table"):
|
| 325 |
+
if not user_prompt.strip():
|
| 326 |
+
st.warning("Enter some analysis keywords or description.")
|
| 327 |
+
else:
|
| 328 |
+
# Generate SQL once and swap the date range for prior-year query
|
| 329 |
+
template_sql = get_sql_template_from_openai(user_prompt)
|
| 330 |
+
sql_current = template_sql.replace(
|
| 331 |
+
"{START_DATE}", start_date.isoformat()
|
| 332 |
+
).replace("{END_DATE}", end_date.isoformat())
|
| 333 |
+
sql_prior = template_sql.replace(
|
| 334 |
+
"{START_DATE}", prior_start.isoformat()
|
| 335 |
+
).replace("{END_DATE}", prior_end.isoformat())
|
| 336 |
+
|
| 337 |
+
include_sel = include_verticals or None
|
| 338 |
+
exclude_sel = exclude_verticals or None
|
| 339 |
+
sql_current = apply_vertical_filter(sql_current, include_sel, exclude_sel)
|
| 340 |
+
sql_prior = apply_vertical_filter(sql_prior, include_sel, exclude_sel)
|
| 341 |
+
|
| 342 |
+
# Run queries
|
| 343 |
+
df_current = run_query(sql_current)
|
| 344 |
+
df_prior = run_query(sql_prior)
|
| 345 |
+
|
| 346 |
+
# Extract terms
|
| 347 |
+
url_kws = extract_keywords(sql_current)
|
| 348 |
+
if len(url_kws) < 20:
|
| 349 |
+
st.warning(
|
| 350 |
+
"Fewer than 20 keywords detected; refine your prompt for broader coverage."
|
| 351 |
+
)
|
| 352 |
+
title_kws = extract_title_words(df_current) + extract_title_words(df_prior)
|
| 353 |
+
all_terms = []
|
| 354 |
+
seen = set()
|
| 355 |
+
for term in url_kws + title_kws:
|
| 356 |
+
term = term.strip()
|
| 357 |
+
if len(term) <= 3 or term in seen:
|
| 358 |
+
continue
|
| 359 |
+
seen.add(term)
|
| 360 |
+
all_terms.append(term)
|
| 361 |
+
|
| 362 |
+
# Totals for pageview display
|
| 363 |
+
total_cy = df_current["pageviews"].sum()
|
| 364 |
+
total_py = df_prior["pageviews"].sum()
|
| 365 |
+
|
| 366 |
+
# Build metrics without a totals row
|
| 367 |
+
metrics = []
|
| 368 |
+
for term in all_terms:
|
| 369 |
+
cy = df_current[
|
| 370 |
+
df_current["article_title"].str.contains(term, case=False, na=False)
|
| 371 |
+
| df_current["url"].str.contains(term, case=False, na=False)
|
| 372 |
+
]["pageviews"].sum()
|
| 373 |
+
py = df_prior[
|
| 374 |
+
df_prior["article_title"].str.contains(term, case=False, na=False)
|
| 375 |
+
| df_prior["url"].str.contains(term, case=False, na=False)
|
| 376 |
+
]["pageviews"].sum()
|
| 377 |
+
yoy = (cy - py) / py * 100 if py else float("nan")
|
| 378 |
+
metrics.append(
|
| 379 |
+
{
|
| 380 |
+
"term": term,
|
| 381 |
+
"CY pageviews": cy,
|
| 382 |
+
"PY pageviews": py,
|
| 383 |
+
"YoY %": yoy,
|
| 384 |
+
}
|
| 385 |
+
)
|
| 386 |
+
|
| 387 |
+
sum_cy_terms = sum(m["CY pageviews"] for m in metrics)
|
| 388 |
+
sum_py_terms = sum(m["PY pageviews"] for m in metrics)
|
| 389 |
+
for m in metrics:
|
| 390 |
+
m["SOV CY"] = (
|
| 391 |
+
m["CY pageviews"] / sum_cy_terms if sum_cy_terms else float("nan")
|
| 392 |
+
)
|
| 393 |
+
m["SOV PY"] = (
|
| 394 |
+
m["PY pageviews"] / sum_py_terms if sum_py_terms else float("nan")
|
| 395 |
+
)
|
| 396 |
+
m["SOV % Change"] = (
|
| 397 |
+
(m["SOV CY"] / m["SOV PY"] - 1)
|
| 398 |
+
if (not pd.isna(m["SOV CY"]) and not pd.isna(m["SOV PY"]))
|
| 399 |
+
else float("nan")
|
| 400 |
+
)
|
| 401 |
+
|
| 402 |
+
metrics_df = pd.DataFrame(metrics).sort_values("CY pageviews", ascending=False)
|
| 403 |
+
|
| 404 |
+
# Display SQL in a hidden expander above metrics
|
| 405 |
+
with st.expander("Show SQL Queries"):
|
| 406 |
+
st.subheader("Current Year SQL")
|
| 407 |
+
st.code(sql_current, language="sql")
|
| 408 |
+
st.subheader("Prior Year SQL")
|
| 409 |
+
st.code(sql_prior, language="sql")
|
| 410 |
+
# Format percentages
|
| 411 |
+
fmt = {
|
| 412 |
+
"CY pageviews": "{:,}", # add thousand separators
|
| 413 |
+
"PY pageviews": "{:,}", # add thousand separators
|
| 414 |
+
"YoY %": "{:.1f}%",
|
| 415 |
+
"SOV CY": "{:.1%}",
|
| 416 |
+
"SOV PY": "{:.1%}",
|
| 417 |
+
"SOV % Change": "{:.1%}",
|
| 418 |
+
}
|
| 419 |
+
|
| 420 |
+
# Display with conditional formatting
|
| 421 |
+
st.subheader("Term Performance Metrics")
|
| 422 |
+
styled = metrics_df.style.format(fmt, na_rep="-").applymap(
|
| 423 |
+
highlight_sov, subset=["SOV % Change"]
|
| 424 |
+
)
|
| 425 |
+
st.dataframe(styled, height=400)
|
| 426 |
+
|
| 427 |
+
# Show raw result tables with totals
|
| 428 |
+
with st.expander(f"Current Year Results: {start_date} to {end_date}"):
|
| 429 |
+
st.dataframe(df_current.style.format({"pageviews": "{:,}"}))
|
| 430 |
+
st.write(f"Total pageviews: {total_cy:,}")
|
| 431 |
+
with st.expander(f"Prior Year Results: {prior_start} to {prior_end}"):
|
| 432 |
+
st.dataframe(df_prior.style.format({"pageviews": "{:,}"}))
|
| 433 |
+
st.write(f"Total pageviews: {total_py:,}")
|
changelog.md
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Changelog
|
| 2 |
+
|
| 3 |
+
- 2025-08-07 14:28 UTC: Initialized changelog to track project updates.
|
index.html
DELETED
|
@@ -1,19 +0,0 @@
|
|
| 1 |
-
<!doctype html>
|
| 2 |
-
<html>
|
| 3 |
-
<head>
|
| 4 |
-
<meta charset="utf-8" />
|
| 5 |
-
<meta name="viewport" content="width=device-width" />
|
| 6 |
-
<title>My static Space</title>
|
| 7 |
-
<link rel="stylesheet" href="style.css" />
|
| 8 |
-
</head>
|
| 9 |
-
<body>
|
| 10 |
-
<div class="card">
|
| 11 |
-
<h1>Welcome to your static Space!</h1>
|
| 12 |
-
<p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
|
| 13 |
-
<p>
|
| 14 |
-
Also don't forget to check the
|
| 15 |
-
<a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
|
| 16 |
-
</p>
|
| 17 |
-
</div>
|
| 18 |
-
</body>
|
| 19 |
-
</html>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
requirements.txt
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
streamlit
|
| 2 |
+
|
| 3 |
+
openai>=1.0.0
|
| 4 |
+
=======
|
| 5 |
+
openai
|
| 6 |
+
|
| 7 |
+
pandas
|
| 8 |
+
python-dotenv
|
| 9 |
+
snowflake-connector-python
|
| 10 |
+
|
style.css
DELETED
|
@@ -1,28 +0,0 @@
|
|
| 1 |
-
body {
|
| 2 |
-
padding: 2rem;
|
| 3 |
-
font-family: -apple-system, BlinkMacSystemFont, "Arial", sans-serif;
|
| 4 |
-
}
|
| 5 |
-
|
| 6 |
-
h1 {
|
| 7 |
-
font-size: 16px;
|
| 8 |
-
margin-top: 0;
|
| 9 |
-
}
|
| 10 |
-
|
| 11 |
-
p {
|
| 12 |
-
color: rgb(107, 114, 128);
|
| 13 |
-
font-size: 15px;
|
| 14 |
-
margin-bottom: 10px;
|
| 15 |
-
margin-top: 5px;
|
| 16 |
-
}
|
| 17 |
-
|
| 18 |
-
.card {
|
| 19 |
-
max-width: 620px;
|
| 20 |
-
margin: 0 auto;
|
| 21 |
-
padding: 16px;
|
| 22 |
-
border: 1px solid lightgray;
|
| 23 |
-
border-radius: 16px;
|
| 24 |
-
}
|
| 25 |
-
|
| 26 |
-
.card p:last-child {
|
| 27 |
-
margin-bottom: 0;
|
| 28 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|