Upload 11 files
Browse filesπ SEO Keyword Research AI Agent
An AI-powered SEO keyword research agent that discovers, analyzes, and ranks keyword opportunities using SerpAPI, featuring an interactive Streamlit dashboard for visualization and n8n automation integration.
This project showcases end-to-end skills in Python, AI agents, API integration, data visualization, and deployment (Render + n8n).
β¨ Features
π Keyword Discovery β Finds semantically related keywords for any seed keyword.
π Keyword Analysis β Scores each keyword based on volume, competition, and SERP metrics.
π Data Export β Export analyzed results as CSV/Excel files.
π Interactive Dashboard β Visualize keyword trends, heatmaps, and search intent using Streamlit + Plotly.
π€ AI Agent Workflow β Automates research β processing β reporting pipeline.
π n8n Integration β Trigger workflows via webhooks (e.g., run research + auto-send reports to Slack/Email).
π Deployment β Hosted on Render, accessible via API and dashboard.
- README.md +156 -0
- __init__.py +0 -0
- app.py +584 -0
- dashboard.py +830 -0
- git +0 -0
- keyword_agent.py +19 -0
- postprocess.py +366 -0
- ranking.py +569 -0
- requirements.txt +60 -0
- server.py +625 -0
- tempCodeRunnerFile.py +1 -0
|
@@ -0,0 +1,156 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SEO Keyword Research AI Agent
|
| 2 |
+
|
| 3 |
+
An **AI-powered SEO keyword research agent** that discovers, analyzes, and ranks keyword opportunities using **SerpAPI**, with an interactive **Streamlit dashboard** for visualization and an **n8n integration** for automation.
|
| 4 |
+
|
| 5 |
+
This project was built to demonstrate skills in **Python, AI agents, API integration, data visualization, and deployment** (Render + n8n).
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## π Features
|
| 10 |
+
|
| 11 |
+
- π **Keyword Discovery** β Finds related keywords for any seed keyword.
|
| 12 |
+
- π **Keyword Analysis** β Scores keywords based on search volume, competition, and SERP signals.
|
| 13 |
+
- π **Data Export** β Saves results to CSV/Excel with metadata.
|
| 14 |
+
- π **Interactive Dashboard** β Streamlit + Plotly for keyword trends, competition heatmaps, and intent analysis.
|
| 15 |
+
- π€ **AI Agent Workflow** β Automates tasks like keyword research β processing β reporting.
|
| 16 |
+
- π **n8n Integration** β Trigger workflows via webhooks (e.g., run keyword research and auto-send results to Slack/Email).
|
| 17 |
+
- π **Deployment** β Hosted on **Render** for API and dashboard access.
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
## ποΈ Project Structure
|
| 22 |
+
|
| 23 |
+
## seo-keyword-ai-agent/
|
| 24 |
+
β
|
| 25 |
+
βββ app.py # Master pipeline orchestrator
|
| 26 |
+
|
| 27 |
+
βββ dashboard.py # Streamlit visualization
|
| 28 |
+
|
| 29 |
+
βββ src/
|
| 30 |
+
|
| 31 |
+
β βββ postprocess.py # Cleans & enriches results
|
| 32 |
+
|
| 33 |
+
β βββ ranking.py # Keyword discovery & scoring
|
| 34 |
+
|
| 35 |
+
β βββ server.py # FastAPI/Render server
|
| 36 |
+
|
| 37 |
+
βββ output/ # Generated keyword results
|
| 38 |
+
|
| 39 |
+
βββ .env # API keys (not committed)
|
| 40 |
+
|
| 41 |
+
βββ requirements.txt # Python dependencies
|
| 42 |
+
|
| 43 |
+
βββ README.md # Project documentation
|
| 44 |
+
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
---
|
| 48 |
+
|
| 49 |
+
## βοΈ Installation
|
| 50 |
+
|
| 51 |
+
---
|
| 52 |
+
|
| 53 |
+
## βοΈ Installation
|
| 54 |
+
|
| 55 |
+
1. **Clone the repo**
|
| 56 |
+
```bash
|
| 57 |
+
git clone https://github.com/omraghu07/seo-keyword-ai-agent.git
|
| 58 |
+
cd seo-keyword-ai-agent
|
| 59 |
+
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
2. **Create a virtual environment**
|
| 63 |
+
```bash
|
| 64 |
+
python -m venv agent_venv
|
| 65 |
+
|
| 66 |
+
# Mac/Linux
|
| 67 |
+
source agent_venv/bin/activate
|
| 68 |
+
|
| 69 |
+
# Windows
|
| 70 |
+
agent_venv\Scripts\activate
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
3. **Install dependencies**
|
| 74 |
+
```bash
|
| 75 |
+
pip install -r requirements.txt
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
4. **Setup .env file**
|
| 79 |
+
Create a `.env` file in the root directory and add your API key:
|
| 80 |
+
```
|
| 81 |
+
SERPAPI_KEY=your_serpapi_key_here
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
---
|
| 85 |
+
|
| 86 |
+
# βΆοΈ Usage
|
| 87 |
+
|
| 88 |
+
|
| 89 |
+
|
| 90 |
+
## Run the full pipeline
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
```bash
|
| 94 |
+
python app.py "global internship" --max-candidates 100 --top-results 50
|
| 95 |
+
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
## Launch the dashboard
|
| 99 |
+
|
| 100 |
+
```bash
|
| 101 |
+
streamlit run dashboard.py
|
| 102 |
+
|
| 103 |
+
```
|
| 104 |
+
## Run as an API (Render/FastAPI)
|
| 105 |
+
```bash
|
| 106 |
+
gunicorn -k uvicorn.workers.UvicornWorker src.server:app --bind 0.0.0.0:8000 --workers 2
|
| 107 |
+
|
| 108 |
+
```
|
| 109 |
+
# π n8n Integration
|
| 110 |
+
|
| 111 |
+
- Create an n8n workflow with a Webhook node.
|
| 112 |
+
|
| 113 |
+
- Connect it to Render API:
|
| 114 |
+
|
| 115 |
+
```bash
|
| 116 |
+
POST https://seo-keyword-ai-agent.onrender.com/analyze
|
| 117 |
+
{
|
| 118 |
+
"seed": "global internship",
|
| 119 |
+
"top": 10
|
| 120 |
+
}
|
| 121 |
+
|
| 122 |
+
```
|
| 123 |
+
- Add email/Slack nodes to auto-send reports.
|
| 124 |
+
|
| 125 |
+
# π Example Output
|
| 126 |
+
|
| 127 |
+
## Top 5 Keyword Opportunities:
|
| 128 |
+
|
| 129 |
+
|
| 130 |
+
| Keyword | Volume | Competition | Score | Results |
|
| 131 |
+
| --------------------------------- | ------ | ----------- | ------ | ------- |
|
| 132 |
+
| UCLA Global Internship Program | 2000 | 0.0 | 330.12 | 0 |
|
| 133 |
+
| Summer Internship Programs - CIEE | 1666 | 0.33 | 9.26 | 54,000 |
|
| 134 |
+
| Global Internship Program HENNGE | 2000 | 0.35 | 9.01 | 10,200 |
|
| 135 |
+
| Berkeley Global Internships Paid | 1666 | 0.45 | 6.98 | 219,000 |
|
| 136 |
+
| Global Internship Remote | 2500 | 0.50 | 6.66 | 174M |
|
| 137 |
+
|
| 138 |
+
## π οΈ Tech Stack
|
| 139 |
+
|
| 140 |
+
- Python (Core language)
|
| 141 |
+
- SerpAPI (Google search results API)
|
| 142 |
+
- Pandas, Requests, Tabulate (Data processing)
|
| 143 |
+
- Streamlit + Plotly (Dashboard & charts)
|
| 144 |
+
- FastAPI + Gunicorn (API server)
|
| 145 |
+
- Render (Deployment)
|
| 146 |
+
- n8n (Workflow automation)
|
| 147 |
+
|
| 148 |
+
# π¨βπ» Author
|
| 149 |
+
|
| 150 |
+
Om Raghuwanshi β Engineering student passionate about AI
|
| 151 |
+
|
| 152 |
+
## π Links
|
| 153 |
+
|
| 154 |
+
[](https://www.linkedin.com/in/om-raghuwanshi-b5136a298)
|
| 155 |
+
|
| 156 |
+
β‘ If you like this project, donβt forget to β star the repo and fork it!
|
|
File without changes
|
|
@@ -0,0 +1,584 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# app.py
|
| 2 |
+
"""
|
| 3 |
+
Complete Keyword Research Pipeline
|
| 4 |
+
Integrates keyword discovery, analysis, and post-processing into one workflow
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import os
|
| 8 |
+
import sys
|
| 9 |
+
import argparse
|
| 10 |
+
from pathlib import Path
|
| 11 |
+
from dotenv import load_dotenv
|
| 12 |
+
|
| 13 |
+
# Load environment variables first
|
| 14 |
+
load_dotenv()
|
| 15 |
+
|
| 16 |
+
# Add current directory to path for imports
|
| 17 |
+
current_dir = Path(__file__).parent
|
| 18 |
+
sys.path.insert(0, str(current_dir))
|
| 19 |
+
|
| 20 |
+
def check_setup():
|
| 21 |
+
"""Check if all requirements are met"""
|
| 22 |
+
print("π Checking setup...")
|
| 23 |
+
|
| 24 |
+
# Check API key
|
| 25 |
+
api_key = os.getenv("SERPAPI_KEY")
|
| 26 |
+
if not api_key:
|
| 27 |
+
print("β SERPAPI_KEY not found in environment variables")
|
| 28 |
+
print("Make sure your .env file contains: SERPAPI_KEY=your_key_here")
|
| 29 |
+
return False
|
| 30 |
+
|
| 31 |
+
print(f"β
API key found: {api_key[:10]}...")
|
| 32 |
+
|
| 33 |
+
# Check required packages
|
| 34 |
+
required_packages = [
|
| 35 |
+
('serpapi', 'google-search-results'),
|
| 36 |
+
('pandas', 'pandas'),
|
| 37 |
+
('tabulate', 'tabulate'),
|
| 38 |
+
('openpyxl', 'openpyxl')
|
| 39 |
+
]
|
| 40 |
+
|
| 41 |
+
missing = []
|
| 42 |
+
for import_name, pip_name in required_packages:
|
| 43 |
+
try:
|
| 44 |
+
__import__(import_name)
|
| 45 |
+
except ImportError:
|
| 46 |
+
missing.append(pip_name)
|
| 47 |
+
|
| 48 |
+
if missing:
|
| 49 |
+
print("β Missing packages:")
|
| 50 |
+
for pkg in missing:
|
| 51 |
+
print(f" pip install {pkg}")
|
| 52 |
+
return False
|
| 53 |
+
|
| 54 |
+
print("β
All packages available")
|
| 55 |
+
return True
|
| 56 |
+
|
| 57 |
+
def run_keyword_analysis(seed_keyword, use_volume_api=False):
|
| 58 |
+
"""Run the keyword analysis using the professional tool"""
|
| 59 |
+
print("\nπ Step 1: Running keyword analysis...")
|
| 60 |
+
|
| 61 |
+
try:
|
| 62 |
+
# Import and run the KeywordResearchTool
|
| 63 |
+
import os
|
| 64 |
+
import math
|
| 65 |
+
import csv
|
| 66 |
+
import re
|
| 67 |
+
import logging
|
| 68 |
+
from datetime import date
|
| 69 |
+
from typing import List, Dict, Optional, Tuple, Any
|
| 70 |
+
from dataclasses import dataclass
|
| 71 |
+
from serpapi import GoogleSearch
|
| 72 |
+
|
| 73 |
+
# Configure logging to be less verbose
|
| 74 |
+
logging.basicConfig(level=logging.WARNING)
|
| 75 |
+
|
| 76 |
+
@dataclass
|
| 77 |
+
class KeywordMetrics:
|
| 78 |
+
keyword: str
|
| 79 |
+
monthly_searches: int
|
| 80 |
+
competition_score: float
|
| 81 |
+
opportunity_score: float
|
| 82 |
+
total_results: int
|
| 83 |
+
ads_count: int
|
| 84 |
+
has_featured_snippet: bool
|
| 85 |
+
has_people_also_ask: bool
|
| 86 |
+
has_knowledge_graph: bool
|
| 87 |
+
|
| 88 |
+
class CompetitionCalculator:
|
| 89 |
+
WEIGHTS = {
|
| 90 |
+
'total_results': 0.50,
|
| 91 |
+
'ads': 0.25,
|
| 92 |
+
'featured_snippet': 0.15,
|
| 93 |
+
'people_also_ask': 0.07,
|
| 94 |
+
'knowledge_graph': 0.03
|
| 95 |
+
}
|
| 96 |
+
|
| 97 |
+
@staticmethod
|
| 98 |
+
def extract_total_results(search_info):
|
| 99 |
+
if not search_info:
|
| 100 |
+
return 0
|
| 101 |
+
|
| 102 |
+
total = (search_info.get("total_results") or
|
| 103 |
+
search_info.get("total_results_raw") or
|
| 104 |
+
search_info.get("total"))
|
| 105 |
+
|
| 106 |
+
if isinstance(total, int):
|
| 107 |
+
return total
|
| 108 |
+
|
| 109 |
+
if isinstance(total, str):
|
| 110 |
+
numbers_only = re.sub(r"[^\d]", "", total)
|
| 111 |
+
try:
|
| 112 |
+
return int(numbers_only) if numbers_only else 0
|
| 113 |
+
except ValueError:
|
| 114 |
+
return 0
|
| 115 |
+
|
| 116 |
+
return 0
|
| 117 |
+
|
| 118 |
+
def calculate_score(self, search_results):
|
| 119 |
+
search_info = search_results.get("search_information", {})
|
| 120 |
+
|
| 121 |
+
total_results = self.extract_total_results(search_info)
|
| 122 |
+
normalized_results = min(math.log10(total_results + 1) / 7, 1.0)
|
| 123 |
+
|
| 124 |
+
ads = search_results.get("ads_results", [])
|
| 125 |
+
ads_count = len(ads) if ads else 0
|
| 126 |
+
ads_score = min(ads_count / 3, 1.0)
|
| 127 |
+
|
| 128 |
+
has_featured_snippet = bool(
|
| 129 |
+
search_results.get("featured_snippet") or
|
| 130 |
+
search_results.get("answer_box")
|
| 131 |
+
)
|
| 132 |
+
|
| 133 |
+
has_people_also_ask = bool(
|
| 134 |
+
search_results.get("related_questions") or
|
| 135 |
+
search_results.get("people_also_ask")
|
| 136 |
+
)
|
| 137 |
+
|
| 138 |
+
has_knowledge_graph = bool(search_results.get("knowledge_graph"))
|
| 139 |
+
|
| 140 |
+
competition_score = (
|
| 141 |
+
self.WEIGHTS['total_results'] * normalized_results +
|
| 142 |
+
self.WEIGHTS['ads'] * ads_score +
|
| 143 |
+
self.WEIGHTS['featured_snippet'] * has_featured_snippet +
|
| 144 |
+
self.WEIGHTS['people_also_ask'] * has_people_also_ask +
|
| 145 |
+
self.WEIGHTS['knowledge_graph'] * has_knowledge_graph
|
| 146 |
+
)
|
| 147 |
+
|
| 148 |
+
competition_score = max(0.0, min(1.0, competition_score))
|
| 149 |
+
|
| 150 |
+
breakdown = {
|
| 151 |
+
"total_results": total_results,
|
| 152 |
+
"ads_count": ads_count,
|
| 153 |
+
"has_featured_snippet": has_featured_snippet,
|
| 154 |
+
"has_people_also_ask": has_people_also_ask,
|
| 155 |
+
"has_knowledge_graph": has_knowledge_graph
|
| 156 |
+
}
|
| 157 |
+
|
| 158 |
+
return competition_score, breakdown
|
| 159 |
+
|
| 160 |
+
# Main analysis functions
|
| 161 |
+
def find_related_keywords(seed_keyword, max_results=120):
|
| 162 |
+
print(f"Finding related keywords for: '{seed_keyword}'...")
|
| 163 |
+
|
| 164 |
+
params = {
|
| 165 |
+
"engine": "google",
|
| 166 |
+
"q": seed_keyword,
|
| 167 |
+
"api_key": os.getenv("SERPAPI_KEY"),
|
| 168 |
+
"hl": "en",
|
| 169 |
+
"gl": "us"
|
| 170 |
+
}
|
| 171 |
+
|
| 172 |
+
try:
|
| 173 |
+
search = GoogleSearch(params)
|
| 174 |
+
results = search.get_dict()
|
| 175 |
+
except Exception as e:
|
| 176 |
+
print(f"Error getting related keywords: {e}")
|
| 177 |
+
return []
|
| 178 |
+
|
| 179 |
+
keyword_candidates = set()
|
| 180 |
+
|
| 181 |
+
# Get related searches
|
| 182 |
+
related_searches = results.get("related_searches", [])
|
| 183 |
+
for item in related_searches:
|
| 184 |
+
query = item.get("query") or item.get("suggestion")
|
| 185 |
+
if query and len(query.strip()) > 0:
|
| 186 |
+
keyword_candidates.add(query.strip())
|
| 187 |
+
|
| 188 |
+
# Get people also ask
|
| 189 |
+
related_questions = results.get("related_questions", [])
|
| 190 |
+
for item in related_questions:
|
| 191 |
+
question = item.get("question") or item.get("query")
|
| 192 |
+
if question and len(question.strip()) > 0:
|
| 193 |
+
keyword_candidates.add(question.strip())
|
| 194 |
+
|
| 195 |
+
# Get organic titles
|
| 196 |
+
organic_results = results.get("organic_results", [])
|
| 197 |
+
for result in organic_results[:10]:
|
| 198 |
+
title = result.get("title", "")
|
| 199 |
+
if title and len(title.strip()) > 0:
|
| 200 |
+
keyword_candidates.add(title.strip())
|
| 201 |
+
|
| 202 |
+
final_keywords = list(keyword_candidates)[:max_results]
|
| 203 |
+
print(f"Found {len(final_keywords)} keyword candidates")
|
| 204 |
+
|
| 205 |
+
return final_keywords
|
| 206 |
+
|
| 207 |
+
def analyze_keywords(keywords, use_volume_api=False):
|
| 208 |
+
print(f"Analyzing {len(keywords)} keywords...")
|
| 209 |
+
|
| 210 |
+
calculator = CompetitionCalculator()
|
| 211 |
+
analyzed_keywords = []
|
| 212 |
+
|
| 213 |
+
for i, keyword in enumerate(keywords, 1):
|
| 214 |
+
if i % 10 == 0:
|
| 215 |
+
print(f"Progress: {i}/{len(keywords)} keywords processed")
|
| 216 |
+
|
| 217 |
+
# Search for keyword
|
| 218 |
+
params = {
|
| 219 |
+
"engine": "google",
|
| 220 |
+
"q": keyword,
|
| 221 |
+
"api_key": os.getenv("SERPAPI_KEY"),
|
| 222 |
+
"hl": "en",
|
| 223 |
+
"gl": "us",
|
| 224 |
+
"num": 10
|
| 225 |
+
}
|
| 226 |
+
|
| 227 |
+
try:
|
| 228 |
+
search = GoogleSearch(params)
|
| 229 |
+
search_results = search.get_dict()
|
| 230 |
+
except Exception as e:
|
| 231 |
+
print(f"Error analyzing '{keyword}': {e}")
|
| 232 |
+
continue
|
| 233 |
+
|
| 234 |
+
# Calculate competition
|
| 235 |
+
competition_score, breakdown = calculator.calculate_score(search_results)
|
| 236 |
+
|
| 237 |
+
# Estimate volume
|
| 238 |
+
word_count = len(keyword.split())
|
| 239 |
+
search_volume = max(10, 10000 // (word_count + 1))
|
| 240 |
+
|
| 241 |
+
# Calculate opportunity score
|
| 242 |
+
volume_score = math.log10(search_volume + 1)
|
| 243 |
+
opportunity_score = volume_score / (competition_score + 0.01)
|
| 244 |
+
|
| 245 |
+
metrics = KeywordMetrics(
|
| 246 |
+
keyword=keyword,
|
| 247 |
+
monthly_searches=search_volume,
|
| 248 |
+
competition_score=round(competition_score, 4),
|
| 249 |
+
opportunity_score=round(opportunity_score, 2),
|
| 250 |
+
total_results=breakdown["total_results"],
|
| 251 |
+
ads_count=breakdown["ads_count"],
|
| 252 |
+
has_featured_snippet=breakdown["has_featured_snippet"],
|
| 253 |
+
has_people_also_ask=breakdown["has_people_also_ask"],
|
| 254 |
+
has_knowledge_graph=breakdown["has_knowledge_graph"]
|
| 255 |
+
)
|
| 256 |
+
|
| 257 |
+
analyzed_keywords.append(metrics)
|
| 258 |
+
|
| 259 |
+
# Sort by opportunity score
|
| 260 |
+
analyzed_keywords.sort(key=lambda x: x.opportunity_score, reverse=True)
|
| 261 |
+
|
| 262 |
+
print(f"Analysis complete! {len(analyzed_keywords)} keywords analyzed")
|
| 263 |
+
return analyzed_keywords
|
| 264 |
+
|
| 265 |
+
def save_to_csv(keyword_metrics, seed_keyword, top_count=50):
|
| 266 |
+
if not keyword_metrics:
|
| 267 |
+
print("No data to save!")
|
| 268 |
+
return None
|
| 269 |
+
|
| 270 |
+
# Create filename
|
| 271 |
+
today = date.today()
|
| 272 |
+
safe_seed = re.sub(r"[^\w\s-]", "", seed_keyword).strip().replace(" ", "_")[:30]
|
| 273 |
+
filename = f"keywords_{safe_seed}_{today}.csv"
|
| 274 |
+
|
| 275 |
+
try:
|
| 276 |
+
with open(filename, "w", newline='', encoding='utf-8') as file:
|
| 277 |
+
writer = csv.writer(file)
|
| 278 |
+
|
| 279 |
+
# Write header
|
| 280 |
+
headers = [
|
| 281 |
+
"Keyword", "Monthly Searches", "Competition Score",
|
| 282 |
+
"Opportunity Score", "Total Results", "Ads Count",
|
| 283 |
+
"Featured Snippet", "People Also Ask", "Knowledge Graph"
|
| 284 |
+
]
|
| 285 |
+
writer.writerow(headers)
|
| 286 |
+
|
| 287 |
+
# Write data
|
| 288 |
+
for metrics in keyword_metrics[:top_count]:
|
| 289 |
+
row = [
|
| 290 |
+
metrics.keyword,
|
| 291 |
+
metrics.monthly_searches,
|
| 292 |
+
metrics.competition_score,
|
| 293 |
+
metrics.opportunity_score,
|
| 294 |
+
metrics.total_results,
|
| 295 |
+
metrics.ads_count,
|
| 296 |
+
"Yes" if metrics.has_featured_snippet else "No",
|
| 297 |
+
"Yes" if metrics.has_people_also_ask else "No",
|
| 298 |
+
"Yes" if metrics.has_knowledge_graph else "No"
|
| 299 |
+
]
|
| 300 |
+
writer.writerow(row)
|
| 301 |
+
|
| 302 |
+
saved_count = min(top_count, len(keyword_metrics))
|
| 303 |
+
print(f"β
Saved {saved_count} keywords to {filename}")
|
| 304 |
+
return filename
|
| 305 |
+
|
| 306 |
+
except Exception as e:
|
| 307 |
+
print(f"Error saving CSV: {e}")
|
| 308 |
+
return None
|
| 309 |
+
|
| 310 |
+
def display_top_results(keyword_metrics, top_count=5):
|
| 311 |
+
if not keyword_metrics:
|
| 312 |
+
print("No results to display!")
|
| 313 |
+
return
|
| 314 |
+
|
| 315 |
+
print(f"\nπ Top {min(top_count, len(keyword_metrics))} Keywords:")
|
| 316 |
+
print("-" * 80)
|
| 317 |
+
|
| 318 |
+
for i, metrics in enumerate(keyword_metrics[:top_count], 1):
|
| 319 |
+
print(f"{i}. {metrics.keyword}")
|
| 320 |
+
print(f" Score: {metrics.opportunity_score} | Volume: {metrics.monthly_searches:,} | Competition: {metrics.competition_score}")
|
| 321 |
+
print()
|
| 322 |
+
|
| 323 |
+
# Run the analysis
|
| 324 |
+
related_keywords = find_related_keywords(seed_keyword)
|
| 325 |
+
if not related_keywords:
|
| 326 |
+
print("β No keyword candidates found")
|
| 327 |
+
return None
|
| 328 |
+
|
| 329 |
+
analyzed_keywords = analyze_keywords(related_keywords, use_volume_api)
|
| 330 |
+
if not analyzed_keywords:
|
| 331 |
+
print("β No keywords analyzed successfully")
|
| 332 |
+
return None
|
| 333 |
+
|
| 334 |
+
filename = save_to_csv(analyzed_keywords, seed_keyword)
|
| 335 |
+
display_top_results(analyzed_keywords)
|
| 336 |
+
|
| 337 |
+
return filename
|
| 338 |
+
|
| 339 |
+
except Exception as e:
|
| 340 |
+
print(f"β Error in keyword analysis: {e}")
|
| 341 |
+
return None
|
| 342 |
+
|
| 343 |
+
def run_postprocessing(csv_filename, seed_keyword):
|
| 344 |
+
"""Run post-processing on the CSV file"""
|
| 345 |
+
print("\nπ§Ή Step 2: Running post-processing...")
|
| 346 |
+
|
| 347 |
+
try:
|
| 348 |
+
import pandas as pd
|
| 349 |
+
import re
|
| 350 |
+
import json
|
| 351 |
+
from datetime import date, datetime
|
| 352 |
+
|
| 353 |
+
# Try to import optional packages
|
| 354 |
+
try:
|
| 355 |
+
from tabulate import tabulate
|
| 356 |
+
HAS_TABULATE = True
|
| 357 |
+
except ImportError:
|
| 358 |
+
HAS_TABULATE = False
|
| 359 |
+
|
| 360 |
+
try:
|
| 361 |
+
import openpyxl
|
| 362 |
+
HAS_EXCEL = True
|
| 363 |
+
except ImportError:
|
| 364 |
+
HAS_EXCEL = False
|
| 365 |
+
|
| 366 |
+
# Configuration
|
| 367 |
+
BRAND_KEYWORDS = {
|
| 368 |
+
"linkedin", "indeed", "glassdoor", "ucla", "asu", "berkeley",
|
| 369 |
+
"hennge", "ciee", "google", "facebook", "microsoft", "amazon"
|
| 370 |
+
}
|
| 371 |
+
|
| 372 |
+
def is_brand_query(keyword):
|
| 373 |
+
if not keyword:
|
| 374 |
+
return False
|
| 375 |
+
keyword_lower = keyword.lower()
|
| 376 |
+
for brand in BRAND_KEYWORDS:
|
| 377 |
+
if brand in keyword_lower:
|
| 378 |
+
return True
|
| 379 |
+
if re.search(r"\.(com|edu|org|net|gov|io)\b", keyword_lower):
|
| 380 |
+
return True
|
| 381 |
+
return False
|
| 382 |
+
|
| 383 |
+
def classify_intent(keyword):
|
| 384 |
+
if not keyword:
|
| 385 |
+
return "informational"
|
| 386 |
+
|
| 387 |
+
k = keyword.lower()
|
| 388 |
+
if any(signal in k for signal in ["how to", "what is", "why", "guide", "tutorial"]):
|
| 389 |
+
return "informational"
|
| 390 |
+
if any(signal in k for signal in ["buy", "price", "cost", "apply", "register"]):
|
| 391 |
+
return "transactional"
|
| 392 |
+
if any(signal in k for signal in ["best", "top", "compare", "vs", "reviews"]):
|
| 393 |
+
return "commercial"
|
| 394 |
+
if is_brand_query(keyword):
|
| 395 |
+
return "navigational"
|
| 396 |
+
return "informational"
|
| 397 |
+
|
| 398 |
+
def classify_tail(keyword):
|
| 399 |
+
if not keyword:
|
| 400 |
+
return "short-tail"
|
| 401 |
+
word_count = len(str(keyword).split())
|
| 402 |
+
if word_count >= 4:
|
| 403 |
+
return "long-tail"
|
| 404 |
+
elif word_count == 3:
|
| 405 |
+
return "mid-tail"
|
| 406 |
+
else:
|
| 407 |
+
return "short-tail"
|
| 408 |
+
|
| 409 |
+
# Load and process the CSV
|
| 410 |
+
print(f"Loading {csv_filename}...")
|
| 411 |
+
df = pd.read_csv(csv_filename)
|
| 412 |
+
print(f"Loaded {len(df)} keywords")
|
| 413 |
+
|
| 414 |
+
# Clean and enhance the data
|
| 415 |
+
print("Processing data...")
|
| 416 |
+
|
| 417 |
+
# Standardize column names
|
| 418 |
+
column_mapping = {
|
| 419 |
+
'Keyword': 'Keyword',
|
| 420 |
+
'Monthly Searches': 'Monthly Searches',
|
| 421 |
+
'Competition Score': 'Competition',
|
| 422 |
+
'Opportunity Score': 'Opportunity Score',
|
| 423 |
+
'Total Results': 'Google Results',
|
| 424 |
+
'Ads Count': 'Ads Shown',
|
| 425 |
+
'Featured Snippet': 'Featured Snippet?',
|
| 426 |
+
'People Also Ask': 'PAA Available?',
|
| 427 |
+
'Knowledge Graph': 'Knowledge Graph?'
|
| 428 |
+
}
|
| 429 |
+
|
| 430 |
+
# Rename columns that exist
|
| 431 |
+
for old_name, new_name in column_mapping.items():
|
| 432 |
+
if old_name in df.columns:
|
| 433 |
+
df = df.rename(columns={old_name: new_name})
|
| 434 |
+
|
| 435 |
+
# Remove duplicates and sort
|
| 436 |
+
df = df.drop_duplicates(subset=['Keyword'], keep='first')
|
| 437 |
+
df = df.sort_values('Opportunity Score', ascending=False)
|
| 438 |
+
|
| 439 |
+
# Add enhancement columns
|
| 440 |
+
df['Intent'] = df['Keyword'].apply(classify_intent)
|
| 441 |
+
df['Tail'] = df['Keyword'].apply(classify_tail)
|
| 442 |
+
df['Is Brand/Navigational'] = df['Keyword'].apply(lambda x: "Yes" if is_brand_query(x) else "No")
|
| 443 |
+
|
| 444 |
+
# Reorder columns
|
| 445 |
+
column_order = [
|
| 446 |
+
'Keyword', 'Intent', 'Tail', 'Is Brand/Navigational',
|
| 447 |
+
'Monthly Searches', 'Competition', 'Opportunity Score',
|
| 448 |
+
'Google Results', 'Ads Shown', 'Featured Snippet?',
|
| 449 |
+
'PAA Available?', 'Knowledge Graph?'
|
| 450 |
+
]
|
| 451 |
+
|
| 452 |
+
available_columns = [col for col in column_order if col in df.columns]
|
| 453 |
+
df = df[available_columns]
|
| 454 |
+
|
| 455 |
+
# Create output directory
|
| 456 |
+
os.makedirs("results", exist_ok=True)
|
| 457 |
+
|
| 458 |
+
# Generate filenames
|
| 459 |
+
today = date.today().isoformat()
|
| 460 |
+
safe_seed = re.sub(r"[^\w\s-]", "", seed_keyword).strip().replace(" ", "_")[:30]
|
| 461 |
+
base_name = f"keywords_{safe_seed}_{today}"
|
| 462 |
+
|
| 463 |
+
csv_path = f"results/{base_name}.csv"
|
| 464 |
+
excel_path = f"results/{base_name}.xlsx"
|
| 465 |
+
meta_path = f"results/{base_name}.meta.json"
|
| 466 |
+
|
| 467 |
+
# Save enhanced CSV
|
| 468 |
+
df.to_csv(csv_path, index=False)
|
| 469 |
+
print(f"πΎ Saved enhanced CSV: {csv_path}")
|
| 470 |
+
|
| 471 |
+
# Save Excel if available
|
| 472 |
+
if HAS_EXCEL:
|
| 473 |
+
with pd.ExcelWriter(excel_path, engine="openpyxl") as writer:
|
| 474 |
+
df.head(50).to_excel(writer, sheet_name="Top_50", index=False)
|
| 475 |
+
df.to_excel(writer, sheet_name="All_Keywords", index=False)
|
| 476 |
+
print(f"π Saved Excel: {excel_path}")
|
| 477 |
+
|
| 478 |
+
# Save metadata
|
| 479 |
+
metadata = {
|
| 480 |
+
"seed_keyword": seed_keyword,
|
| 481 |
+
"generated_at": datetime.utcnow().isoformat() + "Z",
|
| 482 |
+
"total_keywords": len(df),
|
| 483 |
+
"data_source": "SerpApi with heuristic search volumes",
|
| 484 |
+
"methodology": "Opportunity Score = log10(volume+1) / (competition + 0.01)"
|
| 485 |
+
}
|
| 486 |
+
|
| 487 |
+
with open(meta_path, "w", encoding="utf-8") as f:
|
| 488 |
+
json.dump(metadata, f, indent=2)
|
| 489 |
+
|
| 490 |
+
print(f"π Saved metadata: {meta_path}")
|
| 491 |
+
|
| 492 |
+
# Display results
|
| 493 |
+
print(f"\nπ Top 10 Enhanced Results:")
|
| 494 |
+
|
| 495 |
+
preview_df = df.head(10)
|
| 496 |
+
if HAS_TABULATE:
|
| 497 |
+
display_columns = ['Keyword', 'Intent', 'Tail', 'Monthly Searches', 'Competition', 'Opportunity Score']
|
| 498 |
+
display_data = preview_df[display_columns]
|
| 499 |
+
print(tabulate(display_data, headers="keys", tablefmt="github", showindex=False))
|
| 500 |
+
else:
|
| 501 |
+
for i, row in preview_df.iterrows():
|
| 502 |
+
print(f"{i+1}. {row['Keyword']} | Score: {row['Opportunity Score']} | Intent: {row['Intent']} | Tail: {row['Tail']}")
|
| 503 |
+
|
| 504 |
+
# Summary stats
|
| 505 |
+
print(f"\nπ Summary:")
|
| 506 |
+
print(f"β’ Total keywords: {len(df)}")
|
| 507 |
+
print(f"β’ Long-tail keywords: {len(df[df['Tail'] == 'long-tail'])}")
|
| 508 |
+
print(f"β’ Non-brand keywords: {len(df[df['Is Brand/Navigational'] == 'No'])}")
|
| 509 |
+
print(f"β’ High opportunity (score > 50): {len(df[df['Opportunity Score'] > 50])}")
|
| 510 |
+
|
| 511 |
+
return csv_path, excel_path, meta_path
|
| 512 |
+
|
| 513 |
+
except Exception as e:
|
| 514 |
+
print(f"β Error in post-processing: {e}")
|
| 515 |
+
return None, None, None
|
| 516 |
+
|
| 517 |
+
def run_complete_pipeline(seed_keyword, use_volume_api=False):
|
| 518 |
+
"""Run the complete pipeline"""
|
| 519 |
+
print("π Starting Complete Keyword Research Pipeline")
|
| 520 |
+
print("=" * 60)
|
| 521 |
+
print(f"Seed Keyword: '{seed_keyword}'")
|
| 522 |
+
print("=" * 60)
|
| 523 |
+
|
| 524 |
+
# Step 1: Run keyword analysis
|
| 525 |
+
csv_filename = run_keyword_analysis(seed_keyword, use_volume_api)
|
| 526 |
+
|
| 527 |
+
if not csv_filename:
|
| 528 |
+
print("β Pipeline failed at Step 1")
|
| 529 |
+
return False
|
| 530 |
+
|
| 531 |
+
# Step 2: Run post-processing
|
| 532 |
+
csv_path, excel_path, meta_path = run_postprocessing(csv_filename, seed_keyword)
|
| 533 |
+
|
| 534 |
+
if not csv_path:
|
| 535 |
+
print("β Pipeline failed at Step 2")
|
| 536 |
+
return False
|
| 537 |
+
|
| 538 |
+
# Final summary
|
| 539 |
+
print("\nπ― PIPELINE COMPLETE! π―")
|
| 540 |
+
print("=" * 60)
|
| 541 |
+
print(f"π Original CSV: {csv_filename}")
|
| 542 |
+
print(f"π Enhanced CSV: {csv_path}")
|
| 543 |
+
if excel_path:
|
| 544 |
+
print(f"π Excel file: {excel_path}")
|
| 545 |
+
if meta_path:
|
| 546 |
+
print(f"π Metadata: {meta_path}")
|
| 547 |
+
print("=" * 60)
|
| 548 |
+
|
| 549 |
+
return True
|
| 550 |
+
|
| 551 |
+
def main():
|
| 552 |
+
"""Main function with command line support"""
|
| 553 |
+
parser = argparse.ArgumentParser(description="Complete Keyword Research Pipeline")
|
| 554 |
+
parser.add_argument("seed_keyword", nargs="?", default="global internship",
|
| 555 |
+
help="Seed keyword (default: 'global internship')")
|
| 556 |
+
parser.add_argument("--use-volume-api", action="store_true",
|
| 557 |
+
help="Use real volume API (requires implementation)")
|
| 558 |
+
parser.add_argument("--check-only", action="store_true",
|
| 559 |
+
help="Only check setup, don't run pipeline")
|
| 560 |
+
|
| 561 |
+
args = parser.parse_args()
|
| 562 |
+
|
| 563 |
+
# Check setup
|
| 564 |
+
if not check_setup():
|
| 565 |
+
return 1
|
| 566 |
+
|
| 567 |
+
if args.check_only:
|
| 568 |
+
print("β
Setup check complete!")
|
| 569 |
+
return 0
|
| 570 |
+
|
| 571 |
+
# Run pipeline
|
| 572 |
+
success = run_complete_pipeline(args.seed_keyword, args.use_volume_api)
|
| 573 |
+
return 0 if success else 1
|
| 574 |
+
|
| 575 |
+
if __name__ == "__main__":
|
| 576 |
+
try:
|
| 577 |
+
exit_code = main()
|
| 578 |
+
sys.exit(exit_code)
|
| 579 |
+
except KeyboardInterrupt:
|
| 580 |
+
print("\nβ οΈ Pipeline interrupted by user")
|
| 581 |
+
sys.exit(1)
|
| 582 |
+
except Exception as e:
|
| 583 |
+
print(f"\nβ Unexpected error: {e}")
|
| 584 |
+
sys.exit(1)
|
|
@@ -0,0 +1,830 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# dashboard.py
|
| 2 |
+
"""
|
| 3 |
+
SEO Keyword Research Dashboard
|
| 4 |
+
|
| 5 |
+
A Streamlit web interface for the keyword research pipeline.
|
| 6 |
+
Provides interactive analysis, visualization, and download capabilities.
|
| 7 |
+
|
| 8 |
+
Requirements:
|
| 9 |
+
pip install streamlit plotly pandas
|
| 10 |
+
|
| 11 |
+
Usage:
|
| 12 |
+
streamlit run dashboard.py
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
import streamlit as st
|
| 16 |
+
import pandas as pd
|
| 17 |
+
import plotly.express as px
|
| 18 |
+
import plotly.graph_objects as go
|
| 19 |
+
from plotly.subplots import make_subplots
|
| 20 |
+
import os
|
| 21 |
+
import sys
|
| 22 |
+
from pathlib import Path
|
| 23 |
+
from datetime import date, datetime
|
| 24 |
+
import re
|
| 25 |
+
import json
|
| 26 |
+
import io
|
| 27 |
+
from typing import Optional, Tuple, Dict, Any
|
| 28 |
+
|
| 29 |
+
# Add project directories to path
|
| 30 |
+
project_root = Path(__file__).parent
|
| 31 |
+
src_path = project_root / "src"
|
| 32 |
+
if src_path.exists():
|
| 33 |
+
sys.path.insert(0, str(src_path))
|
| 34 |
+
sys.path.insert(0, str(project_root))
|
| 35 |
+
|
| 36 |
+
# Import backend functions
|
| 37 |
+
try:
|
| 38 |
+
from dotenv import load_dotenv
|
| 39 |
+
load_dotenv()
|
| 40 |
+
except ImportError:
|
| 41 |
+
st.error("Missing required package: python-dotenv. Install with: pip install python-dotenv")
|
| 42 |
+
st.stop()
|
| 43 |
+
|
| 44 |
+
# Page configuration
|
| 45 |
+
st.set_page_config(
|
| 46 |
+
page_title="SEO Keyword Research Dashboard",
|
| 47 |
+
page_icon="π",
|
| 48 |
+
layout="wide",
|
| 49 |
+
initial_sidebar_state="expanded"
|
| 50 |
+
)
|
| 51 |
+
|
| 52 |
+
# Custom CSS for better styling
|
| 53 |
+
st.markdown("""
|
| 54 |
+
<style>
|
| 55 |
+
.main-header {
|
| 56 |
+
font-size: 3rem;
|
| 57 |
+
color: #1f77b4;
|
| 58 |
+
text-align: center;
|
| 59 |
+
margin-bottom: 2rem;
|
| 60 |
+
background: linear-gradient(90deg, #1f77b4, #ff7f0e);
|
| 61 |
+
-webkit-background-clip: text;
|
| 62 |
+
-webkit-text-fill-color: transparent;
|
| 63 |
+
background-clip: text;
|
| 64 |
+
}
|
| 65 |
+
|
| 66 |
+
.metric-card {
|
| 67 |
+
background-color: #f0f2f6;
|
| 68 |
+
padding: 1rem;
|
| 69 |
+
border-radius: 0.5rem;
|
| 70 |
+
border-left: 4px solid #1f77b4;
|
| 71 |
+
margin: 0.5rem 0;
|
| 72 |
+
}
|
| 73 |
+
|
| 74 |
+
.success-message {
|
| 75 |
+
background-color: #d4edda;
|
| 76 |
+
color: #155724;
|
| 77 |
+
padding: 1rem;
|
| 78 |
+
border-radius: 0.5rem;
|
| 79 |
+
border: 1px solid #c3e6cb;
|
| 80 |
+
margin: 1rem 0;
|
| 81 |
+
}
|
| 82 |
+
|
| 83 |
+
.error-message {
|
| 84 |
+
background-color: #f8d7da;
|
| 85 |
+
color: #721c24;
|
| 86 |
+
padding: 1rem;
|
| 87 |
+
border-radius: 0.5rem;
|
| 88 |
+
border: 1px solid #f5c6cb;
|
| 89 |
+
margin: 1rem 0;
|
| 90 |
+
}
|
| 91 |
+
|
| 92 |
+
.stDataFrame {
|
| 93 |
+
border-radius: 0.5rem;
|
| 94 |
+
overflow: hidden;
|
| 95 |
+
}
|
| 96 |
+
</style>
|
| 97 |
+
""", unsafe_allow_html=True)
|
| 98 |
+
|
| 99 |
+
class KeywordDashboard:
|
| 100 |
+
"""Main dashboard class for SEO keyword research interface."""
|
| 101 |
+
|
| 102 |
+
def __init__(self):
|
| 103 |
+
"""Initialize the dashboard with necessary configurations."""
|
| 104 |
+
self.setup_directories()
|
| 105 |
+
self.check_environment()
|
| 106 |
+
|
| 107 |
+
def setup_directories(self):
|
| 108 |
+
"""Create necessary output directories."""
|
| 109 |
+
self.output_dir = Path("output")
|
| 110 |
+
self.processed_dir = self.output_dir / "processed"
|
| 111 |
+
self.reports_dir = self.output_dir / "reports"
|
| 112 |
+
|
| 113 |
+
self.output_dir.mkdir(exist_ok=True)
|
| 114 |
+
self.processed_dir.mkdir(exist_ok=True)
|
| 115 |
+
self.reports_dir.mkdir(exist_ok=True)
|
| 116 |
+
|
| 117 |
+
def check_environment(self):
|
| 118 |
+
"""Check if the environment is properly configured."""
|
| 119 |
+
self.api_key = os.getenv("SERPAPI_KEY")
|
| 120 |
+
self.environment_ready = bool(self.api_key)
|
| 121 |
+
|
| 122 |
+
def render_header(self):
|
| 123 |
+
"""Render the main dashboard header."""
|
| 124 |
+
st.markdown('<h1 class="main-header">π SEO Keyword Research Dashboard</h1>',
|
| 125 |
+
unsafe_allow_html=True)
|
| 126 |
+
|
| 127 |
+
if not self.environment_ready:
|
| 128 |
+
st.markdown("""
|
| 129 |
+
<div class="error-message">
|
| 130 |
+
β οΈ <strong>Environment Setup Required</strong><br>
|
| 131 |
+
Please ensure your .env file contains: SERPAPI_KEY=your_key_here
|
| 132 |
+
</div>
|
| 133 |
+
""", unsafe_allow_html=True)
|
| 134 |
+
return False
|
| 135 |
+
|
| 136 |
+
st.markdown("""
|
| 137 |
+
<div class="success-message">
|
| 138 |
+
β
<strong>Environment Ready</strong><br>
|
| 139 |
+
API key detected and ready for keyword research.
|
| 140 |
+
</div>
|
| 141 |
+
""", unsafe_allow_html=True)
|
| 142 |
+
return True
|
| 143 |
+
|
| 144 |
+
def render_sidebar(self) -> Dict[str, Any]:
|
| 145 |
+
"""Render the sidebar with input controls."""
|
| 146 |
+
st.sidebar.markdown("## π― Analysis Parameters")
|
| 147 |
+
|
| 148 |
+
# Input parameters
|
| 149 |
+
seed_keyword = st.sidebar.text_input(
|
| 150 |
+
"π Seed Keyword",
|
| 151 |
+
value="global internship",
|
| 152 |
+
help="Enter the main keyword to research"
|
| 153 |
+
)
|
| 154 |
+
|
| 155 |
+
max_candidates = st.sidebar.slider(
|
| 156 |
+
"π Max Candidates",
|
| 157 |
+
min_value=20,
|
| 158 |
+
max_value=300,
|
| 159 |
+
value=120,
|
| 160 |
+
step=10,
|
| 161 |
+
help="Maximum number of keyword candidates to analyze"
|
| 162 |
+
)
|
| 163 |
+
|
| 164 |
+
top_results = st.sidebar.slider(
|
| 165 |
+
"π Top Results",
|
| 166 |
+
min_value=10,
|
| 167 |
+
max_value=100,
|
| 168 |
+
value=50,
|
| 169 |
+
step=5,
|
| 170 |
+
help="Number of top results to display and save"
|
| 171 |
+
)
|
| 172 |
+
|
| 173 |
+
# Advanced options
|
| 174 |
+
st.sidebar.markdown("## βοΈ Advanced Options")
|
| 175 |
+
|
| 176 |
+
use_volume_api = st.sidebar.checkbox(
|
| 177 |
+
"π Use Real Volume API",
|
| 178 |
+
value=False,
|
| 179 |
+
help="Enable when volume API is implemented",
|
| 180 |
+
disabled=True # Disabled until implemented
|
| 181 |
+
)
|
| 182 |
+
|
| 183 |
+
# Filtering options
|
| 184 |
+
st.sidebar.markdown("## π§ Filters")
|
| 185 |
+
|
| 186 |
+
min_search_volume = st.sidebar.number_input(
|
| 187 |
+
"π Min Search Volume",
|
| 188 |
+
min_value=0,
|
| 189 |
+
max_value=10000,
|
| 190 |
+
value=10,
|
| 191 |
+
step=10,
|
| 192 |
+
help="Minimum monthly search volume"
|
| 193 |
+
)
|
| 194 |
+
|
| 195 |
+
max_competition = st.sidebar.slider(
|
| 196 |
+
"βοΈ Max Competition Score",
|
| 197 |
+
min_value=0.0,
|
| 198 |
+
max_value=1.0,
|
| 199 |
+
value=1.0,
|
| 200 |
+
step=0.1,
|
| 201 |
+
help="Maximum competition score (0=easy, 1=hard)"
|
| 202 |
+
)
|
| 203 |
+
|
| 204 |
+
# Run button
|
| 205 |
+
run_analysis = st.sidebar.button(
|
| 206 |
+
"π Run Analysis",
|
| 207 |
+
type="primary",
|
| 208 |
+
help="Start the keyword research analysis"
|
| 209 |
+
)
|
| 210 |
+
|
| 211 |
+
return {
|
| 212 |
+
"seed_keyword": seed_keyword,
|
| 213 |
+
"max_candidates": max_candidates,
|
| 214 |
+
"top_results": top_results,
|
| 215 |
+
"use_volume_api": use_volume_api,
|
| 216 |
+
"min_search_volume": min_search_volume,
|
| 217 |
+
"max_competition": max_competition,
|
| 218 |
+
"run_analysis": run_analysis
|
| 219 |
+
}
|
| 220 |
+
|
| 221 |
+
def run_keyword_analysis(self, params: Dict[str, Any]) -> Optional[pd.DataFrame]:
|
| 222 |
+
"""Run the keyword analysis using the backend pipeline."""
|
| 223 |
+
try:
|
| 224 |
+
# Import the analysis function from app.py
|
| 225 |
+
sys.path.insert(0, str(project_root))
|
| 226 |
+
|
| 227 |
+
# Since we need to reuse the logic from app.py, let's import what we need
|
| 228 |
+
import math
|
| 229 |
+
import csv
|
| 230 |
+
import re
|
| 231 |
+
from serpapi import GoogleSearch
|
| 232 |
+
from dataclasses import dataclass
|
| 233 |
+
|
| 234 |
+
@dataclass
|
| 235 |
+
class KeywordMetrics:
|
| 236 |
+
keyword: str
|
| 237 |
+
monthly_searches: int
|
| 238 |
+
competition_score: float
|
| 239 |
+
opportunity_score: float
|
| 240 |
+
total_results: int
|
| 241 |
+
ads_count: int
|
| 242 |
+
has_featured_snippet: bool
|
| 243 |
+
has_people_also_ask: bool
|
| 244 |
+
has_knowledge_graph: bool
|
| 245 |
+
|
| 246 |
+
# Competition calculator (from your app.py)
|
| 247 |
+
class CompetitionCalculator:
|
| 248 |
+
WEIGHTS = {
|
| 249 |
+
'total_results': 0.50,
|
| 250 |
+
'ads': 0.25,
|
| 251 |
+
'featured_snippet': 0.15,
|
| 252 |
+
'people_also_ask': 0.07,
|
| 253 |
+
'knowledge_graph': 0.03
|
| 254 |
+
}
|
| 255 |
+
|
| 256 |
+
@staticmethod
|
| 257 |
+
def extract_total_results(search_info):
|
| 258 |
+
if not search_info:
|
| 259 |
+
return 0
|
| 260 |
+
|
| 261 |
+
total = (search_info.get("total_results") or
|
| 262 |
+
search_info.get("total_results_raw") or
|
| 263 |
+
search_info.get("total"))
|
| 264 |
+
|
| 265 |
+
if isinstance(total, int):
|
| 266 |
+
return total
|
| 267 |
+
|
| 268 |
+
if isinstance(total, str):
|
| 269 |
+
numbers_only = re.sub(r"[^\d]", "", total)
|
| 270 |
+
try:
|
| 271 |
+
return int(numbers_only) if numbers_only else 0
|
| 272 |
+
except ValueError:
|
| 273 |
+
return 0
|
| 274 |
+
|
| 275 |
+
return 0
|
| 276 |
+
|
| 277 |
+
def calculate_score(self, search_results):
|
| 278 |
+
search_info = search_results.get("search_information", {})
|
| 279 |
+
|
| 280 |
+
total_results = self.extract_total_results(search_info)
|
| 281 |
+
normalized_results = min(math.log10(total_results + 1) / 7, 1.0)
|
| 282 |
+
|
| 283 |
+
ads = search_results.get("ads_results", [])
|
| 284 |
+
ads_count = len(ads) if ads else 0
|
| 285 |
+
ads_score = min(ads_count / 3, 1.0)
|
| 286 |
+
|
| 287 |
+
has_featured_snippet = bool(
|
| 288 |
+
search_results.get("featured_snippet") or
|
| 289 |
+
search_results.get("answer_box")
|
| 290 |
+
)
|
| 291 |
+
|
| 292 |
+
has_people_also_ask = bool(
|
| 293 |
+
search_results.get("related_questions") or
|
| 294 |
+
search_results.get("people_also_ask")
|
| 295 |
+
)
|
| 296 |
+
|
| 297 |
+
has_knowledge_graph = bool(search_results.get("knowledge_graph"))
|
| 298 |
+
|
| 299 |
+
competition_score = (
|
| 300 |
+
self.WEIGHTS['total_results'] * normalized_results +
|
| 301 |
+
self.WEIGHTS['ads'] * ads_score +
|
| 302 |
+
self.WEIGHTS['featured_snippet'] * has_featured_snippet +
|
| 303 |
+
self.WEIGHTS['people_also_ask'] * has_people_also_ask +
|
| 304 |
+
self.WEIGHTS['knowledge_graph'] * has_knowledge_graph
|
| 305 |
+
)
|
| 306 |
+
|
| 307 |
+
competition_score = max(0.0, min(1.0, competition_score))
|
| 308 |
+
|
| 309 |
+
breakdown = {
|
| 310 |
+
"total_results": total_results,
|
| 311 |
+
"ads_count": ads_count,
|
| 312 |
+
"has_featured_snippet": has_featured_snippet,
|
| 313 |
+
"has_people_also_ask": has_people_also_ask,
|
| 314 |
+
"has_knowledge_graph": has_knowledge_graph
|
| 315 |
+
}
|
| 316 |
+
|
| 317 |
+
return competition_score, breakdown
|
| 318 |
+
|
| 319 |
+
def find_related_keywords(seed_keyword, max_results=120):
|
| 320 |
+
progress_placeholder = st.empty()
|
| 321 |
+
progress_placeholder.info(f"π Finding related keywords for: '{seed_keyword}'...")
|
| 322 |
+
|
| 323 |
+
search_params = {
|
| 324 |
+
"engine": "google",
|
| 325 |
+
"q": seed_keyword,
|
| 326 |
+
"api_key": self.api_key,
|
| 327 |
+
"hl": "en",
|
| 328 |
+
"gl": "us"
|
| 329 |
+
}
|
| 330 |
+
|
| 331 |
+
try:
|
| 332 |
+
search = GoogleSearch(search_params)
|
| 333 |
+
results = search.get_dict()
|
| 334 |
+
except Exception as e:
|
| 335 |
+
progress_placeholder.error(f"β Error getting related keywords: {e}")
|
| 336 |
+
return []
|
| 337 |
+
|
| 338 |
+
keyword_candidates = set()
|
| 339 |
+
|
| 340 |
+
# Extract keywords from different sources
|
| 341 |
+
related_searches = results.get("related_searches", [])
|
| 342 |
+
for item in related_searches:
|
| 343 |
+
query = item.get("query") or item.get("suggestion")
|
| 344 |
+
if query and len(query.strip()) > 0:
|
| 345 |
+
keyword_candidates.add(query.strip())
|
| 346 |
+
|
| 347 |
+
related_questions = results.get("related_questions", [])
|
| 348 |
+
for item in related_questions:
|
| 349 |
+
question = item.get("question") or item.get("query")
|
| 350 |
+
if question and len(question.strip()) > 0:
|
| 351 |
+
keyword_candidates.add(question.strip())
|
| 352 |
+
|
| 353 |
+
organic_results = results.get("organic_results", [])
|
| 354 |
+
for result in organic_results[:10]:
|
| 355 |
+
title = result.get("title", "")
|
| 356 |
+
if title and len(title.strip()) > 0:
|
| 357 |
+
keyword_candidates.add(title.strip())
|
| 358 |
+
|
| 359 |
+
final_keywords = list(keyword_candidates)[:max_results]
|
| 360 |
+
progress_placeholder.success(f"β
Found {len(final_keywords)} keyword candidates")
|
| 361 |
+
return final_keywords
|
| 362 |
+
|
| 363 |
+
def analyze_keywords_batch(keywords):
|
| 364 |
+
calculator = CompetitionCalculator()
|
| 365 |
+
analyzed_keywords = []
|
| 366 |
+
|
| 367 |
+
progress_bar = st.progress(0)
|
| 368 |
+
status_text = st.empty()
|
| 369 |
+
|
| 370 |
+
for i, keyword in enumerate(keywords):
|
| 371 |
+
progress = (i + 1) / len(keywords)
|
| 372 |
+
progress_bar.progress(progress)
|
| 373 |
+
status_text.text(f"Analyzing keyword {i+1}/{len(keywords)}: {keyword}")
|
| 374 |
+
|
| 375 |
+
# Search for keyword
|
| 376 |
+
search_params = {
|
| 377 |
+
"engine": "google",
|
| 378 |
+
"q": keyword,
|
| 379 |
+
"api_key": self.api_key,
|
| 380 |
+
"hl": "en",
|
| 381 |
+
"gl": "us",
|
| 382 |
+
"num": 10
|
| 383 |
+
}
|
| 384 |
+
|
| 385 |
+
try:
|
| 386 |
+
search = GoogleSearch(search_params)
|
| 387 |
+
search_results = search.get_dict()
|
| 388 |
+
except Exception as e:
|
| 389 |
+
continue
|
| 390 |
+
|
| 391 |
+
# Calculate competition
|
| 392 |
+
competition_score, breakdown = calculator.calculate_score(search_results)
|
| 393 |
+
|
| 394 |
+
# Estimate volume
|
| 395 |
+
word_count = len(keyword.split())
|
| 396 |
+
search_volume = max(10, 10000 // (word_count + 1))
|
| 397 |
+
|
| 398 |
+
# Calculate opportunity score
|
| 399 |
+
volume_score = math.log10(search_volume + 1)
|
| 400 |
+
opportunity_score = volume_score / (competition_score + 0.01)
|
| 401 |
+
|
| 402 |
+
metrics = KeywordMetrics(
|
| 403 |
+
keyword=keyword,
|
| 404 |
+
monthly_searches=search_volume,
|
| 405 |
+
competition_score=round(competition_score, 4),
|
| 406 |
+
opportunity_score=round(opportunity_score, 2),
|
| 407 |
+
total_results=breakdown["total_results"],
|
| 408 |
+
ads_count=breakdown["ads_count"],
|
| 409 |
+
has_featured_snippet=breakdown["has_featured_snippet"],
|
| 410 |
+
has_people_also_ask=breakdown["has_people_also_ask"],
|
| 411 |
+
has_knowledge_graph=breakdown["has_knowledge_graph"]
|
| 412 |
+
)
|
| 413 |
+
|
| 414 |
+
analyzed_keywords.append(metrics)
|
| 415 |
+
|
| 416 |
+
progress_bar.empty()
|
| 417 |
+
status_text.empty()
|
| 418 |
+
|
| 419 |
+
# Sort by opportunity score
|
| 420 |
+
analyzed_keywords.sort(key=lambda x: x.opportunity_score, reverse=True)
|
| 421 |
+
return analyzed_keywords
|
| 422 |
+
|
| 423 |
+
# Run the analysis
|
| 424 |
+
with st.spinner("π Discovering related keywords..."):
|
| 425 |
+
related_keywords = find_related_keywords(
|
| 426 |
+
params["seed_keyword"],
|
| 427 |
+
params["max_candidates"]
|
| 428 |
+
)
|
| 429 |
+
|
| 430 |
+
if not related_keywords:
|
| 431 |
+
st.error("β No keyword candidates found. Please check your API key and try again.")
|
| 432 |
+
return None
|
| 433 |
+
|
| 434 |
+
with st.spinner("π Analyzing keywords and calculating scores..."):
|
| 435 |
+
analyzed_keywords = analyze_keywords_batch(related_keywords)
|
| 436 |
+
|
| 437 |
+
if not analyzed_keywords:
|
| 438 |
+
st.error("β No keywords were successfully analyzed.")
|
| 439 |
+
return None
|
| 440 |
+
|
| 441 |
+
# Convert to DataFrame
|
| 442 |
+
data = []
|
| 443 |
+
for metrics in analyzed_keywords:
|
| 444 |
+
data.append({
|
| 445 |
+
'Keyword': metrics.keyword,
|
| 446 |
+
'Monthly Searches': metrics.monthly_searches,
|
| 447 |
+
'Competition': metrics.competition_score,
|
| 448 |
+
'Opportunity Score': metrics.opportunity_score,
|
| 449 |
+
'Total Results': metrics.total_results,
|
| 450 |
+
'Ads Count': metrics.ads_count,
|
| 451 |
+
'Featured Snippet': 'Yes' if metrics.has_featured_snippet else 'No',
|
| 452 |
+
'People Also Ask': 'Yes' if metrics.has_people_also_ask else 'No',
|
| 453 |
+
'Knowledge Graph': 'Yes' if metrics.has_knowledge_graph else 'No'
|
| 454 |
+
})
|
| 455 |
+
|
| 456 |
+
df = pd.DataFrame(data)
|
| 457 |
+
|
| 458 |
+
# Apply filters
|
| 459 |
+
df = df[
|
| 460 |
+
(df['Monthly Searches'] >= params['min_search_volume']) &
|
| 461 |
+
(df['Competition'] <= params['max_competition'])
|
| 462 |
+
]
|
| 463 |
+
|
| 464 |
+
return df
|
| 465 |
+
|
| 466 |
+
except Exception as e:
|
| 467 |
+
st.error(f"β Analysis failed: {str(e)}")
|
| 468 |
+
return None
|
| 469 |
+
|
| 470 |
+
def add_enhancement_columns(self, df: pd.DataFrame) -> pd.DataFrame:
|
| 471 |
+
"""Add intent and tail classification columns."""
|
| 472 |
+
def classify_intent(keyword):
|
| 473 |
+
if not keyword:
|
| 474 |
+
return "informational"
|
| 475 |
+
|
| 476 |
+
k = keyword.lower()
|
| 477 |
+
if any(signal in k for signal in ["how to", "what is", "why", "guide", "tutorial"]):
|
| 478 |
+
return "informational"
|
| 479 |
+
if any(signal in k for signal in ["buy", "price", "cost", "apply", "register"]):
|
| 480 |
+
return "transactional"
|
| 481 |
+
if any(signal in k for signal in ["best", "top", "compare", "vs", "reviews"]):
|
| 482 |
+
return "commercial"
|
| 483 |
+
return "informational"
|
| 484 |
+
|
| 485 |
+
def classify_tail(keyword):
|
| 486 |
+
if not keyword:
|
| 487 |
+
return "short-tail"
|
| 488 |
+
word_count = len(str(keyword).split())
|
| 489 |
+
if word_count >= 4:
|
| 490 |
+
return "long-tail"
|
| 491 |
+
elif word_count == 3:
|
| 492 |
+
return "mid-tail"
|
| 493 |
+
else:
|
| 494 |
+
return "short-tail"
|
| 495 |
+
|
| 496 |
+
df['Intent'] = df['Keyword'].apply(classify_intent)
|
| 497 |
+
df['Tail'] = df['Keyword'].apply(classify_tail)
|
| 498 |
+
|
| 499 |
+
return df
|
| 500 |
+
|
| 501 |
+
def render_summary_metrics(self, df: pd.DataFrame):
|
| 502 |
+
"""Render summary metrics cards."""
|
| 503 |
+
col1, col2, col3, col4 = st.columns(4)
|
| 504 |
+
|
| 505 |
+
with col1:
|
| 506 |
+
st.markdown("""
|
| 507 |
+
<div class="metric-card">
|
| 508 |
+
<h3>π Total Keywords</h3>
|
| 509 |
+
<h2 style="color: #1f77b4;">{}</h2>
|
| 510 |
+
</div>
|
| 511 |
+
""".format(len(df)), unsafe_allow_html=True)
|
| 512 |
+
|
| 513 |
+
with col2:
|
| 514 |
+
avg_score = df['Opportunity Score'].mean()
|
| 515 |
+
st.markdown("""
|
| 516 |
+
<div class="metric-card">
|
| 517 |
+
<h3>β Avg Opportunity Score</h3>
|
| 518 |
+
<h2 style="color: #ff7f0e;">{:.2f}</h2>
|
| 519 |
+
</div>
|
| 520 |
+
""".format(avg_score), unsafe_allow_html=True)
|
| 521 |
+
|
| 522 |
+
with col3:
|
| 523 |
+
high_opportunity = len(df[df['Opportunity Score'] > 50])
|
| 524 |
+
st.markdown("""
|
| 525 |
+
<div class="metric-card">
|
| 526 |
+
<h3>π High Opportunity</h3>
|
| 527 |
+
<h2 style="color: #2ca02c;">{}</h2>
|
| 528 |
+
</div>
|
| 529 |
+
""".format(high_opportunity), unsafe_allow_html=True)
|
| 530 |
+
|
| 531 |
+
with col4:
|
| 532 |
+
long_tail = len(df[df['Tail'] == 'long-tail'])
|
| 533 |
+
st.markdown("""
|
| 534 |
+
<div class="metric-card">
|
| 535 |
+
<h3>π― Long-tail Keywords</h3>
|
| 536 |
+
<h2 style="color: #d62728;">{}</h2>
|
| 537 |
+
</div>
|
| 538 |
+
""".format(long_tail), unsafe_allow_html=True)
|
| 539 |
+
|
| 540 |
+
def render_top_keywords_table(self, df: pd.DataFrame, top_n: int = 10):
|
| 541 |
+
"""Render the top keywords table with styling."""
|
| 542 |
+
st.markdown("## π Top Keyword Opportunities")
|
| 543 |
+
|
| 544 |
+
if df.empty:
|
| 545 |
+
st.warning("No keywords to display.")
|
| 546 |
+
return
|
| 547 |
+
|
| 548 |
+
# Prepare display DataFrame
|
| 549 |
+
display_df = df.head(top_n).copy()
|
| 550 |
+
|
| 551 |
+
# Format columns for better display
|
| 552 |
+
display_df['Monthly Searches'] = display_df['Monthly Searches'].apply(lambda x: f"{x:,}")
|
| 553 |
+
display_df['Total Results'] = display_df['Total Results'].apply(lambda x: f"{x:,}")
|
| 554 |
+
|
| 555 |
+
# Style the dataframe
|
| 556 |
+
def highlight_max_score(s):
|
| 557 |
+
is_max = s == s.max()
|
| 558 |
+
return ['background-color: lightgreen' if v else '' for v in is_max]
|
| 559 |
+
|
| 560 |
+
styled_df = display_df.style.apply(
|
| 561 |
+
highlight_max_score,
|
| 562 |
+
subset=['Opportunity Score']
|
| 563 |
+
).format({
|
| 564 |
+
'Competition': '{:.3f}',
|
| 565 |
+
'Opportunity Score': '{:.2f}'
|
| 566 |
+
})
|
| 567 |
+
|
| 568 |
+
st.dataframe(styled_df, use_container_width=True)
|
| 569 |
+
|
| 570 |
+
def render_visualizations(self, df: pd.DataFrame):
|
| 571 |
+
"""Render interactive charts and visualizations."""
|
| 572 |
+
if df.empty:
|
| 573 |
+
st.warning("No data available for visualization.")
|
| 574 |
+
return
|
| 575 |
+
|
| 576 |
+
# Chart selection tabs
|
| 577 |
+
chart_tab1, chart_tab2, chart_tab3 = st.tabs(["π Opportunity Scores", "π― Intent Analysis", "πΉ Volume vs Competition"])
|
| 578 |
+
|
| 579 |
+
with chart_tab1:
|
| 580 |
+
st.markdown("### Top 10 Keywords by Opportunity Score")
|
| 581 |
+
top_10 = df.head(10)
|
| 582 |
+
|
| 583 |
+
fig = px.bar(
|
| 584 |
+
top_10,
|
| 585 |
+
x='Opportunity Score',
|
| 586 |
+
y='Keyword',
|
| 587 |
+
orientation='h',
|
| 588 |
+
title="Top 10 Keyword Opportunities",
|
| 589 |
+
color='Opportunity Score',
|
| 590 |
+
color_continuous_scale='viridis'
|
| 591 |
+
)
|
| 592 |
+
fig.update_layout(height=500, yaxis={'categoryorder': 'total ascending'})
|
| 593 |
+
st.plotly_chart(fig, use_container_width=True)
|
| 594 |
+
|
| 595 |
+
with chart_tab2:
|
| 596 |
+
st.markdown("### Intent Distribution")
|
| 597 |
+
col1, col2 = st.columns(2)
|
| 598 |
+
|
| 599 |
+
with col1:
|
| 600 |
+
intent_counts = df['Intent'].value_counts()
|
| 601 |
+
fig_pie = px.pie(
|
| 602 |
+
values=intent_counts.values,
|
| 603 |
+
names=intent_counts.index,
|
| 604 |
+
title="Search Intent Distribution",
|
| 605 |
+
color_discrete_sequence=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']
|
| 606 |
+
)
|
| 607 |
+
st.plotly_chart(fig_pie, use_container_width=True)
|
| 608 |
+
|
| 609 |
+
with col2:
|
| 610 |
+
tail_counts = df['Tail'].value_counts()
|
| 611 |
+
fig_tail = px.pie(
|
| 612 |
+
values=tail_counts.values,
|
| 613 |
+
names=tail_counts.index,
|
| 614 |
+
title="Keyword Tail Distribution",
|
| 615 |
+
color_discrete_sequence=['#9467bd', '#8c564b', '#e377c2']
|
| 616 |
+
)
|
| 617 |
+
st.plotly_chart(fig_tail, use_container_width=True)
|
| 618 |
+
|
| 619 |
+
with chart_tab3:
|
| 620 |
+
st.markdown("### Search Volume vs Competition Analysis")
|
| 621 |
+
|
| 622 |
+
fig_scatter = px.scatter(
|
| 623 |
+
df.head(50), # Limit to top 50 for readability
|
| 624 |
+
x='Competition',
|
| 625 |
+
y='Monthly Searches',
|
| 626 |
+
size='Opportunity Score',
|
| 627 |
+
color='Intent',
|
| 628 |
+
hover_name='Keyword',
|
| 629 |
+
title="Search Volume vs Competition (Size = Opportunity Score)",
|
| 630 |
+
labels={'Competition': 'Competition Score', 'Monthly Searches': 'Est. Monthly Searches'}
|
| 631 |
+
)
|
| 632 |
+
fig_scatter.update_layout(height=500)
|
| 633 |
+
st.plotly_chart(fig_scatter, use_container_width=True)
|
| 634 |
+
|
| 635 |
+
def save_results(self, df: pd.DataFrame, params: Dict[str, Any]) -> Tuple[str, str, str]:
|
| 636 |
+
"""Save results to files and return file paths."""
|
| 637 |
+
if df.empty:
|
| 638 |
+
return None, None, None
|
| 639 |
+
|
| 640 |
+
# Generate file names
|
| 641 |
+
today = date.today().isoformat()
|
| 642 |
+
safe_seed = re.sub(r"[^\w\s-]", "", params['seed_keyword']).strip().replace(" ", "_")[:30]
|
| 643 |
+
base_name = f"keywords_{safe_seed}_{today}"
|
| 644 |
+
|
| 645 |
+
# File paths
|
| 646 |
+
csv_path = self.processed_dir / f"{base_name}.csv"
|
| 647 |
+
excel_path = self.processed_dir / f"{base_name}.xlsx"
|
| 648 |
+
report_path = self.reports_dir / f"{base_name}_report.json"
|
| 649 |
+
|
| 650 |
+
try:
|
| 651 |
+
# Save CSV
|
| 652 |
+
df.to_csv(csv_path, index=False)
|
| 653 |
+
|
| 654 |
+
# Save Excel with multiple sheets
|
| 655 |
+
with pd.ExcelWriter(excel_path, engine='openpyxl') as writer:
|
| 656 |
+
df.head(params['top_results']).to_excel(writer, sheet_name='Top_Results', index=False)
|
| 657 |
+
df.to_excel(writer, sheet_name='All_Keywords', index=False)
|
| 658 |
+
|
| 659 |
+
# Summary sheet
|
| 660 |
+
summary_data = {
|
| 661 |
+
'Metric': [
|
| 662 |
+
'Total Keywords',
|
| 663 |
+
'Average Opportunity Score',
|
| 664 |
+
'High Opportunity Keywords (>50)',
|
| 665 |
+
'Long-tail Keywords',
|
| 666 |
+
'Informational Intent',
|
| 667 |
+
'Commercial Intent',
|
| 668 |
+
'Transactional Intent'
|
| 669 |
+
],
|
| 670 |
+
'Value': [
|
| 671 |
+
len(df),
|
| 672 |
+
round(df['Opportunity Score'].mean(), 2),
|
| 673 |
+
len(df[df['Opportunity Score'] > 50]),
|
| 674 |
+
len(df[df['Tail'] == 'long-tail']),
|
| 675 |
+
len(df[df['Intent'] == 'informational']),
|
| 676 |
+
len(df[df['Intent'] == 'commercial']),
|
| 677 |
+
len(df[df['Intent'] == 'transactional'])
|
| 678 |
+
]
|
| 679 |
+
}
|
| 680 |
+
pd.DataFrame(summary_data).to_excel(writer, sheet_name='Summary', index=False)
|
| 681 |
+
|
| 682 |
+
# Save JSON report
|
| 683 |
+
report_data = {
|
| 684 |
+
'analysis_date': datetime.now().isoformat(),
|
| 685 |
+
'seed_keyword': params['seed_keyword'],
|
| 686 |
+
'parameters': {
|
| 687 |
+
'max_candidates': params['max_candidates'],
|
| 688 |
+
'top_results': params['top_results'],
|
| 689 |
+
'min_search_volume': params['min_search_volume'],
|
| 690 |
+
'max_competition': params['max_competition']
|
| 691 |
+
},
|
| 692 |
+
'summary': {
|
| 693 |
+
'total_keywords': len(df),
|
| 694 |
+
'average_opportunity_score': float(df['Opportunity Score'].mean()),
|
| 695 |
+
'top_keyword': df.iloc[0]['Keyword'] if not df.empty else None,
|
| 696 |
+
'intent_distribution': df['Intent'].value_counts().to_dict(),
|
| 697 |
+
'tail_distribution': df['Tail'].value_counts().to_dict()
|
| 698 |
+
}
|
| 699 |
+
}
|
| 700 |
+
|
| 701 |
+
with open(report_path, 'w', encoding='utf-8') as f:
|
| 702 |
+
json.dump(report_data, f, indent=2, ensure_ascii=False)
|
| 703 |
+
|
| 704 |
+
return str(csv_path), str(excel_path), str(report_path)
|
| 705 |
+
|
| 706 |
+
except Exception as e:
|
| 707 |
+
st.error(f"β Error saving files: {e}")
|
| 708 |
+
return None, None, None
|
| 709 |
+
|
| 710 |
+
def render_download_section(self, csv_path: str, excel_path: str, report_path: str):
|
| 711 |
+
"""Render download buttons for generated files."""
|
| 712 |
+
st.markdown("## π₯ Download Results")
|
| 713 |
+
|
| 714 |
+
col1, col2, col3 = st.columns(3)
|
| 715 |
+
|
| 716 |
+
if csv_path and os.path.exists(csv_path):
|
| 717 |
+
with col1:
|
| 718 |
+
with open(csv_path, 'rb') as file:
|
| 719 |
+
st.download_button(
|
| 720 |
+
label="π Download CSV",
|
| 721 |
+
data=file.read(),
|
| 722 |
+
file_name=os.path.basename(csv_path),
|
| 723 |
+
mime="text/csv"
|
| 724 |
+
)
|
| 725 |
+
|
| 726 |
+
if excel_path and os.path.exists(excel_path):
|
| 727 |
+
with col2:
|
| 728 |
+
with open(excel_path, 'rb') as file:
|
| 729 |
+
st.download_button(
|
| 730 |
+
label="π Download Excel",
|
| 731 |
+
data=file.read(),
|
| 732 |
+
file_name=os.path.basename(excel_path),
|
| 733 |
+
mime="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
|
| 734 |
+
)
|
| 735 |
+
|
| 736 |
+
if report_path and os.path.exists(report_path):
|
| 737 |
+
with col3:
|
| 738 |
+
with open(report_path, 'rb') as file:
|
| 739 |
+
st.download_button(
|
| 740 |
+
label="π Download Report",
|
| 741 |
+
data=file.read(),
|
| 742 |
+
file_name=os.path.basename(report_path),
|
| 743 |
+
mime="application/json"
|
| 744 |
+
)
|
| 745 |
+
|
| 746 |
+
def run(self):
|
| 747 |
+
"""Main dashboard execution method."""
|
| 748 |
+
# Render header
|
| 749 |
+
if not self.render_header():
|
| 750 |
+
st.stop()
|
| 751 |
+
|
| 752 |
+
# Render sidebar
|
| 753 |
+
params = self.render_sidebar()
|
| 754 |
+
|
| 755 |
+
# Main content area
|
| 756 |
+
if params["run_analysis"]:
|
| 757 |
+
# Store analysis state
|
| 758 |
+
if 'analysis_complete' not in st.session_state:
|
| 759 |
+
st.session_state.analysis_complete = False
|
| 760 |
+
|
| 761 |
+
# Run analysis
|
| 762 |
+
df = self.run_keyword_analysis(params)
|
| 763 |
+
|
| 764 |
+
if df is not None and not df.empty:
|
| 765 |
+
# Add enhancement columns
|
| 766 |
+
df = self.add_enhancement_columns(df)
|
| 767 |
+
|
| 768 |
+
# Store results in session state
|
| 769 |
+
st.session_state.results_df = df
|
| 770 |
+
st.session_state.analysis_params = params
|
| 771 |
+
st.session_state.analysis_complete = True
|
| 772 |
+
|
| 773 |
+
# Success message
|
| 774 |
+
st.success(f"β
Analysis complete! Found {len(df)} keywords matching your criteria.")
|
| 775 |
+
|
| 776 |
+
# Display results if analysis is complete
|
| 777 |
+
if st.session_state.get('analysis_complete', False) and 'results_df' in st.session_state:
|
| 778 |
+
df = st.session_state.results_df
|
| 779 |
+
params = st.session_state.analysis_params
|
| 780 |
+
|
| 781 |
+
# Render summary metrics
|
| 782 |
+
self.render_summary_metrics(df)
|
| 783 |
+
|
| 784 |
+
# Create view toggle
|
| 785 |
+
view_option = st.radio("π Choose View", ["Table View", "Chart View"], horizontal=True)
|
| 786 |
+
|
| 787 |
+
if view_option == "Table View":
|
| 788 |
+
self.render_top_keywords_table(df, params['top_results'])
|
| 789 |
+
else:
|
| 790 |
+
self.render_visualizations(df)
|
| 791 |
+
|
| 792 |
+
# Save results and provide downloads
|
| 793 |
+
with st.spinner("πΎ Preparing download files..."):
|
| 794 |
+
csv_path, excel_path, report_path = self.save_results(df, params)
|
| 795 |
+
|
| 796 |
+
if csv_path:
|
| 797 |
+
self.render_download_section(csv_path, excel_path, report_path)
|
| 798 |
+
|
| 799 |
+
elif not st.session_state.get('analysis_complete', False):
|
| 800 |
+
# Show welcome message
|
| 801 |
+
st.markdown("""
|
| 802 |
+
## π Welcome to the SEO Keyword Research Dashboard
|
| 803 |
+
|
| 804 |
+
This dashboard helps you discover and analyze keyword opportunities using advanced SEO metrics.
|
| 805 |
+
|
| 806 |
+
### π Getting Started:
|
| 807 |
+
1. **Enter your seed keyword** in the sidebar (e.g., "digital marketing")
|
| 808 |
+
2. **Adjust analysis parameters** (candidates, results, filters)
|
| 809 |
+
3. **Click "Run Analysis"** to start the keyword research
|
| 810 |
+
4. **Explore results** through tables and interactive charts
|
| 811 |
+
5. **Download reports** in CSV, Excel, or JSON format
|
| 812 |
+
|
| 813 |
+
### π Features:
|
| 814 |
+
- **Real-time keyword discovery** using SerpAPI
|
| 815 |
+
- **Competition analysis** based on SERP features
|
| 816 |
+
- **Intent classification** (informational, commercial, transactional)
|
| 817 |
+
- **Interactive visualizations** with Plotly charts
|
| 818 |
+
- **Advanced filtering** by volume and competition
|
| 819 |
+
- **Multi-format exports** (CSV, Excel, JSON reports)
|
| 820 |
+
""")
|
| 821 |
+
|
| 822 |
+
|
| 823 |
+
def main():
|
| 824 |
+
"""Main function to run the Streamlit dashboard."""
|
| 825 |
+
dashboard = KeywordDashboard()
|
| 826 |
+
dashboard.run()
|
| 827 |
+
|
| 828 |
+
|
| 829 |
+
if __name__ == "__main__":
|
| 830 |
+
main()
|
|
File without changes
|
|
@@ -0,0 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import os
|
| 2 |
+
from dotenv import load_dotenv
|
| 3 |
+
|
| 4 |
+
# Load environment variables from .env file
|
| 5 |
+
load_dotenv()
|
| 6 |
+
|
| 7 |
+
def main():
|
| 8 |
+
# Get the API key from environment variables
|
| 9 |
+
api_key = os.getenv("SERPAPI_KEY")
|
| 10 |
+
|
| 11 |
+
if api_key:
|
| 12 |
+
print("β
Project setup complete!")
|
| 13 |
+
print(f"API key loaded: {api_key[:5]}...")
|
| 14 |
+
else:
|
| 15 |
+
print("β Warning: API_KEY not found in environment variables")
|
| 16 |
+
print("Make sure you have a .env file with your API_KEY")
|
| 17 |
+
|
| 18 |
+
if __name__ == "__main__":
|
| 19 |
+
main()
|
|
@@ -0,0 +1,366 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# src/postprocess.py
|
| 2 |
+
"""
|
| 3 |
+
Post-processing tool for keyword research results
|
| 4 |
+
Cleans, annotates, and formats CSV output for professional presentation
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import pandas as pd
|
| 8 |
+
from datetime import date, datetime
|
| 9 |
+
import os
|
| 10 |
+
import re
|
| 11 |
+
import json
|
| 12 |
+
|
| 13 |
+
# Install these if you haven't: pip install pandas openpyxl tabulate
|
| 14 |
+
try:
|
| 15 |
+
from tabulate import tabulate
|
| 16 |
+
TABULATE_AVAILABLE = True
|
| 17 |
+
except ImportError:
|
| 18 |
+
TABULATE_AVAILABLE = False
|
| 19 |
+
print("Note: Install 'tabulate' for prettier table output: pip install tabulate")
|
| 20 |
+
|
| 21 |
+
try:
|
| 22 |
+
import openpyxl
|
| 23 |
+
EXCEL_AVAILABLE = True
|
| 24 |
+
except ImportError:
|
| 25 |
+
EXCEL_AVAILABLE = False
|
| 26 |
+
print("Note: Install 'openpyxl' for Excel export: pip install openpyxl")
|
| 27 |
+
|
| 28 |
+
# Configuration
|
| 29 |
+
BRAND_KEYWORDS = {
|
| 30 |
+
"linkedin", "indeed", "glassdoor", "ucla", "asu", "berkeley",
|
| 31 |
+
"hennge", "ciee", "google", "facebook", "microsoft", "amazon",
|
| 32 |
+
"apple", "netflix", "spotify", "youtube", "instagram", "twitter"
|
| 33 |
+
}
|
| 34 |
+
OUTPUT_DIR = "results" # Directory to save processed files
|
| 35 |
+
|
| 36 |
+
def normalize_keyword(keyword):
|
| 37 |
+
"""Clean and normalize keyword text"""
|
| 38 |
+
if not keyword or pd.isna(keyword):
|
| 39 |
+
return ""
|
| 40 |
+
return str(keyword).strip()
|
| 41 |
+
|
| 42 |
+
def is_brand_query(keyword, brand_set=BRAND_KEYWORDS):
|
| 43 |
+
"""
|
| 44 |
+
Check if keyword is a brand/navigational query
|
| 45 |
+
These are harder to rank for if you're not that brand
|
| 46 |
+
"""
|
| 47 |
+
if not keyword:
|
| 48 |
+
return False
|
| 49 |
+
|
| 50 |
+
keyword_lower = keyword.lower()
|
| 51 |
+
|
| 52 |
+
# Check if any brand name appears in keyword
|
| 53 |
+
for brand in brand_set:
|
| 54 |
+
if brand in keyword_lower:
|
| 55 |
+
return True
|
| 56 |
+
|
| 57 |
+
# Check for domains (.com, .edu, etc.)
|
| 58 |
+
if re.search(r"\.(com|edu|org|net|gov|io)\b", keyword_lower):
|
| 59 |
+
return True
|
| 60 |
+
|
| 61 |
+
return False
|
| 62 |
+
|
| 63 |
+
def classify_search_intent(keyword):
|
| 64 |
+
"""
|
| 65 |
+
Classify keyword by search intent:
|
| 66 |
+
- informational: seeking information
|
| 67 |
+
- commercial: researching before buying
|
| 68 |
+
- transactional: ready to take action
|
| 69 |
+
- navigational: looking for specific site/brand
|
| 70 |
+
"""
|
| 71 |
+
if not keyword:
|
| 72 |
+
return "informational"
|
| 73 |
+
|
| 74 |
+
keyword_lower = keyword.lower()
|
| 75 |
+
|
| 76 |
+
# Informational intent signals
|
| 77 |
+
if any(signal in keyword_lower for signal in [
|
| 78 |
+
"how to", "what is", "why", "are", "do ", "does ", "can ",
|
| 79 |
+
"guide", "tutorial", "learn", "definition", "meaning"
|
| 80 |
+
]):
|
| 81 |
+
return "informational"
|
| 82 |
+
|
| 83 |
+
# Transactional intent signals
|
| 84 |
+
if any(signal in keyword_lower for signal in [
|
| 85 |
+
"buy", "price", "cost", "apply", "register", "admission",
|
| 86 |
+
"apply now", "enroll", "join", "signup", "book", "order"
|
| 87 |
+
]):
|
| 88 |
+
return "transactional"
|
| 89 |
+
|
| 90 |
+
# Commercial intent signals
|
| 91 |
+
if any(signal in keyword_lower for signal in [
|
| 92 |
+
"best", "top", "compare", "vs", "reviews", "review",
|
| 93 |
+
"cheap", "affordable", "discount", "deal"
|
| 94 |
+
]):
|
| 95 |
+
return "commercial"
|
| 96 |
+
|
| 97 |
+
# Navigational intent (brand queries)
|
| 98 |
+
if is_brand_query(keyword):
|
| 99 |
+
return "navigational"
|
| 100 |
+
|
| 101 |
+
# Default to informational
|
| 102 |
+
return "informational"
|
| 103 |
+
|
| 104 |
+
def classify_keyword_tail(keyword):
|
| 105 |
+
"""
|
| 106 |
+
Classify keyword by tail length:
|
| 107 |
+
- short-tail: 1-2 words (high competition, high volume)
|
| 108 |
+
- mid-tail: 3 words (moderate competition/volume)
|
| 109 |
+
- long-tail: 4+ words (low competition, low volume)
|
| 110 |
+
"""
|
| 111 |
+
if not keyword:
|
| 112 |
+
return "short-tail"
|
| 113 |
+
|
| 114 |
+
word_count = len(str(keyword).split())
|
| 115 |
+
|
| 116 |
+
if word_count >= 4:
|
| 117 |
+
return "long-tail"
|
| 118 |
+
elif word_count == 3:
|
| 119 |
+
return "mid-tail"
|
| 120 |
+
else:
|
| 121 |
+
return "short-tail"
|
| 122 |
+
|
| 123 |
+
def format_large_number(number):
|
| 124 |
+
"""Format large numbers with commas for readability"""
|
| 125 |
+
try:
|
| 126 |
+
return f"{int(number):,}"
|
| 127 |
+
except (ValueError, TypeError):
|
| 128 |
+
return str(number)
|
| 129 |
+
|
| 130 |
+
def clean_and_process_dataframe(df, seed_keyword):
|
| 131 |
+
"""Main processing function to clean and enhance the dataframe"""
|
| 132 |
+
|
| 133 |
+
# Make a copy to avoid modifying original
|
| 134 |
+
df = df.copy()
|
| 135 |
+
|
| 136 |
+
print("π§Ή Cleaning and processing data...")
|
| 137 |
+
|
| 138 |
+
# 1. Normalize keywords and remove duplicates
|
| 139 |
+
df["Keyword"] = df["Keyword"].astype(str).apply(normalize_keyword)
|
| 140 |
+
|
| 141 |
+
# Remove empty keywords
|
| 142 |
+
df = df[df["Keyword"].str.len() > 0]
|
| 143 |
+
|
| 144 |
+
# Sort by Opportunity Score and remove duplicates (keep highest score)
|
| 145 |
+
df = df.sort_values(by="Opportunity Score", ascending=False)
|
| 146 |
+
df = df.drop_duplicates(subset=["Keyword"], keep="first")
|
| 147 |
+
|
| 148 |
+
# 2. Fix data types and handle missing values
|
| 149 |
+
|
| 150 |
+
# Monthly Searches: convert to int, fill missing with 0
|
| 151 |
+
df["Monthly Searches"] = pd.to_numeric(df["Monthly Searches"], errors="coerce").fillna(0).astype(int)
|
| 152 |
+
|
| 153 |
+
# Competition: round to 4 decimal places
|
| 154 |
+
df["Competition"] = pd.to_numeric(df["Competition"], errors="coerce").fillna(0.0).round(4)
|
| 155 |
+
|
| 156 |
+
# Opportunity Score: round to 2 decimal places for readability
|
| 157 |
+
df["Opportunity Score"] = pd.to_numeric(df["Opportunity Score"], errors="coerce").fillna(0.0).round(2)
|
| 158 |
+
|
| 159 |
+
# Google Results: clean and convert to int
|
| 160 |
+
if "Google Results" in df.columns:
|
| 161 |
+
# Remove any non-digit characters and convert to int
|
| 162 |
+
df["Google Results"] = df["Google Results"].astype(str).str.replace(r"[^\d]", "", regex=True)
|
| 163 |
+
df["Google Results"] = pd.to_numeric(df["Google Results"], errors="coerce").fillna(0).astype(int)
|
| 164 |
+
|
| 165 |
+
# Ads Shown: convert to int
|
| 166 |
+
if "Ads Shown" in df.columns:
|
| 167 |
+
df["Ads Shown"] = pd.to_numeric(df["Ads Shown"], errors="coerce").fillna(0).astype(int)
|
| 168 |
+
|
| 169 |
+
# 3. Add enhancement columns
|
| 170 |
+
print("π Adding analysis columns...")
|
| 171 |
+
|
| 172 |
+
df["Intent"] = df["Keyword"].apply(classify_search_intent)
|
| 173 |
+
df["Tail"] = df["Keyword"].apply(classify_keyword_tail)
|
| 174 |
+
df["Is Brand/Navigational"] = df["Keyword"].apply(lambda x: "Yes" if is_brand_query(x) else "No")
|
| 175 |
+
|
| 176 |
+
# 4. Reorder columns for better presentation
|
| 177 |
+
column_order = [
|
| 178 |
+
"Keyword",
|
| 179 |
+
"Intent",
|
| 180 |
+
"Tail",
|
| 181 |
+
"Is Brand/Navigational",
|
| 182 |
+
"Monthly Searches",
|
| 183 |
+
"Competition",
|
| 184 |
+
"Opportunity Score",
|
| 185 |
+
"Google Results",
|
| 186 |
+
"Ads Shown",
|
| 187 |
+
"Featured Snippet?",
|
| 188 |
+
"PAA Available?",
|
| 189 |
+
"Knowledge Graph?"
|
| 190 |
+
]
|
| 191 |
+
|
| 192 |
+
# Only include columns that exist in the dataframe
|
| 193 |
+
available_columns = [col for col in column_order if col in df.columns]
|
| 194 |
+
df = df[available_columns]
|
| 195 |
+
|
| 196 |
+
# 5. Final sort by Opportunity Score
|
| 197 |
+
df = df.sort_values(by="Opportunity Score", ascending=False).reset_index(drop=True)
|
| 198 |
+
|
| 199 |
+
print(f"β
Processing complete! {len(df)} keywords ready")
|
| 200 |
+
return df
|
| 201 |
+
|
| 202 |
+
def save_processed_results(df, seed_keyword, output_dir=OUTPUT_DIR):
|
| 203 |
+
"""Save processed results in multiple formats with metadata"""
|
| 204 |
+
|
| 205 |
+
# Create output directory
|
| 206 |
+
os.makedirs(output_dir, exist_ok=True)
|
| 207 |
+
|
| 208 |
+
# Generate safe filename from seed keyword
|
| 209 |
+
today = date.today().isoformat()
|
| 210 |
+
safe_seed = re.sub(r"[^\w\s-]", "", seed_keyword).strip().replace(" ", "_")[:50]
|
| 211 |
+
base_filename = f"keywords_{safe_seed}_{today}"
|
| 212 |
+
|
| 213 |
+
# File paths
|
| 214 |
+
csv_path = os.path.join(output_dir, f"{base_filename}.csv")
|
| 215 |
+
excel_path = os.path.join(output_dir, f"{base_filename}.xlsx")
|
| 216 |
+
meta_path = os.path.join(output_dir, f"{base_filename}.meta.json")
|
| 217 |
+
|
| 218 |
+
# Save CSV
|
| 219 |
+
df.to_csv(csv_path, index=False)
|
| 220 |
+
print(f"πΎ Saved CSV: {csv_path}")
|
| 221 |
+
|
| 222 |
+
# Save Excel with multiple sheets (if openpyxl is available)
|
| 223 |
+
if EXCEL_AVAILABLE:
|
| 224 |
+
try:
|
| 225 |
+
with pd.ExcelWriter(excel_path, engine="openpyxl") as writer:
|
| 226 |
+
# Top 50 sheet
|
| 227 |
+
df.head(50).to_excel(writer, sheet_name="Top_50", index=False)
|
| 228 |
+
# All results sheet
|
| 229 |
+
df.to_excel(writer, sheet_name="All_Keywords", index=False)
|
| 230 |
+
# Summary sheet
|
| 231 |
+
summary_data = {
|
| 232 |
+
"Metric": [
|
| 233 |
+
"Total Keywords",
|
| 234 |
+
"Informational Keywords",
|
| 235 |
+
"Commercial Keywords",
|
| 236 |
+
"Transactional Keywords",
|
| 237 |
+
"Navigational Keywords",
|
| 238 |
+
"Long-tail Keywords",
|
| 239 |
+
"Brand/Navigational Keywords"
|
| 240 |
+
],
|
| 241 |
+
"Count": [
|
| 242 |
+
len(df),
|
| 243 |
+
len(df[df["Intent"] == "informational"]),
|
| 244 |
+
len(df[df["Intent"] == "commercial"]),
|
| 245 |
+
len(df[df["Intent"] == "transactional"]),
|
| 246 |
+
len(df[df["Intent"] == "navigational"]),
|
| 247 |
+
len(df[df["Tail"] == "long-tail"]),
|
| 248 |
+
len(df[df["Is Brand/Navigational"] == "Yes"])
|
| 249 |
+
]
|
| 250 |
+
}
|
| 251 |
+
pd.DataFrame(summary_data).to_excel(writer, sheet_name="Summary", index=False)
|
| 252 |
+
|
| 253 |
+
print(f"π Saved Excel: {excel_path}")
|
| 254 |
+
except Exception as e:
|
| 255 |
+
print(f"β οΈ Could not save Excel file: {e}")
|
| 256 |
+
else:
|
| 257 |
+
print("π Excel export skipped (install openpyxl to enable)")
|
| 258 |
+
|
| 259 |
+
# Save metadata
|
| 260 |
+
metadata = {
|
| 261 |
+
"seed_keyword": seed_keyword,
|
| 262 |
+
"generated_at": datetime.utcnow().isoformat() + "Z",
|
| 263 |
+
"total_keywords": len(df),
|
| 264 |
+
"data_source": "SerpApi with heuristic search volumes",
|
| 265 |
+
"methodology": "Opportunity Score = log10(volume+1) / (competition + 0.01)",
|
| 266 |
+
"notes": [
|
| 267 |
+
"Brand/navigational queries are flagged for filtering",
|
| 268 |
+
"Search volumes are estimated - replace with real API data for production",
|
| 269 |
+
"Competition scores based on SERP feature analysis"
|
| 270 |
+
],
|
| 271 |
+
"intent_breakdown": {
|
| 272 |
+
"informational": int(len(df[df["Intent"] == "informational"])),
|
| 273 |
+
"commercial": int(len(df[df["Intent"] == "commercial"])),
|
| 274 |
+
"transactional": int(len(df[df["Intent"] == "transactional"])),
|
| 275 |
+
"navigational": int(len(df[df["Intent"] == "navigational"]))
|
| 276 |
+
},
|
| 277 |
+
"tail_breakdown": {
|
| 278 |
+
"short-tail": int(len(df[df["Tail"] == "short-tail"])),
|
| 279 |
+
"mid-tail": int(len(df[df["Tail"] == "mid-tail"])),
|
| 280 |
+
"long-tail": int(len(df[df["Tail"] == "long-tail"]))
|
| 281 |
+
}
|
| 282 |
+
}
|
| 283 |
+
|
| 284 |
+
with open(meta_path, "w", encoding="utf-8") as f:
|
| 285 |
+
json.dump(metadata, f, indent=2, ensure_ascii=False)
|
| 286 |
+
|
| 287 |
+
print(f"π Saved metadata: {meta_path}")
|
| 288 |
+
|
| 289 |
+
return csv_path, excel_path, meta_path
|
| 290 |
+
|
| 291 |
+
def display_results_preview(df, top_n=10):
|
| 292 |
+
"""Display a nice preview of the top results"""
|
| 293 |
+
|
| 294 |
+
if df.empty:
|
| 295 |
+
print("β No results to display!")
|
| 296 |
+
return
|
| 297 |
+
|
| 298 |
+
print(f"\nπ Top {min(top_n, len(df))} Keywords:")
|
| 299 |
+
|
| 300 |
+
# Prepare data for display
|
| 301 |
+
preview_df = df.head(top_n).copy()
|
| 302 |
+
|
| 303 |
+
# Format large numbers for readability
|
| 304 |
+
if "Monthly Searches" in preview_df.columns:
|
| 305 |
+
preview_df["Monthly Searches"] = preview_df["Monthly Searches"].apply(format_large_number)
|
| 306 |
+
|
| 307 |
+
if "Google Results" in preview_df.columns:
|
| 308 |
+
preview_df["Google Results"] = preview_df["Google Results"].apply(format_large_number)
|
| 309 |
+
|
| 310 |
+
# Display using tabulate if available
|
| 311 |
+
if TABULATE_AVAILABLE:
|
| 312 |
+
print(tabulate(preview_df, headers="keys", tablefmt="github", showindex=False))
|
| 313 |
+
else:
|
| 314 |
+
# Fallback display
|
| 315 |
+
for i, row in preview_df.iterrows():
|
| 316 |
+
print(f"{i+1}. {row['Keyword']} | Score: {row['Opportunity Score']} | "
|
| 317 |
+
f"Volume: {row['Monthly Searches']} | Competition: {row['Competition']} | "
|
| 318 |
+
f"Intent: {row['Intent']} | Tail: {row['Tail']}")
|
| 319 |
+
|
| 320 |
+
def postprocess_keywords(csv_file_path, seed_keyword):
|
| 321 |
+
"""
|
| 322 |
+
Main postprocessing function
|
| 323 |
+
Call this after your ranking.py generates the initial CSV
|
| 324 |
+
"""
|
| 325 |
+
|
| 326 |
+
print(f"π Starting postprocessing for: '{seed_keyword}'")
|
| 327 |
+
print(f"π Input file: {csv_file_path}")
|
| 328 |
+
|
| 329 |
+
try:
|
| 330 |
+
# Load the CSV from ranking.py
|
| 331 |
+
df = pd.read_csv(csv_file_path)
|
| 332 |
+
print(f"π Loaded {len(df)} keywords from CSV")
|
| 333 |
+
|
| 334 |
+
# Clean and process the data
|
| 335 |
+
processed_df = clean_and_process_dataframe(df, seed_keyword)
|
| 336 |
+
|
| 337 |
+
# Save in multiple formats
|
| 338 |
+
csv_path, excel_path, meta_path = save_processed_results(processed_df, seed_keyword)
|
| 339 |
+
|
| 340 |
+
# Display preview
|
| 341 |
+
display_results_preview(processed_df, top_n=10)
|
| 342 |
+
|
| 343 |
+
# Summary stats
|
| 344 |
+
print(f"\nπ Summary Statistics:")
|
| 345 |
+
print(f"β’ Total keywords analyzed: {len(processed_df)}")
|
| 346 |
+
print(f"β’ Long-tail opportunities: {len(processed_df[processed_df['Tail'] == 'long-tail'])}")
|
| 347 |
+
print(f"β’ Non-brand keywords: {len(processed_df[processed_df['Is Brand/Navigational'] == 'No'])}")
|
| 348 |
+
print(f"β’ High opportunity (score > 50): {len(processed_df[processed_df['Opportunity Score'] > 50])}")
|
| 349 |
+
|
| 350 |
+
return csv_path, excel_path, meta_path, processed_df
|
| 351 |
+
|
| 352 |
+
except Exception as e:
|
| 353 |
+
print(f"β Error during postprocessing: {e}")
|
| 354 |
+
raise
|
| 355 |
+
|
| 356 |
+
# Example usage
|
| 357 |
+
if __name__ == "__main__":
|
| 358 |
+
# Example: process a CSV file generated by ranking.py
|
| 359 |
+
input_csv = "best_keywords_2025-09-23.csv" # Replace with your actual file
|
| 360 |
+
seed_keyword = "global internship"
|
| 361 |
+
|
| 362 |
+
if os.path.exists(input_csv):
|
| 363 |
+
postprocess_keywords(input_csv, seed_keyword)
|
| 364 |
+
else:
|
| 365 |
+
print(f"β Input file not found: {input_csv}")
|
| 366 |
+
print("Run your ranking.py script first to generate the initial CSV")
|
|
@@ -0,0 +1,569 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Professional Keyword Research Tool
|
| 3 |
+
|
| 4 |
+
A comprehensive tool for analyzing keyword opportunities using SerpApi.
|
| 5 |
+
Calculates competition scores and opportunity rankings based on SERP analysis.
|
| 6 |
+
|
| 7 |
+
Requirements:
|
| 8 |
+
pip install serpapi tabulate python-dotenv
|
| 9 |
+
|
| 10 |
+
Setup:
|
| 11 |
+
1. Create a .env file with your SerpApi key: SERPAPI_KEY=your_key_here
|
| 12 |
+
2. Run the script with your desired seed keyword
|
| 13 |
+
"""
|
| 14 |
+
|
| 15 |
+
import os
|
| 16 |
+
import math
|
| 17 |
+
import csv
|
| 18 |
+
import re
|
| 19 |
+
import logging
|
| 20 |
+
from datetime import date
|
| 21 |
+
from typing import List, Dict, Optional, Tuple, Any
|
| 22 |
+
from dataclasses import dataclass
|
| 23 |
+
from dotenv import load_dotenv
|
| 24 |
+
from serpapi import GoogleSearch
|
| 25 |
+
|
| 26 |
+
# Optional dependency for better table formatting
|
| 27 |
+
try:
|
| 28 |
+
from tabulate import tabulate
|
| 29 |
+
HAS_TABULATE = True
|
| 30 |
+
except ImportError:
|
| 31 |
+
HAS_TABULATE = False
|
| 32 |
+
print("π‘ Tip: Install 'tabulate' for prettier output: pip install tabulate")
|
| 33 |
+
|
| 34 |
+
# Configure logging
|
| 35 |
+
logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
|
| 36 |
+
logger = logging.getLogger(__name__)
|
| 37 |
+
|
| 38 |
+
|
| 39 |
+
@dataclass
|
| 40 |
+
class KeywordMetrics:
|
| 41 |
+
"""Container for keyword analysis results."""
|
| 42 |
+
keyword: str
|
| 43 |
+
monthly_searches: int
|
| 44 |
+
competition_score: float
|
| 45 |
+
opportunity_score: float
|
| 46 |
+
total_results: int
|
| 47 |
+
ads_count: int
|
| 48 |
+
has_featured_snippet: bool
|
| 49 |
+
has_people_also_ask: bool
|
| 50 |
+
has_knowledge_graph: bool
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
class Config:
|
| 54 |
+
"""Configuration settings for the keyword research tool."""
|
| 55 |
+
|
| 56 |
+
def __init__(self):
|
| 57 |
+
load_dotenv()
|
| 58 |
+
self.serpapi_key = os.getenv("SERPAPI_KEY")
|
| 59 |
+
self.default_location = "United States"
|
| 60 |
+
self.results_per_query = 10
|
| 61 |
+
self.max_related_keywords = 150
|
| 62 |
+
self.top_keywords_to_save = 50
|
| 63 |
+
self.progress_update_interval = 10
|
| 64 |
+
|
| 65 |
+
if not self.serpapi_key:
|
| 66 |
+
raise ValueError("SERPAPI_KEY not found in environment variables")
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
class CompetitionCalculator:
|
| 70 |
+
"""Calculates keyword competition scores based on SERP features."""
|
| 71 |
+
|
| 72 |
+
# Scoring weights for different competition factors
|
| 73 |
+
WEIGHTS = {
|
| 74 |
+
'total_results': 0.50,
|
| 75 |
+
'ads': 0.25,
|
| 76 |
+
'featured_snippet': 0.15,
|
| 77 |
+
'people_also_ask': 0.07,
|
| 78 |
+
'knowledge_graph': 0.03
|
| 79 |
+
}
|
| 80 |
+
|
| 81 |
+
@staticmethod
|
| 82 |
+
def extract_total_results(search_info: Dict[str, Any]) -> int:
|
| 83 |
+
"""
|
| 84 |
+
Extract total results count from SerpApi response.
|
| 85 |
+
|
| 86 |
+
Args:
|
| 87 |
+
search_info: Search information dictionary from SerpApi
|
| 88 |
+
|
| 89 |
+
Returns:
|
| 90 |
+
Total number of results as integer, 0 if not found
|
| 91 |
+
"""
|
| 92 |
+
if not search_info:
|
| 93 |
+
return 0
|
| 94 |
+
|
| 95 |
+
# Try different possible field names
|
| 96 |
+
total = (search_info.get("total_results") or
|
| 97 |
+
search_info.get("total_results_raw") or
|
| 98 |
+
search_info.get("total"))
|
| 99 |
+
|
| 100 |
+
if isinstance(total, int):
|
| 101 |
+
return total
|
| 102 |
+
|
| 103 |
+
if isinstance(total, str):
|
| 104 |
+
# Extract only digits (remove commas, spaces, etc.)
|
| 105 |
+
numbers_only = re.sub(r"[^\d]", "", total)
|
| 106 |
+
try:
|
| 107 |
+
return int(numbers_only) if numbers_only else 0
|
| 108 |
+
except ValueError:
|
| 109 |
+
return 0
|
| 110 |
+
|
| 111 |
+
return 0
|
| 112 |
+
|
| 113 |
+
def calculate_score(self, search_results: Dict[str, Any]) -> Tuple[float, Dict[str, Any]]:
|
| 114 |
+
"""
|
| 115 |
+
Calculate competition score based on SERP features.
|
| 116 |
+
|
| 117 |
+
Args:
|
| 118 |
+
search_results: Complete search results from SerpApi
|
| 119 |
+
|
| 120 |
+
Returns:
|
| 121 |
+
Tuple of (competition_score, analysis_breakdown)
|
| 122 |
+
Score ranges from 0-1 where 1 = very competitive
|
| 123 |
+
"""
|
| 124 |
+
search_info = search_results.get("search_information", {})
|
| 125 |
+
|
| 126 |
+
# Factor 1: Total number of results (normalized using log scale)
|
| 127 |
+
total_results = self.extract_total_results(search_info)
|
| 128 |
+
normalized_results = min(math.log10(total_results + 1) / 7, 1.0)
|
| 129 |
+
|
| 130 |
+
# Factor 2: Number of ads (more ads = more competition)
|
| 131 |
+
ads = search_results.get("ads_results", [])
|
| 132 |
+
ads_count = len(ads) if ads else 0
|
| 133 |
+
ads_score = min(ads_count / 3, 1.0)
|
| 134 |
+
|
| 135 |
+
# Factor 3: SERP features that make ranking more difficult
|
| 136 |
+
has_featured_snippet = bool(
|
| 137 |
+
search_results.get("featured_snippet") or
|
| 138 |
+
search_results.get("answer_box")
|
| 139 |
+
)
|
| 140 |
+
|
| 141 |
+
has_people_also_ask = bool(
|
| 142 |
+
search_results.get("related_questions") or
|
| 143 |
+
search_results.get("people_also_ask")
|
| 144 |
+
)
|
| 145 |
+
|
| 146 |
+
has_knowledge_graph = bool(search_results.get("knowledge_graph"))
|
| 147 |
+
|
| 148 |
+
# Calculate weighted competition score
|
| 149 |
+
competition_score = (
|
| 150 |
+
self.WEIGHTS['total_results'] * normalized_results +
|
| 151 |
+
self.WEIGHTS['ads'] * ads_score +
|
| 152 |
+
self.WEIGHTS['featured_snippet'] * has_featured_snippet +
|
| 153 |
+
self.WEIGHTS['people_also_ask'] * has_people_also_ask +
|
| 154 |
+
self.WEIGHTS['knowledge_graph'] * has_knowledge_graph
|
| 155 |
+
)
|
| 156 |
+
|
| 157 |
+
# Ensure score stays within bounds
|
| 158 |
+
competition_score = max(0.0, min(1.0, competition_score))
|
| 159 |
+
|
| 160 |
+
# Create analysis breakdown for reporting
|
| 161 |
+
breakdown = {
|
| 162 |
+
"total_results": total_results,
|
| 163 |
+
"ads_count": ads_count,
|
| 164 |
+
"has_featured_snippet": has_featured_snippet,
|
| 165 |
+
"has_people_also_ask": has_people_also_ask,
|
| 166 |
+
"has_knowledge_graph": has_knowledge_graph
|
| 167 |
+
}
|
| 168 |
+
|
| 169 |
+
return competition_score, breakdown
|
| 170 |
+
|
| 171 |
+
|
| 172 |
+
class SearchVolumeEstimator:
|
| 173 |
+
"""Handles search volume estimation and integration with volume APIs."""
|
| 174 |
+
|
| 175 |
+
def get_search_volume(self, keyword: str) -> Optional[int]:
|
| 176 |
+
"""
|
| 177 |
+
Get search volume for a keyword.
|
| 178 |
+
|
| 179 |
+
TODO: Integrate with DataForSEO, Google Keyword Planner, or similar API
|
| 180 |
+
|
| 181 |
+
Args:
|
| 182 |
+
keyword: The keyword to get volume for
|
| 183 |
+
|
| 184 |
+
Returns:
|
| 185 |
+
Monthly search volume or None if unavailable
|
| 186 |
+
"""
|
| 187 |
+
# Placeholder for real volume API integration
|
| 188 |
+
# Examples of what you might implement:
|
| 189 |
+
# - return self._call_dataforseo_api(keyword)
|
| 190 |
+
# - return self._call_google_ads_api(keyword)
|
| 191 |
+
return None
|
| 192 |
+
|
| 193 |
+
def estimate_volume(self, keyword: str) -> int:
|
| 194 |
+
"""
|
| 195 |
+
Estimate search volume using simple heuristics.
|
| 196 |
+
|
| 197 |
+
Args:
|
| 198 |
+
keyword: The keyword to estimate volume for
|
| 199 |
+
|
| 200 |
+
Returns:
|
| 201 |
+
Estimated monthly search volume
|
| 202 |
+
"""
|
| 203 |
+
# Simple heuristic: longer phrases typically have lower volume
|
| 204 |
+
word_count = len(keyword.split())
|
| 205 |
+
# This is rough estimation - replace with real data when possible
|
| 206 |
+
return max(10, 10000 // (word_count + 1))
|
| 207 |
+
|
| 208 |
+
|
| 209 |
+
class KeywordDiscovery:
|
| 210 |
+
"""Discovers related keywords from search results."""
|
| 211 |
+
|
| 212 |
+
def __init__(self, config: Config):
|
| 213 |
+
self.config = config
|
| 214 |
+
|
| 215 |
+
def find_related_keywords(self, seed_keyword: str) -> List[str]:
|
| 216 |
+
"""
|
| 217 |
+
Find related keywords from Google's suggestions and related searches.
|
| 218 |
+
|
| 219 |
+
Args:
|
| 220 |
+
seed_keyword: The base keyword to find related terms for
|
| 221 |
+
|
| 222 |
+
Returns:
|
| 223 |
+
List of related keyword candidates
|
| 224 |
+
"""
|
| 225 |
+
logger.info(f"Discovering related keywords for: '{seed_keyword}'")
|
| 226 |
+
|
| 227 |
+
search_params = {
|
| 228 |
+
"engine": "google",
|
| 229 |
+
"q": seed_keyword,
|
| 230 |
+
"api_key": self.config.serpapi_key,
|
| 231 |
+
"hl": "en",
|
| 232 |
+
"gl": "us"
|
| 233 |
+
}
|
| 234 |
+
|
| 235 |
+
try:
|
| 236 |
+
search = GoogleSearch(search_params)
|
| 237 |
+
results = search.get_dict()
|
| 238 |
+
except Exception as e:
|
| 239 |
+
logger.error(f"Failed to get related keywords: {e}")
|
| 240 |
+
return []
|
| 241 |
+
|
| 242 |
+
keyword_candidates = set()
|
| 243 |
+
|
| 244 |
+
# Extract keywords from different sources
|
| 245 |
+
self._extract_from_related_searches(results, keyword_candidates)
|
| 246 |
+
self._extract_from_people_also_ask(results, keyword_candidates)
|
| 247 |
+
self._extract_from_organic_titles(results, keyword_candidates)
|
| 248 |
+
|
| 249 |
+
# Convert to list and limit results
|
| 250 |
+
final_keywords = list(keyword_candidates)[:self.config.max_related_keywords]
|
| 251 |
+
logger.info(f"Found {len(final_keywords)} keyword candidates")
|
| 252 |
+
|
| 253 |
+
return final_keywords
|
| 254 |
+
|
| 255 |
+
def _extract_from_related_searches(self, results: Dict[str, Any],
|
| 256 |
+
candidates: set) -> None:
|
| 257 |
+
"""Extract keywords from 'related searches' section."""
|
| 258 |
+
related_searches = results.get("related_searches", [])
|
| 259 |
+
for item in related_searches:
|
| 260 |
+
query = item.get("query") or item.get("suggestion")
|
| 261 |
+
if query and len(query.strip()) > 0:
|
| 262 |
+
candidates.add(query.strip())
|
| 263 |
+
|
| 264 |
+
def _extract_from_people_also_ask(self, results: Dict[str, Any],
|
| 265 |
+
candidates: set) -> None:
|
| 266 |
+
"""Extract keywords from 'People also ask' questions."""
|
| 267 |
+
related_questions = results.get("related_questions", [])
|
| 268 |
+
for item in related_questions:
|
| 269 |
+
question = item.get("question") or item.get("query")
|
| 270 |
+
if question and len(question.strip()) > 0:
|
| 271 |
+
candidates.add(question.strip())
|
| 272 |
+
|
| 273 |
+
def _extract_from_organic_titles(self, results: Dict[str, Any],
|
| 274 |
+
candidates: set) -> None:
|
| 275 |
+
"""Extract potential keywords from organic result titles."""
|
| 276 |
+
organic_results = results.get("organic_results", [])
|
| 277 |
+
for result in organic_results[:10]: # Only top 10 results
|
| 278 |
+
title = result.get("title", "")
|
| 279 |
+
if title and len(title.strip()) > 0:
|
| 280 |
+
candidates.add(title.strip())
|
| 281 |
+
|
| 282 |
+
|
| 283 |
+
class KeywordAnalyzer:
|
| 284 |
+
"""Main class for analyzing keywords and calculating opportunity scores."""
|
| 285 |
+
|
| 286 |
+
def __init__(self, config: Config):
|
| 287 |
+
self.config = config
|
| 288 |
+
self.competition_calc = CompetitionCalculator()
|
| 289 |
+
self.volume_estimator = SearchVolumeEstimator()
|
| 290 |
+
self.keyword_discovery = KeywordDiscovery(config)
|
| 291 |
+
|
| 292 |
+
def search_google(self, keyword: str) -> Dict[str, Any]:
|
| 293 |
+
"""
|
| 294 |
+
Fetch search results for a keyword using SerpApi.
|
| 295 |
+
|
| 296 |
+
Args:
|
| 297 |
+
keyword: The keyword to search for
|
| 298 |
+
|
| 299 |
+
Returns:
|
| 300 |
+
Search results dictionary from SerpApi
|
| 301 |
+
"""
|
| 302 |
+
search_params = {
|
| 303 |
+
"engine": "google",
|
| 304 |
+
"q": keyword,
|
| 305 |
+
"api_key": self.config.serpapi_key,
|
| 306 |
+
"hl": "en",
|
| 307 |
+
"gl": "us",
|
| 308 |
+
"num": self.config.results_per_query
|
| 309 |
+
}
|
| 310 |
+
|
| 311 |
+
try:
|
| 312 |
+
search = GoogleSearch(search_params)
|
| 313 |
+
return search.get_dict()
|
| 314 |
+
except Exception as e:
|
| 315 |
+
logger.error(f"Search failed for '{keyword}': {e}")
|
| 316 |
+
return {}
|
| 317 |
+
|
| 318 |
+
def analyze_keyword(self, keyword: str, use_volume_api: bool = False) -> Optional[KeywordMetrics]:
|
| 319 |
+
"""
|
| 320 |
+
Analyze a single keyword and calculate its opportunity score.
|
| 321 |
+
|
| 322 |
+
Args:
|
| 323 |
+
keyword: The keyword to analyze
|
| 324 |
+
use_volume_api: Whether to use real volume API (not implemented yet)
|
| 325 |
+
|
| 326 |
+
Returns:
|
| 327 |
+
KeywordMetrics object or None if analysis failed
|
| 328 |
+
"""
|
| 329 |
+
# Get search results
|
| 330 |
+
search_results = self.search_google(keyword)
|
| 331 |
+
if not search_results:
|
| 332 |
+
return None
|
| 333 |
+
|
| 334 |
+
# Calculate competition score
|
| 335 |
+
competition_score, breakdown = self.competition_calc.calculate_score(search_results)
|
| 336 |
+
|
| 337 |
+
# Get or estimate search volume
|
| 338 |
+
if use_volume_api:
|
| 339 |
+
search_volume = self.volume_estimator.get_search_volume(keyword)
|
| 340 |
+
else:
|
| 341 |
+
search_volume = None
|
| 342 |
+
|
| 343 |
+
if search_volume is None:
|
| 344 |
+
search_volume = self.volume_estimator.estimate_volume(keyword)
|
| 345 |
+
|
| 346 |
+
# Calculate opportunity score
|
| 347 |
+
# Higher volume = better, lower competition = better
|
| 348 |
+
volume_score = math.log10(search_volume + 1)
|
| 349 |
+
opportunity_score = volume_score / (competition_score + 0.01) # Avoid division by zero
|
| 350 |
+
|
| 351 |
+
return KeywordMetrics(
|
| 352 |
+
keyword=keyword,
|
| 353 |
+
monthly_searches=search_volume,
|
| 354 |
+
competition_score=round(competition_score, 4),
|
| 355 |
+
opportunity_score=round(opportunity_score, 2),
|
| 356 |
+
total_results=breakdown["total_results"],
|
| 357 |
+
ads_count=breakdown["ads_count"],
|
| 358 |
+
has_featured_snippet=breakdown["has_featured_snippet"],
|
| 359 |
+
has_people_also_ask=breakdown["has_people_also_ask"],
|
| 360 |
+
has_knowledge_graph=breakdown["has_knowledge_graph"]
|
| 361 |
+
)
|
| 362 |
+
|
| 363 |
+
def analyze_keywords_batch(self, keywords: List[str],
|
| 364 |
+
use_volume_api: bool = False) -> List[KeywordMetrics]:
|
| 365 |
+
"""
|
| 366 |
+
Analyze multiple keywords and return sorted results.
|
| 367 |
+
|
| 368 |
+
Args:
|
| 369 |
+
keywords: List of keywords to analyze
|
| 370 |
+
use_volume_api: Whether to use real volume API
|
| 371 |
+
|
| 372 |
+
Returns:
|
| 373 |
+
List of KeywordMetrics sorted by opportunity score (highest first)
|
| 374 |
+
"""
|
| 375 |
+
logger.info(f"Analyzing {len(keywords)} keywords...")
|
| 376 |
+
analyzed_keywords = []
|
| 377 |
+
|
| 378 |
+
for i, keyword in enumerate(keywords, 1):
|
| 379 |
+
if i % self.config.progress_update_interval == 0:
|
| 380 |
+
logger.info(f"Progress: {i}/{len(keywords)} keywords processed")
|
| 381 |
+
|
| 382 |
+
metrics = self.analyze_keyword(keyword, use_volume_api)
|
| 383 |
+
if metrics:
|
| 384 |
+
analyzed_keywords.append(metrics)
|
| 385 |
+
|
| 386 |
+
# Sort by opportunity score (highest first)
|
| 387 |
+
analyzed_keywords.sort(key=lambda x: x.opportunity_score, reverse=True)
|
| 388 |
+
|
| 389 |
+
logger.info(f"Analysis complete! {len(analyzed_keywords)} keywords analyzed")
|
| 390 |
+
return analyzed_keywords
|
| 391 |
+
|
| 392 |
+
|
| 393 |
+
class ResultsExporter:
|
| 394 |
+
"""Handles exporting results to various formats."""
|
| 395 |
+
|
| 396 |
+
def save_to_csv(self, keyword_metrics: List[KeywordMetrics],
|
| 397 |
+
base_filename: str = "keyword_analysis",
|
| 398 |
+
top_count: int = 50) -> Optional[str]:
|
| 399 |
+
"""
|
| 400 |
+
Save keyword analysis results to CSV file.
|
| 401 |
+
|
| 402 |
+
Args:
|
| 403 |
+
keyword_metrics: List of analyzed keyword metrics
|
| 404 |
+
base_filename: Base name for the output file
|
| 405 |
+
top_count: Number of top results to save
|
| 406 |
+
|
| 407 |
+
Returns:
|
| 408 |
+
Filename if successful, None if failed
|
| 409 |
+
"""
|
| 410 |
+
if not keyword_metrics:
|
| 411 |
+
logger.warning("No data to save!")
|
| 412 |
+
return None
|
| 413 |
+
|
| 414 |
+
# Create filename with timestamp
|
| 415 |
+
today = date.today()
|
| 416 |
+
filename = f"{base_filename}_{today}.csv"
|
| 417 |
+
|
| 418 |
+
try:
|
| 419 |
+
with open(filename, "w", newline='', encoding='utf-8') as file:
|
| 420 |
+
writer = csv.writer(file)
|
| 421 |
+
|
| 422 |
+
# Write header
|
| 423 |
+
headers = [
|
| 424 |
+
"Keyword", "Monthly Searches", "Competition Score",
|
| 425 |
+
"Opportunity Score", "Total Results", "Ads Count",
|
| 426 |
+
"Featured Snippet", "People Also Ask", "Knowledge Graph"
|
| 427 |
+
]
|
| 428 |
+
writer.writerow(headers)
|
| 429 |
+
|
| 430 |
+
# Write data rows
|
| 431 |
+
for metrics in keyword_metrics[:top_count]:
|
| 432 |
+
row = [
|
| 433 |
+
metrics.keyword,
|
| 434 |
+
metrics.monthly_searches,
|
| 435 |
+
metrics.competition_score,
|
| 436 |
+
metrics.opportunity_score,
|
| 437 |
+
metrics.total_results,
|
| 438 |
+
metrics.ads_count,
|
| 439 |
+
"Yes" if metrics.has_featured_snippet else "No",
|
| 440 |
+
"Yes" if metrics.has_people_also_ask else "No",
|
| 441 |
+
"Yes" if metrics.has_knowledge_graph else "No"
|
| 442 |
+
]
|
| 443 |
+
writer.writerow(row)
|
| 444 |
+
|
| 445 |
+
saved_count = min(top_count, len(keyword_metrics))
|
| 446 |
+
logger.info(f"β
Results saved to {filename} ({saved_count} keywords)")
|
| 447 |
+
return filename
|
| 448 |
+
|
| 449 |
+
except Exception as e:
|
| 450 |
+
logger.error(f"Failed to save CSV: {e}")
|
| 451 |
+
return None
|
| 452 |
+
|
| 453 |
+
def display_top_results(self, keyword_metrics: List[KeywordMetrics],
|
| 454 |
+
top_count: int = 5) -> None:
|
| 455 |
+
"""
|
| 456 |
+
Display top results in formatted table.
|
| 457 |
+
|
| 458 |
+
Args:
|
| 459 |
+
keyword_metrics: List of analyzed keyword metrics
|
| 460 |
+
top_count: Number of top results to display
|
| 461 |
+
"""
|
| 462 |
+
if not keyword_metrics:
|
| 463 |
+
logger.warning("No results to display!")
|
| 464 |
+
return
|
| 465 |
+
|
| 466 |
+
top_results = keyword_metrics[:top_count]
|
| 467 |
+
|
| 468 |
+
print(f"\nπ Top {len(top_results)} Keyword Opportunities:")
|
| 469 |
+
|
| 470 |
+
if HAS_TABULATE:
|
| 471 |
+
# Create table data
|
| 472 |
+
table_data = []
|
| 473 |
+
for metrics in top_results:
|
| 474 |
+
table_data.append([
|
| 475 |
+
metrics.keyword,
|
| 476 |
+
f"{metrics.monthly_searches:,}",
|
| 477 |
+
f"{metrics.competition_score:.3f}",
|
| 478 |
+
f"{metrics.opportunity_score:.2f}",
|
| 479 |
+
f"{metrics.total_results:,}",
|
| 480 |
+
metrics.ads_count
|
| 481 |
+
])
|
| 482 |
+
|
| 483 |
+
headers = ["Keyword", "Volume", "Competition", "Score", "Results", "Ads"]
|
| 484 |
+
print(tabulate(table_data, headers=headers, tablefmt="pretty"))
|
| 485 |
+
else:
|
| 486 |
+
# Fallback to simple format
|
| 487 |
+
for i, metrics in enumerate(top_results, 1):
|
| 488 |
+
print(f"{i}. {metrics.keyword}")
|
| 489 |
+
print(f" Score: {metrics.opportunity_score}, "
|
| 490 |
+
f"Volume: {metrics.monthly_searches:,}, "
|
| 491 |
+
f"Competition: {metrics.competition_score:.3f}")
|
| 492 |
+
|
| 493 |
+
|
| 494 |
+
class KeywordResearchTool:
|
| 495 |
+
"""Main application class that orchestrates the keyword research process."""
|
| 496 |
+
|
| 497 |
+
def __init__(self, seed_keyword: str):
|
| 498 |
+
self.seed_keyword = seed_keyword
|
| 499 |
+
self.config = Config()
|
| 500 |
+
self.analyzer = KeywordAnalyzer(self.config)
|
| 501 |
+
self.exporter = ResultsExporter()
|
| 502 |
+
|
| 503 |
+
def run_analysis(self, use_volume_api: bool = False) -> None:
|
| 504 |
+
"""
|
| 505 |
+
Run the complete keyword research analysis.
|
| 506 |
+
|
| 507 |
+
Args:
|
| 508 |
+
use_volume_api: Whether to use real volume API (requires implementation)
|
| 509 |
+
"""
|
| 510 |
+
print("π Starting keyword research analysis...")
|
| 511 |
+
print(f"Seed keyword: '{self.seed_keyword}'")
|
| 512 |
+
|
| 513 |
+
try:
|
| 514 |
+
# Step 1: Discover related keywords
|
| 515 |
+
related_keywords = self.analyzer.keyword_discovery.find_related_keywords(
|
| 516 |
+
self.seed_keyword
|
| 517 |
+
)
|
| 518 |
+
|
| 519 |
+
if not related_keywords:
|
| 520 |
+
logger.error("No keyword candidates found. Check your SerpApi key.")
|
| 521 |
+
return
|
| 522 |
+
|
| 523 |
+
# Step 2: Analyze keywords and calculate scores
|
| 524 |
+
analyzed_keywords = self.analyzer.analyze_keywords_batch(
|
| 525 |
+
related_keywords, use_volume_api
|
| 526 |
+
)
|
| 527 |
+
|
| 528 |
+
if not analyzed_keywords:
|
| 529 |
+
logger.error("No keywords were successfully analyzed.")
|
| 530 |
+
return
|
| 531 |
+
|
| 532 |
+
# Step 3: Save results to file
|
| 533 |
+
self.exporter.save_to_csv(
|
| 534 |
+
analyzed_keywords,
|
| 535 |
+
base_filename=f"keywords_{self.seed_keyword.replace(' ', '_')}",
|
| 536 |
+
top_count=self.config.top_keywords_to_save
|
| 537 |
+
)
|
| 538 |
+
|
| 539 |
+
# Step 4: Display top results
|
| 540 |
+
self.exporter.display_top_results(analyzed_keywords, top_count=5)
|
| 541 |
+
|
| 542 |
+
except Exception as e:
|
| 543 |
+
logger.error(f"Analysis failed: {e}")
|
| 544 |
+
raise
|
| 545 |
+
|
| 546 |
+
|
| 547 |
+
def main():
|
| 548 |
+
"""Main entry point for the keyword research tool."""
|
| 549 |
+
# Configuration
|
| 550 |
+
SEED_KEYWORD = "global internship"
|
| 551 |
+
USE_VOLUME_API = False # Set to True when you implement get_search_volume()
|
| 552 |
+
|
| 553 |
+
try:
|
| 554 |
+
tool = KeywordResearchTool(SEED_KEYWORD)
|
| 555 |
+
tool.run_analysis(use_volume_api=USE_VOLUME_API)
|
| 556 |
+
|
| 557 |
+
except ValueError as e:
|
| 558 |
+
logger.error(f"Configuration error: {e}")
|
| 559 |
+
print("\nπ‘ Setup Instructions:")
|
| 560 |
+
print("1. Create a .env file in the same directory")
|
| 561 |
+
print("2. Add your SerpApi key: SERPAPI_KEY=your_key_here")
|
| 562 |
+
print("3. Get your free key at: https://serpapi.com/")
|
| 563 |
+
|
| 564 |
+
except Exception as e:
|
| 565 |
+
logger.error(f"Unexpected error: {e}")
|
| 566 |
+
|
| 567 |
+
|
| 568 |
+
if __name__ == "__main__":
|
| 569 |
+
main()
|
|
@@ -0,0 +1,60 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
altair==5.5.0
|
| 2 |
+
annotated-types==0.7.0
|
| 3 |
+
anyio==4.11.0
|
| 4 |
+
attrs==25.3.0
|
| 5 |
+
blinker==1.9.0
|
| 6 |
+
cachetools==6.2.0
|
| 7 |
+
certifi==2025.8.3
|
| 8 |
+
charset-normalizer==3.4.3
|
| 9 |
+
click==8.1.8
|
| 10 |
+
colorama==0.4.6
|
| 11 |
+
et_xmlfile==2.0.0
|
| 12 |
+
exceptiongroup==1.3.0
|
| 13 |
+
fastapi==0.118.0
|
| 14 |
+
gitdb==4.0.12
|
| 15 |
+
GitPython==3.1.45
|
| 16 |
+
google_search_results==2.4.2
|
| 17 |
+
gunicorn==23.0.0
|
| 18 |
+
h11==0.16.0
|
| 19 |
+
httptools==0.6.4
|
| 20 |
+
idna==3.10
|
| 21 |
+
Jinja2==3.1.6
|
| 22 |
+
jsonschema==4.25.1
|
| 23 |
+
jsonschema-specifications==2025.9.1
|
| 24 |
+
MarkupSafe==3.0.3
|
| 25 |
+
narwhals==2.6.0
|
| 26 |
+
numpy==2.0.2
|
| 27 |
+
openpyxl==3.1.5
|
| 28 |
+
packaging==25.0
|
| 29 |
+
pandas==2.3.2
|
| 30 |
+
pillow==11.3.0
|
| 31 |
+
plotly==6.3.0
|
| 32 |
+
protobuf==6.32.1
|
| 33 |
+
pyarrow==21.0.0
|
| 34 |
+
pydantic==2.11.9
|
| 35 |
+
pydantic_core==2.33.2
|
| 36 |
+
pydeck==0.9.1
|
| 37 |
+
python-dateutil==2.9.0.post0
|
| 38 |
+
python-dotenv==1.1.1
|
| 39 |
+
pytz==2025.2
|
| 40 |
+
PyYAML==6.0.3
|
| 41 |
+
referencing==0.36.2
|
| 42 |
+
requests==2.32.5
|
| 43 |
+
rpds-py==0.27.1
|
| 44 |
+
six==1.17.0
|
| 45 |
+
smmap==5.0.2
|
| 46 |
+
sniffio==1.3.1
|
| 47 |
+
starlette==0.48.0
|
| 48 |
+
streamlit==1.50.0
|
| 49 |
+
tabulate==0.9.0
|
| 50 |
+
tenacity==9.1.2
|
| 51 |
+
toml==0.10.2
|
| 52 |
+
tornado==6.5.2
|
| 53 |
+
typing-inspection==0.4.1
|
| 54 |
+
typing_extensions==4.15.0
|
| 55 |
+
tzdata==2025.2
|
| 56 |
+
urllib3==2.5.0
|
| 57 |
+
uvicorn==0.37.0
|
| 58 |
+
watchdog==6.0.0
|
| 59 |
+
watchfiles==1.1.0
|
| 60 |
+
websockets==15.0.1
|
|
@@ -0,0 +1,625 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# src/server.py
|
| 2 |
+
"""
|
| 3 |
+
Free-Plan Friendly SEO Keyword Research API
|
| 4 |
+
Optimized to minimize SerpAPI calls while maximizing keyword discovery
|
| 5 |
+
|
| 6 |
+
Key Features:
|
| 7 |
+
- Configurable keyword count (5, 10, 20, 50, etc.)
|
| 8 |
+
- Only 1 SerpAPI call per seed for candidate collection
|
| 9 |
+
- Mock scoring for initial ranking
|
| 10 |
+
- Optional SerpAPI verification for top N results
|
| 11 |
+
- Strict mode for free plan protection (max 5 API calls per request)
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
import os
|
| 15 |
+
import logging
|
| 16 |
+
import time
|
| 17 |
+
import math
|
| 18 |
+
import re
|
| 19 |
+
import io
|
| 20 |
+
from typing import List, Dict, Any, Optional, Tuple
|
| 21 |
+
from datetime import datetime
|
| 22 |
+
from collections import Counter
|
| 23 |
+
|
| 24 |
+
from fastapi import FastAPI, HTTPException, Query, Request
|
| 25 |
+
from fastapi.middleware.cors import CORSMiddleware
|
| 26 |
+
from fastapi.responses import JSONResponse, StreamingResponse
|
| 27 |
+
from pydantic import BaseModel, Field
|
| 28 |
+
from dotenv import load_dotenv
|
| 29 |
+
|
| 30 |
+
try:
|
| 31 |
+
import pandas as pd
|
| 32 |
+
HAS_PANDAS = True
|
| 33 |
+
except ImportError:
|
| 34 |
+
HAS_PANDAS = False
|
| 35 |
+
|
| 36 |
+
try:
|
| 37 |
+
from serpapi import GoogleSearch
|
| 38 |
+
HAS_SERPAPI = True
|
| 39 |
+
except ImportError:
|
| 40 |
+
try:
|
| 41 |
+
from google_search_results import GoogleSearch
|
| 42 |
+
HAS_SERPAPI = True
|
| 43 |
+
except ImportError:
|
| 44 |
+
HAS_SERPAPI = False
|
| 45 |
+
|
| 46 |
+
# Load environment
|
| 47 |
+
load_dotenv()
|
| 48 |
+
|
| 49 |
+
# Configure logging
|
| 50 |
+
logging.basicConfig(
|
| 51 |
+
level=logging.INFO,
|
| 52 |
+
format='%(asctime)s - %(levelname)s - %(message)s'
|
| 53 |
+
)
|
| 54 |
+
logger = logging.getLogger(__name__)
|
| 55 |
+
|
| 56 |
+
# Initialize FastAPI
|
| 57 |
+
app = FastAPI(
|
| 58 |
+
title="Free-Plan Friendly SEO Keyword API",
|
| 59 |
+
description="Efficient keyword research optimized for SerpAPI free plan",
|
| 60 |
+
version="4.0.0",
|
| 61 |
+
docs_url="/docs"
|
| 62 |
+
)
|
| 63 |
+
|
| 64 |
+
# CORS
|
| 65 |
+
app.add_middleware(
|
| 66 |
+
CORSMiddleware,
|
| 67 |
+
allow_origins=["*"],
|
| 68 |
+
allow_credentials=True,
|
| 69 |
+
allow_methods=["GET", "POST", "OPTIONS"],
|
| 70 |
+
allow_headers=["*"],
|
| 71 |
+
)
|
| 72 |
+
|
| 73 |
+
# Configuration
|
| 74 |
+
SERPAPI_KEY = os.getenv("SERPAPI_KEY")
|
| 75 |
+
API_AUTH_KEY = os.getenv("API_AUTH_KEY")
|
| 76 |
+
USE_SERPAPI_STRICT_MODE = os.getenv("USE_SERPAPI_STRICT_MODE", "true").lower() == "true"
|
| 77 |
+
MAX_SERPAPI_CALLS_STRICT = 5 # Maximum API calls in strict mode
|
| 78 |
+
MAX_SERPAPI_CALLS_NORMAL = 20 # Maximum API calls in normal mode
|
| 79 |
+
|
| 80 |
+
# Rate limiting
|
| 81 |
+
REQUEST_TIMES = {}
|
| 82 |
+
RATE_LIMIT_WINDOW = 60
|
| 83 |
+
RATE_LIMIT_MAX_REQUESTS = 30
|
| 84 |
+
|
| 85 |
+
# Request counter for monitoring
|
| 86 |
+
API_CALL_COUNTER = {"total": 0, "session_start": time.time()}
|
| 87 |
+
|
| 88 |
+
class KeywordResponse(BaseModel):
|
| 89 |
+
"""API response model."""
|
| 90 |
+
success: bool = True
|
| 91 |
+
seed: str
|
| 92 |
+
requested: int
|
| 93 |
+
returned: int
|
| 94 |
+
results: List[Dict[str, Any]]
|
| 95 |
+
processing_time: float
|
| 96 |
+
api_calls_used: int
|
| 97 |
+
api_budget_remaining: int
|
| 98 |
+
data_source: str
|
| 99 |
+
timestamp: str
|
| 100 |
+
|
| 101 |
+
def count_api_call():
|
| 102 |
+
"""Track API usage."""
|
| 103 |
+
API_CALL_COUNTER["total"] += 1
|
| 104 |
+
logger.info(f"API call #{API_CALL_COUNTER['total']} - Session time: {time.time() - API_CALL_COUNTER['session_start']:.1f}s")
|
| 105 |
+
|
| 106 |
+
def get_api_budget() -> int:
|
| 107 |
+
"""Calculate remaining API budget for this request."""
|
| 108 |
+
max_calls = MAX_SERPAPI_CALLS_STRICT if USE_SERPAPI_STRICT_MODE else MAX_SERPAPI_CALLS_NORMAL
|
| 109 |
+
used = API_CALL_COUNTER["total"]
|
| 110 |
+
return max(0, max_calls - used)
|
| 111 |
+
|
| 112 |
+
def heuristic_competition_score(keyword: str) -> float:
|
| 113 |
+
"""
|
| 114 |
+
Calculate mock competition score based on keyword characteristics.
|
| 115 |
+
Does NOT use any API calls.
|
| 116 |
+
"""
|
| 117 |
+
words = keyword.lower().split()
|
| 118 |
+
word_count = len(words)
|
| 119 |
+
|
| 120 |
+
# Base competition by word count
|
| 121 |
+
base_scores = {1: 0.8, 2: 0.6, 3: 0.4, 4: 0.25, 5: 0.2}
|
| 122 |
+
base_score = base_scores.get(word_count, max(0.15, 0.3 - (word_count * 0.02)))
|
| 123 |
+
|
| 124 |
+
# Adjust for question keywords (lower competition)
|
| 125 |
+
question_words = ["how", "what", "why", "when", "where", "who", "which", "can", "should", "is", "are", "does"]
|
| 126 |
+
if any(word in words for word in question_words):
|
| 127 |
+
base_score *= 0.7
|
| 128 |
+
|
| 129 |
+
# Adjust for commercial intent (higher competition)
|
| 130 |
+
commercial_words = ["buy", "best", "top", "review", "price", "cheap", "discount"]
|
| 131 |
+
if any(word in words for word in commercial_words):
|
| 132 |
+
base_score *= 1.3
|
| 133 |
+
|
| 134 |
+
# Adjust for specific/niche keywords (lower competition)
|
| 135 |
+
specific_words = ["beginner", "tutorial", "guide", "explained", "step", "diy", "simple"]
|
| 136 |
+
if any(word in words for word in specific_words):
|
| 137 |
+
base_score *= 0.8
|
| 138 |
+
|
| 139 |
+
# Add some deterministic variation based on keyword hash
|
| 140 |
+
variation = (hash(keyword) % 20) / 100 # -0.1 to +0.1
|
| 141 |
+
base_score += variation
|
| 142 |
+
|
| 143 |
+
return max(0.05, min(0.95, base_score))
|
| 144 |
+
|
| 145 |
+
def heuristic_search_volume(keyword: str) -> int:
|
| 146 |
+
"""
|
| 147 |
+
Estimate search volume based on keyword characteristics.
|
| 148 |
+
Does NOT use any API calls.
|
| 149 |
+
"""
|
| 150 |
+
words = keyword.lower().split()
|
| 151 |
+
word_count = len(words)
|
| 152 |
+
|
| 153 |
+
# Base volumes
|
| 154 |
+
base_volumes = {1: 10000, 2: 5000, 3: 2000, 4: 800, 5: 400}
|
| 155 |
+
base_volume = base_volumes.get(word_count, max(100, 500 - (word_count * 50)))
|
| 156 |
+
|
| 157 |
+
# Adjust for popular terms
|
| 158 |
+
popular_terms = ["free", "online", "best", "how", "tutorial", "guide"]
|
| 159 |
+
if any(term in words for term in popular_terms):
|
| 160 |
+
base_volume = int(base_volume * 1.5)
|
| 161 |
+
|
| 162 |
+
# Adjust for very specific/niche terms
|
| 163 |
+
niche_terms = ["advanced", "professional", "enterprise", "custom"]
|
| 164 |
+
if any(term in words for term in niche_terms):
|
| 165 |
+
base_volume = int(base_volume * 0.6)
|
| 166 |
+
|
| 167 |
+
# Add deterministic variation
|
| 168 |
+
variation_factor = 1 + ((hash(keyword) % 40) - 20) / 100 # 0.8 to 1.2
|
| 169 |
+
volume = int(base_volume * variation_factor)
|
| 170 |
+
|
| 171 |
+
return max(10, min(100000, volume))
|
| 172 |
+
|
| 173 |
+
def calculate_opportunity_score(volume: int, competition: float) -> float:
|
| 174 |
+
"""Calculate opportunity score."""
|
| 175 |
+
volume_score = math.log10(volume + 1)
|
| 176 |
+
return volume_score / (competition + 0.1)
|
| 177 |
+
|
| 178 |
+
def score_keyword_heuristic(keyword: str) -> Dict[str, Any]:
|
| 179 |
+
"""
|
| 180 |
+
Score a keyword using only heuristics (NO API calls).
|
| 181 |
+
Fast and free method for initial ranking.
|
| 182 |
+
"""
|
| 183 |
+
competition = heuristic_competition_score(keyword)
|
| 184 |
+
volume = heuristic_search_volume(keyword)
|
| 185 |
+
opportunity = calculate_opportunity_score(volume, competition)
|
| 186 |
+
|
| 187 |
+
# Determine difficulty
|
| 188 |
+
if competition < 0.3:
|
| 189 |
+
difficulty = "Easy"
|
| 190 |
+
elif competition < 0.5:
|
| 191 |
+
difficulty = "Medium"
|
| 192 |
+
elif competition < 0.7:
|
| 193 |
+
difficulty = "Hard"
|
| 194 |
+
else:
|
| 195 |
+
difficulty = "Very Hard"
|
| 196 |
+
|
| 197 |
+
# Estimate ranking potential
|
| 198 |
+
if competition < 0.4 and volume >= 300:
|
| 199 |
+
ranking_chance = "High"
|
| 200 |
+
elif competition < 0.6 and volume >= 100:
|
| 201 |
+
ranking_chance = "Medium"
|
| 202 |
+
else:
|
| 203 |
+
ranking_chance = "Low"
|
| 204 |
+
|
| 205 |
+
return {
|
| 206 |
+
"keyword": keyword,
|
| 207 |
+
"monthly_searches": volume,
|
| 208 |
+
"competition_score": round(competition, 4),
|
| 209 |
+
"opportunity_score": round(opportunity, 2),
|
| 210 |
+
"difficulty": difficulty,
|
| 211 |
+
"ranking_chance": ranking_chance,
|
| 212 |
+
"data_source": "heuristic"
|
| 213 |
+
}
|
| 214 |
+
|
| 215 |
+
def enrich_with_serpapi(keyword: str) -> Optional[Dict[str, Any]]:
|
| 216 |
+
"""
|
| 217 |
+
Enrich a keyword with real SerpAPI data.
|
| 218 |
+
Uses 1 API call per keyword.
|
| 219 |
+
"""
|
| 220 |
+
if not HAS_SERPAPI or not SERPAPI_KEY:
|
| 221 |
+
logger.warning("SerpAPI not available for enrichment")
|
| 222 |
+
return None
|
| 223 |
+
|
| 224 |
+
try:
|
| 225 |
+
count_api_call()
|
| 226 |
+
|
| 227 |
+
params = {
|
| 228 |
+
"engine": "google",
|
| 229 |
+
"q": keyword,
|
| 230 |
+
"api_key": SERPAPI_KEY,
|
| 231 |
+
"hl": "en",
|
| 232 |
+
"gl": "us",
|
| 233 |
+
"num": 10
|
| 234 |
+
}
|
| 235 |
+
|
| 236 |
+
search = GoogleSearch(params)
|
| 237 |
+
results = search.get_dict()
|
| 238 |
+
|
| 239 |
+
if "error" in results:
|
| 240 |
+
logger.error(f"SerpAPI error: {results['error']}")
|
| 241 |
+
return None
|
| 242 |
+
|
| 243 |
+
# Extract metrics
|
| 244 |
+
search_info = results.get("search_information", {})
|
| 245 |
+
total_results_raw = search_info.get("total_results") or search_info.get("total_results_raw") or ""
|
| 246 |
+
total_results = 0
|
| 247 |
+
if isinstance(total_results_raw, int):
|
| 248 |
+
total_results = total_results_raw
|
| 249 |
+
elif isinstance(total_results_raw, str):
|
| 250 |
+
nums = re.sub(r"[^\d]", "", total_results_raw)
|
| 251 |
+
total_results = int(nums) if nums else 0
|
| 252 |
+
|
| 253 |
+
ads_count = len(results.get("ads_results", []))
|
| 254 |
+
has_featured_snippet = bool(results.get("featured_snippet") or results.get("answer_box"))
|
| 255 |
+
has_paa = bool(results.get("related_questions") or results.get("people_also_ask"))
|
| 256 |
+
has_kg = bool(results.get("knowledge_graph"))
|
| 257 |
+
|
| 258 |
+
# Calculate real competition
|
| 259 |
+
normalized_results = min(math.log10(total_results + 1) / 7, 1.0) if total_results > 0 else 0
|
| 260 |
+
ads_score = min(ads_count / 3, 1.0)
|
| 261 |
+
|
| 262 |
+
competition = (
|
| 263 |
+
0.40 * normalized_results +
|
| 264 |
+
0.25 * ads_score +
|
| 265 |
+
0.15 * (1 if has_featured_snippet else 0) +
|
| 266 |
+
0.10 * (1 if has_paa else 0) +
|
| 267 |
+
0.10 * (1 if has_kg else 0)
|
| 268 |
+
)
|
| 269 |
+
competition = max(0.0, min(1.0, competition))
|
| 270 |
+
|
| 271 |
+
# Estimate volume from signals
|
| 272 |
+
word_count = len(keyword.split())
|
| 273 |
+
base_volume = max(100, 8000 // (word_count + 1))
|
| 274 |
+
|
| 275 |
+
if ads_count > 2:
|
| 276 |
+
base_volume = int(base_volume * 1.5)
|
| 277 |
+
if has_featured_snippet:
|
| 278 |
+
base_volume = int(base_volume * 1.2)
|
| 279 |
+
|
| 280 |
+
volume = min(base_volume, 50000)
|
| 281 |
+
opportunity = calculate_opportunity_score(volume, competition)
|
| 282 |
+
|
| 283 |
+
# Determine difficulty
|
| 284 |
+
if competition < 0.3:
|
| 285 |
+
difficulty = "Easy"
|
| 286 |
+
elif competition < 0.5:
|
| 287 |
+
difficulty = "Medium"
|
| 288 |
+
elif competition < 0.7:
|
| 289 |
+
difficulty = "Hard"
|
| 290 |
+
else:
|
| 291 |
+
difficulty = "Very Hard"
|
| 292 |
+
|
| 293 |
+
# Ranking chance
|
| 294 |
+
if competition < 0.35:
|
| 295 |
+
ranking_chance = "High"
|
| 296 |
+
elif competition < 0.55:
|
| 297 |
+
ranking_chance = "Medium"
|
| 298 |
+
else:
|
| 299 |
+
ranking_chance = "Low"
|
| 300 |
+
|
| 301 |
+
return {
|
| 302 |
+
"keyword": keyword,
|
| 303 |
+
"monthly_searches": volume,
|
| 304 |
+
"competition_score": round(competition, 4),
|
| 305 |
+
"opportunity_score": round(opportunity, 2),
|
| 306 |
+
"difficulty": difficulty,
|
| 307 |
+
"ranking_chance": ranking_chance,
|
| 308 |
+
"total_results": total_results,
|
| 309 |
+
"ads_count": ads_count,
|
| 310 |
+
"featured_snippet": "Yes" if has_featured_snippet else "No",
|
| 311 |
+
"people_also_ask": "Yes" if has_paa else "No",
|
| 312 |
+
"knowledge_graph": "Yes" if has_kg else "No",
|
| 313 |
+
"data_source": "serpapi"
|
| 314 |
+
}
|
| 315 |
+
|
| 316 |
+
except Exception as e:
|
| 317 |
+
logger.error(f"SerpAPI enrichment failed for '{keyword}': {e}")
|
| 318 |
+
return None
|
| 319 |
+
|
| 320 |
+
def collect_candidates_from_seed(seed: str) -> Tuple[List[str], int]:
|
| 321 |
+
"""
|
| 322 |
+
Collect keyword candidates using ONLY 1 SerpAPI call.
|
| 323 |
+
Returns (candidates, api_calls_used)
|
| 324 |
+
"""
|
| 325 |
+
candidates = set()
|
| 326 |
+
candidates.add(seed) # Always include seed
|
| 327 |
+
api_calls = 0
|
| 328 |
+
|
| 329 |
+
# Generate synthetic candidates (NO API calls)
|
| 330 |
+
question_words = ["how to", "what is", "why", "when", "where", "can i", "should i"]
|
| 331 |
+
modifiers = ["best", "free", "online", "guide", "tutorial", "tips", "examples",
|
| 332 |
+
"for beginners", "explained", "2024", "2025", "cheap", "review"]
|
| 333 |
+
|
| 334 |
+
for q in question_words[:5]:
|
| 335 |
+
candidates.add(f"{q} {seed}")
|
| 336 |
+
|
| 337 |
+
for mod in modifiers[:15]:
|
| 338 |
+
candidates.add(f"{seed} {mod}")
|
| 339 |
+
candidates.add(f"{mod} {seed}")
|
| 340 |
+
|
| 341 |
+
# Make ONE SerpAPI call to get real related keywords
|
| 342 |
+
if HAS_SERPAPI and SERPAPI_KEY:
|
| 343 |
+
try:
|
| 344 |
+
count_api_call()
|
| 345 |
+
api_calls = 1
|
| 346 |
+
|
| 347 |
+
params = {
|
| 348 |
+
"engine": "google",
|
| 349 |
+
"q": seed,
|
| 350 |
+
"api_key": SERPAPI_KEY,
|
| 351 |
+
"hl": "en",
|
| 352 |
+
"gl": "us"
|
| 353 |
+
}
|
| 354 |
+
|
| 355 |
+
search = GoogleSearch(params)
|
| 356 |
+
results = search.get_dict()
|
| 357 |
+
|
| 358 |
+
if "error" not in results:
|
| 359 |
+
# Extract related searches
|
| 360 |
+
for item in results.get("related_searches", [])[:20]:
|
| 361 |
+
query = item.get("query", "")
|
| 362 |
+
if query and len(query.split()) <= 6:
|
| 363 |
+
candidates.add(query.lower().strip())
|
| 364 |
+
|
| 365 |
+
# Extract PAA questions
|
| 366 |
+
for item in results.get("related_questions", [])[:15]:
|
| 367 |
+
question = item.get("question", "")
|
| 368 |
+
if question:
|
| 369 |
+
candidates.add(question.lower().strip())
|
| 370 |
+
|
| 371 |
+
logger.info(f"SerpAPI call successful: collected real suggestions")
|
| 372 |
+
else:
|
| 373 |
+
logger.warning(f"SerpAPI error: {results.get('error')}")
|
| 374 |
+
|
| 375 |
+
except Exception as e:
|
| 376 |
+
logger.error(f"SerpAPI collection failed: {e}")
|
| 377 |
+
|
| 378 |
+
final_candidates = list(candidates)
|
| 379 |
+
logger.info(f"Collected {len(final_candidates)} candidates ({api_calls} API call)")
|
| 380 |
+
|
| 381 |
+
return final_candidates, api_calls
|
| 382 |
+
|
| 383 |
+
def check_rate_limit(client_ip: str) -> bool:
|
| 384 |
+
"""Rate limiting."""
|
| 385 |
+
current_time = time.time()
|
| 386 |
+
|
| 387 |
+
if client_ip not in REQUEST_TIMES:
|
| 388 |
+
REQUEST_TIMES[client_ip] = []
|
| 389 |
+
|
| 390 |
+
REQUEST_TIMES[client_ip] = [
|
| 391 |
+
t for t in REQUEST_TIMES[client_ip]
|
| 392 |
+
if current_time - t < RATE_LIMIT_WINDOW
|
| 393 |
+
]
|
| 394 |
+
|
| 395 |
+
if len(REQUEST_TIMES[client_ip]) >= RATE_LIMIT_MAX_REQUESTS:
|
| 396 |
+
return False
|
| 397 |
+
|
| 398 |
+
REQUEST_TIMES[client_ip].append(current_time)
|
| 399 |
+
return True
|
| 400 |
+
|
| 401 |
+
@app.on_event("startup")
|
| 402 |
+
async def startup():
|
| 403 |
+
"""Startup logging."""
|
| 404 |
+
logger.info("=" * 60)
|
| 405 |
+
logger.info("SEO Keyword API - Free Plan Optimized")
|
| 406 |
+
logger.info(f"Strict Mode: {USE_SERPAPI_STRICT_MODE}")
|
| 407 |
+
logger.info(f"Max API calls per request: {MAX_SERPAPI_CALLS_STRICT if USE_SERPAPI_STRICT_MODE else MAX_SERPAPI_CALLS_NORMAL}")
|
| 408 |
+
logger.info(f"SerpAPI Available: {HAS_SERPAPI and bool(SERPAPI_KEY)}")
|
| 409 |
+
logger.info("=" * 60)
|
| 410 |
+
|
| 411 |
+
@app.get("/")
|
| 412 |
+
async def root():
|
| 413 |
+
"""Root endpoint."""
|
| 414 |
+
return {
|
| 415 |
+
"service": "Free-Plan Friendly SEO Keyword API",
|
| 416 |
+
"version": "4.0.0",
|
| 417 |
+
"strict_mode": USE_SERPAPI_STRICT_MODE,
|
| 418 |
+
"max_api_calls": MAX_SERPAPI_CALLS_STRICT if USE_SERPAPI_STRICT_MODE else MAX_SERPAPI_CALLS_NORMAL,
|
| 419 |
+
"strategy": "1 API call for candidate collection + optional enrichment for top N",
|
| 420 |
+
"endpoints": {
|
| 421 |
+
"/keywords": "Main keyword research (configurable count)",
|
| 422 |
+
"/health": "Health check",
|
| 423 |
+
"/stats": "API usage statistics"
|
| 424 |
+
}
|
| 425 |
+
}
|
| 426 |
+
|
| 427 |
+
@app.get("/health")
|
| 428 |
+
async def health():
|
| 429 |
+
"""Health check."""
|
| 430 |
+
return {
|
| 431 |
+
"status": "healthy",
|
| 432 |
+
"timestamp": datetime.utcnow().isoformat(),
|
| 433 |
+
"serpapi_available": HAS_SERPAPI and bool(SERPAPI_KEY),
|
| 434 |
+
"strict_mode": USE_SERPAPI_STRICT_MODE,
|
| 435 |
+
"session_api_calls": API_CALL_COUNTER["total"]
|
| 436 |
+
}
|
| 437 |
+
|
| 438 |
+
@app.get("/stats")
|
| 439 |
+
async def stats():
|
| 440 |
+
"""API usage statistics."""
|
| 441 |
+
uptime = time.time() - API_CALL_COUNTER["session_start"]
|
| 442 |
+
return {
|
| 443 |
+
"session_start": datetime.fromtimestamp(API_CALL_COUNTER["session_start"]).isoformat(),
|
| 444 |
+
"uptime_seconds": round(uptime, 1),
|
| 445 |
+
"total_api_calls": API_CALL_COUNTER["total"],
|
| 446 |
+
"strict_mode": USE_SERPAPI_STRICT_MODE,
|
| 447 |
+
"max_calls_per_request": MAX_SERPAPI_CALLS_STRICT if USE_SERPAPI_STRICT_MODE else MAX_SERPAPI_CALLS_NORMAL
|
| 448 |
+
}
|
| 449 |
+
|
| 450 |
+
@app.get("/keywords", response_model=KeywordResponse)
|
| 451 |
+
async def get_keywords(
|
| 452 |
+
request: Request,
|
| 453 |
+
seed: str = Query(..., description="Seed keyword", min_length=1, max_length=100),
|
| 454 |
+
top: int = Query(50, description="Number of keywords to return", ge=1, le=100),
|
| 455 |
+
enrich_top: int = Query(4, description="Number of top results to enrich with SerpAPI", ge=0, le=20)
|
| 456 |
+
):
|
| 457 |
+
"""
|
| 458 |
+
Main keyword research endpoint.
|
| 459 |
+
|
| 460 |
+
Strategy:
|
| 461 |
+
1. Make 1 SerpAPI call to collect candidates from seed
|
| 462 |
+
2. Score all candidates with heuristics (free)
|
| 463 |
+
3. Optionally enrich top N with real SerpAPI data
|
| 464 |
+
|
| 465 |
+
Parameters:
|
| 466 |
+
- seed: Your main keyword
|
| 467 |
+
- top: How many keywords you want (e.g., 5, 10, 20, 50)
|
| 468 |
+
- enrich_top: How many of the top results to verify with SerpAPI (0 = none, saves API calls)
|
| 469 |
+
|
| 470 |
+
Example: top=10, enrich_top=3 means:
|
| 471 |
+
- 1 API call to collect candidates
|
| 472 |
+
- Return 10 keywords scored with heuristics
|
| 473 |
+
- Enrich the top 3 with real SerpAPI data (3 more API calls)
|
| 474 |
+
- Total: 4 API calls
|
| 475 |
+
"""
|
| 476 |
+
start_time = time.time()
|
| 477 |
+
client_ip = request.client.host or "unknown"
|
| 478 |
+
|
| 479 |
+
# Authentication
|
| 480 |
+
if API_AUTH_KEY:
|
| 481 |
+
auth = request.headers.get("Authorization", "").replace("Bearer ", "")
|
| 482 |
+
if auth != API_AUTH_KEY:
|
| 483 |
+
raise HTTPException(401, "Invalid or missing API key")
|
| 484 |
+
|
| 485 |
+
# Rate limiting
|
| 486 |
+
if not check_rate_limit(client_ip):
|
| 487 |
+
raise HTTPException(429, "Rate limit exceeded")
|
| 488 |
+
|
| 489 |
+
# Validate
|
| 490 |
+
seed = seed.strip().lower()
|
| 491 |
+
if not seed:
|
| 492 |
+
raise HTTPException(400, "Invalid seed keyword")
|
| 493 |
+
|
| 494 |
+
# Check API budget
|
| 495 |
+
max_calls = MAX_SERPAPI_CALLS_STRICT if USE_SERPAPI_STRICT_MODE else MAX_SERPAPI_CALLS_NORMAL
|
| 496 |
+
if enrich_top > 0:
|
| 497 |
+
required_calls = 1 + enrich_top # 1 for collection + N for enrichment
|
| 498 |
+
if required_calls > max_calls:
|
| 499 |
+
raise HTTPException(
|
| 500 |
+
400,
|
| 501 |
+
f"Request would use {required_calls} API calls, but budget is {max_calls}. "
|
| 502 |
+
f"Reduce enrich_top to {max_calls - 1} or less."
|
| 503 |
+
)
|
| 504 |
+
|
| 505 |
+
try:
|
| 506 |
+
logger.info(f"Request: seed='{seed}', top={top}, enrich_top={enrich_top}")
|
| 507 |
+
|
| 508 |
+
# Step 1: Collect candidates (1 API call)
|
| 509 |
+
candidates, api_calls_used = collect_candidates_from_seed(seed)
|
| 510 |
+
|
| 511 |
+
if not candidates:
|
| 512 |
+
raise HTTPException(404, "No candidates found")
|
| 513 |
+
|
| 514 |
+
# Step 2: Score all candidates with heuristics (FREE - no API calls)
|
| 515 |
+
logger.info(f"Scoring {len(candidates)} candidates with heuristics...")
|
| 516 |
+
scored_candidates = []
|
| 517 |
+
for candidate in candidates:
|
| 518 |
+
try:
|
| 519 |
+
result = score_keyword_heuristic(candidate)
|
| 520 |
+
scored_candidates.append(result)
|
| 521 |
+
except Exception as e:
|
| 522 |
+
logger.warning(f"Heuristic scoring failed for '{candidate}': {e}")
|
| 523 |
+
continue
|
| 524 |
+
|
| 525 |
+
# Sort by opportunity score (highest first)
|
| 526 |
+
scored_candidates.sort(key=lambda x: x["opportunity_score"], reverse=True)
|
| 527 |
+
|
| 528 |
+
# Get top N requested
|
| 529 |
+
top_results = scored_candidates[:top]
|
| 530 |
+
|
| 531 |
+
# Step 3: Optionally enrich top results with real SerpAPI data
|
| 532 |
+
data_source = "heuristic"
|
| 533 |
+
if enrich_top > 0 and HAS_SERPAPI and SERPAPI_KEY:
|
| 534 |
+
logger.info(f"Enriching top {enrich_top} results with SerpAPI...")
|
| 535 |
+
|
| 536 |
+
for i in range(min(enrich_top, len(top_results))):
|
| 537 |
+
keyword = top_results[i]["keyword"]
|
| 538 |
+
|
| 539 |
+
# Check budget before each call
|
| 540 |
+
if api_calls_used >= max_calls:
|
| 541 |
+
logger.warning(f"API budget exhausted at {api_calls_used} calls")
|
| 542 |
+
break
|
| 543 |
+
|
| 544 |
+
enriched = enrich_with_serpapi(keyword)
|
| 545 |
+
if enriched:
|
| 546 |
+
top_results[i] = enriched
|
| 547 |
+
api_calls_used += 1
|
| 548 |
+
data_source = "mixed"
|
| 549 |
+
|
| 550 |
+
# Small delay between calls
|
| 551 |
+
time.sleep(0.2)
|
| 552 |
+
|
| 553 |
+
logger.info(f"Enrichment complete: {api_calls_used} total API calls used")
|
| 554 |
+
|
| 555 |
+
# Add ranking
|
| 556 |
+
for rank, result in enumerate(top_results, 1):
|
| 557 |
+
result["rank"] = rank
|
| 558 |
+
|
| 559 |
+
processing_time = time.time() - start_time
|
| 560 |
+
budget_remaining = max_calls - api_calls_used
|
| 561 |
+
|
| 562 |
+
logger.info(
|
| 563 |
+
f"SUCCESS: Returned {len(top_results)} keywords, "
|
| 564 |
+
f"API calls: {api_calls_used}/{max_calls}, "
|
| 565 |
+
f"Time: {processing_time:.2f}s"
|
| 566 |
+
)
|
| 567 |
+
|
| 568 |
+
return KeywordResponse(
|
| 569 |
+
success=True,
|
| 570 |
+
seed=seed,
|
| 571 |
+
requested=top,
|
| 572 |
+
returned=len(top_results),
|
| 573 |
+
results=top_results,
|
| 574 |
+
processing_time=round(processing_time, 2),
|
| 575 |
+
api_calls_used=api_calls_used,
|
| 576 |
+
api_budget_remaining=budget_remaining,
|
| 577 |
+
data_source=data_source,
|
| 578 |
+
timestamp=datetime.utcnow().isoformat()
|
| 579 |
+
)
|
| 580 |
+
|
| 581 |
+
except HTTPException:
|
| 582 |
+
raise
|
| 583 |
+
except Exception as e:
|
| 584 |
+
logger.error(f"Request failed: {e}")
|
| 585 |
+
raise HTTPException(500, f"Processing error: {str(e)}")
|
| 586 |
+
|
| 587 |
+
@app.get("/export/csv")
|
| 588 |
+
async def export_csv(
|
| 589 |
+
seed: str = Query(...),
|
| 590 |
+
top: int = Query(50),
|
| 591 |
+
enrich_top: int = Query(0)
|
| 592 |
+
):
|
| 593 |
+
"""Export results as CSV."""
|
| 594 |
+
if not HAS_PANDAS:
|
| 595 |
+
raise HTTPException(500, "CSV export unavailable (pandas not installed)")
|
| 596 |
+
|
| 597 |
+
# Get keyword data
|
| 598 |
+
response = await get_keywords(Request(scope={"type": "http", "client": ("127.0.0.1", 0), "headers": []}), seed, top, enrich_top)
|
| 599 |
+
|
| 600 |
+
# Convert to DataFrame
|
| 601 |
+
df = pd.DataFrame(response.results)
|
| 602 |
+
|
| 603 |
+
# Create CSV
|
| 604 |
+
output = io.StringIO()
|
| 605 |
+
df.to_csv(output, index=False)
|
| 606 |
+
output.seek(0)
|
| 607 |
+
|
| 608 |
+
return StreamingResponse(
|
| 609 |
+
iter([output.getvalue()]),
|
| 610 |
+
media_type="text/csv",
|
| 611 |
+
headers={"Content-Disposition": f"attachment; filename=keywords_{seed.replace(' ', '_')}.csv"}
|
| 612 |
+
)
|
| 613 |
+
|
| 614 |
+
if __name__ == "__main__":
|
| 615 |
+
import uvicorn
|
| 616 |
+
|
| 617 |
+
port = int(os.getenv("PORT", 8000))
|
| 618 |
+
logger.info(f"Starting server on port {port}")
|
| 619 |
+
|
| 620 |
+
uvicorn.run(
|
| 621 |
+
app,
|
| 622 |
+
host="0.0.0.0",
|
| 623 |
+
port=port,
|
| 624 |
+
log_level="info"
|
| 625 |
+
)
|
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
pip install serpapi tabulate python-dotenv
|