Upload 11 files

🚀 SEO Keyword Research AI Agent

An AI-powered SEO keyword research agent that discovers, analyzes, and ranks keyword opportunities using SerpAPI, featuring an interactive Streamlit dashboard for visualization and n8n automation integration.
This project showcases end-to-end skills in Python, AI agents, API integration, data visualization, and deployment (Render + n8n).

✨ Features

🔍 Keyword Discovery – Finds semantically related keywords for any seed keyword.
📊 Keyword Analysis – Scores each keyword based on volume, competition, and SERP metrics.
📂 Data Export – Export analyzed results as CSV/Excel files.
📈 Interactive Dashboard – Visualize keyword trends, heatmaps, and search intent using Streamlit + Plotly.
🤖 AI Agent Workflow – Automates research → processing → reporting pipeline.
🔗 n8n Integration – Trigger workflows via webhooks (e.g., run research + auto-send reports to Slack/Email).
🌐 Deployment – Hosted on Render, accessible via API and dashboard.

Files changed (11) hide show

README.md +156 -0
__init__.py +0 -0
app.py +584 -0
dashboard.py +830 -0
git +0 -0
keyword_agent.py +19 -0
postprocess.py +366 -0
ranking.py +569 -0
requirements.txt +60 -0
server.py +625 -0
tempCodeRunnerFile.py +1 -0

README.md ADDED Viewed

	@@ -0,0 +1,156 @@

+# SEO Keyword Research AI Agent
+An **AI-powered SEO keyword research agent** that discovers, analyzes, and ranks keyword opportunities using **SerpAPI**, with an interactive **Streamlit dashboard** for visualization and an **n8n integration** for automation.
+This project was built to demonstrate skills in **Python, AI agents, API integration, data visualization, and deployment** (Render + n8n).
+---
+## 🚀 Features
+- 🔍 **Keyword Discovery** – Finds related keywords for any seed keyword.
+- 📊 **Keyword Analysis** – Scores keywords based on search volume, competition, and SERP signals.
+- 📂 **Data Export** – Saves results to CSV/Excel with metadata.
+- 📈 **Interactive Dashboard** – Streamlit + Plotly for keyword trends, competition heatmaps, and intent analysis.
+- 🤖 **AI Agent Workflow** – Automates tasks like keyword research → processing → reporting.
+- 🔗 **n8n Integration** – Trigger workflows via webhooks (e.g., run keyword research and auto-send results to Slack/Email).
+- 🌐 **Deployment** – Hosted on **Render** for API and dashboard access.
+---
+## 🏗️ Project Structure
+## seo-keyword-ai-agent/
+│
+├── app.py # Master pipeline orchestrator
+│──  dashboard.py # Streamlit visualization
+│── src/
+│ ├── postprocess.py # Cleans & enriches results
+│ ├── ranking.py # Keyword discovery & scoring
+│ ├── server.py # FastAPI/Render server
+│── output/ # Generated keyword results
+│── .env # API keys (not committed)
+│── requirements.txt # Python dependencies
+│── README.md # Project documentation
+---
+## ⚙️ Installation
+---
+## ⚙️ Installation
+1.  **Clone the repo**
+    ```bash
+    git clone https://github.com/omraghu07/seo-keyword-ai-agent.git
+    cd seo-keyword-ai-agent
+    ```
+2.  **Create a virtual environment**
+    ```bash
+    python -m venv agent_venv
+    # Mac/Linux
+    source agent_venv/bin/activate
+    # Windows
+    agent_venv\Scripts\activate
+    ```
+3.  **Install dependencies**
+    ```bash
+    pip install -r requirements.txt
+    ```
+4.  **Setup .env file**
+    Create a `.env` file in the root directory and add your API key:
+    ```
+    SERPAPI_KEY=your_serpapi_key_here
+    ```
+---
+# ▶️ Usage
+##      Run the full pipeline
+```bash
+  python app.py "global internship" --max-candidates 100 --top-results 50
+```
+## Launch the dashboard
+```bash
+  streamlit run dashboard.py
+```
+## Run as an API (Render/FastAPI)
+```bash
+  gunicorn -k uvicorn.workers.UvicornWorker src.server:app --bind 0.0.0.0:8000 --workers 2
+```
+# 🔗 n8n Integration
+- Create an n8n workflow with a Webhook node.
+- Connect it to Render API:
+```bash
+  POST https://seo-keyword-ai-agent.onrender.com/analyze
+{
+  "seed": "global internship",
+  "top": 10
+}
+```
+- Add email/Slack nodes to auto-send reports.
+# 📊 Example Output
+## Top 5 Keyword Opportunities:
+| Keyword                           | Volume | Competition | Score  | Results |
+| --------------------------------- | ------ | ----------- | ------ | ------- |
+| UCLA Global Internship Program    | 2000   | 0.0         | 330.12 | 0       |
+| Summer Internship Programs - CIEE | 1666   | 0.33        | 9.26   | 54,000  |
+| Global Internship Program HENNGE  | 2000   | 0.35        | 9.01   | 10,200  |
+| Berkeley Global Internships Paid  | 1666   | 0.45        | 6.98   | 219,000 |
+| Global Internship Remote          | 2500   | 0.50        | 6.66   | 174M    |
+## 🛠️ Tech Stack
+- Python (Core language)
+- SerpAPI (Google search results API)
+- Pandas, Requests, Tabulate (Data processing)
+- Streamlit + Plotly (Dashboard & charts)
+- FastAPI + Gunicorn (API server)
+- Render (Deployment)
+- n8n (Workflow automation)
+# 👨‍💻 Author
+Om Raghuwanshi – Engineering student passionate about AI
+## 🔗 Links
+[![linkedin](https://img.shields.io/badge/linkedin-0A66C2?style=for-the-badge&logo=linkedin&logoColor=white)](https://www.linkedin.com/in/om-raghuwanshi-b5136a298)
+⚡ If you like this project, don’t forget to ⭐ star the repo and fork it!

__init__.py ADDED Viewed

File without changes

app.py ADDED Viewed

	@@ -0,0 +1,584 @@

+# app.py
+"""
+Complete Keyword Research Pipeline
+Integrates keyword discovery, analysis, and post-processing into one workflow
+"""
+import os
+import sys
+import argparse
+from pathlib import Path
+from dotenv import load_dotenv
+# Load environment variables first
+load_dotenv()
+# Add current directory to path for imports
+current_dir = Path(__file__).parent
+sys.path.insert(0, str(current_dir))
+def check_setup():
+    """Check if all requirements are met"""
+    print("🔍 Checking setup...")
+    # Check API key
+    api_key = os.getenv("SERPAPI_KEY")
+    if not api_key:
+        print("❌ SERPAPI_KEY not found in environment variables")
+        print("Make sure your .env file contains: SERPAPI_KEY=your_key_here")
+        return False
+    print(f"✅ API key found: {api_key[:10]}...")
+    # Check required packages
+    required_packages = [
+        ('serpapi', 'google-search-results'),
+        ('pandas', 'pandas'),
+        ('tabulate', 'tabulate'),
+        ('openpyxl', 'openpyxl')
+    ]
+    missing = []
+    for import_name, pip_name in required_packages:
+        try:
+            __import__(import_name)
+        except ImportError:
+            missing.append(pip_name)
+    if missing:
+        print("❌ Missing packages:")
+        for pkg in missing:
+            print(f"   pip install {pkg}")
+        return False
+    print("✅ All packages available")
+    return True
+def run_keyword_analysis(seed_keyword, use_volume_api=False):
+    """Run the keyword analysis using the professional tool"""
+    print("\n🔍 Step 1: Running keyword analysis...")
+    try:
+        # Import and run the KeywordResearchTool
+        import os
+        import math
+        import csv
+        import re
+        import logging
+        from datetime import date
+        from typing import List, Dict, Optional, Tuple, Any
+        from dataclasses import dataclass
+        from serpapi import GoogleSearch
+        # Configure logging to be less verbose
+        logging.basicConfig(level=logging.WARNING)
+        @dataclass
+        class KeywordMetrics:
+            keyword: str
+            monthly_searches: int
+            competition_score: float
+            opportunity_score: float
+            total_results: int
+            ads_count: int
+            has_featured_snippet: bool
+            has_people_also_ask: bool
+            has_knowledge_graph: bool
+        class CompetitionCalculator:
+            WEIGHTS = {
+                'total_results': 0.50,
+                'ads': 0.25,
+                'featured_snippet': 0.15,
+                'people_also_ask': 0.07,
+                'knowledge_graph': 0.03
+            }
+            @staticmethod
+            def extract_total_results(search_info):
+                if not search_info:
+                    return 0
+                total = (search_info.get("total_results") or
+                        search_info.get("total_results_raw") or
+                        search_info.get("total"))
+                if isinstance(total, int):
+                    return total
+                if isinstance(total, str):
+                    numbers_only = re.sub(r"[^\d]", "", total)
+                    try:
+                        return int(numbers_only) if numbers_only else 0
+                    except ValueError:
+                        return 0
+                return 0
+            def calculate_score(self, search_results):
+                search_info = search_results.get("search_information", {})
+                total_results = self.extract_total_results(search_info)
+                normalized_results = min(math.log10(total_results + 1) / 7, 1.0)
+                ads = search_results.get("ads_results", [])
+                ads_count = len(ads) if ads else 0
+                ads_score = min(ads_count / 3, 1.0)
+                has_featured_snippet = bool(
+                    search_results.get("featured_snippet") or
+                    search_results.get("answer_box")
+                )
+                has_people_also_ask = bool(
+                    search_results.get("related_questions") or
+                    search_results.get("people_also_ask")
+                )
+                has_knowledge_graph = bool(search_results.get("knowledge_graph"))
+                competition_score = (
+                    self.WEIGHTS['total_results'] * normalized_results +
+                    self.WEIGHTS['ads'] * ads_score +
+                    self.WEIGHTS['featured_snippet'] * has_featured_snippet +
+                    self.WEIGHTS['people_also_ask'] * has_people_also_ask +
+                    self.WEIGHTS['knowledge_graph'] * has_knowledge_graph
+                )
+                competition_score = max(0.0, min(1.0, competition_score))
+                breakdown = {
+                    "total_results": total_results,
+                    "ads_count": ads_count,
+                    "has_featured_snippet": has_featured_snippet,
+                    "has_people_also_ask": has_people_also_ask,
+                    "has_knowledge_graph": has_knowledge_graph
+                }
+                return competition_score, breakdown
+        # Main analysis functions
+        def find_related_keywords(seed_keyword, max_results=120):
+            print(f"Finding related keywords for: '{seed_keyword}'...")
+            params = {
+                "engine": "google",
+                "q": seed_keyword,
+                "api_key": os.getenv("SERPAPI_KEY"),
+                "hl": "en",
+                "gl": "us"
+            }
+            try:
+                search = GoogleSearch(params)
+                results = search.get_dict()
+            except Exception as e:
+                print(f"Error getting related keywords: {e}")
+                return []
+            keyword_candidates = set()
+            # Get related searches
+            related_searches = results.get("related_searches", [])
+            for item in related_searches:
+                query = item.get("query") or item.get("suggestion")
+                if query and len(query.strip()) > 0:
+                    keyword_candidates.add(query.strip())
+            # Get people also ask
+            related_questions = results.get("related_questions", [])
+            for item in related_questions:
+                question = item.get("question") or item.get("query")
+                if question and len(question.strip()) > 0:
+                    keyword_candidates.add(question.strip())
+            # Get organic titles
+            organic_results = results.get("organic_results", [])
+            for result in organic_results[:10]:
+                title = result.get("title", "")
+                if title and len(title.strip()) > 0:
+                    keyword_candidates.add(title.strip())
+            final_keywords = list(keyword_candidates)[:max_results]
+            print(f"Found {len(final_keywords)} keyword candidates")
+            return final_keywords
+        def analyze_keywords(keywords, use_volume_api=False):
+            print(f"Analyzing {len(keywords)} keywords...")
+            calculator = CompetitionCalculator()
+            analyzed_keywords = []
+            for i, keyword in enumerate(keywords, 1):
+                if i % 10 == 0:
+                    print(f"Progress: {i}/{len(keywords)} keywords processed")
+                # Search for keyword
+                params = {
+                    "engine": "google",
+                    "q": keyword,
+                    "api_key": os.getenv("SERPAPI_KEY"),
+                    "hl": "en",
+                    "gl": "us",
+                    "num": 10
+                }
+                try:
+                    search = GoogleSearch(params)
+                    search_results = search.get_dict()
+                except Exception as e:
+                    print(f"Error analyzing '{keyword}': {e}")
+                    continue
+                # Calculate competition
+                competition_score, breakdown = calculator.calculate_score(search_results)
+                # Estimate volume
+                word_count = len(keyword.split())
+                search_volume = max(10, 10000 // (word_count + 1))
+                # Calculate opportunity score
+                volume_score = math.log10(search_volume + 1)
+                opportunity_score = volume_score / (competition_score + 0.01)
+                metrics = KeywordMetrics(
+                    keyword=keyword,
+                    monthly_searches=search_volume,
+                    competition_score=round(competition_score, 4),
+                    opportunity_score=round(opportunity_score, 2),
+                    total_results=breakdown["total_results"],
+                    ads_count=breakdown["ads_count"],
+                    has_featured_snippet=breakdown["has_featured_snippet"],
+                    has_people_also_ask=breakdown["has_people_also_ask"],
+                    has_knowledge_graph=breakdown["has_knowledge_graph"]
+                )
+                analyzed_keywords.append(metrics)
+            # Sort by opportunity score
+            analyzed_keywords.sort(key=lambda x: x.opportunity_score, reverse=True)
+            print(f"Analysis complete! {len(analyzed_keywords)} keywords analyzed")
+            return analyzed_keywords
+        def save_to_csv(keyword_metrics, seed_keyword, top_count=50):
+            if not keyword_metrics:
+                print("No data to save!")
+                return None
+            # Create filename
+            today = date.today()
+            safe_seed = re.sub(r"[^\w\s-]", "", seed_keyword).strip().replace(" ", "_")[:30]
+            filename = f"keywords_{safe_seed}_{today}.csv"
+            try:
+                with open(filename, "w", newline='', encoding='utf-8') as file:
+                    writer = csv.writer(file)
+                    # Write header
+                    headers = [
+                        "Keyword", "Monthly Searches", "Competition Score",
+                        "Opportunity Score", "Total Results", "Ads Count",
+                        "Featured Snippet", "People Also Ask", "Knowledge Graph"
+                    ]
+                    writer.writerow(headers)
+                    # Write data
+                    for metrics in keyword_metrics[:top_count]:
+                        row = [
+                            metrics.keyword,
+                            metrics.monthly_searches,
+                            metrics.competition_score,
+                            metrics.opportunity_score,
+                            metrics.total_results,
+                            metrics.ads_count,
+                            "Yes" if metrics.has_featured_snippet else "No",
+                            "Yes" if metrics.has_people_also_ask else "No",
+                            "Yes" if metrics.has_knowledge_graph else "No"
+                        ]
+                        writer.writerow(row)
+                saved_count = min(top_count, len(keyword_metrics))
+                print(f"✅ Saved {saved_count} keywords to {filename}")
+                return filename
+            except Exception as e:
+                print(f"Error saving CSV: {e}")
+                return None
+        def display_top_results(keyword_metrics, top_count=5):
+            if not keyword_metrics:
+                print("No results to display!")
+                return
+            print(f"\n🏆 Top {min(top_count, len(keyword_metrics))} Keywords:")
+            print("-" * 80)
+            for i, metrics in enumerate(keyword_metrics[:top_count], 1):
+                print(f"{i}. {metrics.keyword}")
+                print(f"   Score: {metrics.opportunity_score} | Volume: {metrics.monthly_searches:,} | Competition: {metrics.competition_score}")
+                print()
+        # Run the analysis
+        related_keywords = find_related_keywords(seed_keyword)
+        if not related_keywords:
+            print("❌ No keyword candidates found")
+            return None
+        analyzed_keywords = analyze_keywords(related_keywords, use_volume_api)
+        if not analyzed_keywords:
+            print("❌ No keywords analyzed successfully")
+            return None
+        filename = save_to_csv(analyzed_keywords, seed_keyword)
+        display_top_results(analyzed_keywords)
+        return filename
+    except Exception as e:
+        print(f"❌ Error in keyword analysis: {e}")
+        return None
+def run_postprocessing(csv_filename, seed_keyword):
+    """Run post-processing on the CSV file"""
+    print("\n🧹 Step 2: Running post-processing...")
+    try:
+        import pandas as pd
+        import re
+        import json
+        from datetime import date, datetime
+        # Try to import optional packages
+        try:
+            from tabulate import tabulate
+            HAS_TABULATE = True
+        except ImportError:
+            HAS_TABULATE = False
+        try:
+            import openpyxl
+            HAS_EXCEL = True
+        except ImportError:
+            HAS_EXCEL = False
+        # Configuration
+        BRAND_KEYWORDS = {
+            "linkedin", "indeed", "glassdoor", "ucla", "asu", "berkeley",
+            "hennge", "ciee", "google", "facebook", "microsoft", "amazon"
+        }
+        def is_brand_query(keyword):
+            if not keyword:
+                return False
+            keyword_lower = keyword.lower()
+            for brand in BRAND_KEYWORDS:
+                if brand in keyword_lower:
+                    return True
+            if re.search(r"\.(com|edu|org|net|gov|io)\b", keyword_lower):
+                return True
+            return False
+        def classify_intent(keyword):
+            if not keyword:
+                return "informational"
+            k = keyword.lower()
+            if any(signal in k for signal in ["how to", "what is", "why", "guide", "tutorial"]):
+                return "informational"
+            if any(signal in k for signal in ["buy", "price", "cost", "apply", "register"]):
+                return "transactional"
+            if any(signal in k for signal in ["best", "top", "compare", "vs", "reviews"]):
+                return "commercial"
+            if is_brand_query(keyword):
+                return "navigational"
+            return "informational"
+        def classify_tail(keyword):
+            if not keyword:
+                return "short-tail"
+            word_count = len(str(keyword).split())
+            if word_count >= 4:
+                return "long-tail"
+            elif word_count == 3:
+                return "mid-tail"
+            else:
+                return "short-tail"
+        # Load and process the CSV
+        print(f"Loading {csv_filename}...")
+        df = pd.read_csv(csv_filename)
+        print(f"Loaded {len(df)} keywords")
+        # Clean and enhance the data
+        print("Processing data...")
+        # Standardize column names
+        column_mapping = {
+            'Keyword': 'Keyword',
+            'Monthly Searches': 'Monthly Searches',
+            'Competition Score': 'Competition',
+            'Opportunity Score': 'Opportunity Score',
+            'Total Results': 'Google Results',
+            'Ads Count': 'Ads Shown',
+            'Featured Snippet': 'Featured Snippet?',
+            'People Also Ask': 'PAA Available?',
+            'Knowledge Graph': 'Knowledge Graph?'
+        }
+        # Rename columns that exist
+        for old_name, new_name in column_mapping.items():
+            if old_name in df.columns:
+                df = df.rename(columns={old_name: new_name})
+        # Remove duplicates and sort
+        df = df.drop_duplicates(subset=['Keyword'], keep='first')
+        df = df.sort_values('Opportunity Score', ascending=False)
+        # Add enhancement columns
+        df['Intent'] = df['Keyword'].apply(classify_intent)
+        df['Tail'] = df['Keyword'].apply(classify_tail)
+        df['Is Brand/Navigational'] = df['Keyword'].apply(lambda x: "Yes" if is_brand_query(x) else "No")
+        # Reorder columns
+        column_order = [
+            'Keyword', 'Intent', 'Tail', 'Is Brand/Navigational',
+            'Monthly Searches', 'Competition', 'Opportunity Score',
+            'Google Results', 'Ads Shown', 'Featured Snippet?',
+            'PAA Available?', 'Knowledge Graph?'
+        ]
+        available_columns = [col for col in column_order if col in df.columns]
+        df = df[available_columns]
+        # Create output directory
+        os.makedirs("results", exist_ok=True)
+        # Generate filenames
+        today = date.today().isoformat()
+        safe_seed = re.sub(r"[^\w\s-]", "", seed_keyword).strip().replace(" ", "_")[:30]
+        base_name = f"keywords_{safe_seed}_{today}"
+        csv_path = f"results/{base_name}.csv"
+        excel_path = f"results/{base_name}.xlsx"
+        meta_path = f"results/{base_name}.meta.json"
+        # Save enhanced CSV
+        df.to_csv(csv_path, index=False)
+        print(f"💾 Saved enhanced CSV: {csv_path}")
+        # Save Excel if available
+        if HAS_EXCEL:
+            with pd.ExcelWriter(excel_path, engine="openpyxl") as writer:
+                df.head(50).to_excel(writer, sheet_name="Top_50", index=False)
+                df.to_excel(writer, sheet_name="All_Keywords", index=False)
+            print(f"📊 Saved Excel: {excel_path}")
+        # Save metadata
+        metadata = {
+            "seed_keyword": seed_keyword,
+            "generated_at": datetime.utcnow().isoformat() + "Z",
+            "total_keywords": len(df),
+            "data_source": "SerpApi with heuristic search volumes",
+            "methodology": "Opportunity Score = log10(volume+1) / (competition + 0.01)"
+        }
+        with open(meta_path, "w", encoding="utf-8") as f:
+            json.dump(metadata, f, indent=2)
+        print(f"📋 Saved metadata: {meta_path}")
+        # Display results
+        print(f"\n🏆 Top 10 Enhanced Results:")
+        preview_df = df.head(10)
+        if HAS_TABULATE:
+            display_columns = ['Keyword', 'Intent', 'Tail', 'Monthly Searches', 'Competition', 'Opportunity Score']
+            display_data = preview_df[display_columns]
+            print(tabulate(display_data, headers="keys", tablefmt="github", showindex=False))
+        else:
+            for i, row in preview_df.iterrows():
+                print(f"{i+1}. {row['Keyword']} | Score: {row['Opportunity Score']} | Intent: {row['Intent']} | Tail: {row['Tail']}")
+        # Summary stats
+        print(f"\n📈 Summary:")
+        print(f"• Total keywords: {len(df)}")
+        print(f"• Long-tail keywords: {len(df[df['Tail'] == 'long-tail'])}")
+        print(f"• Non-brand keywords: {len(df[df['Is Brand/Navigational'] == 'No'])}")
+        print(f"• High opportunity (score > 50): {len(df[df['Opportunity Score'] > 50])}")
+        return csv_path, excel_path, meta_path
+    except Exception as e:
+        print(f"❌ Error in post-processing: {e}")
+        return None, None, None
+def run_complete_pipeline(seed_keyword, use_volume_api=False):
+    """Run the complete pipeline"""
+    print("🚀 Starting Complete Keyword Research Pipeline")
+    print("=" * 60)
+    print(f"Seed Keyword: '{seed_keyword}'")
+    print("=" * 60)
+    # Step 1: Run keyword analysis
+    csv_filename = run_keyword_analysis(seed_keyword, use_volume_api)
+    if not csv_filename:
+        print("❌ Pipeline failed at Step 1")
+        return False
+    # Step 2: Run post-processing
+    csv_path, excel_path, meta_path = run_postprocessing(csv_filename, seed_keyword)
+    if not csv_path:
+        print("❌ Pipeline failed at Step 2")
+        return False
+    # Final summary
+    print("\n🎯 PIPELINE COMPLETE! 🎯")
+    print("=" * 60)
+    print(f"📁 Original CSV: {csv_filename}")
+    print(f"📁 Enhanced CSV: {csv_path}")
+    if excel_path:
+        print(f"📁 Excel file: {excel_path}")
+    if meta_path:
+        print(f"📁 Metadata: {meta_path}")
+    print("=" * 60)
+    return True
+def main():
+    """Main function with command line support"""
+    parser = argparse.ArgumentParser(description="Complete Keyword Research Pipeline")
+    parser.add_argument("seed_keyword", nargs="?", default="global internship",
+                       help="Seed keyword (default: 'global internship')")
+    parser.add_argument("--use-volume-api", action="store_true",
+                       help="Use real volume API (requires implementation)")
+    parser.add_argument("--check-only", action="store_true",
+                       help="Only check setup, don't run pipeline")
+    args = parser.parse_args()
+    # Check setup
+    if not check_setup():
+        return 1
+    if args.check_only:
+        print("✅ Setup check complete!")
+        return 0
+    # Run pipeline
+    success = run_complete_pipeline(args.seed_keyword, args.use_volume_api)
+    return 0 if success else 1
+if __name__ == "__main__":
+    try:
+        exit_code = main()
+        sys.exit(exit_code)
+    except KeyboardInterrupt:
+        print("\n⚠️ Pipeline interrupted by user")
+        sys.exit(1)
+    except Exception as e:
+        print(f"\n❌ Unexpected error: {e}")
+        sys.exit(1)

dashboard.py ADDED Viewed

	@@ -0,0 +1,830 @@

+# dashboard.py
+"""
+SEO Keyword Research Dashboard
+A Streamlit web interface for the keyword research pipeline.
+Provides interactive analysis, visualization, and download capabilities.
+Requirements:
+    pip install streamlit plotly pandas
+Usage:
+    streamlit run dashboard.py
+"""
+import streamlit as st
+import pandas as pd
+import plotly.express as px
+import plotly.graph_objects as go
+from plotly.subplots import make_subplots
+import os
+import sys
+from pathlib import Path
+from datetime import date, datetime
+import re
+import json
+import io
+from typing import Optional, Tuple, Dict, Any
+# Add project directories to path
+project_root = Path(__file__).parent
+src_path = project_root / "src"
+if src_path.exists():
+    sys.path.insert(0, str(src_path))
+sys.path.insert(0, str(project_root))
+# Import backend functions
+try:
+    from dotenv import load_dotenv
+    load_dotenv()
+except ImportError:
+    st.error("Missing required package: python-dotenv. Install with: pip install python-dotenv")
+    st.stop()
+# Page configuration
+st.set_page_config(
+    page_title="SEO Keyword Research Dashboard",
+    page_icon="🔍",
+    layout="wide",
+    initial_sidebar_state="expanded"
+)
+# Custom CSS for better styling
+st.markdown("""
+<style>
+    .main-header {
+        font-size: 3rem;
+        color: #1f77b4;
+        text-align: center;
+        margin-bottom: 2rem;
+        background: linear-gradient(90deg, #1f77b4, #ff7f0e);
+        -webkit-background-clip: text;
+        -webkit-text-fill-color: transparent;
+        background-clip: text;
+    }
+    .metric-card {
+        background-color: #f0f2f6;
+        padding: 1rem;
+        border-radius: 0.5rem;
+        border-left: 4px solid #1f77b4;
+        margin: 0.5rem 0;
+    }
+    .success-message {
+        background-color: #d4edda;
+        color: #155724;
+        padding: 1rem;
+        border-radius: 0.5rem;
+        border: 1px solid #c3e6cb;
+        margin: 1rem 0;
+    }
+    .error-message {
+        background-color: #f8d7da;
+        color: #721c24;
+        padding: 1rem;
+        border-radius: 0.5rem;
+        border: 1px solid #f5c6cb;
+        margin: 1rem 0;
+    }
+    .stDataFrame {
+        border-radius: 0.5rem;
+        overflow: hidden;
+    }
+</style>
+""", unsafe_allow_html=True)
+class KeywordDashboard:
+    """Main dashboard class for SEO keyword research interface."""
+    def __init__(self):
+        """Initialize the dashboard with necessary configurations."""
+        self.setup_directories()
+        self.check_environment()
+    def setup_directories(self):
+        """Create necessary output directories."""
+        self.output_dir = Path("output")
+        self.processed_dir = self.output_dir / "processed"
+        self.reports_dir = self.output_dir / "reports"
+        self.output_dir.mkdir(exist_ok=True)
+        self.processed_dir.mkdir(exist_ok=True)
+        self.reports_dir.mkdir(exist_ok=True)
+    def check_environment(self):
+        """Check if the environment is properly configured."""
+        self.api_key = os.getenv("SERPAPI_KEY")
+        self.environment_ready = bool(self.api_key)
+    def render_header(self):
+        """Render the main dashboard header."""
+        st.markdown('<h1 class="main-header">🔍 SEO Keyword Research Dashboard</h1>',
+                   unsafe_allow_html=True)
+        if not self.environment_ready:
+            st.markdown("""
+            <div class="error-message">
+                ⚠️ <strong>Environment Setup Required</strong><br>
+                Please ensure your .env file contains: SERPAPI_KEY=your_key_here
+            </div>
+            """, unsafe_allow_html=True)
+            return False
+        st.markdown("""
+        <div class="success-message">
+            ✅ <strong>Environment Ready</strong><br>
+            API key detected and ready for keyword research.
+        </div>
+        """, unsafe_allow_html=True)
+        return True
+    def render_sidebar(self) -> Dict[str, Any]:
+        """Render the sidebar with input controls."""
+        st.sidebar.markdown("## 🎯 Analysis Parameters")
+        # Input parameters
+        seed_keyword = st.sidebar.text_input(
+            "🔍 Seed Keyword",
+            value="global internship",
+            help="Enter the main keyword to research"
+        )
+        max_candidates = st.sidebar.slider(
+            "📊 Max Candidates",
+            min_value=20,
+            max_value=300,
+            value=120,
+            step=10,
+            help="Maximum number of keyword candidates to analyze"
+        )
+        top_results = st.sidebar.slider(
+            "🏆 Top Results",
+            min_value=10,
+            max_value=100,
+            value=50,
+            step=5,
+            help="Number of top results to display and save"
+        )
+        # Advanced options
+        st.sidebar.markdown("## ⚙️ Advanced Options")
+        use_volume_api = st.sidebar.checkbox(
+            "📈 Use Real Volume API",
+            value=False,
+            help="Enable when volume API is implemented",
+            disabled=True  # Disabled until implemented
+        )
+        # Filtering options
+        st.sidebar.markdown("## 🔧 Filters")
+        min_search_volume = st.sidebar.number_input(
+            "📈 Min Search Volume",
+            min_value=0,
+            max_value=10000,
+            value=10,
+            step=10,
+            help="Minimum monthly search volume"
+        )
+        max_competition = st.sidebar.slider(
+            "⚔️ Max Competition Score",
+            min_value=0.0,
+            max_value=1.0,
+            value=1.0,
+            step=0.1,
+            help="Maximum competition score (0=easy, 1=hard)"
+        )
+        # Run button
+        run_analysis = st.sidebar.button(
+            "🚀 Run Analysis",
+            type="primary",
+            help="Start the keyword research analysis"
+        )
+        return {
+            "seed_keyword": seed_keyword,
+            "max_candidates": max_candidates,
+            "top_results": top_results,
+            "use_volume_api": use_volume_api,
+            "min_search_volume": min_search_volume,
+            "max_competition": max_competition,
+            "run_analysis": run_analysis
+        }
+    def run_keyword_analysis(self, params: Dict[str, Any]) -> Optional[pd.DataFrame]:
+        """Run the keyword analysis using the backend pipeline."""
+        try:
+            # Import the analysis function from app.py
+            sys.path.insert(0, str(project_root))
+            # Since we need to reuse the logic from app.py, let's import what we need
+            import math
+            import csv
+            import re
+            from serpapi import GoogleSearch
+            from dataclasses import dataclass
+            @dataclass
+            class KeywordMetrics:
+                keyword: str
+                monthly_searches: int
+                competition_score: float
+                opportunity_score: float
+                total_results: int
+                ads_count: int
+                has_featured_snippet: bool
+                has_people_also_ask: bool
+                has_knowledge_graph: bool
+            # Competition calculator (from your app.py)
+            class CompetitionCalculator:
+                WEIGHTS = {
+                    'total_results': 0.50,
+                    'ads': 0.25,
+                    'featured_snippet': 0.15,
+                    'people_also_ask': 0.07,
+                    'knowledge_graph': 0.03
+                }
+                @staticmethod
+                def extract_total_results(search_info):
+                    if not search_info:
+                        return 0
+                    total = (search_info.get("total_results") or
+                            search_info.get("total_results_raw") or
+                            search_info.get("total"))
+                    if isinstance(total, int):
+                        return total
+                    if isinstance(total, str):
+                        numbers_only = re.sub(r"[^\d]", "", total)
+                        try:
+                            return int(numbers_only) if numbers_only else 0
+                        except ValueError:
+                            return 0
+                    return 0
+                def calculate_score(self, search_results):
+                    search_info = search_results.get("search_information", {})
+                    total_results = self.extract_total_results(search_info)
+                    normalized_results = min(math.log10(total_results + 1) / 7, 1.0)
+                    ads = search_results.get("ads_results", [])
+                    ads_count = len(ads) if ads else 0
+                    ads_score = min(ads_count / 3, 1.0)
+                    has_featured_snippet = bool(
+                        search_results.get("featured_snippet") or
+                        search_results.get("answer_box")
+                    )
+                    has_people_also_ask = bool(
+                        search_results.get("related_questions") or
+                        search_results.get("people_also_ask")
+                    )
+                    has_knowledge_graph = bool(search_results.get("knowledge_graph"))
+                    competition_score = (
+                        self.WEIGHTS['total_results'] * normalized_results +
+                        self.WEIGHTS['ads'] * ads_score +
+                        self.WEIGHTS['featured_snippet'] * has_featured_snippet +
+                        self.WEIGHTS['people_also_ask'] * has_people_also_ask +
+                        self.WEIGHTS['knowledge_graph'] * has_knowledge_graph
+                    )
+                    competition_score = max(0.0, min(1.0, competition_score))
+                    breakdown = {
+                        "total_results": total_results,
+                        "ads_count": ads_count,
+                        "has_featured_snippet": has_featured_snippet,
+                        "has_people_also_ask": has_people_also_ask,
+                        "has_knowledge_graph": has_knowledge_graph
+                    }
+                    return competition_score, breakdown
+            def find_related_keywords(seed_keyword, max_results=120):
+                progress_placeholder = st.empty()
+                progress_placeholder.info(f"🔍 Finding related keywords for: '{seed_keyword}'...")
+                search_params = {
+                    "engine": "google",
+                    "q": seed_keyword,
+                    "api_key": self.api_key,
+                    "hl": "en",
+                    "gl": "us"
+                }
+                try:
+                    search = GoogleSearch(search_params)
+                    results = search.get_dict()
+                except Exception as e:
+                    progress_placeholder.error(f"❌ Error getting related keywords: {e}")
+                    return []
+                keyword_candidates = set()
+                # Extract keywords from different sources
+                related_searches = results.get("related_searches", [])
+                for item in related_searches:
+                    query = item.get("query") or item.get("suggestion")
+                    if query and len(query.strip()) > 0:
+                        keyword_candidates.add(query.strip())
+                related_questions = results.get("related_questions", [])
+                for item in related_questions:
+                    question = item.get("question") or item.get("query")
+                    if question and len(question.strip()) > 0:
+                        keyword_candidates.add(question.strip())
+                organic_results = results.get("organic_results", [])
+                for result in organic_results[:10]:
+                    title = result.get("title", "")
+                    if title and len(title.strip()) > 0:
+                        keyword_candidates.add(title.strip())
+                final_keywords = list(keyword_candidates)[:max_results]
+                progress_placeholder.success(f"✅ Found {len(final_keywords)} keyword candidates")
+                return final_keywords
+            def analyze_keywords_batch(keywords):
+                calculator = CompetitionCalculator()
+                analyzed_keywords = []
+                progress_bar = st.progress(0)
+                status_text = st.empty()
+                for i, keyword in enumerate(keywords):
+                    progress = (i + 1) / len(keywords)
+                    progress_bar.progress(progress)
+                    status_text.text(f"Analyzing keyword {i+1}/{len(keywords)}: {keyword}")
+                    # Search for keyword
+                    search_params = {
+                        "engine": "google",
+                        "q": keyword,
+                        "api_key": self.api_key,
+                        "hl": "en",
+                        "gl": "us",
+                        "num": 10
+                    }
+                    try:
+                        search = GoogleSearch(search_params)
+                        search_results = search.get_dict()
+                    except Exception as e:
+                        continue
+                    # Calculate competition
+                    competition_score, breakdown = calculator.calculate_score(search_results)
+                    # Estimate volume
+                    word_count = len(keyword.split())
+                    search_volume = max(10, 10000 // (word_count + 1))
+                    # Calculate opportunity score
+                    volume_score = math.log10(search_volume + 1)
+                    opportunity_score = volume_score / (competition_score + 0.01)
+                    metrics = KeywordMetrics(
+                        keyword=keyword,
+                        monthly_searches=search_volume,
+                        competition_score=round(competition_score, 4),
+                        opportunity_score=round(opportunity_score, 2),
+                        total_results=breakdown["total_results"],
+                        ads_count=breakdown["ads_count"],
+                        has_featured_snippet=breakdown["has_featured_snippet"],
+                        has_people_also_ask=breakdown["has_people_also_ask"],
+                        has_knowledge_graph=breakdown["has_knowledge_graph"]
+                    )
+                    analyzed_keywords.append(metrics)
+                progress_bar.empty()
+                status_text.empty()
+                # Sort by opportunity score
+                analyzed_keywords.sort(key=lambda x: x.opportunity_score, reverse=True)
+                return analyzed_keywords
+            # Run the analysis
+            with st.spinner("🔍 Discovering related keywords..."):
+                related_keywords = find_related_keywords(
+                    params["seed_keyword"],
+                    params["max_candidates"]
+                )
+            if not related_keywords:
+                st.error("❌ No keyword candidates found. Please check your API key and try again.")
+                return None
+            with st.spinner("📊 Analyzing keywords and calculating scores..."):
+                analyzed_keywords = analyze_keywords_batch(related_keywords)
+            if not analyzed_keywords:
+                st.error("❌ No keywords were successfully analyzed.")
+                return None
+            # Convert to DataFrame
+            data = []
+            for metrics in analyzed_keywords:
+                data.append({
+                    'Keyword': metrics.keyword,
+                    'Monthly Searches': metrics.monthly_searches,
+                    'Competition': metrics.competition_score,
+                    'Opportunity Score': metrics.opportunity_score,
+                    'Total Results': metrics.total_results,
+                    'Ads Count': metrics.ads_count,
+                    'Featured Snippet': 'Yes' if metrics.has_featured_snippet else 'No',
+                    'People Also Ask': 'Yes' if metrics.has_people_also_ask else 'No',
+                    'Knowledge Graph': 'Yes' if metrics.has_knowledge_graph else 'No'
+                })
+            df = pd.DataFrame(data)
+            # Apply filters
+            df = df[
+                (df['Monthly Searches'] >= params['min_search_volume']) &
+                (df['Competition'] <= params['max_competition'])
+            ]
+            return df
+        except Exception as e:
+            st.error(f"❌ Analysis failed: {str(e)}")
+            return None
+    def add_enhancement_columns(self, df: pd.DataFrame) -> pd.DataFrame:
+        """Add intent and tail classification columns."""
+        def classify_intent(keyword):
+            if not keyword:
+                return "informational"
+            k = keyword.lower()
+            if any(signal in k for signal in ["how to", "what is", "why", "guide", "tutorial"]):
+                return "informational"
+            if any(signal in k for signal in ["buy", "price", "cost", "apply", "register"]):
+                return "transactional"
+            if any(signal in k for signal in ["best", "top", "compare", "vs", "reviews"]):
+                return "commercial"
+            return "informational"
+        def classify_tail(keyword):
+            if not keyword:
+                return "short-tail"
+            word_count = len(str(keyword).split())
+            if word_count >= 4:
+                return "long-tail"
+            elif word_count == 3:
+                return "mid-tail"
+            else:
+                return "short-tail"
+        df['Intent'] = df['Keyword'].apply(classify_intent)
+        df['Tail'] = df['Keyword'].apply(classify_tail)
+        return df
+    def render_summary_metrics(self, df: pd.DataFrame):
+        """Render summary metrics cards."""
+        col1, col2, col3, col4 = st.columns(4)
+        with col1:
+            st.markdown("""
+            <div class="metric-card">
+                <h3>📊 Total Keywords</h3>
+                <h2 style="color: #1f77b4;">{}</h2>
+            </div>
+            """.format(len(df)), unsafe_allow_html=True)
+        with col2:
+            avg_score = df['Opportunity Score'].mean()
+            st.markdown("""
+            <div class="metric-card">
+                <h3>⭐ Avg Opportunity Score</h3>
+                <h2 style="color: #ff7f0e;">{:.2f}</h2>
+            </div>
+            """.format(avg_score), unsafe_allow_html=True)
+        with col3:
+            high_opportunity = len(df[df['Opportunity Score'] > 50])
+            st.markdown("""
+            <div class="metric-card">
+                <h3>🚀 High Opportunity</h3>
+                <h2 style="color: #2ca02c;">{}</h2>
+            </div>
+            """.format(high_opportunity), unsafe_allow_html=True)
+        with col4:
+            long_tail = len(df[df['Tail'] == 'long-tail'])
+            st.markdown("""
+            <div class="metric-card">
+                <h3>🎯 Long-tail Keywords</h3>
+                <h2 style="color: #d62728;">{}</h2>
+            </div>
+            """.format(long_tail), unsafe_allow_html=True)
+    def render_top_keywords_table(self, df: pd.DataFrame, top_n: int = 10):
+        """Render the top keywords table with styling."""
+        st.markdown("## 🏆 Top Keyword Opportunities")
+        if df.empty:
+            st.warning("No keywords to display.")
+            return
+        # Prepare display DataFrame
+        display_df = df.head(top_n).copy()
+        # Format columns for better display
+        display_df['Monthly Searches'] = display_df['Monthly Searches'].apply(lambda x: f"{x:,}")
+        display_df['Total Results'] = display_df['Total Results'].apply(lambda x: f"{x:,}")
+        # Style the dataframe
+        def highlight_max_score(s):
+            is_max = s == s.max()
+            return ['background-color: lightgreen' if v else '' for v in is_max]
+        styled_df = display_df.style.apply(
+            highlight_max_score,
+            subset=['Opportunity Score']
+        ).format({
+            'Competition': '{:.3f}',
+            'Opportunity Score': '{:.2f}'
+        })
+        st.dataframe(styled_df, use_container_width=True)
+    def render_visualizations(self, df: pd.DataFrame):
+        """Render interactive charts and visualizations."""
+        if df.empty:
+            st.warning("No data available for visualization.")
+            return
+        # Chart selection tabs
+        chart_tab1, chart_tab2, chart_tab3 = st.tabs(["📊 Opportunity Scores", "🎯 Intent Analysis", "💹 Volume vs Competition"])
+        with chart_tab1:
+            st.markdown("### Top 10 Keywords by Opportunity Score")
+            top_10 = df.head(10)
+            fig = px.bar(
+                top_10,
+                x='Opportunity Score',
+                y='Keyword',
+                orientation='h',
+                title="Top 10 Keyword Opportunities",
+                color='Opportunity Score',
+                color_continuous_scale='viridis'
+            )
+            fig.update_layout(height=500, yaxis={'categoryorder': 'total ascending'})
+            st.plotly_chart(fig, use_container_width=True)
+        with chart_tab2:
+            st.markdown("### Intent Distribution")
+            col1, col2 = st.columns(2)
+            with col1:
+                intent_counts = df['Intent'].value_counts()
+                fig_pie = px.pie(
+                    values=intent_counts.values,
+                    names=intent_counts.index,
+                    title="Search Intent Distribution",
+                    color_discrete_sequence=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728']
+                )
+                st.plotly_chart(fig_pie, use_container_width=True)
+            with col2:
+                tail_counts = df['Tail'].value_counts()
+                fig_tail = px.pie(
+                    values=tail_counts.values,
+                    names=tail_counts.index,
+                    title="Keyword Tail Distribution",
+                    color_discrete_sequence=['#9467bd', '#8c564b', '#e377c2']
+                )
+                st.plotly_chart(fig_tail, use_container_width=True)
+        with chart_tab3:
+            st.markdown("### Search Volume vs Competition Analysis")
+            fig_scatter = px.scatter(
+                df.head(50),  # Limit to top 50 for readability
+                x='Competition',
+                y='Monthly Searches',
+                size='Opportunity Score',
+                color='Intent',
+                hover_name='Keyword',
+                title="Search Volume vs Competition (Size = Opportunity Score)",
+                labels={'Competition': 'Competition Score', 'Monthly Searches': 'Est. Monthly Searches'}
+            )
+            fig_scatter.update_layout(height=500)
+            st.plotly_chart(fig_scatter, use_container_width=True)
+    def save_results(self, df: pd.DataFrame, params: Dict[str, Any]) -> Tuple[str, str, str]:
+        """Save results to files and return file paths."""
+        if df.empty:
+            return None, None, None
+        # Generate file names
+        today = date.today().isoformat()
+        safe_seed = re.sub(r"[^\w\s-]", "", params['seed_keyword']).strip().replace(" ", "_")[:30]
+        base_name = f"keywords_{safe_seed}_{today}"
+        # File paths
+        csv_path = self.processed_dir / f"{base_name}.csv"
+        excel_path = self.processed_dir / f"{base_name}.xlsx"
+        report_path = self.reports_dir / f"{base_name}_report.json"
+        try:
+            # Save CSV
+            df.to_csv(csv_path, index=False)
+            # Save Excel with multiple sheets
+            with pd.ExcelWriter(excel_path, engine='openpyxl') as writer:
+                df.head(params['top_results']).to_excel(writer, sheet_name='Top_Results', index=False)
+                df.to_excel(writer, sheet_name='All_Keywords', index=False)
+                # Summary sheet
+                summary_data = {
+                    'Metric': [
+                        'Total Keywords',
+                        'Average Opportunity Score',
+                        'High Opportunity Keywords (>50)',
+                        'Long-tail Keywords',
+                        'Informational Intent',
+                        'Commercial Intent',
+                        'Transactional Intent'
+                    ],
+                    'Value': [
+                        len(df),
+                        round(df['Opportunity Score'].mean(), 2),
+                        len(df[df['Opportunity Score'] > 50]),
+                        len(df[df['Tail'] == 'long-tail']),
+                        len(df[df['Intent'] == 'informational']),
+                        len(df[df['Intent'] == 'commercial']),
+                        len(df[df['Intent'] == 'transactional'])
+                    ]
+                }
+                pd.DataFrame(summary_data).to_excel(writer, sheet_name='Summary', index=False)
+            # Save JSON report
+            report_data = {
+                'analysis_date': datetime.now().isoformat(),
+                'seed_keyword': params['seed_keyword'],
+                'parameters': {
+                    'max_candidates': params['max_candidates'],
+                    'top_results': params['top_results'],
+                    'min_search_volume': params['min_search_volume'],
+                    'max_competition': params['max_competition']
+                },
+                'summary': {
+                    'total_keywords': len(df),
+                    'average_opportunity_score': float(df['Opportunity Score'].mean()),
+                    'top_keyword': df.iloc[0]['Keyword'] if not df.empty else None,
+                    'intent_distribution': df['Intent'].value_counts().to_dict(),
+                    'tail_distribution': df['Tail'].value_counts().to_dict()
+                }
+            }
+            with open(report_path, 'w', encoding='utf-8') as f:
+                json.dump(report_data, f, indent=2, ensure_ascii=False)
+            return str(csv_path), str(excel_path), str(report_path)
+        except Exception as e:
+            st.error(f"❌ Error saving files: {e}")
+            return None, None, None
+    def render_download_section(self, csv_path: str, excel_path: str, report_path: str):
+        """Render download buttons for generated files."""
+        st.markdown("## 📥 Download Results")
+        col1, col2, col3 = st.columns(3)
+        if csv_path and os.path.exists(csv_path):
+            with col1:
+                with open(csv_path, 'rb') as file:
+                    st.download_button(
+                        label="📊 Download CSV",
+                        data=file.read(),
+                        file_name=os.path.basename(csv_path),
+                        mime="text/csv"
+                    )
+        if excel_path and os.path.exists(excel_path):
+            with col2:
+                with open(excel_path, 'rb') as file:
+                    st.download_button(
+                        label="📈 Download Excel",
+                        data=file.read(),
+                        file_name=os.path.basename(excel_path),
+                        mime="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
+                    )
+        if report_path and os.path.exists(report_path):
+            with col3:
+                with open(report_path, 'rb') as file:
+                    st.download_button(
+                        label="📋 Download Report",
+                        data=file.read(),
+                        file_name=os.path.basename(report_path),
+                        mime="application/json"
+                    )
+    def run(self):
+        """Main dashboard execution method."""
+        # Render header
+        if not self.render_header():
+            st.stop()
+        # Render sidebar
+        params = self.render_sidebar()
+        # Main content area
+        if params["run_analysis"]:
+            # Store analysis state
+            if 'analysis_complete' not in st.session_state:
+                st.session_state.analysis_complete = False
+            # Run analysis
+            df = self.run_keyword_analysis(params)
+            if df is not None and not df.empty:
+                # Add enhancement columns
+                df = self.add_enhancement_columns(df)
+                # Store results in session state
+                st.session_state.results_df = df
+                st.session_state.analysis_params = params
+                st.session_state.analysis_complete = True
+                # Success message
+                st.success(f"✅ Analysis complete! Found {len(df)} keywords matching your criteria.")
+        # Display results if analysis is complete
+        if st.session_state.get('analysis_complete', False) and 'results_df' in st.session_state:
+            df = st.session_state.results_df
+            params = st.session_state.analysis_params
+            # Render summary metrics
+            self.render_summary_metrics(df)
+            # Create view toggle
+            view_option = st.radio("📋 Choose View", ["Table View", "Chart View"], horizontal=True)
+            if view_option == "Table View":
+                self.render_top_keywords_table(df, params['top_results'])
+            else:
+                self.render_visualizations(df)
+            # Save results and provide downloads
+            with st.spinner("💾 Preparing download files..."):
+                csv_path, excel_path, report_path = self.save_results(df, params)
+            if csv_path:
+                self.render_download_section(csv_path, excel_path, report_path)
+        elif not st.session_state.get('analysis_complete', False):
+            # Show welcome message
+            st.markdown("""
+            ## 👋 Welcome to the SEO Keyword Research Dashboard
+            This dashboard helps you discover and analyze keyword opportunities using advanced SEO metrics.
+            ### 🚀 Getting Started:
+            1. **Enter your seed keyword** in the sidebar (e.g., "digital marketing")
+            2. **Adjust analysis parameters** (candidates, results, filters)
+            3. **Click "Run Analysis"** to start the keyword research
+            4. **Explore results** through tables and interactive charts
+            5. **Download reports** in CSV, Excel, or JSON format
+            ### 📊 Features:
+            - **Real-time keyword discovery** using SerpAPI
+            - **Competition analysis** based on SERP features
+            - **Intent classification** (informational, commercial, transactional)
+            - **Interactive visualizations** with Plotly charts
+            - **Advanced filtering** by volume and competition
+            - **Multi-format exports** (CSV, Excel, JSON reports)
+            """)
+def main():
+    """Main function to run the Streamlit dashboard."""
+    dashboard = KeywordDashboard()
+    dashboard.run()
+if __name__ == "__main__":
+    main()

git ADDED Viewed

File without changes

keyword_agent.py ADDED Viewed

	@@ -0,0 +1,19 @@

+import os
+from dotenv import load_dotenv
+# Load environment variables from .env file
+load_dotenv()
+def main():
+    # Get the API key from environment variables
+    api_key = os.getenv("SERPAPI_KEY")
+    if api_key:
+        print("✅ Project setup complete!")
+        print(f"API key loaded: {api_key[:5]}...")
+    else:
+        print("❌ Warning: API_KEY not found in environment variables")
+        print("Make sure you have a .env file with your API_KEY")
+if __name__ == "__main__":
+    main()

postprocess.py ADDED Viewed

	@@ -0,0 +1,366 @@

+# src/postprocess.py
+"""
+Post-processing tool for keyword research results
+Cleans, annotates, and formats CSV output for professional presentation
+"""
+import pandas as pd
+from datetime import date, datetime
+import os
+import re
+import json
+# Install these if you haven't: pip install pandas openpyxl tabulate
+try:
+    from tabulate import tabulate
+    TABULATE_AVAILABLE = True
+except ImportError:
+    TABULATE_AVAILABLE = False
+    print("Note: Install 'tabulate' for prettier table output: pip install tabulate")
+try:
+    import openpyxl
+    EXCEL_AVAILABLE = True
+except ImportError:
+    EXCEL_AVAILABLE = False
+    print("Note: Install 'openpyxl' for Excel export: pip install openpyxl")
+# Configuration
+BRAND_KEYWORDS = {
+    "linkedin", "indeed", "glassdoor", "ucla", "asu", "berkeley",
+    "hennge", "ciee", "google", "facebook", "microsoft", "amazon",
+    "apple", "netflix", "spotify", "youtube", "instagram", "twitter"
+}
+OUTPUT_DIR = "results"  # Directory to save processed files
+def normalize_keyword(keyword):
+    """Clean and normalize keyword text"""
+    if not keyword or pd.isna(keyword):
+        return ""
+    return str(keyword).strip()
+def is_brand_query(keyword, brand_set=BRAND_KEYWORDS):
+    """
+    Check if keyword is a brand/navigational query
+    These are harder to rank for if you're not that brand
+    """
+    if not keyword:
+        return False
+    keyword_lower = keyword.lower()
+    # Check if any brand name appears in keyword
+    for brand in brand_set:
+        if brand in keyword_lower:
+            return True
+    # Check for domains (.com, .edu, etc.)
+    if re.search(r"\.(com|edu|org|net|gov|io)\b", keyword_lower):
+        return True
+    return False
+def classify_search_intent(keyword):
+    """
+    Classify keyword by search intent:
+    - informational: seeking information
+    - commercial: researching before buying
+    - transactional: ready to take action
+    - navigational: looking for specific site/brand
+    """
+    if not keyword:
+        return "informational"
+    keyword_lower = keyword.lower()
+    # Informational intent signals
+    if any(signal in keyword_lower for signal in [
+        "how to", "what is", "why", "are", "do ", "does ", "can ",
+        "guide", "tutorial", "learn", "definition", "meaning"
+    ]):
+        return "informational"
+    # Transactional intent signals
+    if any(signal in keyword_lower for signal in [
+        "buy", "price", "cost", "apply", "register", "admission",
+        "apply now", "enroll", "join", "signup", "book", "order"
+    ]):
+        return "transactional"
+    # Commercial intent signals
+    if any(signal in keyword_lower for signal in [
+        "best", "top", "compare", "vs", "reviews", "review",
+        "cheap", "affordable", "discount", "deal"
+    ]):
+        return "commercial"
+    # Navigational intent (brand queries)
+    if is_brand_query(keyword):
+        return "navigational"
+    # Default to informational
+    return "informational"
+def classify_keyword_tail(keyword):
+    """
+    Classify keyword by tail length:
+    - short-tail: 1-2 words (high competition, high volume)
+    - mid-tail: 3 words (moderate competition/volume)
+    - long-tail: 4+ words (low competition, low volume)
+    """
+    if not keyword:
+        return "short-tail"
+    word_count = len(str(keyword).split())
+    if word_count >= 4:
+        return "long-tail"
+    elif word_count == 3:
+        return "mid-tail"
+    else:
+        return "short-tail"
+def format_large_number(number):
+    """Format large numbers with commas for readability"""
+    try:
+        return f"{int(number):,}"
+    except (ValueError, TypeError):
+        return str(number)
+def clean_and_process_dataframe(df, seed_keyword):
+    """Main processing function to clean and enhance the dataframe"""
+    # Make a copy to avoid modifying original
+    df = df.copy()
+    print("🧹 Cleaning and processing data...")
+    # 1. Normalize keywords and remove duplicates
+    df["Keyword"] = df["Keyword"].astype(str).apply(normalize_keyword)
+    # Remove empty keywords
+    df = df[df["Keyword"].str.len() > 0]
+    # Sort by Opportunity Score and remove duplicates (keep highest score)
+    df = df.sort_values(by="Opportunity Score", ascending=False)
+    df = df.drop_duplicates(subset=["Keyword"], keep="first")
+    # 2. Fix data types and handle missing values
+    # Monthly Searches: convert to int, fill missing with 0
+    df["Monthly Searches"] = pd.to_numeric(df["Monthly Searches"], errors="coerce").fillna(0).astype(int)
+    # Competition: round to 4 decimal places
+    df["Competition"] = pd.to_numeric(df["Competition"], errors="coerce").fillna(0.0).round(4)
+    # Opportunity Score: round to 2 decimal places for readability
+    df["Opportunity Score"] = pd.to_numeric(df["Opportunity Score"], errors="coerce").fillna(0.0).round(2)
+    # Google Results: clean and convert to int
+    if "Google Results" in df.columns:
+        # Remove any non-digit characters and convert to int
+        df["Google Results"] = df["Google Results"].astype(str).str.replace(r"[^\d]", "", regex=True)
+        df["Google Results"] = pd.to_numeric(df["Google Results"], errors="coerce").fillna(0).astype(int)
+    # Ads Shown: convert to int
+    if "Ads Shown" in df.columns:
+        df["Ads Shown"] = pd.to_numeric(df["Ads Shown"], errors="coerce").fillna(0).astype(int)
+    # 3. Add enhancement columns
+    print("📊 Adding analysis columns...")
+    df["Intent"] = df["Keyword"].apply(classify_search_intent)
+    df["Tail"] = df["Keyword"].apply(classify_keyword_tail)
+    df["Is Brand/Navigational"] = df["Keyword"].apply(lambda x: "Yes" if is_brand_query(x) else "No")
+    # 4. Reorder columns for better presentation
+    column_order = [
+        "Keyword",
+        "Intent",
+        "Tail",
+        "Is Brand/Navigational",
+        "Monthly Searches",
+        "Competition",
+        "Opportunity Score",
+        "Google Results",
+        "Ads Shown",
+        "Featured Snippet?",
+        "PAA Available?",
+        "Knowledge Graph?"
+    ]
+    # Only include columns that exist in the dataframe
+    available_columns = [col for col in column_order if col in df.columns]
+    df = df[available_columns]
+    # 5. Final sort by Opportunity Score
+    df = df.sort_values(by="Opportunity Score", ascending=False).reset_index(drop=True)
+    print(f"✅ Processing complete! {len(df)} keywords ready")
+    return df
+def save_processed_results(df, seed_keyword, output_dir=OUTPUT_DIR):
+    """Save processed results in multiple formats with metadata"""
+    # Create output directory
+    os.makedirs(output_dir, exist_ok=True)
+    # Generate safe filename from seed keyword
+    today = date.today().isoformat()
+    safe_seed = re.sub(r"[^\w\s-]", "", seed_keyword).strip().replace(" ", "_")[:50]
+    base_filename = f"keywords_{safe_seed}_{today}"
+    # File paths
+    csv_path = os.path.join(output_dir, f"{base_filename}.csv")
+    excel_path = os.path.join(output_dir, f"{base_filename}.xlsx")
+    meta_path = os.path.join(output_dir, f"{base_filename}.meta.json")
+    # Save CSV
+    df.to_csv(csv_path, index=False)
+    print(f"💾 Saved CSV: {csv_path}")
+    # Save Excel with multiple sheets (if openpyxl is available)
+    if EXCEL_AVAILABLE:
+        try:
+            with pd.ExcelWriter(excel_path, engine="openpyxl") as writer:
+                # Top 50 sheet
+                df.head(50).to_excel(writer, sheet_name="Top_50", index=False)
+                # All results sheet
+                df.to_excel(writer, sheet_name="All_Keywords", index=False)
+                # Summary sheet
+                summary_data = {
+                    "Metric": [
+                        "Total Keywords",
+                        "Informational Keywords",
+                        "Commercial Keywords",
+                        "Transactional Keywords",
+                        "Navigational Keywords",
+                        "Long-tail Keywords",
+                        "Brand/Navigational Keywords"
+                    ],
+                    "Count": [
+                        len(df),
+                        len(df[df["Intent"] == "informational"]),
+                        len(df[df["Intent"] == "commercial"]),
+                        len(df[df["Intent"] == "transactional"]),
+                        len(df[df["Intent"] == "navigational"]),
+                        len(df[df["Tail"] == "long-tail"]),
+                        len(df[df["Is Brand/Navigational"] == "Yes"])
+                    ]
+                }
+                pd.DataFrame(summary_data).to_excel(writer, sheet_name="Summary", index=False)
+            print(f"📊 Saved Excel: {excel_path}")
+        except Exception as e:
+            print(f"⚠️ Could not save Excel file: {e}")
+    else:
+        print("📊 Excel export skipped (install openpyxl to enable)")
+    # Save metadata
+    metadata = {
+        "seed_keyword": seed_keyword,
+        "generated_at": datetime.utcnow().isoformat() + "Z",
+        "total_keywords": len(df),
+        "data_source": "SerpApi with heuristic search volumes",
+        "methodology": "Opportunity Score = log10(volume+1) / (competition + 0.01)",
+        "notes": [
+            "Brand/navigational queries are flagged for filtering",
+            "Search volumes are estimated - replace with real API data for production",
+            "Competition scores based on SERP feature analysis"
+        ],
+        "intent_breakdown": {
+            "informational": int(len(df[df["Intent"] == "informational"])),
+            "commercial": int(len(df[df["Intent"] == "commercial"])),
+            "transactional": int(len(df[df["Intent"] == "transactional"])),
+            "navigational": int(len(df[df["Intent"] == "navigational"]))
+        },
+        "tail_breakdown": {
+            "short-tail": int(len(df[df["Tail"] == "short-tail"])),
+            "mid-tail": int(len(df[df["Tail"] == "mid-tail"])),
+            "long-tail": int(len(df[df["Tail"] == "long-tail"]))
+        }
+    }
+    with open(meta_path, "w", encoding="utf-8") as f:
+        json.dump(metadata, f, indent=2, ensure_ascii=False)
+    print(f"📋 Saved metadata: {meta_path}")
+    return csv_path, excel_path, meta_path
+def display_results_preview(df, top_n=10):
+    """Display a nice preview of the top results"""
+    if df.empty:
+        print("❌ No results to display!")
+        return
+    print(f"\n🏆 Top {min(top_n, len(df))} Keywords:")
+    # Prepare data for display
+    preview_df = df.head(top_n).copy()
+    # Format large numbers for readability
+    if "Monthly Searches" in preview_df.columns:
+        preview_df["Monthly Searches"] = preview_df["Monthly Searches"].apply(format_large_number)
+    if "Google Results" in preview_df.columns:
+        preview_df["Google Results"] = preview_df["Google Results"].apply(format_large_number)
+    # Display using tabulate if available
+    if TABULATE_AVAILABLE:
+        print(tabulate(preview_df, headers="keys", tablefmt="github", showindex=False))
+    else:
+        # Fallback display
+        for i, row in preview_df.iterrows():
+            print(f"{i+1}. {row['Keyword']} | Score: {row['Opportunity Score']} | "
+                  f"Volume: {row['Monthly Searches']} | Competition: {row['Competition']} | "
+                  f"Intent: {row['Intent']} | Tail: {row['Tail']}")
+def postprocess_keywords(csv_file_path, seed_keyword):
+    """
+    Main postprocessing function
+    Call this after your ranking.py generates the initial CSV
+    """
+    print(f"🚀 Starting postprocessing for: '{seed_keyword}'")
+    print(f"📁 Input file: {csv_file_path}")
+    try:
+        # Load the CSV from ranking.py
+        df = pd.read_csv(csv_file_path)
+        print(f"📊 Loaded {len(df)} keywords from CSV")
+        # Clean and process the data
+        processed_df = clean_and_process_dataframe(df, seed_keyword)
+        # Save in multiple formats
+        csv_path, excel_path, meta_path = save_processed_results(processed_df, seed_keyword)
+        # Display preview
+        display_results_preview(processed_df, top_n=10)
+        # Summary stats
+        print(f"\n📈 Summary Statistics:")
+        print(f"• Total keywords analyzed: {len(processed_df)}")
+        print(f"• Long-tail opportunities: {len(processed_df[processed_df['Tail'] == 'long-tail'])}")
+        print(f"• Non-brand keywords: {len(processed_df[processed_df['Is Brand/Navigational'] == 'No'])}")
+        print(f"• High opportunity (score > 50): {len(processed_df[processed_df['Opportunity Score'] > 50])}")
+        return csv_path, excel_path, meta_path, processed_df
+    except Exception as e:
+        print(f"❌ Error during postprocessing: {e}")
+        raise
+# Example usage
+if __name__ == "__main__":
+    # Example: process a CSV file generated by ranking.py
+    input_csv = "best_keywords_2025-09-23.csv"  # Replace with your actual file
+    seed_keyword = "global internship"
+    if os.path.exists(input_csv):
+        postprocess_keywords(input_csv, seed_keyword)
+    else:
+        print(f"❌ Input file not found: {input_csv}")
+        print("Run your ranking.py script first to generate the initial CSV")

ranking.py ADDED Viewed

	@@ -0,0 +1,569 @@

+"""
+Professional Keyword Research Tool
+A comprehensive tool for analyzing keyword opportunities using SerpApi.
+Calculates competition scores and opportunity rankings based on SERP analysis.
+Requirements:
+    pip install serpapi tabulate python-dotenv
+Setup:
+    1. Create a .env file with your SerpApi key: SERPAPI_KEY=your_key_here
+    2. Run the script with your desired seed keyword
+"""
+import os
+import math
+import csv
+import re
+import logging
+from datetime import date
+from typing import List, Dict, Optional, Tuple, Any
+from dataclasses import dataclass
+from dotenv import load_dotenv
+from serpapi import GoogleSearch
+# Optional dependency for better table formatting
+try:
+    from tabulate import tabulate
+    HAS_TABULATE = True
+except ImportError:
+    HAS_TABULATE = False
+    print("💡 Tip: Install 'tabulate' for prettier output: pip install tabulate")
+# Configure logging
+logging.basicConfig(level=logging.INFO, format='%(levelname)s: %(message)s')
+logger = logging.getLogger(__name__)
+@dataclass
+class KeywordMetrics:
+    """Container for keyword analysis results."""
+    keyword: str
+    monthly_searches: int
+    competition_score: float
+    opportunity_score: float
+    total_results: int
+    ads_count: int
+    has_featured_snippet: bool
+    has_people_also_ask: bool
+    has_knowledge_graph: bool
+class Config:
+    """Configuration settings for the keyword research tool."""
+    def __init__(self):
+        load_dotenv()
+        self.serpapi_key = os.getenv("SERPAPI_KEY")
+        self.default_location = "United States"
+        self.results_per_query = 10
+        self.max_related_keywords = 150
+        self.top_keywords_to_save = 50
+        self.progress_update_interval = 10
+        if not self.serpapi_key:
+            raise ValueError("SERPAPI_KEY not found in environment variables")
+class CompetitionCalculator:
+    """Calculates keyword competition scores based on SERP features."""
+    # Scoring weights for different competition factors
+    WEIGHTS = {
+        'total_results': 0.50,
+        'ads': 0.25,
+        'featured_snippet': 0.15,
+        'people_also_ask': 0.07,
+        'knowledge_graph': 0.03
+    }
+    @staticmethod
+    def extract_total_results(search_info: Dict[str, Any]) -> int:
+        """
+        Extract total results count from SerpApi response.
+        Args:
+            search_info: Search information dictionary from SerpApi
+        Returns:
+            Total number of results as integer, 0 if not found
+        """
+        if not search_info:
+            return 0
+        # Try different possible field names
+        total = (search_info.get("total_results") or
+                search_info.get("total_results_raw") or
+                search_info.get("total"))
+        if isinstance(total, int):
+            return total
+        if isinstance(total, str):
+            # Extract only digits (remove commas, spaces, etc.)
+            numbers_only = re.sub(r"[^\d]", "", total)
+            try:
+                return int(numbers_only) if numbers_only else 0
+            except ValueError:
+                return 0
+        return 0
+    def calculate_score(self, search_results: Dict[str, Any]) -> Tuple[float, Dict[str, Any]]:
+        """
+        Calculate competition score based on SERP features.
+        Args:
+            search_results: Complete search results from SerpApi
+        Returns:
+            Tuple of (competition_score, analysis_breakdown)
+            Score ranges from 0-1 where 1 = very competitive
+        """
+        search_info = search_results.get("search_information", {})
+        # Factor 1: Total number of results (normalized using log scale)
+        total_results = self.extract_total_results(search_info)
+        normalized_results = min(math.log10(total_results + 1) / 7, 1.0)
+        # Factor 2: Number of ads (more ads = more competition)
+        ads = search_results.get("ads_results", [])
+        ads_count = len(ads) if ads else 0
+        ads_score = min(ads_count / 3, 1.0)
+        # Factor 3: SERP features that make ranking more difficult
+        has_featured_snippet = bool(
+            search_results.get("featured_snippet") or
+            search_results.get("answer_box")
+        )
+        has_people_also_ask = bool(
+            search_results.get("related_questions") or
+            search_results.get("people_also_ask")
+        )
+        has_knowledge_graph = bool(search_results.get("knowledge_graph"))
+        # Calculate weighted competition score
+        competition_score = (
+            self.WEIGHTS['total_results'] * normalized_results +
+            self.WEIGHTS['ads'] * ads_score +
+            self.WEIGHTS['featured_snippet'] * has_featured_snippet +
+            self.WEIGHTS['people_also_ask'] * has_people_also_ask +
+            self.WEIGHTS['knowledge_graph'] * has_knowledge_graph
+        )
+        # Ensure score stays within bounds
+        competition_score = max(0.0, min(1.0, competition_score))
+        # Create analysis breakdown for reporting
+        breakdown = {
+            "total_results": total_results,
+            "ads_count": ads_count,
+            "has_featured_snippet": has_featured_snippet,
+            "has_people_also_ask": has_people_also_ask,
+            "has_knowledge_graph": has_knowledge_graph
+        }
+        return competition_score, breakdown
+class SearchVolumeEstimator:
+    """Handles search volume estimation and integration with volume APIs."""
+    def get_search_volume(self, keyword: str) -> Optional[int]:
+        """
+        Get search volume for a keyword.
+        TODO: Integrate with DataForSEO, Google Keyword Planner, or similar API
+        Args:
+            keyword: The keyword to get volume for
+        Returns:
+            Monthly search volume or None if unavailable
+        """
+        # Placeholder for real volume API integration
+        # Examples of what you might implement:
+        # - return self._call_dataforseo_api(keyword)
+        # - return self._call_google_ads_api(keyword)
+        return None
+    def estimate_volume(self, keyword: str) -> int:
+        """
+        Estimate search volume using simple heuristics.
+        Args:
+            keyword: The keyword to estimate volume for
+        Returns:
+            Estimated monthly search volume
+        """
+        # Simple heuristic: longer phrases typically have lower volume
+        word_count = len(keyword.split())
+        # This is rough estimation - replace with real data when possible
+        return max(10, 10000 // (word_count + 1))
+class KeywordDiscovery:
+    """Discovers related keywords from search results."""
+    def __init__(self, config: Config):
+        self.config = config
+    def find_related_keywords(self, seed_keyword: str) -> List[str]:
+        """
+        Find related keywords from Google's suggestions and related searches.
+        Args:
+            seed_keyword: The base keyword to find related terms for
+        Returns:
+            List of related keyword candidates
+        """
+        logger.info(f"Discovering related keywords for: '{seed_keyword}'")
+        search_params = {
+            "engine": "google",
+            "q": seed_keyword,
+            "api_key": self.config.serpapi_key,
+            "hl": "en",
+            "gl": "us"
+        }
+        try:
+            search = GoogleSearch(search_params)
+            results = search.get_dict()
+        except Exception as e:
+            logger.error(f"Failed to get related keywords: {e}")
+            return []
+        keyword_candidates = set()
+        # Extract keywords from different sources
+        self._extract_from_related_searches(results, keyword_candidates)
+        self._extract_from_people_also_ask(results, keyword_candidates)
+        self._extract_from_organic_titles(results, keyword_candidates)
+        # Convert to list and limit results
+        final_keywords = list(keyword_candidates)[:self.config.max_related_keywords]
+        logger.info(f"Found {len(final_keywords)} keyword candidates")
+        return final_keywords
+    def _extract_from_related_searches(self, results: Dict[str, Any],
+                                     candidates: set) -> None:
+        """Extract keywords from 'related searches' section."""
+        related_searches = results.get("related_searches", [])
+        for item in related_searches:
+            query = item.get("query") or item.get("suggestion")
+            if query and len(query.strip()) > 0:
+                candidates.add(query.strip())
+    def _extract_from_people_also_ask(self, results: Dict[str, Any],
+                                    candidates: set) -> None:
+        """Extract keywords from 'People also ask' questions."""
+        related_questions = results.get("related_questions", [])
+        for item in related_questions:
+            question = item.get("question") or item.get("query")
+            if question and len(question.strip()) > 0:
+                candidates.add(question.strip())
+    def _extract_from_organic_titles(self, results: Dict[str, Any],
+                                   candidates: set) -> None:
+        """Extract potential keywords from organic result titles."""
+        organic_results = results.get("organic_results", [])
+        for result in organic_results[:10]:  # Only top 10 results
+            title = result.get("title", "")
+            if title and len(title.strip()) > 0:
+                candidates.add(title.strip())
+class KeywordAnalyzer:
+    """Main class for analyzing keywords and calculating opportunity scores."""
+    def __init__(self, config: Config):
+        self.config = config
+        self.competition_calc = CompetitionCalculator()
+        self.volume_estimator = SearchVolumeEstimator()
+        self.keyword_discovery = KeywordDiscovery(config)
+    def search_google(self, keyword: str) -> Dict[str, Any]:
+        """
+        Fetch search results for a keyword using SerpApi.
+        Args:
+            keyword: The keyword to search for
+        Returns:
+            Search results dictionary from SerpApi
+        """
+        search_params = {
+            "engine": "google",
+            "q": keyword,
+            "api_key": self.config.serpapi_key,
+            "hl": "en",
+            "gl": "us",
+            "num": self.config.results_per_query
+        }
+        try:
+            search = GoogleSearch(search_params)
+            return search.get_dict()
+        except Exception as e:
+            logger.error(f"Search failed for '{keyword}': {e}")
+            return {}
+    def analyze_keyword(self, keyword: str, use_volume_api: bool = False) -> Optional[KeywordMetrics]:
+        """
+        Analyze a single keyword and calculate its opportunity score.
+        Args:
+            keyword: The keyword to analyze
+            use_volume_api: Whether to use real volume API (not implemented yet)
+        Returns:
+            KeywordMetrics object or None if analysis failed
+        """
+        # Get search results
+        search_results = self.search_google(keyword)
+        if not search_results:
+            return None
+        # Calculate competition score
+        competition_score, breakdown = self.competition_calc.calculate_score(search_results)
+        # Get or estimate search volume
+        if use_volume_api:
+            search_volume = self.volume_estimator.get_search_volume(keyword)
+        else:
+            search_volume = None
+        if search_volume is None:
+            search_volume = self.volume_estimator.estimate_volume(keyword)
+        # Calculate opportunity score
+        # Higher volume = better, lower competition = better
+        volume_score = math.log10(search_volume + 1)
+        opportunity_score = volume_score / (competition_score + 0.01)  # Avoid division by zero
+        return KeywordMetrics(
+            keyword=keyword,
+            monthly_searches=search_volume,
+            competition_score=round(competition_score, 4),
+            opportunity_score=round(opportunity_score, 2),
+            total_results=breakdown["total_results"],
+            ads_count=breakdown["ads_count"],
+            has_featured_snippet=breakdown["has_featured_snippet"],
+            has_people_also_ask=breakdown["has_people_also_ask"],
+            has_knowledge_graph=breakdown["has_knowledge_graph"]
+        )
+    def analyze_keywords_batch(self, keywords: List[str],
+                             use_volume_api: bool = False) -> List[KeywordMetrics]:
+        """
+        Analyze multiple keywords and return sorted results.
+        Args:
+            keywords: List of keywords to analyze
+            use_volume_api: Whether to use real volume API
+        Returns:
+            List of KeywordMetrics sorted by opportunity score (highest first)
+        """
+        logger.info(f"Analyzing {len(keywords)} keywords...")
+        analyzed_keywords = []
+        for i, keyword in enumerate(keywords, 1):
+            if i % self.config.progress_update_interval == 0:
+                logger.info(f"Progress: {i}/{len(keywords)} keywords processed")
+            metrics = self.analyze_keyword(keyword, use_volume_api)
+            if metrics:
+                analyzed_keywords.append(metrics)
+        # Sort by opportunity score (highest first)
+        analyzed_keywords.sort(key=lambda x: x.opportunity_score, reverse=True)
+        logger.info(f"Analysis complete! {len(analyzed_keywords)} keywords analyzed")
+        return analyzed_keywords
+class ResultsExporter:
+    """Handles exporting results to various formats."""
+    def save_to_csv(self, keyword_metrics: List[KeywordMetrics],
+                    base_filename: str = "keyword_analysis",
+                    top_count: int = 50) -> Optional[str]:
+        """
+        Save keyword analysis results to CSV file.
+        Args:
+            keyword_metrics: List of analyzed keyword metrics
+            base_filename: Base name for the output file
+            top_count: Number of top results to save
+        Returns:
+            Filename if successful, None if failed
+        """
+        if not keyword_metrics:
+            logger.warning("No data to save!")
+            return None
+        # Create filename with timestamp
+        today = date.today()
+        filename = f"{base_filename}_{today}.csv"
+        try:
+            with open(filename, "w", newline='', encoding='utf-8') as file:
+                writer = csv.writer(file)
+                # Write header
+                headers = [
+                    "Keyword", "Monthly Searches", "Competition Score",
+                    "Opportunity Score", "Total Results", "Ads Count",
+                    "Featured Snippet", "People Also Ask", "Knowledge Graph"
+                ]
+                writer.writerow(headers)
+                # Write data rows
+                for metrics in keyword_metrics[:top_count]:
+                    row = [
+                        metrics.keyword,
+                        metrics.monthly_searches,
+                        metrics.competition_score,
+                        metrics.opportunity_score,
+                        metrics.total_results,
+                        metrics.ads_count,
+                        "Yes" if metrics.has_featured_snippet else "No",
+                        "Yes" if metrics.has_people_also_ask else "No",
+                        "Yes" if metrics.has_knowledge_graph else "No"
+                    ]
+                    writer.writerow(row)
+            saved_count = min(top_count, len(keyword_metrics))
+            logger.info(f"✅ Results saved to {filename} ({saved_count} keywords)")
+            return filename
+        except Exception as e:
+            logger.error(f"Failed to save CSV: {e}")
+            return None
+    def display_top_results(self, keyword_metrics: List[KeywordMetrics],
+                          top_count: int = 5) -> None:
+        """
+        Display top results in formatted table.
+        Args:
+            keyword_metrics: List of analyzed keyword metrics
+            top_count: Number of top results to display
+        """
+        if not keyword_metrics:
+            logger.warning("No results to display!")
+            return
+        top_results = keyword_metrics[:top_count]
+        print(f"\n🏆 Top {len(top_results)} Keyword Opportunities:")
+        if HAS_TABULATE:
+            # Create table data
+            table_data = []
+            for metrics in top_results:
+                table_data.append([
+                    metrics.keyword,
+                    f"{metrics.monthly_searches:,}",
+                    f"{metrics.competition_score:.3f}",
+                    f"{metrics.opportunity_score:.2f}",
+                    f"{metrics.total_results:,}",
+                    metrics.ads_count
+                ])
+            headers = ["Keyword", "Volume", "Competition", "Score", "Results", "Ads"]
+            print(tabulate(table_data, headers=headers, tablefmt="pretty"))
+        else:
+            # Fallback to simple format
+            for i, metrics in enumerate(top_results, 1):
+                print(f"{i}. {metrics.keyword}")
+                print(f"   Score: {metrics.opportunity_score}, "
+                      f"Volume: {metrics.monthly_searches:,}, "
+                      f"Competition: {metrics.competition_score:.3f}")
+class KeywordResearchTool:
+    """Main application class that orchestrates the keyword research process."""
+    def __init__(self, seed_keyword: str):
+        self.seed_keyword = seed_keyword
+        self.config = Config()
+        self.analyzer = KeywordAnalyzer(self.config)
+        self.exporter = ResultsExporter()
+    def run_analysis(self, use_volume_api: bool = False) -> None:
+        """
+        Run the complete keyword research analysis.
+        Args:
+            use_volume_api: Whether to use real volume API (requires implementation)
+        """
+        print("🔍 Starting keyword research analysis...")
+        print(f"Seed keyword: '{self.seed_keyword}'")
+        try:
+            # Step 1: Discover related keywords
+            related_keywords = self.analyzer.keyword_discovery.find_related_keywords(
+                self.seed_keyword
+            )
+            if not related_keywords:
+                logger.error("No keyword candidates found. Check your SerpApi key.")
+                return
+            # Step 2: Analyze keywords and calculate scores
+            analyzed_keywords = self.analyzer.analyze_keywords_batch(
+                related_keywords, use_volume_api
+            )
+            if not analyzed_keywords:
+                logger.error("No keywords were successfully analyzed.")
+                return
+            # Step 3: Save results to file
+            self.exporter.save_to_csv(
+                analyzed_keywords,
+                base_filename=f"keywords_{self.seed_keyword.replace(' ', '_')}",
+                top_count=self.config.top_keywords_to_save
+            )
+            # Step 4: Display top results
+            self.exporter.display_top_results(analyzed_keywords, top_count=5)
+        except Exception as e:
+            logger.error(f"Analysis failed: {e}")
+            raise
+def main():
+    """Main entry point for the keyword research tool."""
+    # Configuration
+    SEED_KEYWORD = "global internship"
+    USE_VOLUME_API = False  # Set to True when you implement get_search_volume()
+    try:
+        tool = KeywordResearchTool(SEED_KEYWORD)
+        tool.run_analysis(use_volume_api=USE_VOLUME_API)
+    except ValueError as e:
+        logger.error(f"Configuration error: {e}")
+        print("\n💡 Setup Instructions:")
+        print("1. Create a .env file in the same directory")
+        print("2. Add your SerpApi key: SERPAPI_KEY=your_key_here")
+        print("3. Get your free key at: https://serpapi.com/")
+    except Exception as e:
+        logger.error(f"Unexpected error: {e}")
+if __name__ == "__main__":
+    main()

requirements.txt ADDED Viewed

	@@ -0,0 +1,60 @@

+altair==5.5.0
+annotated-types==0.7.0
+anyio==4.11.0
+attrs==25.3.0
+blinker==1.9.0
+cachetools==6.2.0
+certifi==2025.8.3
+charset-normalizer==3.4.3
+click==8.1.8
+colorama==0.4.6
+et_xmlfile==2.0.0
+exceptiongroup==1.3.0
+fastapi==0.118.0
+gitdb==4.0.12
+GitPython==3.1.45
+google_search_results==2.4.2
+gunicorn==23.0.0
+h11==0.16.0
+httptools==0.6.4
+idna==3.10
+Jinja2==3.1.6
+jsonschema==4.25.1
+jsonschema-specifications==2025.9.1
+MarkupSafe==3.0.3
+narwhals==2.6.0
+numpy==2.0.2
+openpyxl==3.1.5
+packaging==25.0
+pandas==2.3.2
+pillow==11.3.0
+plotly==6.3.0
+protobuf==6.32.1
+pyarrow==21.0.0
+pydantic==2.11.9
+pydantic_core==2.33.2
+pydeck==0.9.1
+python-dateutil==2.9.0.post0
+python-dotenv==1.1.1
+pytz==2025.2
+PyYAML==6.0.3
+referencing==0.36.2
+requests==2.32.5
+rpds-py==0.27.1
+six==1.17.0
+smmap==5.0.2
+sniffio==1.3.1
+starlette==0.48.0
+streamlit==1.50.0
+tabulate==0.9.0
+tenacity==9.1.2
+toml==0.10.2
+tornado==6.5.2
+typing-inspection==0.4.1
+typing_extensions==4.15.0
+tzdata==2025.2
+urllib3==2.5.0
+uvicorn==0.37.0
+watchdog==6.0.0
+watchfiles==1.1.0
+websockets==15.0.1

server.py ADDED Viewed

	@@ -0,0 +1,625 @@

+# src/server.py
+"""
+Free-Plan Friendly SEO Keyword Research API
+Optimized to minimize SerpAPI calls while maximizing keyword discovery
+Key Features:
+- Configurable keyword count (5, 10, 20, 50, etc.)
+- Only 1 SerpAPI call per seed for candidate collection
+- Mock scoring for initial ranking
+- Optional SerpAPI verification for top N results
+- Strict mode for free plan protection (max 5 API calls per request)
+"""
+import os
+import logging
+import time
+import math
+import re
+import io
+from typing import List, Dict, Any, Optional, Tuple
+from datetime import datetime
+from collections import Counter
+from fastapi import FastAPI, HTTPException, Query, Request
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import JSONResponse, StreamingResponse
+from pydantic import BaseModel, Field
+from dotenv import load_dotenv
+try:
+    import pandas as pd
+    HAS_PANDAS = True
+except ImportError:
+    HAS_PANDAS = False
+try:
+    from serpapi import GoogleSearch
+    HAS_SERPAPI = True
+except ImportError:
+    try:
+        from google_search_results import GoogleSearch
+        HAS_SERPAPI = True
+    except ImportError:
+        HAS_SERPAPI = False
+# Load environment
+load_dotenv()
+# Configure logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(levelname)s - %(message)s'
+)
+logger = logging.getLogger(__name__)
+# Initialize FastAPI
+app = FastAPI(
+    title="Free-Plan Friendly SEO Keyword API",
+    description="Efficient keyword research optimized for SerpAPI free plan",
+    version="4.0.0",
+    docs_url="/docs"
+)
+# CORS
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["GET", "POST", "OPTIONS"],
+    allow_headers=["*"],
+)
+# Configuration
+SERPAPI_KEY = os.getenv("SERPAPI_KEY")
+API_AUTH_KEY = os.getenv("API_AUTH_KEY")
+USE_SERPAPI_STRICT_MODE = os.getenv("USE_SERPAPI_STRICT_MODE", "true").lower() == "true"
+MAX_SERPAPI_CALLS_STRICT = 5  # Maximum API calls in strict mode
+MAX_SERPAPI_CALLS_NORMAL = 20  # Maximum API calls in normal mode
+# Rate limiting
+REQUEST_TIMES = {}
+RATE_LIMIT_WINDOW = 60
+RATE_LIMIT_MAX_REQUESTS = 30
+# Request counter for monitoring
+API_CALL_COUNTER = {"total": 0, "session_start": time.time()}
+class KeywordResponse(BaseModel):
+    """API response model."""
+    success: bool = True
+    seed: str
+    requested: int
+    returned: int
+    results: List[Dict[str, Any]]
+    processing_time: float
+    api_calls_used: int
+    api_budget_remaining: int
+    data_source: str
+    timestamp: str
+def count_api_call():
+    """Track API usage."""
+    API_CALL_COUNTER["total"] += 1
+    logger.info(f"API call #{API_CALL_COUNTER['total']} - Session time: {time.time() - API_CALL_COUNTER['session_start']:.1f}s")
+def get_api_budget() -> int:
+    """Calculate remaining API budget for this request."""
+    max_calls = MAX_SERPAPI_CALLS_STRICT if USE_SERPAPI_STRICT_MODE else MAX_SERPAPI_CALLS_NORMAL
+    used = API_CALL_COUNTER["total"]
+    return max(0, max_calls - used)
+def heuristic_competition_score(keyword: str) -> float:
+    """
+    Calculate mock competition score based on keyword characteristics.
+    Does NOT use any API calls.
+    """
+    words = keyword.lower().split()
+    word_count = len(words)
+    # Base competition by word count
+    base_scores = {1: 0.8, 2: 0.6, 3: 0.4, 4: 0.25, 5: 0.2}
+    base_score = base_scores.get(word_count, max(0.15, 0.3 - (word_count * 0.02)))
+    # Adjust for question keywords (lower competition)
+    question_words = ["how", "what", "why", "when", "where", "who", "which", "can", "should", "is", "are", "does"]
+    if any(word in words for word in question_words):
+        base_score *= 0.7
+    # Adjust for commercial intent (higher competition)
+    commercial_words = ["buy", "best", "top", "review", "price", "cheap", "discount"]
+    if any(word in words for word in commercial_words):
+        base_score *= 1.3
+    # Adjust for specific/niche keywords (lower competition)
+    specific_words = ["beginner", "tutorial", "guide", "explained", "step", "diy", "simple"]
+    if any(word in words for word in specific_words):
+        base_score *= 0.8
+    # Add some deterministic variation based on keyword hash
+    variation = (hash(keyword) % 20) / 100  # -0.1 to +0.1
+    base_score += variation
+    return max(0.05, min(0.95, base_score))
+def heuristic_search_volume(keyword: str) -> int:
+    """
+    Estimate search volume based on keyword characteristics.
+    Does NOT use any API calls.
+    """
+    words = keyword.lower().split()
+    word_count = len(words)
+    # Base volumes
+    base_volumes = {1: 10000, 2: 5000, 3: 2000, 4: 800, 5: 400}
+    base_volume = base_volumes.get(word_count, max(100, 500 - (word_count * 50)))
+    # Adjust for popular terms
+    popular_terms = ["free", "online", "best", "how", "tutorial", "guide"]
+    if any(term in words for term in popular_terms):
+        base_volume = int(base_volume * 1.5)
+    # Adjust for very specific/niche terms
+    niche_terms = ["advanced", "professional", "enterprise", "custom"]
+    if any(term in words for term in niche_terms):
+        base_volume = int(base_volume * 0.6)
+    # Add deterministic variation
+    variation_factor = 1 + ((hash(keyword) % 40) - 20) / 100  # 0.8 to 1.2
+    volume = int(base_volume * variation_factor)
+    return max(10, min(100000, volume))
+def calculate_opportunity_score(volume: int, competition: float) -> float:
+    """Calculate opportunity score."""
+    volume_score = math.log10(volume + 1)
+    return volume_score / (competition + 0.1)
+def score_keyword_heuristic(keyword: str) -> Dict[str, Any]:
+    """
+    Score a keyword using only heuristics (NO API calls).
+    Fast and free method for initial ranking.
+    """
+    competition = heuristic_competition_score(keyword)
+    volume = heuristic_search_volume(keyword)
+    opportunity = calculate_opportunity_score(volume, competition)
+    # Determine difficulty
+    if competition < 0.3:
+        difficulty = "Easy"
+    elif competition < 0.5:
+        difficulty = "Medium"
+    elif competition < 0.7:
+        difficulty = "Hard"
+    else:
+        difficulty = "Very Hard"
+    # Estimate ranking potential
+    if competition < 0.4 and volume >= 300:
+        ranking_chance = "High"
+    elif competition < 0.6 and volume >= 100:
+        ranking_chance = "Medium"
+    else:
+        ranking_chance = "Low"
+    return {
+        "keyword": keyword,
+        "monthly_searches": volume,
+        "competition_score": round(competition, 4),
+        "opportunity_score": round(opportunity, 2),
+        "difficulty": difficulty,
+        "ranking_chance": ranking_chance,
+        "data_source": "heuristic"
+    }
+def enrich_with_serpapi(keyword: str) -> Optional[Dict[str, Any]]:
+    """
+    Enrich a keyword with real SerpAPI data.
+    Uses 1 API call per keyword.
+    """
+    if not HAS_SERPAPI or not SERPAPI_KEY:
+        logger.warning("SerpAPI not available for enrichment")
+        return None
+    try:
+        count_api_call()
+        params = {
+            "engine": "google",
+            "q": keyword,
+            "api_key": SERPAPI_KEY,
+            "hl": "en",
+            "gl": "us",
+            "num": 10
+        }
+        search = GoogleSearch(params)
+        results = search.get_dict()
+        if "error" in results:
+            logger.error(f"SerpAPI error: {results['error']}")
+            return None
+        # Extract metrics
+        search_info = results.get("search_information", {})
+        total_results_raw = search_info.get("total_results") or search_info.get("total_results_raw") or ""
+        total_results = 0
+        if isinstance(total_results_raw, int):
+            total_results = total_results_raw
+        elif isinstance(total_results_raw, str):
+            nums = re.sub(r"[^\d]", "", total_results_raw)
+            total_results = int(nums) if nums else 0
+        ads_count = len(results.get("ads_results", []))
+        has_featured_snippet = bool(results.get("featured_snippet") or results.get("answer_box"))
+        has_paa = bool(results.get("related_questions") or results.get("people_also_ask"))
+        has_kg = bool(results.get("knowledge_graph"))
+        # Calculate real competition
+        normalized_results = min(math.log10(total_results + 1) / 7, 1.0) if total_results > 0 else 0
+        ads_score = min(ads_count / 3, 1.0)
+        competition = (
+            0.40 * normalized_results +
+            0.25 * ads_score +
+            0.15 * (1 if has_featured_snippet else 0) +
+            0.10 * (1 if has_paa else 0) +
+            0.10 * (1 if has_kg else 0)
+        )
+        competition = max(0.0, min(1.0, competition))
+        # Estimate volume from signals
+        word_count = len(keyword.split())
+        base_volume = max(100, 8000 // (word_count + 1))
+        if ads_count > 2:
+            base_volume = int(base_volume * 1.5)
+        if has_featured_snippet:
+            base_volume = int(base_volume * 1.2)
+        volume = min(base_volume, 50000)
+        opportunity = calculate_opportunity_score(volume, competition)
+        # Determine difficulty
+        if competition < 0.3:
+            difficulty = "Easy"
+        elif competition < 0.5:
+            difficulty = "Medium"
+        elif competition < 0.7:
+            difficulty = "Hard"
+        else:
+            difficulty = "Very Hard"
+        # Ranking chance
+        if competition < 0.35:
+            ranking_chance = "High"
+        elif competition < 0.55:
+            ranking_chance = "Medium"
+        else:
+            ranking_chance = "Low"
+        return {
+            "keyword": keyword,
+            "monthly_searches": volume,
+            "competition_score": round(competition, 4),
+            "opportunity_score": round(opportunity, 2),
+            "difficulty": difficulty,
+            "ranking_chance": ranking_chance,
+            "total_results": total_results,
+            "ads_count": ads_count,
+            "featured_snippet": "Yes" if has_featured_snippet else "No",
+            "people_also_ask": "Yes" if has_paa else "No",
+            "knowledge_graph": "Yes" if has_kg else "No",
+            "data_source": "serpapi"
+        }
+    except Exception as e:
+        logger.error(f"SerpAPI enrichment failed for '{keyword}': {e}")
+        return None
+def collect_candidates_from_seed(seed: str) -> Tuple[List[str], int]:
+    """
+    Collect keyword candidates using ONLY 1 SerpAPI call.
+    Returns (candidates, api_calls_used)
+    """
+    candidates = set()
+    candidates.add(seed)  # Always include seed
+    api_calls = 0
+    # Generate synthetic candidates (NO API calls)
+    question_words = ["how to", "what is", "why", "when", "where", "can i", "should i"]
+    modifiers = ["best", "free", "online", "guide", "tutorial", "tips", "examples",
+                 "for beginners", "explained", "2024", "2025", "cheap", "review"]
+    for q in question_words[:5]:
+        candidates.add(f"{q} {seed}")
+    for mod in modifiers[:15]:
+        candidates.add(f"{seed} {mod}")
+        candidates.add(f"{mod} {seed}")
+    # Make ONE SerpAPI call to get real related keywords
+    if HAS_SERPAPI and SERPAPI_KEY:
+        try:
+            count_api_call()
+            api_calls = 1
+            params = {
+                "engine": "google",
+                "q": seed,
+                "api_key": SERPAPI_KEY,
+                "hl": "en",
+                "gl": "us"
+            }
+            search = GoogleSearch(params)
+            results = search.get_dict()
+            if "error" not in results:
+                # Extract related searches
+                for item in results.get("related_searches", [])[:20]:
+                    query = item.get("query", "")
+                    if query and len(query.split()) <= 6:
+                        candidates.add(query.lower().strip())
+                # Extract PAA questions
+                for item in results.get("related_questions", [])[:15]:
+                    question = item.get("question", "")
+                    if question:
+                        candidates.add(question.lower().strip())
+                logger.info(f"SerpAPI call successful: collected real suggestions")
+            else:
+                logger.warning(f"SerpAPI error: {results.get('error')}")
+        except Exception as e:
+            logger.error(f"SerpAPI collection failed: {e}")
+    final_candidates = list(candidates)
+    logger.info(f"Collected {len(final_candidates)} candidates ({api_calls} API call)")
+    return final_candidates, api_calls
+def check_rate_limit(client_ip: str) -> bool:
+    """Rate limiting."""
+    current_time = time.time()
+    if client_ip not in REQUEST_TIMES:
+        REQUEST_TIMES[client_ip] = []
+    REQUEST_TIMES[client_ip] = [
+        t for t in REQUEST_TIMES[client_ip]
+        if current_time - t < RATE_LIMIT_WINDOW
+    ]
+    if len(REQUEST_TIMES[client_ip]) >= RATE_LIMIT_MAX_REQUESTS:
+        return False
+    REQUEST_TIMES[client_ip].append(current_time)
+    return True
+@app.on_event("startup")
+async def startup():
+    """Startup logging."""
+    logger.info("=" * 60)
+    logger.info("SEO Keyword API - Free Plan Optimized")
+    logger.info(f"Strict Mode: {USE_SERPAPI_STRICT_MODE}")
+    logger.info(f"Max API calls per request: {MAX_SERPAPI_CALLS_STRICT if USE_SERPAPI_STRICT_MODE else MAX_SERPAPI_CALLS_NORMAL}")
+    logger.info(f"SerpAPI Available: {HAS_SERPAPI and bool(SERPAPI_KEY)}")
+    logger.info("=" * 60)
+@app.get("/")
+async def root():
+    """Root endpoint."""
+    return {
+        "service": "Free-Plan Friendly SEO Keyword API",
+        "version": "4.0.0",
+        "strict_mode": USE_SERPAPI_STRICT_MODE,
+        "max_api_calls": MAX_SERPAPI_CALLS_STRICT if USE_SERPAPI_STRICT_MODE else MAX_SERPAPI_CALLS_NORMAL,
+        "strategy": "1 API call for candidate collection + optional enrichment for top N",
+        "endpoints": {
+            "/keywords": "Main keyword research (configurable count)",
+            "/health": "Health check",
+            "/stats": "API usage statistics"
+        }
+    }
+@app.get("/health")
+async def health():
+    """Health check."""
+    return {
+        "status": "healthy",
+        "timestamp": datetime.utcnow().isoformat(),
+        "serpapi_available": HAS_SERPAPI and bool(SERPAPI_KEY),
+        "strict_mode": USE_SERPAPI_STRICT_MODE,
+        "session_api_calls": API_CALL_COUNTER["total"]
+    }
+@app.get("/stats")
+async def stats():
+    """API usage statistics."""
+    uptime = time.time() - API_CALL_COUNTER["session_start"]
+    return {
+        "session_start": datetime.fromtimestamp(API_CALL_COUNTER["session_start"]).isoformat(),
+        "uptime_seconds": round(uptime, 1),
+        "total_api_calls": API_CALL_COUNTER["total"],
+        "strict_mode": USE_SERPAPI_STRICT_MODE,
+        "max_calls_per_request": MAX_SERPAPI_CALLS_STRICT if USE_SERPAPI_STRICT_MODE else MAX_SERPAPI_CALLS_NORMAL
+    }
+@app.get("/keywords", response_model=KeywordResponse)
+async def get_keywords(
+    request: Request,
+    seed: str = Query(..., description="Seed keyword", min_length=1, max_length=100),
+    top: int = Query(50, description="Number of keywords to return", ge=1, le=100),
+    enrich_top: int = Query(4, description="Number of top results to enrich with SerpAPI", ge=0, le=20)
+):
+    """
+    Main keyword research endpoint.
+    Strategy:
+    1. Make 1 SerpAPI call to collect candidates from seed
+    2. Score all candidates with heuristics (free)
+    3. Optionally enrich top N with real SerpAPI data
+    Parameters:
+    - seed: Your main keyword
+    - top: How many keywords you want (e.g., 5, 10, 20, 50)
+    - enrich_top: How many of the top results to verify with SerpAPI (0 = none, saves API calls)
+    Example: top=10, enrich_top=3 means:
+    - 1 API call to collect candidates
+    - Return 10 keywords scored with heuristics
+    - Enrich the top 3 with real SerpAPI data (3 more API calls)
+    - Total: 4 API calls
+    """
+    start_time = time.time()
+    client_ip = request.client.host or "unknown"
+    # Authentication
+    if API_AUTH_KEY:
+        auth = request.headers.get("Authorization", "").replace("Bearer ", "")
+        if auth != API_AUTH_KEY:
+            raise HTTPException(401, "Invalid or missing API key")
+    # Rate limiting
+    if not check_rate_limit(client_ip):
+        raise HTTPException(429, "Rate limit exceeded")
+    # Validate
+    seed = seed.strip().lower()
+    if not seed:
+        raise HTTPException(400, "Invalid seed keyword")
+    # Check API budget
+    max_calls = MAX_SERPAPI_CALLS_STRICT if USE_SERPAPI_STRICT_MODE else MAX_SERPAPI_CALLS_NORMAL
+    if enrich_top > 0:
+        required_calls = 1 + enrich_top  # 1 for collection + N for enrichment
+        if required_calls > max_calls:
+            raise HTTPException(
+                400,
+                f"Request would use {required_calls} API calls, but budget is {max_calls}. "
+                f"Reduce enrich_top to {max_calls - 1} or less."
+            )
+    try:
+        logger.info(f"Request: seed='{seed}', top={top}, enrich_top={enrich_top}")
+        # Step 1: Collect candidates (1 API call)
+        candidates, api_calls_used = collect_candidates_from_seed(seed)
+        if not candidates:
+            raise HTTPException(404, "No candidates found")
+        # Step 2: Score all candidates with heuristics (FREE - no API calls)
+        logger.info(f"Scoring {len(candidates)} candidates with heuristics...")
+        scored_candidates = []
+        for candidate in candidates:
+            try:
+                result = score_keyword_heuristic(candidate)
+                scored_candidates.append(result)
+            except Exception as e:
+                logger.warning(f"Heuristic scoring failed for '{candidate}': {e}")
+                continue
+        # Sort by opportunity score (highest first)
+        scored_candidates.sort(key=lambda x: x["opportunity_score"], reverse=True)
+        # Get top N requested
+        top_results = scored_candidates[:top]
+        # Step 3: Optionally enrich top results with real SerpAPI data
+        data_source = "heuristic"
+        if enrich_top > 0 and HAS_SERPAPI and SERPAPI_KEY:
+            logger.info(f"Enriching top {enrich_top} results with SerpAPI...")
+            for i in range(min(enrich_top, len(top_results))):
+                keyword = top_results[i]["keyword"]
+                # Check budget before each call
+                if api_calls_used >= max_calls:
+                    logger.warning(f"API budget exhausted at {api_calls_used} calls")
+                    break
+                enriched = enrich_with_serpapi(keyword)
+                if enriched:
+                    top_results[i] = enriched
+                    api_calls_used += 1
+                    data_source = "mixed"
+                # Small delay between calls
+                time.sleep(0.2)
+            logger.info(f"Enrichment complete: {api_calls_used} total API calls used")
+        # Add ranking
+        for rank, result in enumerate(top_results, 1):
+            result["rank"] = rank
+        processing_time = time.time() - start_time
+        budget_remaining = max_calls - api_calls_used
+        logger.info(
+            f"SUCCESS: Returned {len(top_results)} keywords, "
+            f"API calls: {api_calls_used}/{max_calls}, "
+            f"Time: {processing_time:.2f}s"
+        )
+        return KeywordResponse(
+            success=True,
+            seed=seed,
+            requested=top,
+            returned=len(top_results),
+            results=top_results,
+            processing_time=round(processing_time, 2),
+            api_calls_used=api_calls_used,
+            api_budget_remaining=budget_remaining,
+            data_source=data_source,
+            timestamp=datetime.utcnow().isoformat()
+        )
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"Request failed: {e}")
+        raise HTTPException(500, f"Processing error: {str(e)}")
+@app.get("/export/csv")
+async def export_csv(
+    seed: str = Query(...),
+    top: int = Query(50),
+    enrich_top: int = Query(0)
+):
+    """Export results as CSV."""
+    if not HAS_PANDAS:
+        raise HTTPException(500, "CSV export unavailable (pandas not installed)")
+    # Get keyword data
+    response = await get_keywords(Request(scope={"type": "http", "client": ("127.0.0.1", 0), "headers": []}), seed, top, enrich_top)
+    # Convert to DataFrame
+    df = pd.DataFrame(response.results)
+    # Create CSV
+    output = io.StringIO()
+    df.to_csv(output, index=False)
+    output.seek(0)
+    return StreamingResponse(
+        iter([output.getvalue()]),
+        media_type="text/csv",
+        headers={"Content-Disposition": f"attachment; filename=keywords_{seed.replace(' ', '_')}.csv"}
+    )
+if __name__ == "__main__":
+    import uvicorn
+    port = int(os.getenv("PORT", 8000))
+    logger.info(f"Starting server on port {port}")
+    uvicorn.run(
+        app,
+        host="0.0.0.0",
+        port=port,
+        log_level="info"
+    )

tempCodeRunnerFile.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ pip install serpapi tabulate python-dotenv