Spaces:

Tremick
/

PIOE

Runtime error

App Files Files Community

B1acB1rd commited on Jan 17

Commit

4d92cd5

0 Parent(s):

PIOE 2.0 ready for deploymnet

Browse files

Files changed (37) hide show

.env.example +62 -0
.gitignore +63 -0
Dockerfile +31 -0
Procfile +2 -0
README.md +136 -0
backend/__init__.py +17 -0
backend/config.py +76 -0
backend/database.py +32 -0
backend/delivery/__init__.py +6 -0
backend/delivery/digest.py +171 -0
backend/ingestion/__init__.py +27 -0
backend/ingestion/arxiv_client.py +124 -0
backend/ingestion/careers_client.py +403 -0
backend/ingestion/github_client.py +154 -0
backend/ingestion/grants_client.py +385 -0
backend/ingestion/jobboard_client.py +472 -0
backend/ingestion/reddit_client.py +185 -0
backend/ingestion/rss_client.py +220 -0
backend/ingestion/scheduler.py +371 -0
backend/ingestion/superteam_client.py +178 -0
backend/ingestion/web_scraper.py +227 -0
backend/intelligence/__init__.py +22 -0
backend/intelligence/classifier.py +214 -0
backend/intelligence/credibility.py +125 -0
backend/intelligence/llm_client.py +352 -0
backend/intelligence/novelty.py +118 -0
backend/intelligence/roi_scorer.py +340 -0
backend/intelligence/scorer.py +101 -0
backend/intelligence/silent_detector.py +313 -0
backend/main.py +481 -0
backend/models.py +237 -0
config/sources.yaml +135 -0
frontend/app.js +660 -0
frontend/index.html +162 -0
frontend/styles.css +905 -0
render.yaml +25 -0
requirements.txt +18 -0

.env.example ADDED Viewed

	@@ -0,0 +1,62 @@

+# ===========================================
+# PIOE 2.0 Environment Configuration
+# ===========================================
+# Copy this file to .env and fill in your values
+# ===========================================
+# AI Provider (Required - pick one)
+# ===========================================
+AI_PROVIDER=gemini
+# Gemini API (Free: https://makersuite.google.com/app/apikey)
+GEMINI_API_KEY=your_gemini_api_key_here
+# OpenAI API (Alternative to Gemini)
+OPENAI_API_KEY=
+# ===========================================
+# Job Board APIs (Optional - get for more jobs)
+# ===========================================
+# Adzuna API (Free: 250 requests/day)
+# Sign up at: https://developer.adzuna.com/
+ADZUNA_APP_ID=
+ADZUNA_API_KEY=
+# Jooble API (Free tier, aggregates LinkedIn/Indeed/Glassdoor)
+# Sign up at: https://jooble.org/api/about
+JOOBLE_API_KEY=
+# RapidAPI for LinkedIn Jobs (Free: 100 requests/month)
+# Sign up at: https://rapidapi.com/jaypat87/api/linkedin-jobs-search
+RAPIDAPI_KEY=
+# ===========================================
+# Social APIs (Optional - for more sources)
+# ===========================================
+# Reddit API (get from reddit.com/prefs/apps)
+REDDIT_CLIENT_ID=
+REDDIT_CLIENT_SECRET=
+REDDIT_USER_AGENT=PIOE/2.0
+# GitHub API (for higher rate limits)
+# Get at: https://github.com/settings/tokens
+GITHUB_TOKEN=
+# ===========================================
+# Database
+# ===========================================
+DATABASE_URL=sqlite:///./pioe.db
+# ===========================================
+# Ingestion Schedule
+# ===========================================
+INGESTION_INTERVAL_HOURS=6
+# ===========================================
+# Scoring Thresholds (Lower = More Results)
+# ===========================================
+MIN_RELEVANCE_SCORE=0.3
+MIN_NOVELTY_SCORE=0.3
+MIN_CREDIBILITY_SCORE=0.5

.gitignore ADDED Viewed

	@@ -0,0 +1,63 @@

+# PIOE .gitignore
+# Environment files (contains secrets!)
+.env
+.env.local
+# Database
+*.db
+*.sqlite
+*.sqlite3
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual environments
+venv/
+ENV/
+env/
+.venv/
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+*~
+# OS
+.DS_Store
+Thumbs.db
+# Logs
+*.log
+logs/
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+# Misc
+*.bak
+tmp/
+temp/

Dockerfile ADDED Viewed

	@@ -0,0 +1,31 @@

+# PIOE Docker Image
+FROM python:3.11-slim
+# Set working directory
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    gcc \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements first for caching
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY . .
+# Create non-root user for security
+RUN useradd -m appuser && chown -R appuser:appuser /app
+USER appuser
+# Expose port
+EXPOSE 8000
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/api/stats || exit 1
+# Run the application
+CMD ["uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"]

Procfile ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ # Procfile for Render/Heroku
2	+ web: uvicorn backend.main:app --host 0.0.0.0 --port ${PORT:-8000}

README.md ADDED Viewed

	@@ -0,0 +1,136 @@

+# PIOE 2.0 - Personal Intelligence & Opportunity Engine
+Signal intelligence system for detecting early opportunities in AI, Robotics, Computer Vision, Finance, Scholarships, and Hackathons.
+## Features
+- **Multi-Source Ingestion**: arXiv, GitHub, RSS, Superteam, Web scraping
+- **Job Board Aggregators**: Arbeitnow, TheMuse, Remotive, Adzuna, Jooble, LinkedIn
+- **AI Classification**: Gemini-powered categorization and summarization
+- **Smart Scoring**: Relevance, novelty, and credibility scoring with ROI analysis
+- **Anti-Noise Filters**: Rejects recycled content and discussion posts
+- **Modern Dashboard**: Real-time opportunity feed with filters
+## Quick Start
+### 1. Install Dependencies
+```bash
+cd PIOE
+pip install -r requirements.txt
+```
+### 2. Configure Environment
+```bash
+cp .env.example .env
+# Edit .env with your API keys
+```
+**Required:**
+- `GEMINI_API_KEY` - Get from [Google AI Studio](https://makersuite.google.com/app/apikey)
+**Optional (More Jobs):**
+- `ADZUNA_APP_ID` / `ADZUNA_API_KEY` - [Adzuna Developer](https://developer.adzuna.com/) (Free: 250 req/day)
+- `JOOBLE_API_KEY` - [Jooble API](https://jooble.org/api/about) (Free, aggregates LinkedIn/Indeed/Glassdoor)
+- `RAPIDAPI_KEY` - [RapidAPI LinkedIn](https://rapidapi.com/jaypat87/api/linkedin-jobs-search) (Free: 100 req/month)
+- `GITHUB_TOKEN` - For higher rate limits
+### 3. Run the Server
+```bash
+uvicorn backend.main:app --reload
+```
+Open http://localhost:8000 in your browser.
+### 4. Trigger First Ingestion
+Click "Run Ingestion" in the dashboard or:
+```bash
+curl -X POST http://localhost:8000/api/ingest/run
+```
+## Data Sources
+### Free (No API Key)
+| Source | Type | Coverage |
+|--------|------|----------|
+| Arbeitnow | Jobs | Tech jobs worldwide |
+| TheMuse | Jobs | Data Science, Engineering |
+| Remotive | Remote Jobs | Software, DevOps, Data |
+| ProFellow | Fellowships | Scholarships & Fellowships |
+| RemoteOK | Remote Jobs | AI, ML, Internships |
+| arXiv | Research | CS.CV, CS.RO, CS.AI papers |
+| HN Jobs | Jobs | Startup jobs |
+### With Free API Keys
+| Source | Type | Coverage |
+|--------|------|----------|
+| Adzuna | Jobs | Indeed, Monster, CareerBuilder |
+| Jooble | Jobs | LinkedIn, Indeed, Glassdoor (70+ sources) |
+| RapidAPI LinkedIn | Jobs | Direct LinkedIn job listings |
+| Superteam | Web3 | Bounties, grants |
+## API Endpoints
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/api/opportunities` | GET | List opportunities with filters |
+| `/api/opportunities/{id}` | GET | Get single opportunity |
+| `/api/opportunities/{id}/status` | PATCH | Update status (save, apply, dismiss) |
+| `/api/digest/daily` | GET | Get daily intelligence brief |
+| `/api/digest/weekly` | GET | Get weekly report |
+| `/api/digest/urgent` | GET | Get opportunities with deadlines |
+| `/api/ingest/run` | POST | Trigger full ingestion |
+| `/api/stats` | GET | Get system statistics |
+## Deployment
+### Local Development
+```bash
+uvicorn backend.main:app --reload
+```
+### Production (with Gunicorn)
+```bash
+gunicorn backend.main:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000
+```
+### Docker (Optional)
+```dockerfile
+FROM python:3.11-slim
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install -r requirements.txt
+COPY . .
+CMD ["uvicorn", "backend.main:app", "--host", "0.0.0.0", "--port", "8000"]
+```
+## Opportunity Categories
+- Scholarships & Fellowships
+- Internships & Jobs
+- Hackathons & Competitions
+- Research Opportunities
+- Grants & Funding
+- Open Source Programs
+- Web3 Bounties
+## Anti-Noise Rules
+PIOE automatically filters out:
+- Discussion posts ("How do I get an internship?")
+- Opinion-only content
+- Reposted/recycled news
+- "Top 10 tools" listicles
+- Low engagement social posts
+## License
+MIT
+---
+**Most people search. You detect.**

backend/__init__.py ADDED Viewed

	@@ -0,0 +1,17 @@

+"""
+PIOE Backend - Init
+"""
+from .config import get_settings
+from .database import SessionLocal, init_db, get_db
+from .models import Opportunity, Source, OpportunityCategory, OpportunityStatus
+__all__ = [
+    "get_settings",
+    "SessionLocal",
+    "init_db",
+    "get_db",
+    "Opportunity",
+    "Source",
+    "OpportunityCategory",
+    "OpportunityStatus"
+]

backend/config.py ADDED Viewed

	@@ -0,0 +1,76 @@

+"""
+PIOE Configuration Management
+"""
+from pydantic_settings import BaseSettings
+from functools import lru_cache
+from typing import Literal
+class Settings(BaseSettings):
+    """Application settings loaded from environment variables."""
+    # AI Configuration
+    ai_provider: Literal["gemini", "openai"] = "gemini"
+    gemini_api_key: str = ""
+    openai_api_key: str = ""
+    # Reddit API
+    reddit_client_id: str = ""
+    reddit_client_secret: str = ""
+    reddit_user_agent: str = "PIOE/1.0"
+    # GitHub API
+    github_token: str = ""
+    # ===========================================
+    # JOB BOARD APIs (Optional - get free keys)
+    # ===========================================
+    # Adzuna API (Free: 250 req/day)
+    # Get at: https://developer.adzuna.com/
+    adzuna_app_id: str = ""
+    adzuna_api_key: str = ""
+    # Jooble API (Free tier available)
+    # Get at: https://jooble.org/api/about
+    jooble_api_key: str = ""
+    # RapidAPI LinkedIn Jobs (Free: 100 req/month)
+    # Get at: https://rapidapi.com/jaypat87/api/linkedin-jobs-search
+    rapidapi_key: str = ""
+    # ===========================================
+    # Database
+    # ===========================================
+    database_url: str = "sqlite:///./pioe.db"
+    # Ingestion
+    ingestion_interval_hours: int = 6
+    # Scoring Thresholds (lower = more results saved)
+    min_relevance_score: float = 0.3  # Lowered from 0.4 for more results
+    min_novelty_score: float = 0.3
+    min_credibility_score: float = 0.5
+    # Keywords for relevance scoring
+    high_priority_keywords: list[str] = [
+        "computer vision", "robotics", "ROS", "PyTorch", "TensorFlow",
+        "machine learning", "deep learning", "neural network",
+        "internship", "fellowship", "scholarship", "grant", "funding",
+        "hackathon", "competition", "challenge", "bounty",
+        "research assistant", "PhD", "postdoc", "hiring",
+        "early-stage", "seed", "Series A", "startup",
+        "AI", "artificial intelligence", "data science", "NLP"
+    ]
+    class Config:
+        env_file = ".env"
+        env_file_encoding = "utf-8"
+        extra = "ignore"
+@lru_cache
+def get_settings() -> Settings:
+    """Get cached settings instance."""
+    return Settings()

backend/database.py ADDED Viewed

	@@ -0,0 +1,32 @@

+"""
+PIOE Database Configuration
+"""
+from sqlalchemy import create_engine
+from sqlalchemy.orm import sessionmaker, declarative_base
+from .config import get_settings
+settings = get_settings()
+engine = create_engine(
+    settings.database_url,
+    connect_args={"check_same_thread": False} if "sqlite" in settings.database_url else {}
+)
+SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
+Base = declarative_base()
+def get_db():
+    """Dependency for FastAPI to get database session."""
+    db = SessionLocal()
+    try:
+        yield db
+    finally:
+        db.close()
+def init_db():
+    """Initialize database tables."""
+    from . import models  # noqa: F401
+    Base.metadata.create_all(bind=engine)

backend/delivery/__init__.py ADDED Viewed

	@@ -0,0 +1,6 @@

+"""
+PIOE Delivery Layer - Init
+"""
+from .digest import DigestGenerator
+__all__ = ["DigestGenerator"]

backend/delivery/digest.py ADDED Viewed

	@@ -0,0 +1,171 @@

+"""
+PIOE Delivery Layer - Daily Digest Generator
+"""
+from datetime import datetime, timedelta
+from typing import Optional
+from sqlalchemy.orm import Session
+from ..models import Opportunity, OpportunityCategory, OpportunityStatus
+class DigestGenerator:
+    """
+    Generates daily/weekly opportunity digests.
+    Outputs in markdown format for easy reading.
+    """
+    def __init__(self, db: Session):
+        self.db = db
+    def generate_daily(self, limit: int = 10) -> str:
+        """Generate today's top opportunities digest."""
+        since = datetime.utcnow() - timedelta(days=1)
+        opportunities = self.db.query(Opportunity).filter(
+            Opportunity.discovered_at >= since,
+            Opportunity.status == OpportunityStatus.NEW
+        ).order_by(
+            Opportunity.combined_score.desc()
+        ).limit(limit).all()
+        return self._format_digest(opportunities, "Daily Intelligence Brief")
+    def generate_weekly(self, limit: int = 25) -> str:
+        """Generate weekly opportunities digest."""
+        since = datetime.utcnow() - timedelta(days=7)
+        opportunities = self.db.query(Opportunity).filter(
+            Opportunity.discovered_at >= since,
+            Opportunity.status == OpportunityStatus.NEW
+        ).order_by(
+            Opportunity.combined_score.desc()
+        ).limit(limit).all()
+        return self._format_digest(opportunities, "Weekly Intelligence Report")
+    def generate_by_category(
+        self,
+        category: OpportunityCategory,
+        limit: int = 10
+    ) -> str:
+        """Generate digest for a specific category."""
+        since = datetime.utcnow() - timedelta(days=7)
+        opportunities = self.db.query(Opportunity).filter(
+            Opportunity.discovered_at >= since,
+            Opportunity.category == category,
+            Opportunity.status == OpportunityStatus.NEW
+        ).order_by(
+            Opportunity.combined_score.desc()
+        ).limit(limit).all()
+        return self._format_digest(
+            opportunities,
+            f"{category.value.title()} Opportunities"
+        )
+    def generate_urgent(self, limit: int = 10) -> str:
+        """Generate digest for time-sensitive opportunities."""
+        now = datetime.utcnow()
+        soon = now + timedelta(days=14)
+        opportunities = self.db.query(Opportunity).filter(
+            Opportunity.deadline.isnot(None),
+            Opportunity.deadline > now,
+            Opportunity.deadline <= soon,
+            Opportunity.status == OpportunityStatus.NEW
+        ).order_by(
+            Opportunity.deadline.asc()
+        ).limit(limit).all()
+        return self._format_digest(opportunities, "⚡ Urgent - Deadlines Approaching")
+    def _format_digest(self, opportunities: list[Opportunity], title: str) -> str:
+        """Format opportunities into markdown digest."""
+        lines = [
+            f"# {title}",
+            f"*Generated: {datetime.utcnow().strftime('%Y-%m-%d %H:%M UTC')}*",
+            "",
+            f"**{len(opportunities)} opportunities detected**",
+            "",
+            "---",
+            ""
+        ]
+        if not opportunities:
+            lines.append("*No new opportunities matching your criteria.*")
+            return "\n".join(lines)
+        for i, opp in enumerate(opportunities, 1):
+            lines.extend(self._format_opportunity(opp, i))
+        # Summary stats
+        lines.extend([
+            "",
+            "---",
+            "",
+            "## Quick Stats",
+            "",
+            self._generate_stats(opportunities)
+        ])
+        return "\n".join(lines)
+    def _format_opportunity(self, opp: Opportunity, index: int) -> list[str]:
+        """Format single opportunity."""
+        # Category emoji
+        cat_emoji = {
+            OpportunityCategory.SCHOLARSHIP: "🎓",
+            OpportunityCategory.FELLOWSHIP: "🏆",
+            OpportunityCategory.INTERNSHIP: "💼",
+            OpportunityCategory.JOB: "👔",
+            OpportunityCategory.HACKATHON: "🚀",
+            OpportunityCategory.COMPETITION: "🏅",
+            OpportunityCategory.GRANT: "💰",
+            OpportunityCategory.RESEARCH: "🔬",
+            OpportunityCategory.OPEN_SOURCE: "💻",
+            OpportunityCategory.CONFERENCE: "📅",
+        }.get(opp.category, "📌")
+        # Score indicator
+        score_stars = "⭐" * min(int(opp.combined_score * 5), 5)
+        lines = [
+            f"### {index}. {cat_emoji} {opp.title}",
+            "",
+            f"**Category:** {opp.category.value.replace('_', ' ').title()}",
+            f"**Domain:** {opp.domain.value.replace('_', ' ').title()}",
+            f"**Source:** {opp.source_name}",
+            f"**Score:** {score_stars} ({opp.combined_score:.2f})",
+        ]
+        if opp.deadline:
+            days_left = (opp.deadline - datetime.utcnow()).days
+            urgency = "🔴" if days_left < 7 else "🟡" if days_left < 14 else "🟢"
+            lines.append(f"**Deadline:** {urgency} {opp.deadline.strftime('%Y-%m-%d')} ({days_left} days)")
+        lines.extend([
+            "",
+            f"> {opp.raw_text[:300]}..." if len(opp.raw_text or '') > 300 else f"> {opp.raw_text}",
+            "",
+            f"🔗 [View Opportunity]({opp.url})",
+            "",
+            "---",
+            ""
+        ])
+        return lines
+    def _generate_stats(self, opportunities: list[Opportunity]) -> str:
+        """Generate summary statistics."""
+        from collections import Counter
+        categories = Counter(o.category.value for o in opportunities)
+        domains = Counter(o.domain.value for o in opportunities)
+        stats = ["| Metric | Value |", "|--------|-------|"]
+        for cat, count in categories.most_common(5):
+            stats.append(f"| {cat.replace('_', ' ').title()} | {count} |")
+        return "\n".join(stats)

backend/ingestion/__init__.py ADDED Viewed

	@@ -0,0 +1,27 @@

+"""
+PIOE Ingestion Layer - Version 2.0
+"""
+from .arxiv_client import ArxivClient
+from .github_client import GitHubClient
+from .rss_client import RSSClient
+from .reddit_client import RedditClient
+from .superteam_client import SuperteamClient
+from .web_scraper import WebScraper
+from .careers_client import CareersClient, InternshipClient
+from .grants_client import GrantsClient, NigeriaGrantsClient
+from .scheduler import IngestionScheduler
+__all__ = [
+    "ArxivClient",
+    "GitHubClient",
+    "RSSClient",
+    "RedditClient",
+    "SuperteamClient",
+    "WebScraper",
+    "CareersClient",
+    "InternshipClient",
+    "GrantsClient",
+    "NigeriaGrantsClient",
+    "IngestionScheduler"
+]

backend/ingestion/arxiv_client.py ADDED Viewed

	@@ -0,0 +1,124 @@

+"""
+PIOE arXiv Client
+Fetches papers from arXiv API for CS.CV, CS.RO, CS.AI, CS.LG categories.
+"""
+import httpx
+from datetime import datetime
+from typing import Optional
+import xml.etree.ElementTree as ET
+class ArxivClient:
+    """
+    Client for arXiv API to fetch recent papers.
+    High credibility source for academic research.
+    """
+    BASE_URL = "https://export.arxiv.org/api/query"
+    # Target categories for PIOE
+    CATEGORIES = [
+        "cs.CV",   # Computer Vision
+        "cs.RO",   # Robotics
+        "cs.AI",   # Artificial Intelligence
+        "cs.LG",   # Machine Learning
+        "cs.CL",   # Computation and Language (NLP)
+    ]
+    def __init__(self, max_results: int = 50):
+        self.max_results = max_results
+    async def fetch(self, categories: Optional[list[str]] = None) -> list[dict]:
+        """
+        Fetch recent papers from specified categories.
+        Returns list of normalized opportunity dicts.
+        """
+        categories = categories or self.CATEGORIES
+        # Build query for multiple categories
+        cat_query = " OR ".join(f"cat:{cat}" for cat in categories)
+        params = {
+            "search_query": cat_query,
+            "start": 0,
+            "max_results": self.max_results,
+            "sortBy": "submittedDate",
+            "sortOrder": "descending"
+        }
+        async with httpx.AsyncClient() as client:
+            response = await client.get(
+                self.BASE_URL,
+                params=params,
+                timeout=30,
+                follow_redirects=True
+            )
+            response.raise_for_status()
+        return self._parse_response(response.text)
+    def _parse_response(self, xml_content: str) -> list[dict]:
+        """Parse arXiv Atom feed into normalized opportunities."""
+        opportunities = []
+        # Parse XML
+        root = ET.fromstring(xml_content)
+        ns = {"atom": "http://www.w3.org/2005/Atom"}
+        for entry in root.findall("atom:entry", ns):
+            try:
+                # Extract fields
+                title = entry.find("atom:title", ns)
+                summary = entry.find("atom:summary", ns)
+                published = entry.find("atom:published", ns)
+                link = entry.find("atom:id", ns)
+                # Get authors
+                authors = [
+                    author.find("atom:name", ns).text
+                    for author in entry.findall("atom:author", ns)
+                    if author.find("atom:name", ns) is not None
+                ]
+                # Get categories
+                categories = [
+                    cat.get("term") for cat in entry.findall("atom:category", ns)
+                ]
+                opportunity = {
+                    "title": title.text.strip().replace("\n", " ") if title is not None else "",
+                    "raw_text": summary.text.strip().replace("\n", " ") if summary is not None else "",
+                    "url": link.text if link is not None else "",
+                    "source_type": "arxiv",
+                    "source_name": "arXiv",
+                    "published_at": self._parse_date(published.text) if published is not None else None,
+                    "metadata": {
+                        "authors": authors,
+                        "categories": categories
+                    }
+                }
+                opportunities.append(opportunity)
+            except Exception as e:
+                print(f"Error parsing arXiv entry: {e}")
+                continue
+        return opportunities
+    def _parse_date(self, date_str: str) -> Optional[datetime]:
+        """Parse arXiv date format."""
+        try:
+            return datetime.fromisoformat(date_str.replace("Z", "+00:00"))
+        except Exception:
+            return None
+# Sync wrapper for non-async usage
+def fetch_arxiv_sync(max_results: int = 50) -> list[dict]:
+    """Synchronous wrapper for arXiv fetch."""
+    import asyncio
+    client = ArxivClient(max_results)
+    return asyncio.run(client.fetch())

backend/ingestion/careers_client.py ADDED Viewed

	@@ -0,0 +1,403 @@

+"""
+PIOE Careers Client
+Tracks job/internship opportunities from major tech companies.
+Microsoft, NVIDIA, Google, Meta, OpenAI, DeepMind, etc.
+"""
+import httpx
+from datetime import datetime
+from typing import Optional
+from bs4 import BeautifulSoup
+import re
+class CareersClient:
+    """
+    Scrapes career pages from major tech companies.
+    Focuses on AI, robotics, and computer vision roles.
+    """
+    # Target companies with their career page configurations
+    COMPANIES = [
+        # Microsoft
+        {
+            "name": "Microsoft",
+            "search_url": "https://careers.microsoft.com/v2/global/en/search.json",
+            "type": "api",
+            "keywords": ["computer vision", "robotics", "machine learning", "AI", "research"],
+            "filters": {"lc": "United States", "exp": "Internship"}
+        },
+        # NVIDIA
+        {
+            "name": "NVIDIA",
+            "search_url": "https://nvidia.wd5.myworkdayjobs.com/wday/cxs/nvidia/NVIDIAExternalCareerSite/jobs",
+            "type": "workday",
+            "keywords": ["computer vision", "robotics", "deep learning", "AI research", "intern"]
+        },
+        # Google
+        {
+            "name": "Google",
+            "rss_url": "https://careers.google.com/jobs/rss",
+            "type": "rss",
+            "keywords": ["machine learning", "research", "robotics", "computer vision", "intern"]
+        },
+        # Meta
+        {
+            "name": "Meta",
+            "search_url": "https://www.metacareers.com/jobs",
+            "type": "scrape",
+            "keywords": ["AI", "research", "robotics", "computer vision", "intern"]
+        },
+        # OpenAI
+        {
+            "name": "OpenAI",
+            "careers_url": "https://openai.com/careers",
+            "type": "scrape",
+            "keywords": ["research", "engineering", "intern"]
+        },
+        # DeepMind
+        {
+            "name": "DeepMind",
+            "careers_url": "https://deepmind.google/about/careers/",
+            "type": "scrape",
+            "keywords": ["research", "intern", "robotics"]
+        },
+        # Boston Dynamics
+        {
+            "name": "Boston Dynamics",
+            "careers_url": "https://bostondynamics.wd1.myworkdayjobs.com/Boston_Dynamics",
+            "type": "workday",
+            "keywords": ["robotics", "perception", "control", "intern"]
+        },
+        # Tesla (Optimus/AI)
+        {
+            "name": "Tesla AI",
+            "careers_url": "https://www.tesla.com/careers/search/?query=AI%20robotics",
+            "type": "scrape",
+            "keywords": ["autopilot", "optimus", "robotics", "computer vision", "intern"]
+        },
+    ]
+    # Internship-specific keywords
+    INTERNSHIP_KEYWORDS = [
+        "intern", "internship", "co-op", "summer", "student",
+        "graduate", "new grad", "entry level", "early career"
+    ]
+    def __init__(self):
+        self._headers = {
+            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
+        }
+    async def fetch_all(self, internship_only: bool = False) -> list[dict]:
+        """
+        Fetch opportunities from all configured companies.
+        Args:
+            internship_only: If True, filter to only internship positions
+        """
+        all_opportunities = []
+        for company in self.COMPANIES:
+            try:
+                opps = await self.fetch_company(company)
+                if internship_only:
+                    opps = [o for o in opps if self._is_internship(o)]
+                all_opportunities.extend(opps)
+            except Exception as e:
+                print(f"Error fetching {company['name']}: {e}")
+        return all_opportunities
+    async def fetch_company(self, company: dict) -> list[dict]:
+        """Fetch jobs from a specific company."""
+        if company["type"] == "scrape":
+            return await self._scrape_careers_page(company)
+        elif company["type"] == "rss":
+            return await self._fetch_rss_careers(company)
+        elif company["type"] == "workday":
+            return await self._fetch_workday(company)
+        else:
+            return await self._scrape_careers_page(company)
+    async def _scrape_careers_page(self, company: dict) -> list[dict]:
+        """Scrape a generic careers page."""
+        url = company.get("careers_url") or company.get("search_url")
+        async with httpx.AsyncClient() as client:
+            response = await client.get(
+                url,
+                headers=self._headers,
+                timeout=30,
+                follow_redirects=True
+            )
+            response.raise_for_status()
+        soup = BeautifulSoup(response.text, "html.parser")
+        opportunities = []
+        # Look for job listing elements (common patterns)
+        job_selectors = [
+            "article", ".job-listing", ".job-card", ".position",
+            "[data-job]", ".career-item", ".opening"
+        ]
+        jobs = []
+        for selector in job_selectors:
+            jobs = soup.select(selector)
+            if jobs:
+                break
+        for job in jobs[:30]:
+            try:
+                title_el = job.select_one("h2, h3, h4, .title, .job-title")
+                link_el = job.select_one("a[href]")
+                location_el = job.select_one(".location, .job-location")
+                if not title_el:
+                    continue
+                title = title_el.get_text(strip=True)
+                # Filter by keywords
+                if not self._matches_keywords(title, company.get("keywords", [])):
+                    continue
+                link = ""
+                if link_el and link_el.get("href"):
+                    href = link_el["href"]
+                    if href.startswith("http"):
+                        link = href
+                    else:
+                        from urllib.parse import urljoin
+                        link = urljoin(url, href)
+                opportunity = {
+                    "title": f"[{company['name']}] {title}",
+                    "raw_text": job.get_text(strip=True)[:500],
+                    "url": link or url,
+                    "source_type": "web_scrape",
+                    "source_name": f"{company['name']} Careers",
+                    "published_at": datetime.utcnow(),
+                    "metadata": {
+                        "company": company["name"],
+                        "location": location_el.get_text(strip=True) if location_el else None,
+                        "is_internship": self._is_internship({"title": title})
+                    }
+                }
+                opportunities.append(opportunity)
+            except Exception as e:
+                print(f"Error parsing job listing: {e}")
+        return opportunities
+    async def _fetch_workday(self, company: dict) -> list[dict]:
+        """Fetch from Workday-based career sites."""
+        url = company.get("search_url") or company.get("careers_url")
+        # Workday API format
+        payload = {
+            "limit": 20,
+            "offset": 0,
+            "searchText": " ".join(company.get("keywords", [])[:3])
+        }
+        try:
+            async with httpx.AsyncClient() as client:
+                response = await client.post(
+                    url,
+                    json=payload,
+                    headers={**self._headers, "Content-Type": "application/json"},
+                    timeout=30
+                )
+                response.raise_for_status()
+            data = response.json()
+            jobs = data.get("jobPostings", [])
+            return [
+                {
+                    "title": f"[{company['name']}] {job.get('title', '')}",
+                    "raw_text": job.get("bulletFields", [""])[0] if job.get("bulletFields") else "",
+                    "url": job.get("externalPath", url),
+                    "source_type": "web_scrape",
+                    "source_name": f"{company['name']} Careers",
+                    "published_at": datetime.utcnow(),
+                    "metadata": {
+                        "company": company["name"],
+                        "location": job.get("locationsText"),
+                        "is_internship": self._is_internship({"title": job.get("title", "")})
+                    }
+                }
+                for job in jobs
+            ]
+        except Exception as e:
+            print(f"Workday fetch error: {e}")
+            return await self._scrape_careers_page(company)
+    async def _fetch_rss_careers(self, company: dict) -> list[dict]:
+        """Fetch from RSS-based career feeds."""
+        import feedparser
+        url = company.get("rss_url")
+        async with httpx.AsyncClient() as client:
+            response = await client.get(url, headers=self._headers, timeout=30)
+            content = response.text
+        feed = feedparser.parse(content)
+        opportunities = []
+        for entry in feed.entries[:20]:
+            title = entry.get("title", "")
+            if not self._matches_keywords(title, company.get("keywords", [])):
+                continue
+            opportunities.append({
+                "title": f"[{company['name']}] {title}",
+                "raw_text": entry.get("summary", "")[:500],
+                "url": entry.get("link", ""),
+                "source_type": "rss",
+                "source_name": f"{company['name']} Careers",
+                "published_at": datetime.utcnow(),
+                "metadata": {
+                    "company": company["name"],
+                    "is_internship": self._is_internship({"title": title})
+                }
+            })
+        return opportunities
+    def _matches_keywords(self, text: str, keywords: list[str]) -> bool:
+        """Check if text matches any keyword."""
+        if not keywords:
+            return True
+        text_lower = text.lower()
+        return any(kw.lower() in text_lower for kw in keywords)
+    def _is_internship(self, opportunity: dict) -> bool:
+        """Check if opportunity is an internship."""
+        title = opportunity.get("title", "").lower()
+        text = opportunity.get("raw_text", "").lower()
+        combined = f"{title} {text}"
+        return any(kw in combined for kw in self.INTERNSHIP_KEYWORDS)
+class InternshipClient:
+    """
+    Dedicated client for finding internship opportunities.
+    Aggregates from multiple sources with internship focus.
+    """
+    # Internship-focused sites
+    INTERNSHIP_SOURCES = [
+        {
+            "name": "LinkedIn Internships",
+            "url": "https://www.linkedin.com/jobs/search/?keywords=AI%20robotics%20internship",
+            "type": "scrape"
+        },
+        {
+            "name": "Indeed Internships",
+            "url": "https://www.indeed.com/jobs?q=machine+learning+intern",
+            "type": "scrape"
+        },
+        {
+            "name": "Glassdoor Internships",
+            "url": "https://www.glassdoor.com/Job/computer-vision-intern-jobs-SRCH_KO0,22.htm",
+            "type": "scrape"
+        },
+        {
+            "name": "WayUp",
+            "url": "https://www.wayup.com/s/internships/computer-science/",
+            "type": "scrape"
+        },
+        {
+            "name": "Handshake",
+            "url": "https://joinhandshake.com",
+            "type": "scrape"
+        }
+    ]
+    def __init__(self):
+        self.careers_client = CareersClient()
+        self._headers = {
+            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
+        }
+    async def fetch_all(self) -> list[dict]:
+        """Fetch internships from all sources."""
+        opportunities = []
+        # Get internships from major companies
+        try:
+            company_internships = await self.careers_client.fetch_all(internship_only=True)
+            opportunities.extend(company_internships)
+        except Exception as e:
+            print(f"Careers client error: {e}")
+        # Scrape internship-focused sites
+        for source in self.INTERNSHIP_SOURCES[:3]:  # Limit to avoid rate limiting
+            try:
+                opps = await self._scrape_internship_site(source)
+                opportunities.extend(opps)
+            except Exception as e:
+                print(f"Error fetching {source['name']}: {e}")
+        return opportunities
+    async def _scrape_internship_site(self, source: dict) -> list[dict]:
+        """Scrape an internship-focused site."""
+        try:
+            async with httpx.AsyncClient() as client:
+                response = await client.get(
+                    source["url"],
+                    headers=self._headers,
+                    timeout=30,
+                    follow_redirects=True
+                )
+                response.raise_for_status()
+        except Exception:
+            return []
+        soup = BeautifulSoup(response.text, "html.parser")
+        opportunities = []
+        # Find job cards
+        cards = soup.select(".job-card, .job-listing, article, .result")[:15]
+        for card in cards:
+            try:
+                title_el = card.select_one("h2, h3, .title, .job-title")
+                if not title_el:
+                    continue
+                title = title_el.get_text(strip=True)
+                link_el = card.select_one("a[href]")
+                link = ""
+                if link_el and link_el.get("href"):
+                    from urllib.parse import urljoin
+                    link = urljoin(source["url"], link_el["href"])
+                opportunities.append({
+                    "title": f"[Internship] {title}",
+                    "raw_text": card.get_text(strip=True)[:500],
+                    "url": link or source["url"],
+                    "source_type": "web_scrape",
+                    "source_name": source["name"],
+                    "published_at": datetime.utcnow(),
+                    "metadata": {
+                        "is_internship": True,
+                        "source_site": source["name"]
+                    }
+                })
+            except Exception:
+                continue
+        return opportunities

backend/ingestion/github_client.py ADDED Viewed

	@@ -0,0 +1,154 @@

+"""
+PIOE GitHub Client
+Tracks trending repositories and star velocity for AI/Robotics/CV projects.
+"""
+import httpx
+from datetime import datetime, timedelta
+from typing import Optional
+class GitHubClient:
+    """
+    Client for GitHub API to discover trending repositories.
+    Tracks star velocity and contributor growth.
+    """
+    BASE_URL = "https://api.github.com"
+    # Search queries for relevant topics
+    SEARCH_TOPICS = [
+        "computer-vision",
+        "robotics",
+        "machine-learning",
+        "deep-learning",
+        "ros",
+        "pytorch",
+        "transformers",
+        "llm"
+    ]
+    def __init__(self, token: Optional[str] = None, max_results: int = 30):
+        self.token = token
+        self.max_results = max_results
+        self._headers = {
+            "Accept": "application/vnd.github+json",
+            "X-GitHub-Api-Version": "2022-11-28"
+        }
+        if token:
+            self._headers["Authorization"] = f"Bearer {token}"
+    async def fetch_trending(self, topics: Optional[list[str]] = None) -> list[dict]:
+        """
+        Fetch recently popular repositories in target topics.
+        Returns list of normalized opportunity dicts.
+        """
+        topics = topics or self.SEARCH_TOPICS
+        opportunities = []
+        # Get repos created or updated in last 7 days with high stars
+        week_ago = (datetime.utcnow() - timedelta(days=7)).strftime("%Y-%m-%d")
+        for topic in topics[:5]:  # Limit to avoid rate limiting
+            try:
+                repos = await self._search_repos(topic, week_ago)
+                opportunities.extend(repos)
+            except Exception as e:
+                print(f"GitHub search error for {topic}: {e}")
+        # Deduplicate by URL
+        seen_urls = set()
+        unique = []
+        for opp in opportunities:
+            if opp["url"] not in seen_urls:
+                seen_urls.add(opp["url"])
+                unique.append(opp)
+        return unique[:self.max_results]
+    async def _search_repos(self, topic: str, since_date: str) -> list[dict]:
+        """Search for repositories by topic."""
+        query = f"topic:{topic} pushed:>{since_date} stars:>50"
+        async with httpx.AsyncClient() as client:
+            response = await client.get(
+                f"{self.BASE_URL}/search/repositories",
+                params={
+                    "q": query,
+                    "sort": "stars",
+                    "order": "desc",
+                    "per_page": 10
+                },
+                headers=self._headers,
+                timeout=30,
+                follow_redirects=True
+            )
+            response.raise_for_status()
+        data = response.json()
+        return self._parse_repos(data.get("items", []), topic)
+    def _parse_repos(self, repos: list, topic: str) -> list[dict]:
+        """Parse GitHub repos into normalized opportunities."""
+        opportunities = []
+        for repo in repos:
+            try:
+                opportunity = {
+                    "title": f"[GitHub] {repo['full_name']}: {repo.get('description', '')[:100]}",
+                    "raw_text": repo.get("description", "") or "",
+                    "url": repo["html_url"],
+                    "source_type": "github",
+                    "source_name": f"GitHub/{topic}",
+                    "published_at": self._parse_date(repo.get("created_at")),
+                    "social_engagement": repo.get("stargazers_count", 0),
+                    "metadata": {
+                        "owner": repo["owner"]["login"],
+                        "stars": repo.get("stargazers_count", 0),
+                        "forks": repo.get("forks_count", 0),
+                        "language": repo.get("language"),
+                        "topics": repo.get("topics", []),
+                        "open_issues": repo.get("open_issues_count", 0),
+                        "updated_at": repo.get("updated_at")
+                    }
+                }
+                opportunities.append(opportunity)
+            except Exception as e:
+                print(f"Error parsing repo: {e}")
+        return opportunities
+    async def fetch_gsoc_repos(self) -> list[dict]:
+        """Fetch Google Summer of Code related repositories."""
+        async with httpx.AsyncClient() as client:
+            response = await client.get(
+                f"{self.BASE_URL}/search/repositories",
+                params={
+                    "q": "topic:gsoc OR topic:google-summer-of-code",
+                    "sort": "updated",
+                    "per_page": 20
+                },
+                headers=self._headers,
+                timeout=30,
+                follow_redirects=True
+            )
+            response.raise_for_status()
+        data = response.json()
+        repos = self._parse_repos(data.get("items", []), "gsoc")
+        # Mark as open source opportunity
+        for repo in repos:
+            repo["title"] = f"[GSoC] {repo['title'].replace('[GitHub] ', '')}"
+        return repos
+    def _parse_date(self, date_str: Optional[str]) -> Optional[datetime]:
+        """Parse GitHub date format."""
+        if not date_str:
+            return None
+        try:
+            return datetime.fromisoformat(date_str.replace("Z", "+00:00"))
+        except Exception:
+            return None

backend/ingestion/grants_client.py ADDED Viewed

	@@ -0,0 +1,385 @@

+"""
+PIOE Grants Client - Version 2.0
+Fetches grant opportunities from crypto ecosystems and funding platforms.
+High-leverage opportunities with money + credibility + access.
+"""
+import httpx
+from datetime import datetime
+from typing import Optional
+from bs4 import BeautifulSoup
+class GrantsClient:
+    """
+    Client for fetching grants from crypto ecosystems and funding platforms.
+    Prioritizes: Ethereum, Solana, Base, Starknet, Gitcoin.
+    """
+    # Grant sources with their configurations
+    GRANT_SOURCES = [
+        # Ethereum Ecosystem
+        {
+            "name": "Ethereum Foundation Grants",
+            "url": "https://esp.ethereum.foundation/",
+            "ecosystem": "ethereum",
+            "type": "ecosystem_grant",
+            "typical_size": (5000, 100000),
+        },
+        # Solana Ecosystem
+        {
+            "name": "Solana Foundation Grants",
+            "url": "https://solana.org/grants",
+            "ecosystem": "solana",
+            "type": "ecosystem_grant",
+            "typical_size": (5000, 50000),
+        },
+        # Base (Coinbase L2)
+        {
+            "name": "Base Builder Grants",
+            "url": "https://base.org/builders",
+            "ecosystem": "base",
+            "type": "ecosystem_grant",
+            "typical_size": (5000, 25000),
+        },
+        # Starknet
+        {
+            "name": "Starknet Grants",
+            "url": "https://www.starknet.io/ecosystem/grants/",
+            "ecosystem": "starknet",
+            "type": "ecosystem_grant",
+            "typical_size": (5000, 50000),
+        },
+        # Gitcoin
+        {
+            "name": "Gitcoin Grants",
+            "url": "https://gitcoin.co/grants",
+            "ecosystem": "gitcoin",
+            "type": "micro_grant",
+            "typical_size": (500, 10000),
+        },
+        # Protocol-specific
+        {
+            "name": "Uniswap Grants",
+            "url": "https://www.uniswapfoundation.org/grants",
+            "ecosystem": "ethereum",
+            "type": "ecosystem_grant",
+            "typical_size": (10000, 100000),
+        },
+        {
+            "name": "Aave Grants DAO",
+            "url": "https://aavegrants.org/",
+            "ecosystem": "ethereum",
+            "type": "ecosystem_grant",
+            "typical_size": (5000, 100000),
+        },
+        {
+            "name": "Polygon Grants",
+            "url": "https://polygon.technology/village/grants",
+            "ecosystem": "polygon",
+            "type": "ecosystem_grant",
+            "typical_size": (5000, 50000),
+        },
+    ]
+    # RSS/API sources for grants
+    GRANT_RSS_FEEDS = [
+        {
+            "name": "Ethereum Blog - Grants",
+            "url": "https://blog.ethereum.org/feed.xml",
+            "filter_keywords": ["grant", "funding", "ecosystem"],
+        },
+    ]
+    def __init__(self):
+        self._headers = {
+            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
+        }
+    async def fetch_all(self) -> list[dict]:
+        """Fetch grants from all configured sources."""
+        opportunities = []
+        # Fetch from grant pages
+        for source in self.GRANT_SOURCES:
+            try:
+                grants = await self._scrape_grant_page(source)
+                opportunities.extend(grants)
+            except Exception as e:
+                print(f"Error fetching {source['name']}: {e}")
+        return opportunities
+    async def _scrape_grant_page(self, source: dict) -> list[dict]:
+        """Scrape a grant program page for opportunities."""
+        try:
+            async with httpx.AsyncClient() as client:
+                response = await client.get(
+                    source["url"],
+                    headers=self._headers,
+                    timeout=30,
+                    follow_redirects=True
+                )
+                if response.status_code != 200:
+                    return []
+                html = response.text
+        except Exception as e:
+            print(f"HTTP error for {source['name']}: {e}")
+            return []
+        soup = BeautifulSoup(html, "html.parser")
+        # Create a single opportunity for the grant program
+        # (These pages describe the program, not individual grants)
+        opportunity = {
+            "title": f"[{source['ecosystem'].upper()}] {source['name']}",
+            "raw_text": self._extract_page_text(soup)[:2000],
+            "url": source["url"],
+            "source_type": "grant_platform",
+            "source_name": source["name"],
+            "published_at": datetime.utcnow(),
+            "metadata": {
+                "ecosystem": source["ecosystem"],
+                "grant_type": source["type"],
+                "grant_size_min": source["typical_size"][0],
+                "grant_size_max": source["typical_size"][1],
+                "region": "global",
+                "technical_depth": "intermediate",
+            }
+        }
+        return [opportunity]
+    def _extract_page_text(self, soup: BeautifulSoup) -> str:
+        """Extract meaningful text from page."""
+        # Remove scripts and styles
+        for tag in soup(["script", "style", "nav", "footer", "header"]):
+            tag.decompose()
+        # Get text
+        text = soup.get_text(separator=" ", strip=True)
+        return " ".join(text.split())[:2000]
+    async def fetch_active_rounds(self) -> list[dict]:
+        """Fetch currently active grant rounds from Gitcoin."""
+        # Gitcoin has an API for active rounds
+        try:
+            async with httpx.AsyncClient() as client:
+                # This is a simplified version - actual API may differ
+                response = await client.get(
+                    "https://api.gitcoin.co/grants/rounds/active",
+                    headers=self._headers,
+                    timeout=30,
+                    follow_redirects=True
+                )
+                if response.status_code == 200:
+                    data = response.json()
+                    return self._parse_gitcoin_rounds(data)
+        except Exception as e:
+            print(f"Error fetching Gitcoin rounds: {e}")
+        return []
+    def _parse_gitcoin_rounds(self, data: dict) -> list[dict]:
+        """Parse Gitcoin API response into opportunities."""
+        opportunities = []
+        for round_data in data.get("rounds", []):
+            opportunity = {
+                "title": f"[GITCOIN] {round_data.get('name', 'Gitcoin Round')}",
+                "raw_text": round_data.get("description", ""),
+                "url": f"https://gitcoin.co/grants/{round_data.get('id', '')}",
+                "source_type": "grant_platform",
+                "source_name": "Gitcoin",
+                "published_at": datetime.utcnow(),
+                "deadline": self._parse_date(round_data.get("end_date")),
+                "metadata": {
+                    "ecosystem": "gitcoin",
+                    "grant_type": "micro_grant",
+                    "matching_pool": round_data.get("matching_pool", 0),
+                    "grant_size_min": 100,
+                    "grant_size_max": 10000,
+                    "region": "global",
+                }
+            }
+            opportunities.append(opportunity)
+        return opportunities
+    def _parse_date(self, date_str: Optional[str]) -> Optional[datetime]:
+        """Parse date string."""
+        if not date_str:
+            return None
+        try:
+            return datetime.fromisoformat(date_str.replace("Z", "+00:00"))
+        except Exception:
+            return None
+class NigeriaGrantsClient:
+    """
+    Client for Nigeria-specific funding and grant opportunities.
+    Focuses on: NITDA, CcHub, BOI, Government programs.
+    """
+    # Nigeria-specific grant sources
+    NIGERIA_SOURCES = [
+        {
+            "name": "NITDA Programs",
+            "url": "https://nitda.gov.ng/",
+            "type": "innovation_fund",
+            "region": "nigeria",
+        },
+        {
+            "name": "CcHub Accelerator",
+            "url": "https://cchubnigeria.com/",
+            "type": "grant",
+            "region": "nigeria",
+        },
+        {
+            "name": "Tony Elumelu Foundation",
+            "url": "https://www.tonyelumelufoundation.org/",
+            "type": "grant",
+            "region": "africa",
+        },
+        {
+            "name": "Ventures Platform",
+            "url": "https://www.venturesplatform.com/",
+            "type": "investment",
+            "region": "africa",
+        },
+        {
+            "name": "BoI Youth Entrepreneurship",
+            "url": "https://www.boi.ng/",
+            "type": "innovation_fund",
+            "region": "nigeria",
+        },
+    ]
+    # RSS feeds for Nigeria tech news
+    NIGERIA_RSS = [
+        {"name": "TechCabal", "url": "https://techcabal.com/feed/"},
+        {"name": "Disrupt Africa", "url": "https://disrupt-africa.com/feed/"},
+    ]
+    def __init__(self):
+        self._headers = {
+            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
+        }
+    async def fetch_all(self) -> list[dict]:
+        """Fetch all Nigeria-specific opportunities."""
+        opportunities = []
+        # Fetch from Nigeria sources
+        for source in self.NIGERIA_SOURCES:
+            try:
+                opps = await self._fetch_source(source)
+                opportunities.extend(opps)
+            except Exception as e:
+                print(f"Error fetching {source['name']}: {e}")
+        # Fetch from RSS feeds
+        for feed in self.NIGERIA_RSS:
+            try:
+                opps = await self._fetch_rss(feed)
+                opportunities.extend(opps)
+            except Exception as e:
+                print(f"Error fetching {feed['name']}: {e}")
+        return opportunities
+    async def _fetch_source(self, source: dict) -> list[dict]:
+        """Fetch from a Nigeria source."""
+        try:
+            async with httpx.AsyncClient() as client:
+                response = await client.get(
+                    source["url"],
+                    headers=self._headers,
+                    timeout=30,
+                    follow_redirects=True
+                )
+                if response.status_code != 200:
+                    return []
+                html = response.text
+        except Exception as e:
+            print(f"HTTP error for {source['name']}: {e}")
+            return []
+        soup = BeautifulSoup(html, "html.parser")
+        # Create opportunity for the program
+        opportunity = {
+            "title": f"[NIGERIA] {source['name']}",
+            "raw_text": self._extract_text(soup)[:2000],
+            "url": source["url"],
+            "source_type": "gov_portal",
+            "source_name": source["name"],
+            "published_at": datetime.utcnow(),
+            "metadata": {
+                "region": source["region"],
+                "grant_type": source["type"],
+                "nigeria_specific": True,
+            }
+        }
+        return [opportunity]
+    async def _fetch_rss(self, feed: dict) -> list[dict]:
+        """Fetch from an RSS feed and filter for opportunities."""
+        import feedparser
+        try:
+            async with httpx.AsyncClient() as client:
+                response = await client.get(
+                    feed["url"],
+                    headers=self._headers,
+                    timeout=30,
+                    follow_redirects=True
+                )
+                content = response.text
+        except Exception as e:
+            print(f"Error fetching {feed['name']}: {e}")
+            return []
+        parsed = feedparser.parse(content)
+        opportunities = []
+        # Keywords indicating opportunities
+        opportunity_keywords = [
+            "grant", "funding", "accelerator", "apply", "opportunity",
+            "fellowship", "program", "investment", "startup", "launch"
+        ]
+        for entry in parsed.entries[:20]:
+            title = entry.get("title", "").lower()
+            summary = entry.get("summary", "").lower()
+            # Check if contains opportunity keywords
+            if any(kw in title or kw in summary for kw in opportunity_keywords):
+                opportunity = {
+                    "title": f"[AFRICA] {entry.get('title', '')}",
+                    "raw_text": entry.get("summary", "")[:2000],
+                    "url": entry.get("link", ""),
+                    "source_type": "rss",
+                    "source_name": feed["name"],
+                    "published_at": datetime.utcnow(),
+                    "metadata": {
+                        "region": "africa",
+                        "africa_focus": True,
+                    }
+                }
+                opportunities.append(opportunity)
+        return opportunities
+    def _extract_text(self, soup: BeautifulSoup) -> str:
+        """Extract text from soup."""
+        for tag in soup(["script", "style", "nav", "footer"]):
+            tag.decompose()
+        return " ".join(soup.get_text(separator=" ", strip=True).split())

backend/ingestion/jobboard_client.py ADDED Viewed

	@@ -0,0 +1,472 @@

+"""
+PIOE Job Board Client
+Fetches REAL job opportunities from structured job board APIs.
+These return actual job listings, not discussions.
+Supports:
+- Arbeitnow (free, no key needed)
+- TheMuse (free, no key needed)
+- Remotive (free, no key needed)
+- Adzuna (free key, 250 req/day)
+- Jooble (free key, aggregates LinkedIn/Indeed/Glassdoor)
+- RapidAPI LinkedIn (free key, 100 req/month)
+"""
+import httpx
+from datetime import datetime
+from typing import Optional
+import re
+class JobBoardClient:
+    """
+    Client for structured job board APIs.
+    Returns actual job listings you can apply to.
+    Usage:
+        client = JobBoardClient(
+            adzuna_app_id="xxx",
+            adzuna_api_key="xxx",
+            jooble_api_key="xxx",
+            rapidapi_key="xxx"
+        )
+        jobs = await client.fetch_all()
+    """
+    def __init__(
+        self,
+        adzuna_app_id: str = "",
+        adzuna_api_key: str = "",
+        jooble_api_key: str = "",
+        rapidapi_key: str = ""
+    ):
+        self.adzuna_app_id = adzuna_app_id
+        self.adzuna_api_key = adzuna_api_key
+        self.jooble_api_key = jooble_api_key
+        self.rapidapi_key = rapidapi_key
+    async def fetch_all(self) -> list[dict]:
+        """Fetch from all available job board sources."""
+        opportunities = []
+        # === FREE APIs (no key needed) ===
+        # Arbeitnow (free job API)
+        try:
+            arbeitnow_jobs = await self.fetch_arbeitnow()
+            opportunities.extend(arbeitnow_jobs)
+            print(f"  Arbeitnow: {len(arbeitnow_jobs)} jobs")
+        except Exception as e:
+            print(f"  Arbeitnow error: {e}")
+        # TheMuse (free job API)
+        try:
+            muse_jobs = await self.fetch_themuse()
+            opportunities.extend(muse_jobs)
+            print(f"  TheMuse: {len(muse_jobs)} jobs")
+        except Exception as e:
+            print(f"  TheMuse error: {e}")
+        # Remotive (remote jobs, free)
+        try:
+            remote_jobs = await self.fetch_remotive()
+            opportunities.extend(remote_jobs)
+            print(f"  Remotive: {len(remote_jobs)} remote jobs")
+        except Exception as e:
+            print(f"  Remotive error: {e}")
+        # === APIs WITH FREE KEYS ===
+        # Adzuna (if API key provided)
+        if self.adzuna_app_id and self.adzuna_api_key:
+            try:
+                adzuna_jobs = await self.fetch_adzuna()
+                opportunities.extend(adzuna_jobs)
+                print(f"  Adzuna: {len(adzuna_jobs)} jobs")
+            except Exception as e:
+                print(f"  Adzuna error: {e}")
+        # Jooble (if API key provided) - aggregates LinkedIn, Indeed, Glassdoor
+        if self.jooble_api_key:
+            try:
+                jooble_jobs = await self.fetch_jooble()
+                opportunities.extend(jooble_jobs)
+                print(f"  Jooble: {len(jooble_jobs)} jobs (LinkedIn/Indeed/Glassdoor)")
+            except Exception as e:
+                print(f"  Jooble error: {e}")
+        # RapidAPI LinkedIn Jobs (if API key provided)
+        if self.rapidapi_key:
+            try:
+                linkedin_jobs = await self.fetch_linkedin_rapidapi()
+                opportunities.extend(linkedin_jobs)
+                print(f"  LinkedIn (via RapidAPI): {len(linkedin_jobs)} jobs")
+            except Exception as e:
+                print(f"  LinkedIn error: {e}")
+        return opportunities
+    # ===========================================
+    # FREE APIs (No registration needed)
+    # ===========================================
+    async def fetch_arbeitnow(self) -> list[dict]:
+        """Fetch from Arbeitnow API - free, no registration."""
+        opportunities = []
+        try:
+            url = "https://www.arbeitnow.com/api/job-board-api"
+            async with httpx.AsyncClient() as client:
+                response = await client.get(
+                    url,
+                    headers={"User-Agent": "PIOE/2.0"},
+                    timeout=30
+                )
+                if response.status_code != 200:
+                    return []
+                data = response.json()
+            for job in data.get("data", [])[:30]:
+                title = (job.get("title") or "").lower()
+                tags = " ".join(job.get("tags") or []).lower()
+                combined = f"{title} {tags}"
+                # Filter for relevant tech jobs
+                keywords = ["machine learning", "ai", "data", "engineer", "developer",
+                           "software", "python", "intern", "research", "robotics",
+                           "backend", "frontend", "fullstack", "devops"]
+                if not any(kw in combined for kw in keywords):
+                    continue
+                opportunities.append({
+                    "title": f"[Arbeitnow] {job.get('title', '')}",
+                    "raw_text": self._strip_html(job.get("description", ""))[:2000],
+                    "url": job.get("url", ""),
+                    "source_type": "job",
+                    "source_name": f"Arbeitnow ({job.get('company_name', 'Unknown')})",
+                    "published_at": self._parse_date(job.get("created_at")),
+                    "metadata": {
+                        "company": job.get("company_name"),
+                        "location": job.get("location"),
+                        "remote": job.get("remote", False),
+                        "tags": job.get("tags", []),
+                        "region": "remote_global" if job.get("remote") else "global"
+                    }
+                })
+        except Exception as e:
+            print(f"    Arbeitnow fetch error: {e}")
+        return opportunities
+    async def fetch_themuse(self) -> list[dict]:
+        """Fetch from The Muse API - free, no registration."""
+        opportunities = []
+        categories = ["Data Science", "Engineering", "Software Engineering"]
+        for category in categories:
+            try:
+                url = "https://www.themuse.com/api/public/jobs"
+                params = {"category": category, "page": 1}
+                async with httpx.AsyncClient() as client:
+                    response = await client.get(
+                        url, params=params,
+                        headers={"User-Agent": "PIOE/2.0"},
+                        timeout=30
+                    )
+                    if response.status_code != 200:
+                        continue
+                    data = response.json()
+                for job in data.get("results", [])[:10]:
+                    company = job.get("company", {})
+                    opportunities.append({
+                        "title": f"[TheMuse] {job.get('name', '')}",
+                        "raw_text": self._strip_html(job.get("contents", ""))[:2000],
+                        "url": job.get("refs", {}).get("landing_page", ""),
+                        "source_type": "job",
+                        "source_name": f"TheMuse ({company.get('name', 'Unknown')})",
+                        "published_at": self._parse_date(job.get("publication_date")),
+                        "metadata": {
+                            "company": company.get("name"),
+                            "locations": [loc.get("name") for loc in job.get("locations", [])],
+                            "level": job.get("levels", [{}])[0].get("name") if job.get("levels") else None,
+                            "region": "global"
+                        }
+                    })
+            except Exception as e:
+                print(f"    TheMuse '{category}' error: {e}")
+        return opportunities
+    async def fetch_remotive(self) -> list[dict]:
+        """Fetch from Remotive API - free, no registration."""
+        opportunities = []
+        categories = ["software-dev", "data", "devops-sysadmin"]
+        for category in categories:
+            try:
+                url = "https://remotive.com/api/remote-jobs"
+                params = {"category": category, "limit": 15}
+                async with httpx.AsyncClient() as client:
+                    response = await client.get(url, params=params, timeout=30)
+                    if response.status_code != 200:
+                        continue
+                    data = response.json()
+                for job in data.get("jobs", []):
+                    title_lower = (job.get("title") or "").lower()
+                    # Skip non-tech roles
+                    skip_keywords = ["sales", "marketing", "recruiter", "hr ", "customer support"]
+                    if any(skip in title_lower for skip in skip_keywords):
+                        continue
+                    opportunities.append({
+                        "title": f"[Remote] {job.get('title', '')}",
+                        "raw_text": self._strip_html(job.get("description", ""))[:2000],
+                        "url": job.get("url", ""),
+                        "source_type": "job",
+                        "source_name": f"Remotive ({job.get('company_name', 'Unknown')})",
+                        "published_at": self._parse_date(job.get("publication_date")),
+                        "metadata": {
+                            "company": job.get("company_name"),
+                            "location": job.get("candidate_required_location"),
+                            "job_type": job.get("job_type"),
+                            "salary": job.get("salary"),
+                            "tags": job.get("tags", []),
+                            "region": "remote_global"
+                        }
+                    })
+            except Exception as e:
+                print(f"    Remotive '{category}' error: {e}")
+        return opportunities
+    # ===========================================
+    # APIs WITH FREE API KEYS
+    # ===========================================
+    async def fetch_adzuna(self) -> list[dict]:
+        """
+        Fetch from Adzuna API.
+        Free tier: 250 requests/day
+        Get key at: https://developer.adzuna.com/
+        """
+        opportunities = []
+        keywords = ["machine learning", "AI engineer", "data scientist", "robotics"]
+        for keyword in keywords[:2]:  # Limit to conserve quota
+            try:
+                url = "https://api.adzuna.com/v1/api/jobs/us/search/1"
+                params = {
+                    "app_id": self.adzuna_app_id,
+                    "app_key": self.adzuna_api_key,
+                    "what": keyword,
+                    "results_per_page": 10,
+                    "content-type": "application/json"
+                }
+                async with httpx.AsyncClient() as client:
+                    response = await client.get(url, params=params, timeout=30)
+                    if response.status_code != 200:
+                        continue
+                    data = response.json()
+                for job in data.get("results", []):
+                    company = job.get("company", {})
+                    location = job.get("location", {})
+                    opportunities.append({
+                        "title": f"[Adzuna] {job.get('title', '')}",
+                        "raw_text": job.get("description", "")[:2000],
+                        "url": job.get("redirect_url", ""),
+                        "source_type": "job",
+                        "source_name": f"Adzuna ({company.get('display_name', 'Unknown')})",
+                        "published_at": self._parse_date(job.get("created")),
+                        "metadata": {
+                            "company": company.get("display_name"),
+                            "location": location.get("display_name"),
+                            "salary_min": job.get("salary_min"),
+                            "salary_max": job.get("salary_max"),
+                            "contract_type": job.get("contract_type"),
+                            "region": "global"
+                        }
+                    })
+            except Exception as e:
+                print(f"    Adzuna '{keyword}' error: {e}")
+        return opportunities
+    async def fetch_jooble(self) -> list[dict]:
+        """
+        Fetch from Jooble API - aggregates 70+ sources including:
+        - LinkedIn
+        - Indeed
+        - Glassdoor
+        - Monster
+        - CareerBuilder
+        Free tier available.
+        Get key at: https://jooble.org/api/about
+        """
+        opportunities = []
+        search_queries = [
+            "machine learning engineer",
+            "AI internship",
+            "data scientist",
+            "robotics engineer",
+            "computer vision",
+            "scholarship",
+            "fellowship"
+        ]
+        for query in search_queries[:5]:  # Limit to conserve quota
+            try:
+                url = f"https://jooble.org/api/{self.jooble_api_key}"
+                payload = {
+                    "keywords": query,
+                    "location": "",  # Worldwide
+                }
+                async with httpx.AsyncClient() as client:
+                    response = await client.post(
+                        url,
+                        json=payload,
+                        headers={"Content-Type": "application/json"},
+                        timeout=30
+                    )
+                    if response.status_code != 200:
+                        continue
+                    data = response.json()
+                for job in data.get("jobs", [])[:10]:
+                    opportunities.append({
+                        "title": f"[Jooble] {job.get('title', '')}",
+                        "raw_text": self._strip_html(job.get("snippet", ""))[:2000],
+                        "url": job.get("link", ""),
+                        "source_type": "job",
+                        "source_name": f"Jooble ({job.get('company', 'Unknown')})",
+                        "published_at": self._parse_date(job.get("updated")),
+                        "metadata": {
+                            "company": job.get("company"),
+                            "location": job.get("location"),
+                            "salary": job.get("salary"),
+                            "source": job.get("source"),  # Original source (LinkedIn, Indeed, etc.)
+                            "region": "global"
+                        }
+                    })
+            except Exception as e:
+                print(f"    Jooble '{query}' error: {e}")
+        return opportunities
+    async def fetch_linkedin_rapidapi(self) -> list[dict]:
+        """
+        Fetch LinkedIn jobs via RapidAPI.
+        Free tier: 100 requests/month
+        Get key at: https://rapidapi.com/jaypat87/api/linkedin-jobs-search
+        """
+        opportunities = []
+        search_queries = [
+            "machine learning",
+            "AI engineer",
+            "computer vision intern",
+            "robotics"
+        ]
+        for query in search_queries[:2]:  # Limit to conserve quota
+            try:
+                url = "https://linkedin-jobs-search.p.rapidapi.com/"
+                payload = {
+                    "search_terms": query,
+                    "location": "United States",
+                    "page": "1"
+                }
+                headers = {
+                    "content-type": "application/json",
+                    "X-RapidAPI-Key": self.rapidapi_key,
+                    "X-RapidAPI-Host": "linkedin-jobs-search.p.rapidapi.com"
+                }
+                async with httpx.AsyncClient() as client:
+                    response = await client.post(
+                        url,
+                        json=payload,
+                        headers=headers,
+                        timeout=30
+                    )
+                    if response.status_code != 200:
+                        continue
+                    data = response.json()
+                for job in data[:10] if isinstance(data, list) else []:
+                    opportunities.append({
+                        "title": f"[LinkedIn] {job.get('job_title', '')}",
+                        "raw_text": job.get("job_description", "")[:2000],
+                        "url": job.get("linkedin_job_url_cleaned", job.get("job_url", "")),
+                        "source_type": "job",
+                        "source_name": f"LinkedIn ({job.get('company_name', 'Unknown')})",
+                        "published_at": self._parse_date(job.get("posted_date")),
+                        "metadata": {
+                            "company": job.get("company_name"),
+                            "location": job.get("job_location"),
+                            "linkedin_url": job.get("linkedin_job_url_cleaned"),
+                            "region": "global"
+                        }
+                    })
+            except Exception as e:
+                print(f"    LinkedIn '{query}' error: {e}")
+        return opportunities
+    # ===========================================
+    # HELPER METHODS
+    # ===========================================
+    def _parse_date(self, date_str: Optional[str]) -> Optional[datetime]:
+        """Parse various date formats."""
+        if not date_str:
+            return None
+        try:
+            if "T" in str(date_str):
+                return datetime.fromisoformat(str(date_str).replace("Z", "+00:00"))
+            return datetime.strptime(str(date_str)[:10], "%Y-%m-%d")
+        except Exception:
+            return None
+    def _strip_html(self, text: str) -> str:
+        """Remove HTML tags from text."""
+        if not text:
+            return ""
+        clean = re.sub(r'<[^>]+>', '', text)
+        return " ".join(clean.split())

backend/ingestion/reddit_client.py ADDED Viewed

	@@ -0,0 +1,185 @@

+"""
+PIOE Reddit Client
+Monitors curated subreddits for opportunities with strict filtering.
+"""
+from datetime import datetime
+from typing import Optional
+import httpx
+class RedditClient:
+    """
+    Client for Reddit using public JSON API.
+    Note: For production, consider using PRAW with OAuth for better rate limits.
+    This implementation uses public endpoints which are rate-limited.
+    """
+    BASE_URL = "https://www.reddit.com"
+    # Curated subreddits for high-signal content
+    TARGET_SUBREDDITS = [
+        "computervision",
+        "robotics",
+        "MachineLearning",
+        "artificial",
+        "learnmachinelearning",
+        "deeplearning",
+        "hackathons",
+        "scholarships",
+        "cscareerquestions",
+        "roboticsengineering",
+    ]
+    # Keywords that indicate opportunities
+    OPPORTUNITY_KEYWORDS = [
+        "internship", "intern", "hiring", "job",
+        "hackathon", "competition", "challenge",
+        "scholarship", "fellowship", "grant", "funding",
+        "research assistant", "ra position", "phd",
+        "call for papers", "cfp", "workshop",
+        "applications open", "apply now", "deadline"
+    ]
+    # Keywords to filter out (noise)
+    NOISE_KEYWORDS = [
+        "meme", "funny", "eli5", "rant",
+        "top 10", "best tools", "what are",
+        "vs", "versus", "comparison"
+    ]
+    def __init__(self, user_agent: str = "PIOE/1.0"):
+        self.user_agent = user_agent
+        self._headers = {"User-Agent": user_agent}
+    async def fetch_all(self, subreddits: Optional[list[str]] = None) -> list[dict]:
+        """Fetch from all target subreddits with filtering."""
+        subreddits = subreddits or self.TARGET_SUBREDDITS
+        all_opportunities = []
+        for subreddit in subreddits:
+            try:
+                posts = await self.fetch_subreddit(subreddit)
+                all_opportunities.extend(posts)
+            except Exception as e:
+                print(f"Error fetching r/{subreddit}: {e}")
+        return all_opportunities
+    async def fetch_subreddit(
+        self,
+        subreddit: str,
+        sort: str = "new",
+        limit: int = 25
+    ) -> list[dict]:
+        """
+        Fetch posts from a subreddit with opportunity filtering.
+        Only returns posts that match opportunity keywords
+        and don't match noise keywords.
+        """
+        url = f"{self.BASE_URL}/r/{subreddit}/{sort}.json"
+        async with httpx.AsyncClient() as client:
+            response = await client.get(
+                url,
+                params={"limit": limit},
+                headers=self._headers,
+                timeout=30
+            )
+            response.raise_for_status()
+        data = response.json()
+        posts = data.get("data", {}).get("children", [])
+        return self._filter_and_parse(posts, subreddit)
+    def _filter_and_parse(self, posts: list, subreddit: str) -> list[dict]:
+        """Filter posts for opportunities and parse to normalized format."""
+        opportunities = []
+        for post_wrapper in posts:
+            post = post_wrapper.get("data", {})
+            # Skip removed/deleted posts
+            if post.get("removed_by_category") or post.get("selftext") == "[removed]":
+                continue
+            title = post.get("title", "").lower()
+            text = post.get("selftext", "").lower()
+            combined = f"{title} {text}"
+            # Filter out noise
+            if any(noise in combined for noise in self.NOISE_KEYWORDS):
+                continue
+            # Check for opportunity keywords
+            has_opportunity = any(kw in combined for kw in self.OPPORTUNITY_KEYWORDS)
+            # Also include posts with high scores (community validated)
+            high_score = post.get("score", 0) > 50
+            if not has_opportunity and not high_score:
+                continue
+            # Calculate engagement
+            engagement = post.get("score", 0) + post.get("num_comments", 0)
+            opportunity = {
+                "title": f"[Reddit] {post.get('title', '')}",
+                "raw_text": post.get("selftext", "")[:2000] or post.get("title", ""),
+                "url": f"https://reddit.com{post.get('permalink', '')}",
+                "source_type": "reddit",
+                "source_name": f"r/{subreddit}",
+                "published_at": self._parse_timestamp(post.get("created_utc")),
+                "social_engagement": engagement,
+                "metadata": {
+                    "subreddit": subreddit,
+                    "author": post.get("author"),
+                    "score": post.get("score", 0),
+                    "num_comments": post.get("num_comments", 0),
+                    "flair": post.get("link_flair_text"),
+                    "is_self": post.get("is_self", True),
+                    "external_url": post.get("url") if not post.get("is_self") else None
+                }
+            }
+            opportunities.append(opportunity)
+        return opportunities
+    def _parse_timestamp(self, timestamp: Optional[float]) -> Optional[datetime]:
+        """Convert Unix timestamp to datetime."""
+        if not timestamp:
+            return None
+        try:
+            return datetime.utcfromtimestamp(timestamp)
+        except Exception:
+            return None
+    async def search(self, query: str, subreddit: Optional[str] = None) -> list[dict]:
+        """Search Reddit for specific opportunities."""
+        if subreddit:
+            url = f"{self.BASE_URL}/r/{subreddit}/search.json"
+        else:
+            url = f"{self.BASE_URL}/search.json"
+        async with httpx.AsyncClient() as client:
+            response = await client.get(
+                url,
+                params={
+                    "q": query,
+                    "sort": "new",
+                    "limit": 25,
+                    "restrict_sr": "on" if subreddit else "off"
+                },
+                headers=self._headers,
+                timeout=30
+            )
+            response.raise_for_status()
+        data = response.json()
+        posts = data.get("data", {}).get("children", [])
+        return self._filter_and_parse(posts, subreddit or "search")

backend/ingestion/rss_client.py ADDED Viewed

	@@ -0,0 +1,220 @@

+"""
+PIOE RSS Client
+Parses RSS/Atom feeds from blogs, news sites, and announcement pages.
+"""
+import feedparser
+from datetime import datetime
+from typing import Optional
+import httpx
+import re
+class RSSClient:
+    """
+    Client for RSS/Atom feeds.
+    Supports multiple feeds with configurable filtering.
+    """
+    # Patterns that indicate non-actionable content (discussions, not opportunities)
+    FILTER_OUT_PATTERNS = [
+        r'^Ask HN:',           # Hacker News discussions
+        r'^Show HN:',          # Show HN posts (usually not opportunities)
+        r'^Tell HN:',          # Tell HN posts
+        r'my internship',      # Personal stories about internships
+        r'my experience',      # Personal experiences
+        r'I (got|landed|received|missed)',  # Personal stories
+        r'How (do|did|can|should) I',       # Questions, not opportunities
+        r'\?$',                # Questions
+        r'AMA$',               # AMAs
+        r'white british',      # News articles, not opportunities
+        r'is (this|it) (real|fake|legit)',  # Verification questions
+    ]
+    # Patterns that indicate REAL opportunities
+    OPPORTUNITY_PATTERNS = [
+        r'hiring',
+        r'apply now',
+        r'deadline',
+        r'applications? open',
+        r'we are looking',
+        r'join (our|the) team',
+        r'open position',
+        r'fellowship program',
+        r'grant program',
+        r'scholarship',
+        r'bounty',
+        r'\$\d+',              # Money amounts
+        r'remote (ok|friendly|position)',
+    ]
+    # Default feeds - ONLY actionable opportunity sources
+    DEFAULT_FEEDS = [
+        # HN Jobs - ACTUAL job postings, not discussions
+        {"name": "Hacker News Jobs", "url": "https://hnrss.org/jobs", "type": "job"},
+        # ArXiv RSS (research papers - always relevant)
+        {"name": "ArXiv CS.CV", "url": "https://rss.arxiv.org/rss/cs.CV", "type": "research"},
+        {"name": "ArXiv CS.RO", "url": "https://rss.arxiv.org/rss/cs.RO", "type": "research"},
+        {"name": "ArXiv CS.AI", "url": "https://rss.arxiv.org/rss/cs.AI", "type": "research"},
+        # Fellowships & Scholarships (working feeds only)
+        {"name": "ProFellow", "url": "https://www.profellow.com/feed/", "type": "fellowship"},
+        {"name": "Scholars4Dev", "url": "https://www.scholars4dev.com/feed/", "type": "scholarship"},
+        # NOTE: OpportunityDesk, AfterSchoolAfrica, WayUp removed - broken/invalid XML
+        # Remote Jobs
+        {"name": "RemoteOK AI", "url": "https://remoteok.com/remote-ai-jobs.rss", "type": "job"},
+        {"name": "RemoteOK Intern", "url": "https://remoteok.com/remote-intern-jobs.rss", "type": "internship"},
+        {"name": "RemoteOK ML", "url": "https://remoteok.com/remote-machine-learning-jobs.rss", "type": "job"},
+    ]
+    def __init__(self, custom_feeds: Optional[list[dict]] = None):
+        self.feeds = custom_feeds or self.DEFAULT_FEEDS
+    async def fetch_all(self) -> list[dict]:
+        """Fetch from all configured feeds."""
+        all_opportunities = []
+        for feed_config in self.feeds:
+            try:
+                opportunities = await self.fetch_feed(
+                    feed_config["url"],
+                    feed_config["name"],
+                    feed_config.get("type", "rss")
+                )
+                all_opportunities.extend(opportunities)
+            except Exception as e:
+                print(f"Error fetching {feed_config['name']}: {e}")
+        return all_opportunities
+    async def fetch_feed(self, url: str, source_name: str, feed_type: str = "rss") -> list[dict]:
+        """
+        Fetch and parse a single RSS feed.
+        Returns list of normalized opportunity dicts.
+        """
+        try:
+            async with httpx.AsyncClient() as client:
+                response = await client.get(url, timeout=30, follow_redirects=True)
+                content = response.text
+        except Exception as e:
+            print(f"HTTP error for {url}: {e}")
+            return []
+        # Parse feed
+        feed = feedparser.parse(content)
+        if feed.bozo and not feed.entries:
+            print(f"Feed parse error for {url}: {feed.bozo_exception}")
+            return []
+        return self._parse_entries(feed.entries, source_name, feed_type)
+    def _is_discussion_not_opportunity(self, title: str, description: str) -> bool:
+        """Check if content is a discussion post rather than an actionable opportunity."""
+        text = f"{title} {description}".lower()
+        # Check for filter-out patterns (discussions, personal stories)
+        for pattern in self.FILTER_OUT_PATTERNS:
+            if re.search(pattern, title, re.IGNORECASE):
+                return True
+        return False
+    def _is_likely_opportunity(self, title: str, description: str, feed_type: str) -> bool:
+        """Check if content is likely a real opportunity."""
+        # Research papers are always opportunities
+        if feed_type == "research":
+            return True
+        # Fellowships/scholarships from ProFellow are always good
+        if feed_type in ["fellowship", "scholarship"]:
+            return True
+        # Jobs from HN Jobs feed are always real
+        if feed_type == "job":
+            return True
+        text = f"{title} {description}".lower()
+        # Check for opportunity patterns
+        for pattern in self.OPPORTUNITY_PATTERNS:
+            if re.search(pattern, text, re.IGNORECASE):
+                return True
+        return False
+    def _parse_entries(self, entries: list, source_name: str, feed_type: str) -> list[dict]:
+        """Parse feed entries into normalized opportunities."""
+        opportunities = []
+        for entry in entries[:20]:  # Limit per feed
+            try:
+                # Extract content
+                title = entry.get("title", "").strip()
+                # Get description/summary
+                description = ""
+                if "summary" in entry:
+                    description = entry.summary
+                elif "description" in entry:
+                    description = entry.description
+                elif "content" in entry and entry.content:
+                    description = entry.content[0].get("value", "")
+                # Clean HTML tags (basic)
+                description = self._strip_html(description)
+                # QUALITY FILTER: Skip discussions and non-opportunities
+                if self._is_discussion_not_opportunity(title, description):
+                    continue
+                # QUALITY FILTER: Only keep likely opportunities
+                if not self._is_likely_opportunity(title, description, feed_type):
+                    # For unknown types, be more lenient
+                    if feed_type not in ["news", "blog"]:
+                        continue
+                # Get published date
+                published = None
+                if "published_parsed" in entry and entry.published_parsed:
+                    published = datetime(*entry.published_parsed[:6])
+                elif "updated_parsed" in entry and entry.updated_parsed:
+                    published = datetime(*entry.updated_parsed[:6])
+                opportunity = {
+                    "title": title,
+                    "raw_text": description[:2000],
+                    "url": entry.get("link", ""),
+                    "source_type": "rss",
+                    "source_name": source_name,
+                    "published_at": published,
+                    "metadata": {
+                        "feed_type": feed_type,
+                        "author": entry.get("author"),
+                        "tags": [tag.term for tag in entry.get("tags", [])]
+                    }
+                }
+                opportunities.append(opportunity)
+            except Exception as e:
+                print(f"Error parsing entry: {e}")
+        return opportunities
+    def _strip_html(self, text: str) -> str:
+        """Remove HTML tags from text."""
+        clean = re.sub(r'<[^>]+>', '', text)
+        return " ".join(clean.split())
+    def add_feed(self, name: str, url: str, feed_type: str = "rss"):
+        """Add a new feed to monitor."""
+        self.feeds.append({
+            "name": name,
+            "url": url,
+            "type": feed_type
+        })

backend/ingestion/scheduler.py ADDED Viewed

	@@ -0,0 +1,371 @@

+"""
+PIOE Ingestion Scheduler - Version 2.0
+Orchestrates periodic data collection from all sources.
+Now includes Grant Intelligence and ROI scoring.
+"""
+from apscheduler.schedulers.asyncio import AsyncIOScheduler
+from datetime import datetime
+from sqlalchemy.orm import Session
+from ..config import get_settings
+from ..database import SessionLocal
+from ..models import Opportunity, Source, SourceType, OpportunityCategory, Domain, Region, RiskLevel
+from ..intelligence import RelevanceScorer, NoveltyDetector, CredibilityScorer, OpportunityClassifier
+from ..intelligence import ROIScorer, SilentOpportunityDetector
+from .arxiv_client import ArxivClient
+from .github_client import GitHubClient
+from .rss_client import RSSClient
+from .reddit_client import RedditClient
+from .superteam_client import SuperteamClient
+from .web_scraper import WebScraper
+from .careers_client import CareersClient, InternshipClient
+from .grants_client import GrantsClient, NigeriaGrantsClient
+from .jobboard_client import JobBoardClient
+class IngestionScheduler:
+    """
+    Coordinates all data ingestion and processing.
+    PIOE 2.0: Now includes grant intelligence and ROI scoring.
+    """
+    def __init__(self, user_region: str = "nigeria"):
+        self.settings = get_settings()
+        self.scheduler = AsyncIOScheduler()
+        self.user_region = user_region
+        # Initialize clients
+        self.arxiv = ArxivClient(max_results=30)
+        self.github = GitHubClient(token=self.settings.github_token)
+        self.rss = RSSClient()
+        self.reddit = RedditClient()
+        self.superteam = SuperteamClient()
+        self.scraper = WebScraper()
+        self.careers = CareersClient()
+        self.internships = InternshipClient()
+        # PIOE 2.0: Job boards (REAL opportunities, not discussions)
+        self.jobboards = JobBoardClient(
+            adzuna_app_id=self.settings.adzuna_app_id,
+            adzuna_api_key=self.settings.adzuna_api_key,
+            jooble_api_key=self.settings.jooble_api_key,
+            rapidapi_key=self.settings.rapidapi_key
+        )
+        # PIOE 2.0: Grant clients
+        self.grants = GrantsClient()
+        self.nigeria_grants = NigeriaGrantsClient()
+        # Initialize intelligence
+        self.scorer = RelevanceScorer()
+        self.novelty = NoveltyDetector()
+        self.credibility = CredibilityScorer()
+        self.classifier = OpportunityClassifier()
+        # PIOE 2.0: Decision intelligence
+        self.roi_scorer = ROIScorer(user_region=user_region)
+        self.silent_detector = SilentOpportunityDetector()
+    def start(self):
+        """Start the scheduler."""
+        # Run ingestion every N hours
+        self.scheduler.add_job(
+            self.run_full_ingestion,
+            'interval',
+            hours=self.settings.ingestion_interval_hours,
+            id='full_ingestion'
+        )
+        # Run high-priority sources more frequently (every 2 hours)
+        self.scheduler.add_job(
+            self.run_priority_ingestion,
+            'interval',
+            hours=2,
+            id='priority_ingestion'
+        )
+        self.scheduler.start()
+        print(f"Scheduler started - full ingestion every {self.settings.ingestion_interval_hours}h")
+    def stop(self):
+        """Stop the scheduler."""
+        try:
+            if self.scheduler.running:
+                self.scheduler.shutdown()
+        except Exception:
+            pass  # Ignore if scheduler not running
+    async def run_full_ingestion(self):
+        """Run ingestion from all sources."""
+        print(f"[{datetime.utcnow()}] Starting full ingestion...")
+        results = {
+            "total_fetched": 0,
+            "total_saved": 0,
+            "sources": {}
+        }
+        db = SessionLocal()
+        try:
+            # Fetch from all sources (PIOE 2.0 includes grant + job board sources)
+            sources = [
+                ("arXiv", self.arxiv.fetch(), SourceType.ARXIV),
+                ("GitHub", self.github.fetch_trending(), SourceType.GITHUB),
+                ("RSS", self.rss.fetch_all(), SourceType.RSS),
+                # DISABLED: Reddit returns too many discussions, not opportunities
+                # ("Reddit", self.reddit.fetch_all(), SourceType.REDDIT),
+                ("Superteam", self.superteam.fetch_all(), SourceType.SUPERTEAM),
+                # ("Web Scraper", self.scraper.fetch_all(), SourceType.WEB_SCRAPE),  # Often blocked
+                # ("Careers", self.careers.fetch_all(), SourceType.WEB_SCRAPE),  # Often blocked
+                # ("Internships", self.internships.fetch_all(), SourceType.WEB_SCRAPE),  # Often blocked
+                # PIOE 2.0: Job boards (REAL opportunities)
+                ("Job Boards", self.jobboards.fetch_all(), SourceType.WEB_SCRAPE),
+                # PIOE 2.0: Grant sources
+                ("Ecosystem Grants", self.grants.fetch_all(), SourceType.GRANT_PLATFORM),
+                ("Nigeria Grants", self.nigeria_grants.fetch_all(), SourceType.GOV_PORTAL),
+            ]
+            for source_name, fetch_coro, source_type in sources:
+                try:
+                    opportunities = await fetch_coro
+                    saved = self._process_and_save(db, opportunities, source_type)
+                    results["sources"][source_name] = {
+                        "fetched": len(opportunities),
+                        "saved": saved
+                    }
+                    results["total_fetched"] += len(opportunities)
+                    results["total_saved"] += saved
+                    print(f"  {source_name}: {len(opportunities)} fetched, {saved} saved")
+                except Exception as e:
+                    print(f"  {source_name}: ERROR - {e}")
+                    results["sources"][source_name] = {"error": str(e)}
+        finally:
+            db.close()
+        print(f"[{datetime.utcnow()}] Ingestion complete: {results['total_saved']}/{results['total_fetched']} saved")
+        return results
+    async def run_priority_ingestion(self):
+        """Run ingestion for high-priority sources only."""
+        print(f"[{datetime.utcnow()}] Starting priority ingestion...")
+        db = SessionLocal()
+        try:
+            # Only run arXiv, GitHub, and Superteam (highest signal sources)
+            sources = [
+                ("arXiv", self.arxiv.fetch(), SourceType.ARXIV),
+                ("GitHub", self.github.fetch_trending(), SourceType.GITHUB),
+                ("Superteam", self.superteam.fetch_all(), SourceType.SUPERTEAM),
+            ]
+            for source_name, fetch_coro, source_type in sources:
+                try:
+                    opportunities = await fetch_coro
+                    saved = self._process_and_save(db, opportunities, source_type)
+                    print(f"  {source_name}: {saved} new")
+                except Exception as e:
+                    print(f"  {source_name}: ERROR - {e}")
+        finally:
+            db.close()
+    def _process_and_save(
+        self,
+        db: Session,
+        raw_opportunities: list[dict],
+        source_type: SourceType
+    ) -> int:
+        """
+        Process raw opportunities through intelligence layer and save.
+        Returns count of saved opportunities.
+        """
+        saved_count = 0
+        for raw in raw_opportunities:
+            try:
+                # Skip if already exists (by URL)
+                existing = db.query(Opportunity).filter(
+                    Opportunity.url == raw.get("url")
+                ).first()
+                if existing:
+                    continue
+                # Combine title and text for analysis
+                full_text = f"{raw.get('title', '')} {raw.get('raw_text', '')}"
+                # Score relevance
+                scores = self.scorer.score(raw.get("raw_text", ""), raw.get("title", ""))
+                # Skip low relevance
+                if scores["relevance_score"] < self.settings.min_relevance_score:
+                    continue
+                # Get embedding for novelty detection
+                embedding = self.scorer.get_embedding(full_text[:1000])
+                # Check novelty
+                novelty_result = self.novelty.calculate_novelty(embedding, db)
+                # Skip duplicates
+                if novelty_result["is_duplicate"]:
+                    continue
+                # Skip recycled content
+                if self.novelty.is_recycled_content(full_text):
+                    continue
+                # Calculate credibility
+                cred_result = self.credibility.score(
+                    source_type,
+                    raw.get("raw_text", ""),
+                    raw.get("metadata", {}),
+                    social_engagement=raw.get("social_engagement", 0)
+                )
+                # Skip low credibility
+                if cred_result["credibility_score"] < self.settings.min_credibility_score:
+                    continue
+                # Classify
+                classification = self.classifier.classify(
+                    raw.get("raw_text", ""),
+                    raw.get("title", ""),
+                    source_type=raw.get("source_type", ""),
+                    source_name=raw.get("source_name", "")
+                )
+                # PIOE 2.0: Check for silent opportunities
+                silent_result = self.silent_detector.detect(
+                    raw.get("raw_text", ""),
+                    raw.get("title", "")
+                )
+                # Override category if silent opportunity detected
+                final_category = classification["category"]
+                if silent_result["is_silent_opportunity"]:
+                    final_category = silent_result["recommended_category"]
+                # PIOE 2.0: Calculate ROI score
+                metadata = raw.get("metadata", {})
+                roi_result = self.roi_scorer.calculate_roi(
+                    category=final_category,
+                    deadline=raw.get("deadline"),
+                    grant_size=metadata.get("grant_size_max"),
+                    region=metadata.get("region", "global"),
+                    extra_data=metadata
+                )
+                # Calculate combined score (now includes ROI)
+                combined_score = (
+                    0.3 * scores["relevance_score"] +
+                    0.2 * novelty_result["novelty_score"] +
+                    0.2 * cred_result["credibility_score"] +
+                    0.3 * roi_result["roi_score"]  # PIOE 2.0: Weight ROI heavily
+                )
+                # Prepare enhanced metadata
+                enhanced_metadata = {
+                    **metadata,
+                    "silent_opportunity": silent_result["is_silent_opportunity"],
+                    "silent_type": silent_result.get("opportunity_type"),
+                    "roi_reasoning": roi_result["reasoning"],
+                }
+                # Determine region
+                region_str = (metadata.get("region") or "global").lower()
+                region_map = {
+                    "nigeria": Region.NIGERIA,
+                    "africa": Region.AFRICA,
+                    "global": Region.GLOBAL,
+                    "remote_africa": Region.REMOTE_AFRICA,
+                    "remote_global": Region.REMOTE_GLOBAL,
+                }
+                region = region_map.get(region_str, Region.GLOBAL)
+                # Map risk level
+                risk_map = {"low": RiskLevel.LOW, "medium": RiskLevel.MEDIUM, "high": RiskLevel.HIGH}
+                risk_level = risk_map.get(roi_result["risk_level"], RiskLevel.MEDIUM)
+                # Create opportunity record
+                opportunity = Opportunity(
+                    title=raw.get("title", "")[:500],
+                    source_type=source_type,
+                    source_name=raw.get("source_name", ""),
+                    domain=Domain(classification["domain"]) if classification["domain"] in [d.value for d in Domain] else Domain.MIXED,
+                    category=OpportunityCategory(final_category) if final_category in [c.value for c in OpportunityCategory] else OpportunityCategory.OTHER,
+                    region=region,
+                    region_weight=1.0 if region_str == self.user_region else 0.7,
+                    published_at=raw.get("published_at"),
+                    deadline=raw.get("deadline"),
+                    raw_text=raw.get("raw_text", "")[:5000],
+                    url=raw.get("url", ""),
+                    relevance_score=scores["relevance_score"],
+                    novelty_score=novelty_result["novelty_score"],
+                    credibility_score=cred_result["credibility_score"],
+                    signal_strength=cred_result["signal_strength"],
+                    combined_score=combined_score,
+                    roi_score=roi_result["roi_score"],
+                    unlock_potential=roi_result["unlock_potential"],
+                    risk_level=risk_level,
+                    competition_level=roi_result["competition_level"],
+                    social_engagement=raw.get("social_engagement", 0),
+                    extra_data=enhanced_metadata,
+                    embedding=embedding
+                )
+                db.add(opportunity)
+                saved_count += 1
+            except Exception as e:
+                print(f"Error processing opportunity: {e}")
+                continue
+        # Commit batch
+        try:
+            db.commit()
+        except Exception as e:
+            print(f"Database commit error: {e}")
+            db.rollback()
+            saved_count = 0
+        return saved_count
+    async def ingest_single_source(self, source_name: str) -> dict:
+        """Manually trigger ingestion for a single source."""
+        db = SessionLocal()
+        source_map = {
+            "arxiv": (self.arxiv.fetch(), SourceType.ARXIV),
+            "github": (self.github.fetch_trending(), SourceType.GITHUB),
+            "rss": (self.rss.fetch_all(), SourceType.RSS),
+            "reddit": (self.reddit.fetch_all(), SourceType.REDDIT),
+            "superteam": (self.superteam.fetch_all(), SourceType.SUPERTEAM),
+            "scraper": (self.scraper.fetch_all(), SourceType.WEB_SCRAPE),
+            "careers": (self.careers.fetch_all(), SourceType.WEB_SCRAPE),
+            "internships": (self.internships.fetch_all(), SourceType.WEB_SCRAPE),
+        }
+        if source_name.lower() not in source_map:
+            return {"error": f"Unknown source: {source_name}"}
+        try:
+            fetch_coro, source_type = source_map[source_name.lower()]
+            opportunities = await fetch_coro
+            saved = self._process_and_save(db, opportunities, source_type)
+            return {
+                "source": source_name,
+                "fetched": len(opportunities),
+                "saved": saved
+            }
+        finally:
+            db.close()

backend/ingestion/superteam_client.py ADDED Viewed

	@@ -0,0 +1,178 @@

+"""
+PIOE Superteam Client
+Fetches bounties, grants, and hackathons from Superteam ecosystem.
+High-value source for crypto/web3 opportunities.
+"""
+import httpx
+from datetime import datetime
+from typing import Optional
+from bs4 import BeautifulSoup
+class SuperteamClient:
+    """
+    Client for Superteam ecosystem opportunities.
+    Superteam aggregates bounties, grants, hackathons, and jobs
+    across the Solana ecosystem and beyond.
+    """
+    # Known Superteam endpoints
+    EARN_URL = "https://earn.superteam.fun"
+    BOUNTIES_API = "https://earn.superteam.fun/api/listings"
+    def __init__(self):
+        self._headers = {
+            "User-Agent": "PIOE/1.0",
+            "Accept": "application/json"
+        }
+    async def fetch_all(self) -> list[dict]:
+        """Fetch all opportunity types from Superteam."""
+        opportunities = []
+        # Try API first
+        try:
+            api_opps = await self.fetch_from_api()
+            opportunities.extend(api_opps)
+        except Exception as e:
+            print(f"Superteam API error: {e}")
+            # Fall back to scraping
+            try:
+                scraped = await self.fetch_by_scraping()
+                opportunities.extend(scraped)
+            except Exception as e2:
+                print(f"Superteam scrape error: {e2}")
+        return opportunities
+    async def fetch_from_api(self) -> list[dict]:
+        """Fetch listings from Superteam API."""
+        async with httpx.AsyncClient() as client:
+            response = await client.get(
+                self.BOUNTIES_API,
+                params={"type": "all"},
+                headers=self._headers,
+                timeout=30
+            )
+            response.raise_for_status()
+        data = response.json()
+        listings = data if isinstance(data, list) else data.get("listings", [])
+        return self._parse_listings(listings)
+    async def fetch_by_scraping(self) -> list[dict]:
+        """Fallback: scrape Superteam Earn page."""
+        async with httpx.AsyncClient() as client:
+            response = await client.get(
+                self.EARN_URL,
+                headers={"User-Agent": "PIOE/1.0"},
+                timeout=30,
+                follow_redirects=True
+            )
+            response.raise_for_status()
+        soup = BeautifulSoup(response.text, "html.parser")
+        opportunities = []
+        # Look for listing cards (structure may vary)
+        for card in soup.select("[data-testid='listing-card'], .listing-card, article"):
+            try:
+                title_el = card.select_one("h3, h2, .title")
+                link_el = card.select_one("a[href]")
+                reward_el = card.select_one(".reward, .prize, [data-testid='reward']")
+                deadline_el = card.select_one(".deadline, .due-date")
+                if not title_el:
+                    continue
+                opportunity = {
+                    "title": f"[Superteam] {title_el.get_text(strip=True)}",
+                    "raw_text": card.get_text(strip=True)[:500],
+                    "url": f"{self.EARN_URL}{link_el.get('href')}" if link_el else self.EARN_URL,
+                    "source_type": "superteam",
+                    "source_name": "Superteam Earn",
+                    "published_at": datetime.utcnow(),
+                    "metadata": {
+                        "reward": reward_el.get_text(strip=True) if reward_el else None,
+                        "deadline": deadline_el.get_text(strip=True) if deadline_el else None,
+                    }
+                }
+                opportunities.append(opportunity)
+            except Exception as e:
+                print(f"Error parsing Superteam card: {e}")
+        return opportunities
+    def _parse_listings(self, listings: list) -> list[dict]:
+        """Parse API listings to normalized format."""
+        opportunities = []
+        for listing in listings:
+            try:
+                # Determine opportunity type
+                listing_type = listing.get("type", "bounty").lower()
+                type_prefix = {
+                    "bounty": "Bounty",
+                    "grant": "Grant",
+                    "hackathon": "Hackathon",
+                    "job": "Job"
+                }.get(listing_type, "Opportunity")
+                # Parse reward
+                reward = None
+                if listing.get("rewardAmount"):
+                    token = listing.get("token", "USDC")
+                    reward = f"{listing['rewardAmount']} {token}"
+                # Parse deadline
+                deadline = None
+                if listing.get("deadline"):
+                    try:
+                        deadline = datetime.fromisoformat(
+                            listing["deadline"].replace("Z", "+00:00")
+                        )
+                    except Exception:
+                        pass
+                # Extract skills/requirements
+                skills = listing.get("skills", [])
+                if isinstance(skills, str):
+                    skills = [s.strip() for s in skills.split(",")]
+                opportunity = {
+                    "title": f"[Superteam {type_prefix}] {listing.get('title', '')}",
+                    "raw_text": listing.get("description", "")[:2000],
+                    "url": listing.get("link") or f"{self.EARN_URL}/listing/{listing.get('slug', '')}",
+                    "source_type": "superteam",
+                    "source_name": "Superteam Earn",
+                    "published_at": self._parse_date(listing.get("publishedAt")),
+                    "deadline": deadline,
+                    "metadata": {
+                        "listing_type": listing_type,
+                        "reward": reward,
+                        "skills": skills,
+                        "sponsor": listing.get("sponsor", {}).get("name"),
+                        "region": listing.get("region"),
+                        "is_active": listing.get("isPublished", True)
+                    }
+                }
+                opportunities.append(opportunity)
+            except Exception as e:
+                print(f"Error parsing Superteam listing: {e}")
+        return opportunities
+    def _parse_date(self, date_str: Optional[str]) -> Optional[datetime]:
+        """Parse date string to datetime."""
+        if not date_str:
+            return None
+        try:
+            return datetime.fromisoformat(date_str.replace("Z", "+00:00"))
+        except Exception:
+            return None

backend/ingestion/web_scraper.py ADDED Viewed

	@@ -0,0 +1,227 @@

+"""
+PIOE Web Scraper
+Generic web scraper for scholarship sites, hackathon platforms, and university pages.
+Uses BeautifulSoup for static pages, Playwright for dynamic content.
+"""
+import httpx
+from datetime import datetime
+from typing import Optional
+from bs4 import BeautifulSoup
+class WebScraper:
+    """
+    Generic web scraper for pages without APIs.
+    Supports static and dynamic (JavaScript) pages.
+    """
+    # Preconfigured scrape targets
+    TARGETS = [
+        # Hackathon Platforms
+        {
+            "name": "Devpost Hackathons",
+            "url": "https://devpost.com/hackathons",
+            "type": "hackathon",
+            "selectors": {
+                "items": ".hackathon-tile, .challenge-listing",
+                "title": "h2, h3, .title",
+                "link": "a",
+                "deadline": ".submission-period, .dates"
+            }
+        },
+        {
+            "name": "Devfolio Hackathons",
+            "url": "https://devfolio.co/hackathons",
+            "type": "hackathon",
+            "selectors": {
+                "items": "[class*='HackathonCard'], article",
+                "title": "h3, h2, [class*='Name']",
+                "link": "a",
+                "deadline": "[class*='Date']"
+            }
+        },
+        {
+            "name": "HackerEarth Challenges",
+            "url": "https://www.hackerearth.com/challenges/",
+            "type": "hackathon",
+            "selectors": {
+                "items": ".challenge-card, .event-card",
+                "title": ".challenge-name, h3",
+                "link": "a",
+                "deadline": ".date, .timing"
+            }
+        },
+        # Scholarship/Fellowship Sites
+        {
+            "name": "FindAPhD AI",
+            "url": "https://www.findaphd.com/phds/?Keywords=artificial+intelligence+machine+learning",
+            "type": "scholarship",
+            "selectors": {
+                "items": ".phd-result",
+                "title": "h4 a, .title a",
+                "link": "a",
+                "deadline": ".close-date"
+            }
+        },
+        {
+            "name": "FindAPhD Robotics",
+            "url": "https://www.findaphd.com/phds/?Keywords=robotics+computer+vision",
+            "type": "scholarship",
+            "selectors": {
+                "items": ".phd-result",
+                "title": "h4 a, .title a",
+                "link": "a",
+                "deadline": ".close-date"
+            }
+        },
+        # Grant/Fellowship
+        {
+            "name": "Opportunities.com",
+            "url": "https://www.opportunitiescircle.com/category/fellowships/",
+            "type": "fellowship",
+            "selectors": {
+                "items": "article, .post",
+                "title": "h2, h3, .entry-title",
+                "link": "a",
+                "deadline": ".deadline"
+            }
+        },
+    ]
+    def __init__(self, use_playwright: bool = False):
+        self.use_playwright = use_playwright
+        self._headers = {
+            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
+        }
+    async def fetch_all(self, targets: Optional[list[dict]] = None) -> list[dict]:
+        """Fetch from all configured targets."""
+        targets = targets or self.TARGETS
+        all_opportunities = []
+        for target in targets:
+            try:
+                opps = await self.scrape_target(target)
+                all_opportunities.extend(opps)
+            except Exception as e:
+                print(f"Scrape error for {target['name']}: {e}")
+        return all_opportunities
+    async def scrape_target(self, target: dict) -> list[dict]:
+        """Scrape a single target configuration."""
+        html = await self._fetch_html(target["url"])
+        if not html:
+            return []
+        soup = BeautifulSoup(html, "html.parser")
+        selectors = target.get("selectors", {})
+        opportunities = []
+        items = soup.select(selectors.get("items", "article"))[:20]
+        for item in items:
+            try:
+                # Extract title
+                title_el = item.select_one(selectors.get("title", "h2, h3, .title"))
+                title = title_el.get_text(strip=True) if title_el else ""
+                if not title:
+                    continue
+                # Extract link
+                link_el = item.select_one(selectors.get("link", "a"))
+                link = ""
+                if link_el and link_el.get("href"):
+                    href = link_el.get("href")
+                    if href.startswith("http"):
+                        link = href
+                    else:
+                        # Relative URL - construct absolute
+                        from urllib.parse import urljoin
+                        link = urljoin(target["url"], href)
+                # Extract deadline if available
+                deadline_el = item.select_one(selectors.get("deadline", ".deadline"))
+                deadline_text = deadline_el.get_text(strip=True) if deadline_el else None
+                # Get full text content
+                raw_text = item.get_text(separator=" ", strip=True)[:1000]
+                opportunity = {
+                    "title": f"[{target['type'].title()}] {title}",
+                    "raw_text": raw_text,
+                    "url": link or target["url"],
+                    "source_type": "web_scrape",
+                    "source_name": target["name"],
+                    "published_at": datetime.utcnow(),
+                    "metadata": {
+                        "scrape_type": target["type"],
+                        "deadline_text": deadline_text
+                    }
+                }
+                opportunities.append(opportunity)
+            except Exception as e:
+                print(f"Error parsing item: {e}")
+        return opportunities
+    async def _fetch_html(self, url: str) -> Optional[str]:
+        """Fetch HTML content from URL."""
+        if self.use_playwright:
+            return await self._fetch_with_playwright(url)
+        try:
+            async with httpx.AsyncClient() as client:
+                response = await client.get(
+                    url,
+                    headers=self._headers,
+                    timeout=30,
+                    follow_redirects=True
+                )
+                response.raise_for_status()
+                return response.text
+        except Exception as e:
+            print(f"HTTP fetch error: {e}")
+            return None
+    async def _fetch_with_playwright(self, url: str) -> Optional[str]:
+        """Fetch dynamic content using Playwright."""
+        try:
+            from playwright.async_api import async_playwright
+            async with async_playwright() as p:
+                browser = await p.chromium.launch(headless=True)
+                page = await browser.new_page()
+                await page.goto(url, wait_until="networkidle", timeout=30000)
+                html = await page.content()
+                await browser.close()
+                return html
+        except Exception as e:
+            print(f"Playwright error: {e}")
+            return None
+    async def scrape_custom(
+        self,
+        url: str,
+        name: str,
+        item_selector: str,
+        title_selector: str = "h2, h3",
+        link_selector: str = "a",
+        scrape_type: str = "custom"
+    ) -> list[dict]:
+        """Scrape a custom URL with provided selectors."""
+        target = {
+            "name": name,
+            "url": url,
+            "type": scrape_type,
+            "selectors": {
+                "items": item_selector,
+                "title": title_selector,
+                "link": link_selector
+            }
+        }
+        return await self.scrape_target(target)

backend/intelligence/__init__.py ADDED Viewed

	@@ -0,0 +1,22 @@

+"""
+PIOE Intelligence Layer - Version 2.0
+"""
+from .llm_client import LLMClient
+from .scorer import RelevanceScorer
+from .novelty import NoveltyDetector
+from .classifier import OpportunityClassifier
+from .credibility import CredibilityScorer
+from .roi_scorer import ROIScorer
+from .silent_detector import SilentOpportunityDetector, OpportunityLanguageDetector
+__all__ = [
+    "LLMClient",
+    "RelevanceScorer",
+    "NoveltyDetector",
+    "OpportunityClassifier",
+    "CredibilityScorer",
+    "ROIScorer",
+    "SilentOpportunityDetector",
+    "OpportunityLanguageDetector",
+]

backend/intelligence/classifier.py ADDED Viewed

	@@ -0,0 +1,214 @@

+"""
+PIOE Opportunity Classifier
+Classifies opportunities into categories using rules and LLM.
+"""
+from ..models import OpportunityCategory, Domain
+class OpportunityClassifier:
+    """
+    Classifies opportunities into categories and domains.
+    Uses rule-based classification first, LLM for ambiguous cases.
+    """
+    # Source type to category mapping (high priority)
+    SOURCE_CATEGORY_MAP = {
+        "arxiv": OpportunityCategory.RESEARCH,
+        "github": OpportunityCategory.OPEN_SOURCE,
+        "superteam": OpportunityCategory.BOUNTY,
+        "grant_platform": OpportunityCategory.GRANT,
+        "gov_portal": OpportunityCategory.GRANT,
+    }
+    # Keyword patterns for each category
+    CATEGORY_PATTERNS = {
+        OpportunityCategory.SCHOLARSHIP: [
+            "scholarship", "tuition", "financial aid", "merit award"
+        ],
+        OpportunityCategory.FELLOWSHIP: [
+            "fellowship", "fellow program", "research fellow"
+        ],
+        OpportunityCategory.INTERNSHIP: [
+            "internship", "intern ", "summer program", "co-op"
+        ],
+        OpportunityCategory.JOB: [
+            "hiring", "job opening", "position available", "career opportunity",
+            "we're looking for", "full-time", "remote job"
+        ],
+        OpportunityCategory.RESEARCH: [
+            "research assistant", "ra position", "research opportunity", "arxiv",
+            "abstract:", "we present", "we propose", "our method"
+        ],
+        OpportunityCategory.HACKATHON: [
+            "hackathon", "buildathon", "hackers wanted", "hack day"
+        ],
+        OpportunityCategory.COMPETITION: [
+            "competition", "challenge", "contest", "prize pool"
+        ],
+        OpportunityCategory.GRANT: [
+            "grant program", "grant application", "grant funding", "grant deadline"
+        ],
+        OpportunityCategory.CONFERENCE: [
+            "conference", "call for papers", "summit", "symposium"
+        ],
+        OpportunityCategory.OPEN_SOURCE: [
+            "open source", "gsoc", "outreachy", "contributor wanted"
+        ],
+        OpportunityCategory.INVESTMENT: [
+            "funding round", "series a", "series b", "vc funding", "raised $"
+        ],
+        OpportunityCategory.BOUNTY: [
+            "bounty", "bug bounty", "earn reward", "usdc reward"
+        ],
+    }
+    # Domain patterns
+    DOMAIN_PATTERNS = {
+        Domain.COMPUTER_VISION: [
+            "computer vision", "image", "visual", "object detection", "segmentation", "opencv"
+        ],
+        Domain.ROBOTICS: [
+            "robot", "ros", "autonomous", "manipulation", "navigation"
+        ],
+        Domain.AI: [
+            "ai", "artificial intelligence", "machine learning", "deep learning",
+            "neural network", "llm", "transformer", "gpt"
+        ],
+        Domain.FINANCE: [
+            "finance", "fintech", "trading", "investment", "stock", "quantitative"
+        ],
+        Domain.CRYPTO: [
+            "crypto", "blockchain", "web3", "defi", "solana", "ethereum", "nft"
+        ],
+        Domain.ACADEMIA: [
+            "research", "phd", "postdoc", "university", "academic", "professor"
+        ],
+    }
+    def classify_by_source(self, source_type: str, source_name: str = "") -> OpportunityCategory | None:
+        """
+        Classify primarily by source type.
+        Returns category or None if source doesn't determine category.
+        """
+        source_lower = (source_type or "").lower()
+        source_name_lower = (source_name or "").lower()
+        # Check direct source mapping
+        if source_lower in self.SOURCE_CATEGORY_MAP:
+            return self.SOURCE_CATEGORY_MAP[source_lower]
+        # Check source name patterns
+        if "arxiv" in source_name_lower:
+            return OpportunityCategory.RESEARCH
+        if "github" in source_name_lower:
+            return OpportunityCategory.OPEN_SOURCE
+        if "profellow" in source_name_lower:
+            return OpportunityCategory.FELLOWSHIP
+        if "remoteok" in source_name_lower:
+            return OpportunityCategory.JOB
+        if "hacker news" in source_name_lower:
+            if "internship" in source_name_lower:
+                return OpportunityCategory.INTERNSHIP
+            if "robotics" in source_name_lower:
+                return OpportunityCategory.RESEARCH
+            if "jobs" in source_name_lower:
+                return OpportunityCategory.JOB
+        if "devfolio" in source_name_lower:
+            return OpportunityCategory.HACKATHON
+        return None
+    def classify_by_rules(self, text: str) -> tuple[OpportunityCategory, Domain, float]:
+        """
+        Classify using keyword matching.
+        Returns (category, domain, confidence)
+        """
+        if not text:
+            return OpportunityCategory.OTHER, Domain.MIXED, 0.0
+        text_lower = text.lower()
+        # Find matching category
+        category = OpportunityCategory.OTHER
+        cat_confidence = 0.0
+        for cat, patterns in self.CATEGORY_PATTERNS.items():
+            matches = sum(1 for p in patterns if p in text_lower)
+            if matches > cat_confidence:
+                category = cat
+                cat_confidence = min(matches * 0.3, 0.9)
+        # Find matching domain
+        domain = Domain.MIXED
+        domain_matches = 0
+        for dom, patterns in self.DOMAIN_PATTERNS.items():
+            matches = sum(1 for p in patterns if p in text_lower)
+            if matches > domain_matches:
+                domain = dom
+                domain_matches = matches
+        # If multiple domains match well, keep as mixed
+        domain_counts = {
+            dom: sum(1 for p in patterns if p in text_lower)
+            for dom, patterns in self.DOMAIN_PATTERNS.items()
+        }
+        high_matches = [d for d, c in domain_counts.items() if c >= domain_matches and c > 0]
+        if len(high_matches) > 1:
+            domain = Domain.MIXED
+        return category, domain, cat_confidence
+    def classify(
+        self,
+        text: str,
+        title: str = "",
+        source_type: str = "",
+        source_name: str = "",
+        use_llm: bool = False,
+        llm_client = None
+    ) -> dict:
+        """
+        Classify opportunity with optional LLM enhancement.
+        Returns dict with category, domain, confidence, method
+        """
+        full_text = f"{title} {text}".strip()
+        # PRIORITY 1: Source-based classification (most reliable)
+        source_category = self.classify_by_source(source_type, source_name)
+        # PRIORITY 2: Rule-based keyword matching
+        rule_category, domain, confidence = self.classify_by_rules(full_text)
+        # Use source category if available (overrides keyword matching)
+        if source_category:
+            category = source_category
+            confidence = 0.85  # High confidence for source-based
+            method = "source"
+        else:
+            category = rule_category
+            method = "rules"
+        # Use LLM for low-confidence or ambiguous cases (only if no source match)
+        if use_llm and llm_client and confidence < 0.5 and not source_category:
+            try:
+                llm_result = llm_client.classify(full_text)
+                if llm_result.get("confidence", 0) > confidence:
+                    return {
+                        "category": llm_result.get("category", category.value),
+                        "domain": llm_result.get("domain", domain.value),
+                        "confidence": llm_result.get("confidence", confidence),
+                        "method": "llm"
+                    }
+            except Exception as e:
+                print(f"LLM classification failed: {e}")
+        return {
+            "category": category.value,
+            "domain": domain.value,
+            "confidence": confidence,
+            "method": method
+        }

backend/intelligence/credibility.py ADDED Viewed

	@@ -0,0 +1,125 @@

+"""
+PIOE Credibility Scorer
+Evaluates trustworthiness of sources and authors.
+"""
+from ..models import SourceType
+class CredibilityScorer:
+    """
+    Scores credibility based on source type, author history, and content signals.
+    """
+    # Base credibility scores by source type
+    SOURCE_CREDIBILITY = {
+        SourceType.ARXIV: 0.95,      # Academic papers - highest trust
+        SourceType.GITHUB: 0.8,       # Open source - high trust
+        SourceType.RSS: 0.7,          # Varies by feed
+        SourceType.SUPERTEAM: 0.85,   # Official platform
+        SourceType.REDDIT: 0.5,       # Community - variable
+        SourceType.TWITTER: 0.4,      # Social - requires filtering
+        SourceType.LINKEDIN: 0.6,     # Professional but noisy
+        SourceType.WEB_SCRAPE: 0.5,   # Unknown quality
+    }
+    def __init__(self):
+        pass
+    def score_source(self, source_type: SourceType) -> float:
+        """Get base credibility score for source type."""
+        return self.SOURCE_CREDIBILITY.get(source_type, 0.5)
+    def score_content_signals(self, text: str, metadata: dict = None) -> dict:
+        """
+        Evaluate content signals that indicate credibility.
+        Returns individual signal scores.
+        """
+        metadata = metadata or {}
+        signals = {}
+        text_lower = text.lower() if text else ""
+        # Has deadline (official announcements usually have deadlines)
+        signals["has_deadline"] = 1.0 if metadata.get("deadline") or \
+            any(kw in text_lower for kw in ["deadline", "due date", "apply by", "closes"]) else 0.0
+        # Has organization/institution
+        signals["has_organization"] = 1.0 if metadata.get("organization") else 0.5
+        # Contains action URL
+        signals["has_action_url"] = 1.0 if metadata.get("url") or \
+            any(kw in text_lower for kw in ["apply here", "register at", "sign up"]) else 0.0
+        # Is first announcement (not a repost)
+        signals["is_original"] = 0.0 if any(kw in text_lower for kw in [
+            "repost", "sharing", "fyi", "icymi", "in case you missed"
+        ]) else 1.0
+        # Has specific requirements (detailed = more credible)
+        signals["has_requirements"] = 1.0 if metadata.get("requirements") or \
+            any(kw in text_lower for kw in ["requirements", "qualifications", "must have"]) else 0.0
+        return signals
+    def calculate_signal_strength(self, signals: dict) -> float:
+        """
+        Calculate overall signal strength from content signals.
+        High signal strength = actionable, official, time-sensitive.
+        """
+        weights = {
+            "has_deadline": 0.3,
+            "has_organization": 0.2,
+            "has_action_url": 0.2,
+            "is_original": 0.2,
+            "has_requirements": 0.1
+        }
+        total = sum(signals.get(k, 0) * w for k, w in weights.items())
+        return round(total, 3)
+    def score(
+        self,
+        source_type: SourceType,
+        text: str = "",
+        metadata: dict = None,
+        author_credibility: float = 0.5,
+        social_engagement: int = 0
+    ) -> dict:
+        """
+        Calculate comprehensive credibility score.
+        Returns dict with:
+        - source_score: Base source credibility
+        - signal_strength: Content actionability
+        - credibility_score: Combined score
+        """
+        source_score = self.score_source(source_type)
+        content_signals = self.score_content_signals(text, metadata)
+        signal_strength = self.calculate_signal_strength(content_signals)
+        # Social engagement boost (for social sources)
+        engagement_boost = 0.0
+        if source_type in [SourceType.REDDIT, SourceType.TWITTER]:
+            if social_engagement > 100:
+                engagement_boost = 0.15
+            elif social_engagement > 50:
+                engagement_boost = 0.1
+            elif social_engagement > 20:
+                engagement_boost = 0.05
+        # Combined credibility:
+        # 50% source, 30% signals, 10% author, 10% engagement
+        credibility_score = (
+            0.5 * source_score +
+            0.3 * signal_strength +
+            0.1 * author_credibility +
+            0.1 * min(engagement_boost + 0.5, 1.0)
+        )
+        return {
+            "source_score": round(source_score, 3),
+            "signal_strength": signal_strength,
+            "signals": content_signals,
+            "credibility_score": round(credibility_score, 3)
+        }

backend/intelligence/llm_client.py ADDED Viewed

	@@ -0,0 +1,352 @@

+"""
+PIOE LLM Client Abstraction Layer
+Supports Gemini (default) and OpenAI as providers.
+"""
+from abc import ABC, abstractmethod
+from typing import Optional
+import json
+from ..config import get_settings
+class BaseLLMClient(ABC):
+    """Abstract base class for LLM providers."""
+    @abstractmethod
+    def classify(self, text: str) -> dict:
+        """Classify opportunity text into category and domain."""
+        pass
+    @abstractmethod
+    def summarize(self, text: str, max_length: int = 150) -> str:
+        """Generate concise summary of opportunity."""
+        pass
+    @abstractmethod
+    def recommend_action(self, opportunity: dict) -> dict:
+        """Recommend action based on opportunity context."""
+        pass
+    @abstractmethod
+    def extract_metadata(self, text: str) -> dict:
+        """Extract structured metadata (deadline, location, reward, etc.)."""
+        pass
+class GeminiClient(BaseLLMClient):
+    """Google Gemini implementation."""
+    def __init__(self, api_key: str):
+        import google.generativeai as genai
+        genai.configure(api_key=api_key)
+        self.model = genai.GenerativeModel('gemini-1.5-flash')
+    def _generate(self, prompt: str, as_json: bool = False) -> str:
+        """Generate response from Gemini."""
+        response = self.model.generate_content(prompt)
+        return response.text
+    def classify(self, text: str) -> dict:
+        """Classify opportunity into category and domain."""
+        prompt = f"""Analyze this opportunity and classify it. Return JSON only.
+TEXT: {text[:2000]}
+Return this exact JSON structure:
+{{
+    "category": "one of: scholarship, fellowship, internship, job, research, hackathon, competition, grant, conference, open_source, investment, weak_signal, other",
+    "domain": "one of: ai, computer_vision, robotics, finance, crypto, academia, mixed",
+    "confidence": 0.0 to 1.0
+}}"""
+        try:
+            result = self._generate(prompt)
+            # Extract JSON from response
+            start = result.find('{')
+            end = result.rfind('}') + 1
+            if start != -1 and end > start:
+                return json.loads(result[start:end])
+        except Exception as e:
+            print(f"Classification error: {e}")
+        return {"category": "other", "domain": "mixed", "confidence": 0.0}
+    def summarize(self, text: str, max_length: int = 150) -> str:
+        """Generate concise summary."""
+        prompt = f"""Summarize this opportunity in {max_length} characters or less.
+Focus on: what it is, who it's for, and deadline if any.
+TEXT: {text[:2000]}
+Return only the summary, no quotes or labels."""
+        try:
+            return self._generate(prompt).strip()[:max_length]
+        except Exception as e:
+            print(f"Summary error: {e}")
+            return text[:max_length]
+    def recommend_action(self, opportunity: dict) -> dict:
+        """
+        PIOE 2.0 Enhanced Action Guidance.
+        Returns comprehensive recommendations for how to approach the opportunity.
+        """
+        prompt = f"""You are an expert career and opportunity advisor. Analyze this opportunity and provide detailed action guidance.
+OPPORTUNITY DETAILS:
+- Title: {opportunity.get('title', '')}
+- Category: {opportunity.get('category', '')}
+- Domain: {opportunity.get('domain', '')}
+- Deadline: {opportunity.get('deadline', 'No deadline specified')}
+- Description: {opportunity.get('raw_text', '')[:1500]}
+- ROI Score: {opportunity.get('roi_score', 'N/A')}
+- Competition Level: {opportunity.get('competition_level', 'N/A')}
+- Region: {opportunity.get('region', 'global')}
+USER CONTEXT:
+- Location: Nigeria, Africa
+- Interests: AI, Computer Vision, Robotics, Web3
+- Status: Student/Early Career
+Provide strategic action guidance. Return JSON only:
+{{
+    "primary_action": "one of: apply_now, apply_prepared, track, save_for_later, deep_research, network_first, skip",
+    "urgency": "one of: immediate, this_week, this_month, whenever, expired",
+    "timing_status": "one of: early, optimal, late, unknown",
+    "skills_to_highlight": ["skill1", "skill2", "skill3"],
+    "portfolio_pieces": ["project type 1", "project type 2"],
+    "preparation_steps": [
+        "step 1",
+        "step 2",
+        "step 3"
+    ],
+    "networking_tips": "who to contact or how to stand out (1 sentence)",
+    "differentiation_angle": "what unique angle to take (1 sentence)",
+    "success_probability": 0.0 to 1.0,
+    "time_investment_hours": estimated hours to apply well,
+    "risk_level": "low, medium, or high",
+    "why": "brief strategic reasoning (max 100 chars)",
+    "red_flags": ["any concerns"] or []
+}}"""
+        try:
+            result = self._generate(prompt)
+            start = result.find('{')
+            end = result.rfind('}') + 1
+            if start != -1 and end > start:
+                parsed = json.loads(result[start:end])
+                # Ensure required fields exist
+                return {
+                    "primary_action": parsed.get("primary_action", "save_for_later"),
+                    "urgency": parsed.get("urgency", "whenever"),
+                    "timing_status": parsed.get("timing_status", "unknown"),
+                    "skills_to_highlight": parsed.get("skills_to_highlight", []),
+                    "portfolio_pieces": parsed.get("portfolio_pieces", []),
+                    "preparation_steps": parsed.get("preparation_steps", []),
+                    "networking_tips": parsed.get("networking_tips", ""),
+                    "differentiation_angle": parsed.get("differentiation_angle", ""),
+                    "success_probability": parsed.get("success_probability", 0.3),
+                    "time_investment_hours": parsed.get("time_investment_hours", 10),
+                    "risk_level": parsed.get("risk_level", "medium"),
+                    "why": parsed.get("why", "Review and decide"),
+                    "red_flags": parsed.get("red_flags", []),
+                }
+        except Exception as e:
+            print(f"Action guidance error: {e}")
+        # Fallback response
+        return {
+            "primary_action": "save_for_later",
+            "urgency": "whenever",
+            "timing_status": "unknown",
+            "skills_to_highlight": [],
+            "portfolio_pieces": [],
+            "preparation_steps": ["Review the opportunity details", "Assess fit with your goals"],
+            "networking_tips": "",
+            "differentiation_angle": "",
+            "success_probability": 0.3,
+            "time_investment_hours": 10,
+            "risk_level": "medium",
+            "why": "Needs manual review",
+            "red_flags": [],
+        }
+    def extract_metadata(self, text: str) -> dict:
+        """Extract structured metadata from text."""
+        prompt = f"""Extract metadata from this opportunity text. Return JSON only.
+TEXT: {text[:2000]}
+Return this structure (use null for missing info):
+{{
+    "deadline": "YYYY-MM-DD or null",
+    "location": "location or 'remote' or null",
+    "reward": "amount or null",
+    "organization": "org name or null",
+    "requirements": ["skill1", "skill2"] or [],
+    "url": "application url or null"
+}}"""
+        try:
+            result = self._generate(prompt)
+            start = result.find('{')
+            end = result.rfind('}') + 1
+            if start != -1 and end > start:
+                return json.loads(result[start:end])
+        except Exception as e:
+            print(f"Metadata extraction error: {e}")
+        return {}
+class OpenAIClient(BaseLLMClient):
+    """OpenAI implementation (fallback)."""
+    def __init__(self, api_key: str):
+        from openai import OpenAI
+        self.client = OpenAI(api_key=api_key)
+        self.model = "gpt-3.5-turbo"
+    def _generate(self, prompt: str) -> str:
+        """Generate response from OpenAI."""
+        response = self.client.chat.completions.create(
+            model=self.model,
+            messages=[{"role": "user", "content": prompt}],
+            temperature=0.3
+        )
+        return response.choices[0].message.content
+    def classify(self, text: str) -> dict:
+        """Classify opportunity - same logic as Gemini."""
+        prompt = f"""Classify this opportunity. Return JSON only with keys: category, domain, confidence.
+Categories: scholarship, fellowship, internship, job, research, hackathon, competition, grant, conference, open_source, investment, weak_signal, other
+Domains: ai, computer_vision, robotics, finance, crypto, academia, mixed
+TEXT: {text[:2000]}"""
+        try:
+            result = self._generate(prompt)
+            start = result.find('{')
+            end = result.rfind('}') + 1
+            if start != -1 and end > start:
+                return json.loads(result[start:end])
+        except Exception:
+            pass
+        return {"category": "other", "domain": "mixed", "confidence": 0.0}
+    def summarize(self, text: str, max_length: int = 150) -> str:
+        prompt = f"Summarize in {max_length} chars: {text[:2000]}"
+        try:
+            return self._generate(prompt).strip()[:max_length]
+        except Exception:
+            return text[:max_length]
+    def recommend_action(self, opportunity: dict) -> dict:
+        return {"action": "save", "reason": "Review later", "urgency": "low"}
+    def extract_metadata(self, text: str) -> dict:
+        return {}
+class LLMClient:
+    """
+    Factory class that provides the configured LLM client.
+    Uses Gemini by default, falls back to OpenAI if configured.
+    """
+    _instance: Optional[BaseLLMClient] = None
+    @classmethod
+    def get_client(cls) -> BaseLLMClient:
+        """Get or create the LLM client instance."""
+        if cls._instance is None:
+            settings = get_settings()
+            if settings.ai_provider == "gemini" and settings.gemini_api_key:
+                cls._instance = GeminiClient(settings.gemini_api_key)
+            elif settings.openai_api_key:
+                cls._instance = OpenAIClient(settings.openai_api_key)
+            else:
+                # Return a mock client if no API keys configured
+                cls._instance = MockLLMClient()
+        return cls._instance
+class MockLLMClient(BaseLLMClient):
+    """Mock client for development without API keys. PIOE 2.0 compatible."""
+    def classify(self, text: str) -> dict:
+        # Basic rule-based classification
+        text_lower = text.lower()
+        if any(kw in text_lower for kw in ["scholarship", "fellowship", "grant"]):
+            return {"category": "scholarship", "domain": "academia", "confidence": 0.7}
+        elif any(kw in text_lower for kw in ["hackathon", "competition", "challenge"]):
+            return {"category": "hackathon", "domain": "ai", "confidence": 0.7}
+        elif any(kw in text_lower for kw in ["internship", "intern"]):
+            return {"category": "internship", "domain": "mixed", "confidence": 0.7}
+        elif any(kw in text_lower for kw in ["job", "hiring", "position"]):
+            return {"category": "job", "domain": "mixed", "confidence": 0.7}
+        elif any(kw in text_lower for kw in ["bounty", "ecosystem", "solana", "ethereum"]):
+            return {"category": "bounty", "domain": "crypto", "confidence": 0.7}
+        elif any(kw in text_lower for kw in ["pitch", "demo day", "accelerator"]):
+            return {"category": "pitch_event", "domain": "mixed", "confidence": 0.7}
+        elif any(kw in text_lower for kw in ["collaborat", "partner", "looking for"]):
+            return {"category": "collaboration", "domain": "mixed", "confidence": 0.6}
+        return {"category": "other", "domain": "mixed", "confidence": 0.3}
+    def summarize(self, text: str, max_length: int = 150) -> str:
+        return text[:max_length]
+    def recommend_action(self, opportunity: dict) -> dict:
+        """PIOE 2.0 action guidance - rule-based fallback."""
+        category = opportunity.get("category", "other")
+        # Category-based action mapping
+        action_map = {
+            "hackathon": ("apply_now", "this_week", ["Python", "ML/AI"], ["Previous hackathon project"]),
+            "grant": ("apply_prepared", "this_month", ["Technical writing", "Project planning"], ["Open source contributions"]),
+            "ecosystem_grant": ("apply_prepared", "this_month", ["Solidity/Rust", "Web3"], ["DApp or smart contract"]),
+            "internship": ("apply_now", "this_week", ["Relevant coursework", "Projects"], ["GitHub portfolio"]),
+            "scholarship": ("apply_prepared", "this_month", ["Academic excellence", "Leadership"], ["Research paper or thesis"]),
+            "bounty": ("apply_now", "immediate", ["Specific tech stack"], ["Related code samples"]),
+            "pitch_event": ("apply_prepared", "this_month", ["Presentation", "Business model"], ["Pitch deck", "Demo video"]),
+            "collaboration": ("network_first", "whenever", ["Domain expertise"], ["Relevant projects"]),
+        }
+        action, urgency, skills, portfolio = action_map.get(
+            category,
+            ("save_for_later", "whenever", [], [])
+        )
+        return {
+            "primary_action": action,
+            "urgency": urgency,
+            "timing_status": "unknown",
+            "skills_to_highlight": skills,
+            "portfolio_pieces": portfolio,
+            "preparation_steps": [
+                "Review the opportunity requirements",
+                "Prepare relevant materials",
+                "Submit before deadline"
+            ],
+            "networking_tips": "Research the organization and connect with past participants",
+            "differentiation_angle": "Highlight unique projects and Africa/Nigeria perspective",
+            "success_probability": 0.3,
+            "time_investment_hours": 10,
+            "risk_level": "medium",
+            "why": f"Standard approach for {category}",
+            "red_flags": [],
+        }
+    def extract_metadata(self, text: str) -> dict:
+        return {}

backend/intelligence/novelty.py ADDED Viewed

	@@ -0,0 +1,118 @@

+"""
+PIOE Novelty Detector
+Detects if an opportunity is novel or a repeat of existing content.
+Uses embedding similarity against historical database.
+"""
+from typing import Optional
+import numpy as np
+from sqlalchemy.orm import Session
+from ..models import Opportunity
+class NoveltyDetector:
+    """
+    Detects novelty by comparing against historical opportunity embeddings.
+    High novelty = new and unseen topics/opportunities.
+    """
+    def __init__(self, similarity_threshold: float = 0.85):
+        """
+        Args:
+            similarity_threshold: If similarity > threshold, item is considered duplicate.
+        """
+        self.similarity_threshold = similarity_threshold
+    def cosine_similarity(self, vec1: list[float], vec2: list[float]) -> float:
+        """Calculate cosine similarity between two vectors."""
+        a = np.array(vec1)
+        b = np.array(vec2)
+        norm_a = np.linalg.norm(a)
+        norm_b = np.linalg.norm(b)
+        if norm_a == 0 or norm_b == 0:
+            return 0.0
+        return float(np.dot(a, b) / (norm_a * norm_b))
+    def calculate_novelty(
+        self,
+        embedding: list[float],
+        db: Session,
+        limit: int = 100
+    ) -> dict:
+        """
+        Calculate novelty score by comparing against recent opportunities.
+        Returns:
+            dict with novelty_score, is_duplicate, most_similar_id
+        """
+        if not embedding:
+            return {
+                "novelty_score": 1.0,
+                "is_duplicate": False,
+                "most_similar_id": None
+            }
+        # Get recent opportunities with embeddings
+        recent = db.query(Opportunity).filter(
+            Opportunity.embedding.isnot(None)
+        ).order_by(
+            Opportunity.discovered_at.desc()
+        ).limit(limit).all()
+        if not recent:
+            return {
+                "novelty_score": 1.0,
+                "is_duplicate": False,
+                "most_similar_id": None
+            }
+        max_similarity = 0.0
+        most_similar_id = None
+        for opp in recent:
+            if opp.embedding:
+                similarity = self.cosine_similarity(embedding, opp.embedding)
+                if similarity > max_similarity:
+                    max_similarity = similarity
+                    most_similar_id = opp.id
+        # Novelty is inverse of maximum similarity
+        novelty_score = 1.0 - max_similarity
+        is_duplicate = max_similarity > self.similarity_threshold
+        return {
+            "novelty_score": round(novelty_score, 3),
+            "is_duplicate": is_duplicate,
+            "most_similar_id": most_similar_id if is_duplicate else None,
+            "max_similarity": round(max_similarity, 3)
+        }
+    def is_recycled_content(self, text: str) -> bool:
+        """
+        Rule-based check for recycled/aggregated content.
+        Returns True if content appears to be recycled.
+        """
+        if not text:
+            return False
+        text_lower = text.lower()
+        # Patterns indicating recycled content
+        recycled_patterns = [
+            "top 10",
+            "top 5",
+            "best tools",
+            "complete guide",
+            "everything you need to know",
+            "roundup",
+            "weekly digest",
+            "news summary",
+            "in case you missed",
+            "trending this week"
+        ]
+        return any(pattern in text_lower for pattern in recycled_patterns)

backend/intelligence/roi_scorer.py ADDED Viewed

	@@ -0,0 +1,340 @@

+"""
+PIOE ROI Scorer - Version 2.0
+Calculates "Is this worth my time?" score.
+Key decision intelligence for prioritizing opportunities.
+"""
+from datetime import datetime, timedelta
+from typing import Optional
+class ROIScorer:
+    """
+    Calculates ROI (Return on Investment) score for opportunities.
+    Considers:
+    - Time required
+    - Probability of success
+    - Financial/career upside
+    - Opportunity chain unlocks
+    - Competition level
+    - Regional accessibility
+    """
+    # Weights for ROI calculation
+    WEIGHTS = {
+        "time_efficiency": 0.15,
+        "success_probability": 0.25,
+        "upside_potential": 0.25,
+        "unlock_potential": 0.15,
+        "competition": 0.10,
+        "accessibility": 0.10,
+    }
+    # Category time requirements (hours)
+    CATEGORY_TIME = {
+        "hackathon": 40,
+        "grant": 20,
+        "micro_grant": 8,
+        "ecosystem_grant": 25,
+        "scholarship": 15,
+        "fellowship": 20,
+        "internship": 10,
+        "job": 5,
+        "research": 30,
+        "bounty": 15,
+        "pitch_event": 20,
+        "ambassador": 10,
+        "partnership": 5,
+    }
+    # Category upside potential (0-1)
+    CATEGORY_UPSIDE = {
+        "ecosystem_grant": 0.9,
+        "grant": 0.85,
+        "fellowship": 0.85,
+        "scholarship": 0.8,
+        "hackathon": 0.8,
+        "micro_grant": 0.6,
+        "pitch_event": 0.75,
+        "internship": 0.7,
+        "bounty": 0.5,
+        "job": 0.6,
+        "research": 0.65,
+        "ambassador": 0.4,
+        "partnership": 0.7,
+    }
+    # Category competition levels (0-1, higher = more competitive)
+    CATEGORY_COMPETITION = {
+        "scholarship": 0.9,
+        "fellowship": 0.85,
+        "job": 0.7,
+        "internship": 0.75,
+        "hackathon": 0.6,
+        "grant": 0.5,
+        "ecosystem_grant": 0.4,
+        "micro_grant": 0.3,
+        "bounty": 0.3,
+        "pitch_event": 0.5,
+        "ambassador": 0.35,
+        "partnership": 0.4,
+    }
+    # Chain unlock values (which categories open doors)
+    UNLOCK_VALUES = {
+        "hackathon": 0.8,  # Opens: grants, accelerators, jobs
+        "fellowship": 0.9,  # Opens: PhD, research, network
+        "ecosystem_grant": 0.85,  # Opens: ecosystem jobs, more grants
+        "internship": 0.7,  # Opens: full-time, network
+        "research": 0.75,  # Opens: PhD, conference, collaboration
+        "pitch_event": 0.7,  # Opens: investment, visibility
+        "bounty": 0.4,  # Opens: ecosystem roles
+        "ambassador": 0.5,  # Opens: community, ecosystem
+    }
+    def __init__(self, user_region: str = "nigeria"):
+        self.user_region = user_region.lower()
+    def calculate_roi(
+        self,
+        category: str,
+        deadline: Optional[datetime] = None,
+        grant_size: Optional[int] = None,
+        region: str = "global",
+        extra_data: dict = None
+    ) -> dict:
+        """
+        Calculate ROI score for an opportunity.
+        Returns dict with:
+        - roi_score: 0.0 to 1.0
+        - risk_level: low/medium/high
+        - unlock_potential: 0.0 to 1.0
+        - competition_level: 0.0 to 1.0
+        - reasoning: explanation
+        """
+        extra_data = extra_data or {}
+        category = category.lower() if category else "other"
+        # Calculate component scores
+        time_efficiency = self._calculate_time_efficiency(category, deadline)
+        success_prob = self._calculate_success_probability(category, extra_data)
+        upside = self._calculate_upside(category, grant_size)
+        unlock = self._calculate_unlock_potential(category)
+        competition = self._calculate_competition(category)
+        accessibility = self._calculate_accessibility(region)
+        # Weighted ROI score
+        roi_score = (
+            self.WEIGHTS["time_efficiency"] * time_efficiency +
+            self.WEIGHTS["success_probability"] * success_prob +
+            self.WEIGHTS["upside_potential"] * upside +
+            self.WEIGHTS["unlock_potential"] * unlock +
+            self.WEIGHTS["competition"] * (1 - competition) +  # Invert competition
+            self.WEIGHTS["accessibility"] * accessibility
+        )
+        # Determine risk level
+        risk_level = self._determine_risk(category, competition, deadline)
+        # Generate reasoning
+        reasoning = self._generate_reasoning(
+            category, roi_score, risk_level,
+            time_efficiency, success_prob, upside, accessibility
+        )
+        return {
+            "roi_score": round(roi_score, 3),
+            "risk_level": risk_level,
+            "unlock_potential": round(unlock, 3),
+            "competition_level": round(competition, 3),
+            "time_hours": self.CATEGORY_TIME.get(category, 15),
+            "reasoning": reasoning,
+        }
+    def _calculate_time_efficiency(
+        self,
+        category: str,
+        deadline: Optional[datetime]
+    ) -> float:
+        """Score based on time required and deadline pressure."""
+        base_hours = self.CATEGORY_TIME.get(category, 15)
+        # Lower hours = higher efficiency
+        efficiency = 1.0 - (min(base_hours, 60) / 60)
+        # Deadline factor
+        if deadline:
+            # Handle timezone-aware datetimes
+            try:
+                if deadline.tzinfo is not None:
+                    deadline = deadline.replace(tzinfo=None)
+                days_left = (deadline - datetime.utcnow()).days
+            except Exception:
+                days_left = 30  # Default if comparison fails
+            if days_left < 3:
+                efficiency *= 0.5  # Too rushed
+            elif days_left < 7:
+                efficiency *= 0.8  # Tight
+            elif days_left > 30:
+                efficiency *= 1.0  # Good time
+        return min(efficiency, 1.0)
+    def _calculate_success_probability(
+        self,
+        category: str,
+        extra_data: dict
+    ) -> float:
+        """Estimate probability of success."""
+        base_prob = {
+            "bounty": 0.7,
+            "micro_grant": 0.5,
+            "ambassador": 0.5,
+            "hackathon": 0.3,
+            "ecosystem_grant": 0.25,
+            "grant": 0.2,
+            "internship": 0.2,
+            "job": 0.15,
+            "fellowship": 0.1,
+            "scholarship": 0.1,
+        }.get(category, 0.2)
+        # Adjust based on extra data
+        if extra_data.get("technical_depth") == "beginner":
+            base_prob += 0.1
+        if extra_data.get("africa_focus") or extra_data.get("nigeria_specific"):
+            base_prob += 0.15  # Regional programs often less competitive
+        return min(base_prob, 1.0)
+    def _calculate_upside(
+        self,
+        category: str,
+        grant_size: Optional[int]
+    ) -> float:
+        """Calculate potential upside."""
+        base_upside = self.CATEGORY_UPSIDE.get(category, 0.5)
+        # Adjust for grant size
+        if grant_size:
+            if grant_size > 50000:
+                base_upside = min(base_upside + 0.2, 1.0)
+            elif grant_size > 10000:
+                base_upside = min(base_upside + 0.1, 1.0)
+        return base_upside
+    def _calculate_unlock_potential(self, category: str) -> float:
+        """Calculate what doors this opens."""
+        return self.UNLOCK_VALUES.get(category, 0.3)
+    def _calculate_competition(self, category: str) -> float:
+        """Estimate competition level."""
+        return self.CATEGORY_COMPETITION.get(category, 0.5)
+    def _calculate_accessibility(self, region: str) -> float:
+        """Calculate accessibility based on user region."""
+        region = (region or "global").lower()
+        # Perfect match
+        if region == self.user_region:
+            return 1.0
+        # Regional matches
+        if self.user_region == "nigeria":
+            if region in ["africa", "remote_africa"]:
+                return 0.9
+            elif region in ["global", "remote_global"]:
+                return 0.7
+            else:
+                return 0.3
+        # Global is accessible
+        if region in ["global", "remote_global"]:
+            return 0.8
+        return 0.5
+    def _determine_risk(
+        self,
+        category: str,
+        competition: float,
+        deadline: Optional[datetime]
+    ) -> str:
+        """Determine risk level (time sink risk)."""
+        risk_score = 0
+        # High time = high risk
+        time_hours = self.CATEGORY_TIME.get(category, 15)
+        if time_hours > 30:
+            risk_score += 2
+        elif time_hours > 15:
+            risk_score += 1
+        # High competition = high risk
+        if competition > 0.7:
+            risk_score += 2
+        elif competition > 0.5:
+            risk_score += 1
+        # Tight deadline = high risk
+        if deadline:
+            try:
+                if deadline.tzinfo is not None:
+                    deadline = deadline.replace(tzinfo=None)
+                days_left = (deadline - datetime.utcnow()).days
+            except Exception:
+                days_left = 30  # Default if comparison fails
+            if days_left < 5:
+                risk_score += 2
+        if risk_score >= 4:
+            return "high"
+        elif risk_score >= 2:
+            return "medium"
+        else:
+            return "low"
+    def _generate_reasoning(
+        self,
+        category: str,
+        roi_score: float,
+        risk_level: str,
+        time_eff: float,
+        success_prob: float,
+        upside: float,
+        accessibility: float
+    ) -> str:
+        """Generate human-readable reasoning."""
+        reasons = []
+        if roi_score > 0.7:
+            reasons.append("High-value opportunity")
+        elif roi_score > 0.5:
+            reasons.append("Moderate value")
+        else:
+            reasons.append("Consider carefully")
+        if time_eff > 0.7:
+            reasons.append("time-efficient")
+        elif time_eff < 0.4:
+            reasons.append("requires significant time")
+        if success_prob > 0.4:
+            reasons.append("good success odds")
+        elif success_prob < 0.15:
+            reasons.append("highly competitive")
+        if accessibility > 0.8:
+            reasons.append("region-accessible")
+        elif accessibility < 0.5:
+            reasons.append("may have access barriers")
+        if risk_level == "low":
+            reasons.append("low time-sink risk")
+        elif risk_level == "high":
+            reasons.append("high time investment")
+        return ". ".join(reasons) + "."

backend/intelligence/scorer.py ADDED Viewed

	@@ -0,0 +1,101 @@

+"""
+PIOE Relevance Scorer
+Calculates relevance score based on keyword matching and semantic similarity.
+"""
+from typing import Optional
+import numpy as np
+from sentence_transformers import SentenceTransformer
+from ..config import get_settings
+class RelevanceScorer:
+    """
+    Scores opportunities based on relevance to user interests.
+    Uses both keyword matching and semantic similarity.
+    """
+    def __init__(self):
+        self.settings = get_settings()
+        self._model: Optional[SentenceTransformer] = None
+        self._interest_embedding: Optional[np.ndarray] = None
+        # Build interest text from keywords
+        self.interest_text = " ".join(self.settings.high_priority_keywords)
+    @property
+    def model(self) -> SentenceTransformer:
+        """Lazy load the embedding model."""
+        if self._model is None:
+            self._model = SentenceTransformer('all-MiniLM-L6-v2')
+        return self._model
+    @property
+    def interest_embedding(self) -> np.ndarray:
+        """Get cached interest vector embedding."""
+        if self._interest_embedding is None:
+            self._interest_embedding = self.model.encode(self.interest_text)
+        return self._interest_embedding
+    def get_embedding(self, text: str) -> list[float]:
+        """Generate embedding for text."""
+        embedding = self.model.encode(text)
+        return embedding.tolist()
+    def score_keywords(self, text: str) -> float:
+        """
+        Score based on keyword presence.
+        Returns 0.0 to 1.0
+        """
+        if not text:
+            return 0.0
+        text_lower = text.lower()
+        matches = sum(
+            1 for keyword in self.settings.high_priority_keywords
+            if keyword.lower() in text_lower
+        )
+        # Normalize: more matches = higher score, capped at 1.0
+        max_expected = 5  # Expect 5+ matches for full score
+        return min(matches / max_expected, 1.0)
+    def score_semantic(self, text: str) -> float:
+        """
+        Score based on semantic similarity to interest vector.
+        Returns 0.0 to 1.0
+        """
+        if not text:
+            return 0.0
+        try:
+            text_embedding = self.model.encode(text)
+            # Cosine similarity
+            similarity = np.dot(text_embedding, self.interest_embedding) / (
+                np.linalg.norm(text_embedding) * np.linalg.norm(self.interest_embedding)
+            )
+            # Normalize from [-1, 1] to [0, 1]
+            return float((similarity + 1) / 2)
+        except Exception as e:
+            print(f"Semantic scoring error: {e}")
+            return 0.5
+    def score(self, text: str, title: str = "") -> dict:
+        """
+        Calculate combined relevance score.
+        Returns dict with individual and combined scores.
+        """
+        full_text = f"{title} {text}".strip()
+        keyword_score = self.score_keywords(full_text)
+        semantic_score = self.score_semantic(full_text)
+        # Weighted average: keywords 40%, semantic 60%
+        combined = 0.4 * keyword_score + 0.6 * semantic_score
+        return {
+            "keyword_score": round(keyword_score, 3),
+            "semantic_score": round(semantic_score, 3),
+            "relevance_score": round(combined, 3)
+        }

backend/intelligence/silent_detector.py ADDED Viewed

	@@ -0,0 +1,313 @@

+"""
+PIOE Silent Opportunities Detector - Version 2.0
+Detects implicit/hidden opportunities that are never announced clearly.
+These appear in blog posts, tweets, Discord messages, research updates.
+Examples:
+- "We're exploring ideas around..."
+- "We're looking for collaborators..."
+- "If anyone is interested..."
+- "We're building something new..."
+"""
+import re
+from typing import Optional
+class SilentOpportunityDetector:
+    """
+    Detects implicit opportunities from content that doesn't
+    explicitly announce them as opportunities.
+    """
+    # Patterns for implicit opportunities
+    SIGNAL_PATTERNS = {
+        # Pre-hiring signals
+        "pre_hiring": [
+            r"we(?:'re| are) (?:actively )?(?:looking|searching) for",
+            r"we need (?:a |someone|people)",
+            r"hiring (?:soon|next|this)",
+            r"building (?:a |our |the )?team",
+            r"if you(?:'re| are) interested in joining",
+            r"open roles? (?:coming|soon)",
+            r"dm (?:me|us) if (?:you(?:'re| are)|interested)",
+            r"reach out if",
+        ],
+        # Pre-grant signals
+        "pre_grant": [
+            r"(?:we(?:'re| are)|we will be) (?:funding|supporting|backing)",
+            r"grants? (?:coming|opening|soon|next)",
+            r"ecosystem fund",
+            r"builder(?:s)? program",
+            r"retroactive (?:funding|rewards)",
+            r"announcing.{0,30}funding",
+            r"accepting applications",
+        ],
+        # Collaboration signals
+        "collaboration": [
+            r"looking for (?:collaborators?|partners?|co-founder)",
+            r"seeking (?:collaborat|partner)",
+            r"open to (?:collaborat|partner|work)",
+            r"anyone (?:want|interested).{0,30}(?:build|work|collaborat)",
+            r"let(?:'s| us) (?:build|work|create) together",
+            r"who wants to",
+            r"exploring.{0,30}partnership",
+        ],
+        # Project/research signals
+        "research": [
+            r"we(?:'re| are) (?:exploring|researching|investigating)",
+            r"new (?:research|project|initiative)",
+            r"call for (?:papers?|proposals?|abstracts?)",
+            r"(?:research|academic) (?:collaboration|partnership)",
+            r"phd (?:position|opportunity|student)",
+            r"postdoc",
+            r"looking for (?:interns?|students?)",
+        ],
+        # Community/ambassador signals
+        "ambassador": [
+            r"ambassador program",
+            r"community (?:lead|manager|role)",
+            r"help (?:us )?(?:grow|build|spread)",
+            r"join (?:our|the) (?:community|team|movement)",
+            r"early (?:adopter|supporter)",
+        ],
+        # Investment/demo signals
+        "investment": [
+            r"demo day",
+            r"pitch (?:competition|event|day)",
+            r"investor (?:meeting|demo|call)",
+            r"raising (?:a |our )?(?:seed|round|series)",
+            r"open to (?:investment|investors)",
+        ],
+    }
+    # Strength indicators (modifiers)
+    STRENGTH_BOOSTERS = [
+        r"immediately",
+        r"urgently",
+        r"actively",
+        r"now",
+        r"today",
+        r"this week",
+        r"asap",
+        r"serious",
+        r"exciting",
+    ]
+    # Negative patterns (reduce signal)
+    NOISE_PATTERNS = [
+        r"not (?:looking|hiring|seeking)",
+        r"no longer",
+        r"was (?:looking|hiring)",
+        r"used to",
+        r"back in",
+        r"years? ago",
+        r"hypothetically",
+        r"if only",
+    ]
+    def detect(self, text: str, title: str = "") -> dict:
+        """
+        Analyze text for silent opportunity signals.
+        Returns:
+        - is_silent_opportunity: bool
+        - opportunity_type: str (pre_hiring, pre_grant, etc.)
+        - signal_strength: float (0.0 to 1.0)
+        - detected_patterns: list
+        - recommended_category: str
+        """
+        full_text = f"{title} {text}".lower()
+        # Check for noise patterns first
+        if self._has_noise(full_text):
+            return {
+                "is_silent_opportunity": False,
+                "opportunity_type": None,
+                "signal_strength": 0.0,
+                "detected_patterns": [],
+                "recommended_category": None,
+            }
+        # Detect patterns
+        detected = {}
+        for opp_type, patterns in self.SIGNAL_PATTERNS.items():
+            matches = self._find_matches(full_text, patterns)
+            if matches:
+                detected[opp_type] = matches
+        if not detected:
+            return {
+                "is_silent_opportunity": False,
+                "opportunity_type": None,
+                "signal_strength": 0.0,
+                "detected_patterns": [],
+                "recommended_category": None,
+            }
+        # Find primary opportunity type
+        primary_type = max(detected, key=lambda k: len(detected[k]))
+        # Calculate signal strength
+        signal_strength = self._calculate_strength(
+            full_text, detected, primary_type
+        )
+        # Map to category
+        category_map = {
+            "pre_hiring": "pre_hiring_signal",
+            "pre_grant": "pre_grant_signal",
+            "collaboration": "collaboration",
+            "research": "research",
+            "ambassador": "ambassador",
+            "investment": "pitch_event",
+        }
+        return {
+            "is_silent_opportunity": True,
+            "opportunity_type": primary_type,
+            "signal_strength": round(signal_strength, 3),
+            "detected_patterns": detected[primary_type],
+            "recommended_category": category_map.get(primary_type, "weak_signal"),
+        }
+    def _find_matches(self, text: str, patterns: list) -> list:
+        """Find all matching patterns in text."""
+        matches = []
+        for pattern in patterns:
+            if re.search(pattern, text, re.IGNORECASE):
+                # Extract the matching context
+                match = re.search(pattern, text, re.IGNORECASE)
+                if match:
+                    # Get surrounding context
+                    start = max(0, match.start() - 20)
+                    end = min(len(text), match.end() + 20)
+                    context = text[start:end]
+                    matches.append(context.strip())
+        return matches
+    def _has_noise(self, text: str) -> bool:
+        """Check if text contains noise patterns."""
+        for pattern in self.NOISE_PATTERNS:
+            if re.search(pattern, text, re.IGNORECASE):
+                return True
+        return False
+    def _calculate_strength(
+        self,
+        text: str,
+        detected: dict,
+        primary_type: str
+    ) -> float:
+        """Calculate signal strength."""
+        base_strength = 0.5
+        # More patterns = stronger signal
+        pattern_count = len(detected[primary_type])
+        base_strength += min(pattern_count * 0.1, 0.3)
+        # Check for strength boosters
+        for booster in self.STRENGTH_BOOSTERS:
+            if re.search(booster, text, re.IGNORECASE):
+                base_strength += 0.05
+        # Multiple types of signals = stronger
+        if len(detected) > 1:
+            base_strength += 0.1
+        # Cap at 1.0
+        return min(base_strength, 1.0)
+    def reclassify_opportunity(
+        self,
+        opportunity: dict
+    ) -> tuple[str, float]:
+        """
+        Re-evaluate an existing opportunity for silent signals.
+        Returns (new_category, confidence)
+        """
+        title = opportunity.get("title", "")
+        text = opportunity.get("raw_text", "")
+        result = self.detect(text, title)
+        if result["is_silent_opportunity"]:
+            return (
+                result["recommended_category"],
+                result["signal_strength"]
+            )
+        return (None, 0.0)
+class OpportunityLanguageDetector:
+    """
+    Detects the urgency, timing, and action language in opportunities.
+    """
+    TIMING_PATTERNS = {
+        "early": [
+            r"early (?:bird|access|application)",
+            r"just (?:launched|announced|opened)",
+            r"applications? (?:now )?open",
+            r"first (?:round|batch|cohort)",
+            r"founding",
+            r"new program",
+        ],
+        "optimal": [
+            r"applications? (?:open|accepted)",
+            r"deadline (?:is )?(?:soon|approaching)",
+            r"apply (?:now|today)",
+            r"last call",
+            r"extended deadline",
+        ],
+        "late": [
+            r"deadline (?:in )?(?:days?|hours?)",
+            r"closes? (?:soon|tomorrow|today)",
+            r"final (?:day|hour|chance)",
+            r"last (?:day|chance)",
+        ],
+    }
+    def detect_timing(self, text: str) -> str:
+        """Detect application timing."""
+        text = text.lower()
+        for timing, patterns in self.TIMING_PATTERNS.items():
+            for pattern in patterns:
+                if re.search(pattern, text, re.IGNORECASE):
+                    return timing
+        return "unknown"
+    def extract_action_items(self, text: str) -> list:
+        """Extract actionable items from text."""
+        actions = []
+        # Common action patterns
+        action_patterns = [
+            r"apply (?:at|via|through|here)",
+            r"visit (?:our|the) (?:website|page|link)",
+            r"(?:fill|submit).{0,20}(?:form|application)",
+            r"send.{0,20}(?:email|resume|cv|portfolio)",
+            r"register (?:at|on|here)",
+            r"sign up",
+            r"join.{0,20}(?:discord|telegram|slack)",
+            r"dm (?:me|us)",
+            r"follow.{0,10}on",
+        ]
+        for pattern in action_patterns:
+            match = re.search(pattern, text, re.IGNORECASE)
+            if match:
+                start = max(0, match.start() - 10)
+                end = min(len(text), match.end() + 30)
+                actions.append(text[start:end].strip())
+        return actions[:5]  # Limit to 5 actions

backend/main.py ADDED Viewed

	@@ -0,0 +1,481 @@

+"""
+PIOE - Personal Intelligence & Opportunity Engine
+FastAPI Backend Application
+"""
+from fastapi import FastAPI, Depends, HTTPException, Query, BackgroundTasks
+from fastapi.staticfiles import StaticFiles
+from fastapi.responses import HTMLResponse, JSONResponse
+from fastapi.middleware.cors import CORSMiddleware
+from sqlalchemy.orm import Session
+from datetime import datetime
+from typing import Optional
+from pathlib import Path
+from .database import get_db, init_db
+from .models import Opportunity, OpportunityCategory, OpportunityStatus, Domain
+from .delivery import DigestGenerator
+from .ingestion import IngestionScheduler
+# Initialize app
+app = FastAPI(
+    title="PIOE - Personal Intelligence & Opportunity Engine",
+    description="Signal intelligence system for opportunities in AI, Robotics, and more",
+    version="1.0.0"
+)
+# CORS middleware
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Global scheduler instance
+scheduler: Optional[IngestionScheduler] = None
+@app.on_event("startup")
+async def startup():
+    """Initialize database and scheduler on startup."""
+    init_db()
+    global scheduler
+    scheduler = IngestionScheduler()
+    # Don't auto-start scheduler - let user trigger manually first
+    print("PIOE Backend started. Run /api/ingest/start to begin ingestion.")
+@app.on_event("shutdown")
+async def shutdown():
+    """Cleanup on shutdown."""
+    global scheduler
+    if scheduler:
+        scheduler.stop()
+# ============== API Routes ==============
+@app.get("/", response_class=HTMLResponse)
+async def serve_dashboard():
+    """Serve the frontend dashboard."""
+    frontend_path = Path(__file__).parent.parent / "frontend" / "index.html"
+    if frontend_path.exists():
+        return HTMLResponse(content=frontend_path.read_text(), status_code=200)
+    return HTMLResponse(content="<h1>PIOE Dashboard - Frontend not found</h1>", status_code=200)
+# ---------- Opportunities ----------
+@app.get("/api/opportunities")
+async def get_opportunities(
+    db: Session = Depends(get_db),
+    category: Optional[str] = None,
+    domain: Optional[str] = None,
+    status: Optional[str] = None,
+    min_score: float = 0.0,
+    limit: int = Query(default=50, le=200),
+    offset: int = 0
+):
+    """Get filtered list of opportunities."""
+    query = db.query(Opportunity).filter(
+        Opportunity.combined_score >= min_score
+    )
+    if category:
+        try:
+            query = query.filter(Opportunity.category == OpportunityCategory(category))
+        except ValueError:
+            pass
+    if domain:
+        try:
+            query = query.filter(Opportunity.domain == Domain(domain))
+        except ValueError:
+            pass
+    if status:
+        try:
+            query = query.filter(Opportunity.status == OpportunityStatus(status))
+        except ValueError:
+            pass
+    total = query.count()
+    opportunities = query.order_by(
+        Opportunity.combined_score.desc()
+    ).offset(offset).limit(limit).all()
+    return {
+        "total": total,
+        "limit": limit,
+        "offset": offset,
+        "opportunities": [
+            {
+                "id": o.id,
+                "title": o.title,
+                "category": o.category.value if o.category else None,
+                "domain": o.domain.value if o.domain else None,
+                "source_name": o.source_name,
+                "url": o.url,
+                "deadline": o.deadline.isoformat() if o.deadline else None,
+                "relevance_score": o.relevance_score,
+                "novelty_score": o.novelty_score,
+                "credibility_score": o.credibility_score,
+                "combined_score": o.combined_score,
+                # PIOE 2.0 fields
+                "roi_score": getattr(o, 'roi_score', None),
+                "risk_level": o.risk_level.value if hasattr(o, 'risk_level') and o.risk_level else "medium",
+                "region": o.region.value if hasattr(o, 'region') and o.region else "global",
+                "status": o.status.value if o.status else None,
+                "discovered_at": o.discovered_at.isoformat() if o.discovered_at else None,
+                "raw_text": o.raw_text[:500] if o.raw_text else None
+            }
+            for o in opportunities
+        ]
+    }
+@app.get("/api/opportunities/{opportunity_id}")
+async def get_opportunity(opportunity_id: str, db: Session = Depends(get_db)):
+    """Get single opportunity by ID with full PIOE 2.0 details."""
+    opp = db.query(Opportunity).filter(Opportunity.id == opportunity_id).first()
+    if not opp:
+        raise HTTPException(status_code=404, detail="Opportunity not found")
+    return {
+        "id": opp.id,
+        "title": opp.title,
+        "category": opp.category.value if opp.category else None,
+        "domain": opp.domain.value if opp.domain else None,
+        "source_name": opp.source_name,
+        "source_type": opp.source_type.value if opp.source_type else None,
+        "url": opp.url,
+        "deadline": opp.deadline.isoformat() if opp.deadline else None,
+        "published_at": opp.published_at.isoformat() if opp.published_at else None,
+        "discovered_at": opp.discovered_at.isoformat() if opp.discovered_at else None,
+        "raw_text": opp.raw_text,
+        # Core scores
+        "relevance_score": opp.relevance_score,
+        "novelty_score": opp.novelty_score,
+        "credibility_score": opp.credibility_score,
+        "signal_strength": opp.signal_strength,
+        "combined_score": opp.combined_score,
+        # PIOE 2.0: Decision intelligence
+        "roi_score": getattr(opp, 'roi_score', None),
+        "unlock_potential": getattr(opp, 'unlock_potential', None),
+        "risk_level": opp.risk_level.value if hasattr(opp, 'risk_level') and opp.risk_level else "medium",
+        "competition_level": getattr(opp, 'competition_level', None),
+        # PIOE 2.0: Regional
+        "region": opp.region.value if hasattr(opp, 'region') and opp.region else "global",
+        "region_weight": getattr(opp, 'region_weight', 1.0),
+        # Status and metadata
+        "status": opp.status.value if opp.status else None,
+        "metadata": opp.extra_data
+    }
+@app.get("/api/opportunities/{opportunity_id}/guidance")
+async def get_action_guidance(opportunity_id: str, db: Session = Depends(get_db)):
+    """PIOE 2.0: Get AI-powered action guidance for an opportunity."""
+    from .intelligence import LLMClient
+    opp = db.query(Opportunity).filter(Opportunity.id == opportunity_id).first()
+    if not opp:
+        raise HTTPException(status_code=404, detail="Opportunity not found")
+    # Build opportunity dict for LLM
+    opp_dict = {
+        "title": opp.title,
+        "category": opp.category.value if opp.category else "other",
+        "domain": opp.domain.value if opp.domain else "mixed",
+        "deadline": opp.deadline.isoformat() if opp.deadline else None,
+        "raw_text": opp.raw_text or "",
+        "roi_score": getattr(opp, 'roi_score', 0.5),
+        "competition_level": getattr(opp, 'competition_level', 0.5),
+        "region": opp.region.value if hasattr(opp, 'region') and opp.region else "global",
+    }
+    # Get action guidance from LLM
+    llm = LLMClient.get_client()
+    guidance = llm.recommend_action(opp_dict)
+    return {
+        "opportunity_id": opportunity_id,
+        "guidance": guidance
+    }
+@app.patch("/api/opportunities/{opportunity_id}/status")
+async def update_opportunity_status(
+    opportunity_id: str,
+    status: str,
+    db: Session = Depends(get_db)
+):
+    """Update opportunity status (save, apply, dismiss, etc.)."""
+    opp = db.query(Opportunity).filter(Opportunity.id == opportunity_id).first()
+    if not opp:
+        raise HTTPException(status_code=404, detail="Opportunity not found")
+    try:
+        opp.status = OpportunityStatus(status)
+        db.commit()
+        return {"success": True, "new_status": status}
+    except ValueError:
+        raise HTTPException(status_code=400, detail=f"Invalid status: {status}")
+# ---------- Digest ----------
+@app.get("/api/digest/daily")
+async def get_daily_digest(db: Session = Depends(get_db), limit: int = 10):
+    """Get today's opportunity digest."""
+    generator = DigestGenerator(db)
+    digest = generator.generate_daily(limit)
+    return {"digest": digest}
+@app.get("/api/digest/weekly")
+async def get_weekly_digest(db: Session = Depends(get_db), limit: int = 25):
+    """Get weekly opportunity digest."""
+    generator = DigestGenerator(db)
+    digest = generator.generate_weekly(limit)
+    return {"digest": digest}
+@app.get("/api/digest/urgent")
+async def get_urgent_digest(db: Session = Depends(get_db), limit: int = 10):
+    """Get urgent opportunities with approaching deadlines."""
+    generator = DigestGenerator(db)
+    digest = generator.generate_urgent(limit)
+    return {"digest": digest}
+@app.get("/api/digest/{category}")
+async def get_category_digest(
+    category: str,
+    db: Session = Depends(get_db),
+    limit: int = 10
+):
+    """Get digest for specific category."""
+    try:
+        cat = OpportunityCategory(category)
+    except ValueError:
+        raise HTTPException(status_code=400, detail=f"Invalid category: {category}")
+    generator = DigestGenerator(db)
+    digest = generator.generate_by_category(cat, limit)
+    return {"digest": digest}
+# ---------- Ingestion Control ----------
+@app.post("/api/ingest/run")
+async def run_ingestion(background_tasks: BackgroundTasks):
+    """Trigger full ingestion manually."""
+    global scheduler
+    if not scheduler:
+        scheduler = IngestionScheduler()
+    background_tasks.add_task(scheduler.run_full_ingestion)
+    return {"message": "Ingestion started in background"}
+@app.post("/api/ingest/source/{source_name}")
+async def run_source_ingestion(source_name: str, background_tasks: BackgroundTasks):
+    """Trigger ingestion for specific source."""
+    global scheduler
+    if not scheduler:
+        scheduler = IngestionScheduler()
+    background_tasks.add_task(scheduler.ingest_single_source, source_name)
+    return {"message": f"Ingestion started for {source_name}"}
+@app.post("/api/ingest/start")
+async def start_scheduler():
+    """Start the automatic ingestion scheduler."""
+    global scheduler
+    if not scheduler:
+        scheduler = IngestionScheduler()
+    scheduler.start()
+    return {"message": "Scheduler started"}
+@app.post("/api/ingest/stop")
+async def stop_scheduler():
+    """Stop the automatic ingestion scheduler."""
+    global scheduler
+    if scheduler:
+        scheduler.stop()
+    return {"message": "Scheduler stopped"}
+# ---------- Stats ----------
+@app.get("/api/stats")
+async def get_stats(db: Session = Depends(get_db)):
+    """Get overview statistics."""
+    from sqlalchemy import func
+    total = db.query(Opportunity).count()
+    new_count = db.query(Opportunity).filter(
+        Opportunity.status == OpportunityStatus.NEW
+    ).count()
+    # Category breakdown
+    categories = db.query(
+        Opportunity.category, func.count(Opportunity.id)
+    ).group_by(Opportunity.category).all()
+    # Domain breakdown
+    domains = db.query(
+        Opportunity.domain, func.count(Opportunity.id)
+    ).group_by(Opportunity.domain).all()
+    return {
+        "total_opportunities": total,
+        "new_opportunities": new_count,
+        "by_category": {
+            cat.value if cat else "unknown": count
+            for cat, count in categories
+        },
+        "by_domain": {
+            dom.value if dom else "unknown": count
+            for dom, count in domains
+        }
+    }
+# ---------- AI Chat ----------
+from pydantic import BaseModel
+class ChatMessage(BaseModel):
+    message: str
+@app.post("/api/chat")
+async def chat_with_opportunities(
+    chat: ChatMessage,
+    db: Session = Depends(get_db)
+):
+    """
+    PIOE 2.0: AI-powered chat to search and explore opportunities.
+    Ask questions like:
+    - "Find me hackathons in Nigeria"
+    - "What grants are available for AI projects?"
+    - "Show me high ROI opportunities with low competition"
+    """
+    from .intelligence import LLMClient
+    user_message = chat.message.strip()
+    if not user_message:
+        return {"response": "Please ask a question about opportunities.", "opportunities": []}
+    # Get all opportunities for context (limit to recent high-scoring ones)
+    opportunities = db.query(Opportunity).filter(
+        Opportunity.combined_score >= 0.3
+    ).order_by(Opportunity.combined_score.desc()).limit(100).all()
+    # Build context for LLM
+    opp_summaries = []
+    for o in opportunities:
+        summary = f"[{o.id}] {o.title} | Category: {o.category.value if o.category else 'other'} | Domain: {o.domain.value if o.domain else 'mixed'} | Region: {o.region.value if hasattr(o, 'region') and o.region else 'global'} | ROI: {getattr(o, 'roi_score', 0.5):.0%} | Risk: {o.risk_level.value if hasattr(o, 'risk_level') and o.risk_level else 'medium'}"
+        opp_summaries.append(summary)
+    opp_context = "\n".join(opp_summaries[:50]) if opp_summaries else "No opportunities found in database."
+    # Create prompt for LLM
+    prompt = f"""You are PIOE, a Personal Intelligence & Opportunity Engine assistant.
+The user is from Nigeria and interested in AI, Computer Vision, Robotics, and Web3 opportunities.
+AVAILABLE OPPORTUNITIES:
+{opp_context}
+USER QUESTION: {user_message}
+Instructions:
+1. Answer the user's question based on the opportunities above
+2. If they're searching for specific types, list the most relevant opportunity IDs
+3. Provide actionable advice
+4. Be concise but helpful
+5. If no matching opportunities exist, suggest what to search for
+Return a JSON response:
+{{
+    "response": "Your helpful answer here",
+    "matched_ids": ["id1", "id2"] or [] if none match,
+    "suggested_action": "What the user should do next"
+}}"""
+    try:
+        llm = LLMClient.get_client()
+        result = llm._generate(prompt) if hasattr(llm, '_generate') else '{"response": "AI not configured", "matched_ids": [], "suggested_action": "Configure Gemini API key"}'
+        import json
+        # Try to parse JSON response
+        start = result.find('{')
+        end = result.rfind('}') + 1
+        if start != -1 and end > start:
+            parsed = json.loads(result[start:end])
+            response_text = parsed.get("response", result)
+            matched_ids = parsed.get("matched_ids", [])
+            suggested_action = parsed.get("suggested_action", "")
+        else:
+            response_text = result
+            matched_ids = []
+            suggested_action = ""
+        # Get the matched opportunities
+        matched_opps = []
+        if matched_ids:
+            for opp in opportunities:
+                if opp.id in matched_ids:
+                    matched_opps.append({
+                        "id": opp.id,
+                        "title": opp.title,
+                        "category": opp.category.value if opp.category else None,
+                        "domain": opp.domain.value if opp.domain else None,
+                        "url": opp.url,
+                        "roi_score": getattr(opp, 'roi_score', None),
+                        "risk_level": opp.risk_level.value if hasattr(opp, 'risk_level') and opp.risk_level else "medium",
+                        "region": opp.region.value if hasattr(opp, 'region') and opp.region else "global",
+                    })
+        return {
+            "response": response_text,
+            "opportunities": matched_opps[:10],
+            "suggested_action": suggested_action,
+            "total_searched": len(opportunities)
+        }
+    except Exception as e:
+        # Fallback: Simple keyword search
+        keywords = user_message.lower().split()
+        matched = []
+        for o in opportunities:
+            text = f"{o.title} {o.raw_text or ''}".lower()
+            if any(kw in text for kw in keywords):
+                matched.append({
+                    "id": o.id,
+                    "title": o.title,
+                    "category": o.category.value if o.category else None,
+                    "url": o.url,
+                    "roi_score": getattr(o, 'roi_score', None),
+                })
+        return {
+            "response": f"Found {len(matched)} opportunities matching your search. (AI unavailable: {str(e)[:50]})",
+            "opportunities": matched[:10],
+            "suggested_action": "Click on any opportunity for details",
+            "total_searched": len(opportunities)
+        }
+# Mount static files (frontend assets)
+frontend_dir = Path(__file__).parent.parent / "frontend"
+if frontend_dir.exists():
+    app.mount("/static", StaticFiles(directory=str(frontend_dir)), name="static")

backend/models.py ADDED Viewed

	@@ -0,0 +1,237 @@

+"""
+PIOE Database Models - Version 2.0
+Personal Advantage Engine
+"""
+from sqlalchemy import Column, String, Float, DateTime, Text, Boolean, Integer, JSON, ForeignKey, Enum as SQLEnum
+from sqlalchemy.orm import relationship
+from datetime import datetime
+import uuid
+import enum
+from .database import Base
+class OpportunityCategory(str, enum.Enum):
+    """Categories for opportunity classification - PIOE 2.0 Extended."""
+    # Standard opportunities
+    SCHOLARSHIP = "scholarship"
+    FELLOWSHIP = "fellowship"
+    INTERNSHIP = "internship"
+    JOB = "job"
+    RESEARCH = "research"
+    HACKATHON = "hackathon"
+    COMPETITION = "competition"
+    CONFERENCE = "conference"
+    OPEN_SOURCE = "open_source"
+    # Grant types (PIOE 2.0)
+    GRANT = "grant"
+    MICRO_GRANT = "micro_grant"
+    ECOSYSTEM_GRANT = "ecosystem_grant"
+    INNOVATION_FUND = "innovation_fund"
+    # Partnership & Collaboration (PIOE 2.0)
+    PARTNERSHIP = "partnership"
+    COLLABORATION = "collaboration"
+    # Events & Showcases (PIOE 2.0)
+    PITCH_EVENT = "pitch_event"
+    DEMO_DAY = "demo_day"
+    TALENT_CALL = "talent_call"
+    # Web3/Crypto specific (PIOE 2.0)
+    BOUNTY = "bounty"
+    AMBASSADOR = "ambassador"
+    # Silent/Implicit opportunities (PIOE 2.0)
+    PRE_GRANT_SIGNAL = "pre_grant_signal"
+    PRE_HIRING_SIGNAL = "pre_hiring_signal"
+    WEAK_SIGNAL = "weak_signal"
+    # Other
+    INVESTMENT = "investment"
+    OTHER = "other"
+class OpportunityStatus(str, enum.Enum):
+    """User interaction status."""
+    NEW = "new"
+    SAVED = "saved"
+    APPLIED = "applied"
+    TRACKING = "tracking"
+    DISMISSED = "dismissed"
+    EXPIRED = "expired"
+class SourceType(str, enum.Enum):
+    """Types of data sources."""
+    ARXIV = "arxiv"
+    GITHUB = "github"
+    RSS = "rss"
+    REDDIT = "reddit"
+    TWITTER = "twitter"
+    LINKEDIN = "linkedin"
+    SUPERTEAM = "superteam"
+    WEB_SCRAPE = "web_scrape"
+    DISCORD = "discord"
+    GOV_PORTAL = "gov_portal"
+    GRANT_PLATFORM = "grant_platform"
+class Domain(str, enum.Enum):
+    """Domain classification."""
+    AI = "ai"
+    COMPUTER_VISION = "computer_vision"
+    ROBOTICS = "robotics"
+    FINANCE = "finance"
+    CRYPTO = "crypto"
+    ACADEMIA = "academia"
+    WEB3 = "web3"
+    MIXED = "mixed"
+class Region(str, enum.Enum):
+    """Regional accessibility - PIOE 2.0."""
+    NIGERIA = "nigeria"
+    AFRICA = "africa"
+    GLOBAL = "global"
+    REMOTE_AFRICA = "remote_africa"  # Remote but Africa-accessible
+    REMOTE_GLOBAL = "remote_global"
+class RiskLevel(str, enum.Enum):
+    """Time investment risk level."""
+    LOW = "low"
+    MEDIUM = "medium"
+    HIGH = "high"
+class Source(Base):
+    """Data source configuration."""
+    __tablename__ = "sources"
+    id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4()))
+    name = Column(String, nullable=False)
+    type = Column(SQLEnum(SourceType), nullable=False)
+    url = Column(String)
+    config = Column(JSON, default={})
+    credibility_score = Column(Float, default=0.7)
+    last_fetch = Column(DateTime)
+    is_active = Column(Boolean, default=True)
+    created_at = Column(DateTime, default=datetime.utcnow)
+    opportunities = relationship("Opportunity", back_populates="source")
+class Opportunity(Base):
+    """Normalized opportunity item - PIOE 2.0 Enhanced."""
+    __tablename__ = "opportunities"
+    id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4()))
+    title = Column(String, nullable=False)
+    source_id = Column(String, ForeignKey("sources.id"))
+    source_name = Column(String)
+    source_type = Column(SQLEnum(SourceType))
+    domain = Column(SQLEnum(Domain), default=Domain.MIXED)
+    category = Column(SQLEnum(OpportunityCategory), default=OpportunityCategory.OTHER)
+    # Regional accessibility (PIOE 2.0)
+    region = Column(SQLEnum(Region), default=Region.GLOBAL)
+    region_weight = Column(Float, default=1.0)  # 1.0 = perfect match for user
+    # Timestamps
+    discovered_at = Column(DateTime, default=datetime.utcnow)
+    published_at = Column(DateTime)
+    deadline = Column(DateTime)
+    # Content
+    raw_text = Column(Text)
+    summary = Column(Text)
+    url = Column(String)
+    # Core Scores (0.0 to 1.0)
+    relevance_score = Column(Float, default=0.0)
+    novelty_score = Column(Float, default=1.0)
+    credibility_score = Column(Float, default=0.5)
+    signal_strength = Column(Float, default=0.5)
+    combined_score = Column(Float, default=0.0)
+    # PIOE 2.0: Decision Intelligence Scores
+    roi_score = Column(Float, default=0.5)  # Is this worth my time?
+    unlock_potential = Column(Float, default=0.0)  # Opens doors to what?
+    risk_level = Column(SQLEnum(RiskLevel), default=RiskLevel.MEDIUM)
+    competition_level = Column(Float, default=0.5)  # Estimated competition
+    # Social engagement (from social sources)
+    social_engagement = Column(Integer, default=0)
+    # User status
+    status = Column(SQLEnum(OpportunityStatus), default=OpportunityStatus.NEW)
+    # Grant-specific metadata (PIOE 2.0)
+    # Stored in extra_data:
+    # - grant_size_min, grant_size_max
+    # - required_output (MVP, paper, OSS)
+    # - timeline_months
+    # - ecosystem (ethereum, solana, government)
+    # - eligibility_regions
+    # - technical_depth
+    # Action guidance (PIOE 2.0)
+    # Stored in extra_data:
+    # - recommended_action
+    # - skill_to_highlight
+    # - timing (early/optimal/late)
+    # - success_probability
+    # - preparation_steps
+    # Opportunity chaining (PIOE 2.0)
+    # - chain_next: list of potential next opportunity IDs
+    # - chain_unlocks: what this unlocks
+    extra_data = Column(JSON, default={})
+    # Embedding for novelty detection
+    embedding = Column(JSON)
+    source = relationship("Source", back_populates="opportunities")
+    interactions = relationship("UserInteraction", back_populates="opportunity")
+class UserInteraction(Base):
+    """Track user actions for personalization."""
+    __tablename__ = "user_interactions"
+    id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4()))
+    opportunity_id = Column(String, ForeignKey("opportunities.id"))
+    action = Column(String)  # view, apply, save, dismiss, track
+    timestamp = Column(DateTime, default=datetime.utcnow)
+    opportunity = relationship("Opportunity", back_populates="interactions")
+class Author(Base):
+    """Track authors for credibility and social graph."""
+    __tablename__ = "authors"
+    id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4()))
+    name = Column(String, nullable=False)
+    platform = Column(String)  # reddit, twitter, github, etc.
+    platform_id = Column(String)  # username or ID on platform
+    credibility_score = Column(Float, default=0.5)
+    opportunity_creator_score = Column(Float, default=0.0)  # Do they create opportunities?
+    first_seen = Column(DateTime, default=datetime.utcnow)
+    extra_data = Column(JSON, default={})
+class OpportunityChain(Base):
+    """Track opportunity sequences/paths - PIOE 2.0."""
+    __tablename__ = "opportunity_chains"
+    id = Column(String, primary_key=True, default=lambda: str(uuid.uuid4()))
+    name = Column(String)  # e.g., "Hackathon to Startup Path"
+    description = Column(Text)
+    steps = Column(JSON)  # Ordered list of opportunity categories/types
+    success_rate = Column(Float, default=0.0)
+    example_urls = Column(JSON, default=[])
+    created_at = Column(DateTime, default=datetime.utcnow)

config/sources.yaml ADDED Viewed

	@@ -0,0 +1,135 @@

+# PIOE Default Sources Configuration
+# arXiv Categories
+arxiv:
+  enabled: true
+  categories:
+    - cs.CV   # Computer Vision
+    - cs.RO   # Robotics
+    - cs.AI   # Artificial Intelligence
+    - cs.LG   # Machine Learning
+    - cs.CL   # Natural Language Processing
+  max_results: 50
+  schedule: "daily"
+# GitHub Topics/Search
+github:
+  enabled: true
+  topics:
+    - computer-vision
+    - robotics
+    - machine-learning
+    - deep-learning
+    - ros
+    - pytorch
+    - transformers
+    - llm
+  min_stars: 50
+  schedule: "daily"
+# RSS Feeds
+rss:
+  enabled: true
+  feeds:
+    # AI Research Labs
+    - name: "Google AI Blog"
+      url: "https://blog.google/technology/ai/rss/"
+      type: blog
+    - name: "OpenAI Blog"
+      url: "https://openai.com/blog/rss/"
+      type: blog
+    - name: "DeepMind Blog"
+      url: "https://www.deepmind.com/blog/rss.xml"
+      type: blog
+    # Tech News
+    - name: "Hacker News - AI"
+      url: "https://hnrss.org/newest?q=ai+machine+learning"
+      type: news
+    - name: "Hacker News - Robotics"
+      url: "https://hnrss.org/newest?q=robotics"
+      type: news
+    - name: "TechCrunch AI"
+      url: "https://techcrunch.com/category/artificial-intelligence/feed/"
+      type: news
+# Reddit Subreddits
+reddit:
+  enabled: true
+  subreddits:
+    - computervision
+    - robotics
+    - MachineLearning
+    - artificial
+    - learnmachinelearning
+    - deeplearning
+    - hackathons
+    - scholarships
+    - cscareerquestions
+  min_score: 10
+  schedule: "every_6_hours"
+# Superteam (Web3/Crypto Opportunities)
+superteam:
+  enabled: true
+  focus:
+    - bounties
+    - grants
+    - hackathons
+  schedule: "daily"
+# Major Tech Company Careers
+careers:
+  enabled: true
+  companies:
+    - name: Microsoft
+      keywords: ["computer vision", "robotics", "AI", "machine learning", "intern"]
+    - name: NVIDIA
+      keywords: ["deep learning", "computer vision", "robotics", "intern"]
+    - name: Google
+      keywords: ["machine learning", "research", "robotics", "intern"]
+    - name: Meta
+      keywords: ["AI", "research", "robotics", "computer vision", "intern"]
+    - name: OpenAI
+      keywords: ["research", "engineering"]
+    - name: DeepMind
+      keywords: ["research", "robotics"]
+    - name: "Boston Dynamics"
+      keywords: ["robotics", "perception", "control"]
+    - name: "Tesla AI"
+      keywords: ["autopilot", "optimus", "robotics", "computer vision"]
+  schedule: "daily"
+# Web Scraping Targets
+scraper:
+  enabled: true
+  targets:
+    # Hackathons
+    - name: "Devpost Hackathons"
+      url: "https://devpost.com/hackathons"
+      type: hackathon
+    - name: "MLH Events"
+      url: "https://mlh.io/seasons/2024/events"
+      type: hackathon
+    # Scholarships
+    - name: "FindAPhD"
+      url: "https://www.findaphd.com/phds/?Keywords=computer+vision+robotics"
+      type: scholarship
+  schedule: "daily"
+# Scheduling
+schedule:
+  full_ingestion_hours: 6
+  priority_ingestion_hours: 2
+# Scoring Thresholds
+scoring:
+  min_relevance: 0.4
+  min_novelty: 0.3
+  min_credibility: 0.5

frontend/app.js ADDED Viewed

	@@ -0,0 +1,660 @@

+/**
+ * PIOE - Personal Intelligence & Opportunity Engine
+ * Frontend JavaScript Application
+ */
+class PIOEApp {
+    constructor() {
+        this.currentCategory = null;
+        this.currentDomain = null;
+        this.minScore = 0;
+        this.opportunities = [];
+        this.init();
+    }
+    init() {
+        this.bindEvents();
+        this.loadStats();
+        this.loadOpportunities();
+    }
+    bindEvents() {
+        // Navigation items
+        document.querySelectorAll('.nav-item[data-view]').forEach(item => {
+            item.addEventListener('click', (e) => {
+                e.preventDefault();
+                this.setActiveNav(item);
+                this.handleViewChange(item.dataset.view);
+            });
+        });
+        // Category filters
+        document.querySelectorAll('.nav-item[data-category]').forEach(item => {
+            item.addEventListener('click', (e) => {
+                e.preventDefault();
+                this.setActiveNav(item);
+                this.currentCategory = item.dataset.category;
+                this.loadOpportunities();
+                this.showFeedView();
+            });
+        });
+        // Domain filter
+        document.getElementById('domain-filter').addEventListener('change', (e) => {
+            this.currentDomain = e.target.value || null;
+            this.loadOpportunities();
+        });
+        // Score filter
+        document.getElementById('score-filter').addEventListener('change', (e) => {
+            this.minScore = parseFloat(e.target.value) || 0;
+            this.loadOpportunities();
+        });
+        // Run ingestion
+        document.getElementById('run-ingestion').addEventListener('click', (e) => {
+            e.preventDefault();
+            this.runIngestion();
+        });
+        // View stats
+        document.getElementById('view-stats').addEventListener('click', (e) => {
+            e.preventDefault();
+            this.showStatsModal();
+        });
+        // Modal close
+        document.querySelector('.modal-close').addEventListener('click', () => {
+            this.closeModal();
+        });
+        document.querySelector('.modal-backdrop').addEventListener('click', () => {
+            this.closeModal();
+        });
+        // PIOE 2.0: AI Chat
+        document.getElementById('open-chat')?.addEventListener('click', (e) => {
+            e.preventDefault();
+            this.toggleChat();
+        });
+    }
+    // PIOE 2.0: Chat Methods
+    toggleChat() {
+        const panel = document.getElementById('chat-panel');
+        panel.classList.toggle('active');
+    }
+    async sendChatMessage() {
+        const input = document.getElementById('chat-input');
+        const messagesContainer = document.getElementById('chat-messages');
+        const message = input.value.trim();
+        if (!message) return;
+        // Add user message to chat
+        messagesContainer.innerHTML += `
+            <div class="chat-message user">
+                <p>${this.escapeHtml(message)}</p>
+            </div>
+        `;
+        input.value = '';
+        messagesContainer.scrollTop = messagesContainer.scrollHeight;
+        // Add loading indicator
+        const loadingId = `loading-${Date.now()}`;
+        messagesContainer.innerHTML += `
+            <div class="chat-message bot" id="${loadingId}">
+                <p>[...] Searching opportunities...</p>
+            </div>
+        `;
+        messagesContainer.scrollTop = messagesContainer.scrollHeight;
+        try {
+            const response = await fetch('/api/chat', {
+                method: 'POST',
+                headers: { 'Content-Type': 'application/json' },
+                body: JSON.stringify({ message })
+            });
+            const data = await response.json();
+            // Remove loading indicator
+            document.getElementById(loadingId)?.remove();
+            // Build response HTML
+            let responseHtml = `<p>${this.escapeHtml(data.response || 'No response')}</p>`;
+            // Add matched opportunities if any
+            if (data.opportunities && data.opportunities.length > 0) {
+                responseHtml += `<div style="margin-top: 12px">`;
+                for (const opp of data.opportunities) {
+                    const roiDisplay = opp.roi_score ? `${Math.round(opp.roi_score * 100)}% ROI` : '';
+                    responseHtml += `
+                        <a href="${opp.url}" target="_blank" class="opp-link">
+                            ${this.getCategoryEmoji(opp.category)} ${this.escapeHtml(opp.title.slice(0, 60))}${opp.title.length > 60 ? '...' : ''}
+                            <span style="opacity: 0.7; margin-left: 8px">${roiDisplay}</span>
+                        </a>
+                    `;
+                }
+                responseHtml += `</div>`;
+            }
+            // Add suggested action if any
+            if (data.suggested_action) {
+                responseHtml += `<p style="margin-top: 12px; font-style: italic; opacity: 0.8">[TIP] ${this.escapeHtml(data.suggested_action)}</p>`;
+            }
+            messagesContainer.innerHTML += `
+                <div class="chat-message bot">
+                    ${responseHtml}
+                </div>
+            `;
+        } catch (error) {
+            document.getElementById(loadingId)?.remove();
+            messagesContainer.innerHTML += `
+                <div class="chat-message bot">
+                    <p style="color: var(--danger)">Error: ${error.message}</p>
+                </div>
+            `;
+        }
+        messagesContainer.scrollTop = messagesContainer.scrollHeight;
+    }
+    setActiveNav(activeItem) {
+        document.querySelectorAll('.nav-item').forEach(item => {
+            item.classList.remove('active');
+        });
+        activeItem.classList.add('active');
+    }
+    handleViewChange(view) {
+        if (view === 'feed') {
+            this.currentCategory = null;
+            this.loadOpportunities();
+            this.showFeedView();
+            this.updateHeader('Opportunity Feed', 'High-signal opportunities detected by PIOE');
+        } else if (view === 'digest') {
+            this.loadDigest('daily');
+            this.showDigestView();
+            this.updateHeader('Daily Brief', 'Your personalized intelligence report');
+        } else if (view === 'urgent') {
+            this.loadDigest('urgent');
+            this.showDigestView();
+            this.updateHeader('Urgent Opportunities', 'Deadlines approaching soon');
+        }
+    }
+    updateHeader(title, subtitle) {
+        document.getElementById('page-title').textContent = title;
+        document.getElementById('page-subtitle').textContent = subtitle;
+    }
+    showFeedView() {
+        document.getElementById('opportunity-feed').style.display = 'flex';
+        document.getElementById('digest-view').style.display = 'none';
+    }
+    showDigestView() {
+        document.getElementById('opportunity-feed').style.display = 'none';
+        document.getElementById('digest-view').style.display = 'block';
+    }
+    async loadStats() {
+        try {
+            const response = await fetch('/api/stats');
+            const stats = await response.json();
+            document.getElementById('total-count').textContent = stats.total_opportunities || 0;
+            document.getElementById('new-count').textContent = stats.new_opportunities || 0;
+            document.getElementById('hackathon-count').textContent = stats.by_category?.hackathon || 0;
+            document.getElementById('internship-count').textContent = stats.by_category?.internship || 0;
+        } catch (error) {
+            console.error('Failed to load stats:', error);
+        }
+    }
+    async loadOpportunities() {
+        const feed = document.getElementById('opportunity-feed');
+        feed.innerHTML = '<div class="loading">Loading opportunities...</div>';
+        try {
+            const params = new URLSearchParams();
+            if (this.currentCategory) params.set('category', this.currentCategory);
+            if (this.currentDomain) params.set('domain', this.currentDomain);
+            if (this.minScore) params.set('min_score', this.minScore);
+            params.set('limit', '50');
+            const response = await fetch(`/api/opportunities?${params}`);
+            const data = await response.json();
+            this.opportunities = data.opportunities || [];
+            this.renderOpportunities();
+        } catch (error) {
+            feed.innerHTML = `<div class="loading">Error loading opportunities: ${error.message}</div>`;
+        }
+    }
+    renderOpportunities() {
+        const feed = document.getElementById('opportunity-feed');
+        if (this.opportunities.length === 0) {
+            feed.innerHTML = `
+                <div class="loading">
+                    No opportunities found. Try running ingestion first!
+                </div>
+            `;
+            return;
+        }
+        feed.innerHTML = this.opportunities.map(opp => this.renderOpportunityCard(opp)).join('');
+        // Bind card click events
+        feed.querySelectorAll('.opportunity-card').forEach((card, index) => {
+            card.addEventListener('click', () => {
+                this.showOpportunityDetail(this.opportunities[index]);
+            });
+            // Action buttons
+            card.querySelector('.action-btn.primary')?.addEventListener('click', (e) => {
+                e.stopPropagation();
+                window.open(this.opportunities[index].url, '_blank');
+            });
+            card.querySelector('.action-btn.secondary')?.addEventListener('click', (e) => {
+                e.stopPropagation();
+                this.updateStatus(this.opportunities[index].id, 'saved');
+            });
+        });
+    }
+    renderOpportunityCard(opp) {
+        const category = opp.category || 'other';
+        const categoryEmoji = this.getCategoryEmoji(category);
+        const scorePercent = Math.round((opp.combined_score || 0) * 100);
+        const roiPercent = Math.round((opp.roi_score || 0.5) * 100);
+        const riskLevel = opp.risk_level || 'medium';
+        const region = opp.region || 'global';
+        let deadlineBadge = '';
+        if (opp.deadline) {
+            const daysLeft = Math.ceil((new Date(opp.deadline) - new Date()) / (1000 * 60 * 60 * 24));
+            let urgency = 'ok';
+            if (daysLeft < 7) urgency = 'urgent';
+            else if (daysLeft < 14) urgency = 'soon';
+            deadlineBadge = `
+                <span class="deadline-badge ${urgency}">
+                    [!] ${daysLeft} days left
+                </span>
+            `;
+        }
+        // Risk level badge
+        const riskColors = { low: '#10b981', medium: '#f59e0b', high: '#ef4444' };
+        const riskLabels = { low: '[OK]', medium: '[!]', high: '[!!]' };
+        // Region badge
+        const regionLabels = { nigeria: 'NG', africa: 'AFR', global: 'GLB', remote_africa: 'AFR-R', remote_global: 'GLB-R' };
+        return `
+            <div class="opportunity-card">
+                <div class="card-header">
+                    <span class="card-category ${category}">
+                        ${categoryEmoji} ${category.replace('_', ' ')}
+                    </span>
+                    <div class="card-score">
+                        <div class="score-bar">
+                            <div class="score-fill" style="width: ${scorePercent}%"></div>
+                        </div>
+                        <span>${scorePercent}%</span>
+                    </div>
+                </div>
+                <h3 class="card-title">${this.escapeHtml(opp.title)}</h3>
+                <div class="card-meta">
+                    <span>[SRC] ${opp.source_name || 'Unknown'}</span>
+                    <span>[${regionLabels[region] || 'GLB'}] ${region.replace('_', ' ')}</span>
+                    <span style="color: ${riskColors[riskLevel]}">${riskLabels[riskLevel]} ${riskLevel} risk</span>
+                </div>
+                <div class="card-meta" style="margin-top: 8px">
+                    <span title="ROI Score">[ROI] ${roiPercent}%</span>
+                    <span>[DATE] ${this.formatDate(opp.discovered_at)}</span>
+                </div>
+                <p class="card-summary">${this.escapeHtml(opp.raw_text?.slice(0, 200) || '')}</p>
+                <div class="card-footer">
+                    ${deadlineBadge}
+                    <div class="card-actions">
+                        <button class="action-btn secondary">Save</button>
+                        <button class="action-btn primary">Open</button>
+                    </div>
+                </div>
+            </div>
+        `;
+    }
+    getCategoryEmoji(category) {
+        const labels = {
+            scholarship: '[S]',
+            fellowship: '[F]',
+            internship: '[I]',
+            job: '[J]',
+            hackathon: '[H]',
+            competition: '[C]',
+            grant: '[G]',
+            micro_grant: '[MG]',
+            ecosystem_grant: '[EG]',
+            innovation_fund: '[IF]',
+            research: '[R]',
+            open_source: '[OS]',
+            conference: '[CF]',
+            investment: '[IV]',
+            partnership: '[P]',
+            collaboration: '[CO]',
+            pitch_event: '[PE]',
+            demo_day: '[DD]',
+            talent_call: '[TC]',
+            bounty: '[B]',
+            ambassador: '[A]',
+            pre_grant_signal: '[PG]',
+            pre_hiring_signal: '[PH]',
+            weak_signal: '[WS]',
+            other: '[?]'
+        };
+        return labels[category] || '[?]';
+    }
+    async loadDigest(type) {
+        const content = document.getElementById('digest-content');
+        content.innerHTML = '<div class="loading">Generating digest...</div>';
+        try {
+            const response = await fetch(`/api/digest/${type}`);
+            const data = await response.json();
+            // Convert markdown to HTML (simple conversion)
+            content.innerHTML = this.markdownToHtml(data.digest || 'No digest available.');
+        } catch (error) {
+            content.innerHTML = `<p>Error loading digest: ${error.message}</p>`;
+        }
+    }
+    markdownToHtml(md) {
+        return md
+            .replace(/^### (.*$)/gim, '<h3>$1</h3>')
+            .replace(/^## (.*$)/gim, '<h2>$1</h2>')
+            .replace(/^# (.*$)/gim, '<h1>$1</h1>')
+            .replace(/\*\*(.*?)\*\*/g, '<strong>$1</strong>')
+            .replace(/\*(.*?)\*/g, '<em>$1</em>')
+            .replace(/^> (.*$)/gim, '<blockquote>$1</blockquote>')
+            .replace(/\[(.*?)\]\((.*?)\)/g, '<a href="$2" target="_blank">$1</a>')
+            .replace(/^---$/gim, '<hr>')
+            .replace(/\n/g, '<br>');
+    }
+    showOpportunityDetail(opp) {
+        const modal = document.getElementById('detail-modal');
+        const body = document.getElementById('modal-body');
+        const roiPercent = Math.round((opp.roi_score || 0.5) * 100);
+        const riskLevel = opp.risk_level || 'medium';
+        const region = opp.region || 'global';
+        const riskColors = { low: '#10b981', medium: '#f59e0b', high: '#ef4444' };
+        body.innerHTML = `
+            <span class="card-category ${opp.category}" style="margin-bottom: 16px">
+                ${this.getCategoryEmoji(opp.category)} ${(opp.category || 'other').replace('_', ' ')}
+            </span>
+            <h2 style="margin: 16px 0">${this.escapeHtml(opp.title)}</h2>
+            <div class="card-meta" style="margin-bottom: 20px">
+                <span>📡 ${opp.source_name}</span>
+                <span>🌐 ${region.replace('_', ' ')}</span>
+                <span style="color: ${riskColors[riskLevel]}">${riskLevel} risk</span>
+            </div>
+            <div style="display: grid; grid-template-columns: repeat(4, 1fr); gap: 12px; margin-bottom: 24px">
+                <div class="stat-card">
+                    <span class="stat-value">${Math.round((opp.relevance_score || 0) * 100)}%</span>
+                    <span class="stat-label">Relevance</span>
+                </div>
+                <div class="stat-card">
+                    <span class="stat-value">${Math.round((opp.novelty_score || 0) * 100)}%</span>
+                    <span class="stat-label">Novelty</span>
+                </div>
+                <div class="stat-card">
+                    <span class="stat-value">${Math.round((opp.credibility_score || 0) * 100)}%</span>
+                    <span class="stat-label">Credibility</span>
+                </div>
+                <div class="stat-card highlight">
+                    <span class="stat-value">${roiPercent}%</span>
+                    <span class="stat-label">💎 ROI</span>
+                </div>
+            </div>
+            ${opp.deadline ? `<p style="color: var(--warning); margin-bottom: 16px">⏰ Deadline: ${new Date(opp.deadline).toLocaleDateString()}</p>` : ''}
+            <p style="color: var(--text-secondary); line-height: 1.8; margin-bottom: 24px">
+                ${this.escapeHtml(opp.raw_text || 'No description available.')}
+            </p>
+            <!-- Action Guidance Container -->
+            <div id="guidance-container" style="margin-bottom: 24px; padding: 16px; background: rgba(99, 102, 241, 0.1); border-radius: 12px; display: none;">
+                <h3 style="margin-bottom: 12px; color: var(--accent)">🎯 Action Guidance</h3>
+                <div id="guidance-content"></div>
+            </div>
+            <div style="display: flex; flex-wrap: wrap; gap: 12px">
+                <button class="action-btn primary" onclick="app.getGuidance('${opp.id}')" style="padding: 12px 24px; background: linear-gradient(135deg, #8b5cf6, #6366f1)">
+                    🧠 Get Guidance
+                </button>
+                <a href="${opp.url}" target="_blank" class="action-btn primary" style="text-decoration: none; padding: 12px 24px">
+                    🔗 View Original
+                </a>
+                <button class="action-btn secondary" onclick="app.updateStatus('${opp.id}', 'saved')" style="padding: 12px 24px">
+                    💾 Save
+                </button>
+                <button class="action-btn secondary" onclick="app.updateStatus('${opp.id}', 'applied')" style="padding: 12px 24px">
+                    ✅ Mark Applied
+                </button>
+            </div>
+        `;
+        modal.classList.add('active');
+    }
+    async getGuidance(opportunityId) {
+        const container = document.getElementById('guidance-container');
+        const content = document.getElementById('guidance-content');
+        container.style.display = 'block';
+        content.innerHTML = '<p>🔄 Analyzing opportunity...</p>';
+        try {
+            const response = await fetch(`/api/opportunities/${opportunityId}/guidance`);
+            const data = await response.json();
+            const g = data.guidance;
+            content.innerHTML = `
+                <div style="display: grid; gap: 16px">
+                    <div style="display: flex; gap: 16px; flex-wrap: wrap">
+                        <div class="stat-card" style="flex: 1; min-width: 120px">
+                            <span class="stat-value" style="font-size: 14px">${g.primary_action?.replace('_', ' ') || 'Review'}</span>
+                            <span class="stat-label">Action</span>
+                        </div>
+                        <div class="stat-card" style="flex: 1; min-width: 120px">
+                            <span class="stat-value" style="font-size: 14px">${g.urgency || 'whenever'}</span>
+                            <span class="stat-label">Urgency</span>
+                        </div>
+                        <div class="stat-card" style="flex: 1; min-width: 120px">
+                            <span class="stat-value" style="font-size: 14px">${Math.round((g.success_probability || 0.3) * 100)}%</span>
+                            <span class="stat-label">Success Odds</span>
+                        </div>
+                        <div class="stat-card" style="flex: 1; min-width: 120px">
+                            <span class="stat-value" style="font-size: 14px">${g.time_investment_hours || 10}h</span>
+                            <span class="stat-label">Time Needed</span>
+                        </div>
+                    </div>
+                    ${g.skills_to_highlight?.length ? `
+                        <div>
+                            <strong>Skills to Highlight:</strong>
+                            <div style="display: flex; gap: 8px; flex-wrap: wrap; margin-top: 8px">
+                                ${g.skills_to_highlight.map(s => `<span style="background: var(--accent); padding: 4px 12px; border-radius: 20px; font-size: 12px">${s}</span>`).join('')}
+                            </div>
+                        </div>
+                    ` : ''}
+                    ${g.portfolio_pieces?.length ? `
+                        <div>
+                            <strong>Portfolio to Show:</strong>
+                            <div style="display: flex; gap: 8px; flex-wrap: wrap; margin-top: 8px">
+                                ${g.portfolio_pieces.map(p => `<span style="background: var(--success); padding: 4px 12px; border-radius: 20px; font-size: 12px">${p}</span>`).join('')}
+                            </div>
+                        </div>
+                    ` : ''}
+                    ${g.preparation_steps?.length ? `
+                        <div>
+                            <strong>Preparation Steps:</strong>
+                            <ol style="margin-top: 8px; padding-left: 20px">
+                                ${g.preparation_steps.map(s => `<li style="margin-bottom: 4px">${s}</li>`).join('')}
+                            </ol>
+                        </div>
+                    ` : ''}
+                    ${g.networking_tips ? `
+                        <div>
+                            <strong>💡 Networking Tip:</strong>
+                            <p style="margin-top: 4px; color: var(--text-secondary)">${g.networking_tips}</p>
+                        </div>
+                    ` : ''}
+                    ${g.differentiation_angle ? `
+                        <div>
+                            <strong>🎯 Your Angle:</strong>
+                            <p style="margin-top: 4px; color: var(--text-secondary)">${g.differentiation_angle}</p>
+                        </div>
+                    ` : ''}
+                    ${g.red_flags?.length ? `
+                        <div style="background: rgba(239, 68, 68, 0.1); padding: 12px; border-radius: 8px">
+                            <strong style="color: #ef4444">⚠️ Red Flags:</strong>
+                            <ul style="margin-top: 8px; padding-left: 20px">
+                                ${g.red_flags.map(f => `<li style="color: #ef4444">${f}</li>`).join('')}
+                            </ul>
+                        </div>
+                    ` : ''}
+                    <p style="font-style: italic; color: var(--text-secondary); font-size: 12px">
+                        ${g.why || 'Personalized guidance based on your profile'}
+                    </p>
+                </div>
+            `;
+        } catch (error) {
+            content.innerHTML = `<p style="color: var(--error)">Failed to get guidance: ${error.message}</p>`;
+        }
+    }
+    closeModal() {
+        document.getElementById('detail-modal').classList.remove('active');
+    }
+    async updateStatus(id, status) {
+        try {
+            await fetch(`/api/opportunities/${id}/status`, {
+                method: 'PATCH',
+                headers: { 'Content-Type': 'application/json' },
+                body: JSON.stringify({ status })
+            });
+            // Visual feedback
+            this.showNotification(`Status updated to ${status}`);
+        } catch (error) {
+            console.error('Failed to update status:', error);
+        }
+    }
+    async runIngestion() {
+        this.showNotification('Starting ingestion... This may take a few minutes.');
+        try {
+            await fetch('/api/ingest/run', { method: 'POST' });
+            this.showNotification('Ingestion started! Refresh in a few minutes to see new opportunities.');
+        } catch (error) {
+            this.showNotification('Failed to start ingestion: ' + error.message);
+        }
+    }
+    async showStatsModal() {
+        try {
+            const response = await fetch('/api/stats');
+            const stats = await response.json();
+            const body = document.getElementById('modal-body');
+            body.innerHTML = `
+                <h2 style="margin-bottom: 24px">📊 System Statistics</h2>
+                <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 16px; margin-bottom: 24px">
+                    <div class="stat-card highlight">
+                        <span class="stat-value">${stats.total_opportunities || 0}</span>
+                        <span class="stat-label">Total Opportunities</span>
+                    </div>
+                    <div class="stat-card">
+                        <span class="stat-value">${stats.new_opportunities || 0}</span>
+                        <span class="stat-label">New (Unread)</span>
+                    </div>
+                </div>
+                <h3 style="margin: 24px 0 16px">By Category</h3>
+                ${Object.entries(stats.by_category || {}).map(([cat, count]) => `
+                    <div style="display: flex; justify-content: space-between; padding: 8px 0; border-bottom: 1px solid var(--border-color)">
+                        <span>${this.getCategoryEmoji(cat)} ${cat.replace('_', ' ')}</span>
+                        <span style="font-weight: 600">${count}</span>
+                    </div>
+                `).join('')}
+                <h3 style="margin: 24px 0 16px">By Domain</h3>
+                ${Object.entries(stats.by_domain || {}).map(([dom, count]) => `
+                    <div style="display: flex; justify-content: space-between; padding: 8px 0; border-bottom: 1px solid var(--border-color)">
+                        <span>${dom.replace('_', ' ')}</span>
+                        <span style="font-weight: 600">${count}</span>
+                    </div>
+                `).join('')}
+            `;
+            document.getElementById('detail-modal').classList.add('active');
+        } catch (error) {
+            console.error('Failed to load stats:', error);
+        }
+    }
+    showNotification(message) {
+        // Simple notification - could be enhanced with toast UI
+        console.log('PIOE:', message);
+        alert(message);
+    }
+    formatDate(dateStr) {
+        if (!dateStr) return 'Unknown';
+        const date = new Date(dateStr);
+        return date.toLocaleDateString('en-US', { month: 'short', day: 'numeric' });
+    }
+    escapeHtml(text) {
+        if (!text) return '';
+        const div = document.createElement('div');
+        div.textContent = text;
+        return div.innerHTML;
+    }
+}
+// Initialize app
+const app = new PIOEApp();

frontend/index.html ADDED Viewed

	@@ -0,0 +1,162 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>PIOE 2.0 - Personal Advantage Engine</title>
+    <link rel="stylesheet" href="/static/styles.css">
+    <link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&display=swap" rel="stylesheet">
+</head>
+<body>
+    <div class="app">
+        <!-- Sidebar -->
+        <nav class="sidebar">
+            <div class="logo">
+                <span class="logo-icon">[P]</span>
+                <span class="logo-text">PIOE 2.0</span>
+            </div>
+            <div class="nav-section">
+                <span class="nav-label">Dashboard</span>
+                <a href="#" class="nav-item active" data-view="feed">
+                    <span class="icon">[F]</span> Opportunity Feed
+                </a>
+                <a href="#" class="nav-item" data-view="digest">
+                    <span class="icon">[D]</span> Daily Brief
+                </a>
+                <a href="#" class="nav-item" data-view="urgent">
+                    <span class="icon">[!]</span> Urgent
+                </a>
+                <a href="#" class="nav-item" id="open-chat">
+                    <span class="icon">[AI]</span> AI Search
+                </a>
+            </div>
+            <div class="nav-section">
+                <span class="nav-label">Categories</span>
+                <a href="#" class="nav-item" data-category="hackathon">[H] Hackathons</a>
+                <a href="#" class="nav-item" data-category="internship">[I] Internships</a>
+                <a href="#" class="nav-item" data-category="scholarship">[S] Scholarships</a>
+                <a href="#" class="nav-item" data-category="research">[R] Research</a>
+                <a href="#" class="nav-item" data-category="job">[J] Jobs</a>
+                <a href="#" class="nav-item" data-category="grant">[G] Grants</a>
+                <a href="#" class="nav-item" data-category="ecosystem_grant">[E] Ecosystem Grants</a>
+                <a href="#" class="nav-item" data-category="bounty">[B] Bounties</a>
+                <a href="#" class="nav-item" data-category="open_source">[O] Open Source</a>
+            </div>
+            <div class="nav-section">
+                <span class="nav-label">System</span>
+                <a href="#" class="nav-item" id="run-ingestion">
+                    <span class="icon">[>]</span> Run Ingestion
+                </a>
+                <a href="#" class="nav-item" id="view-stats">
+                    <span class="icon">[#]</span> Statistics
+                </a>
+            </div>
+        </nav>
+        <!-- Main Content -->
+        <main class="main-content">
+            <header class="header">
+                <div class="header-title">
+                    <h1 id="page-title">Opportunity Feed</h1>
+                    <p class="subtitle" id="page-subtitle">High-signal opportunities detected by PIOE</p>
+                </div>
+                <div class="header-actions">
+                    <select id="domain-filter" class="filter-select">
+                        <option value="">All Domains</option>
+                        <option value="ai">AI</option>
+                        <option value="computer_vision">Computer Vision</option>
+                        <option value="robotics">Robotics</option>
+                        <option value="finance">Finance</option>
+                        <option value="crypto">Crypto</option>
+                        <option value="academia">Academia</option>
+                    </select>
+                    <select id="score-filter" class="filter-select">
+                        <option value="0">All Scores</option>
+                        <option value="0.5">Score > 0.5</option>
+                        <option value="0.7">Score > 0.7</option>
+                        <option value="0.8">Score > 0.8</option>
+                    </select>
+                </div>
+            </header>
+            <div class="content-area">
+                <!-- Stats Banner -->
+                <div class="stats-banner" id="stats-banner">
+                    <div class="stat-card">
+                        <span class="stat-value" id="total-count">-</span>
+                        <span class="stat-label">Total</span>
+                    </div>
+                    <div class="stat-card">
+                        <span class="stat-value" id="new-count">-</span>
+                        <span class="stat-label">New</span>
+                    </div>
+                    <div class="stat-card highlight">
+                        <span class="stat-value" id="hackathon-count">-</span>
+                        <span class="stat-label">Hackathons</span>
+                    </div>
+                    <div class="stat-card">
+                        <span class="stat-value" id="internship-count">-</span>
+                        <span class="stat-label">Internships</span>
+                    </div>
+                </div>
+                <!-- Opportunity Feed -->
+                <div class="feed" id="opportunity-feed">
+                    <div class="loading">Loading opportunities...</div>
+                </div>
+                <!-- Digest View (Hidden by default) -->
+                <div class="digest-view" id="digest-view" style="display: none;">
+                    <div class="digest-content" id="digest-content"></div>
+                </div>
+            </div>
+        </main>
+    </div>
+    <!-- Opportunity Detail Modal -->
+    <div class="modal" id="detail-modal">
+        <div class="modal-backdrop"></div>
+        <div class="modal-content">
+            <button class="modal-close">&times;</button>
+            <div id="modal-body"></div>
+        </div>
+    </div>
+    <!-- AI Chat Panel -->
+    <div class="chat-panel" id="chat-panel">
+        <div class="chat-header">
+            <span>PIOE AI Search</span>
+            <button class="chat-close" onclick="app.toggleChat()">&times;</button>
+        </div>
+        <div class="chat-messages" id="chat-messages">
+            <div class="chat-message bot">
+                <p>Hi! I'm PIOE AI. Ask me to find opportunities:</p>
+                <ul style="margin: 8px 0; padding-left: 20px; font-size: 12px; opacity: 0.8">
+                    <li>"Find hackathons in Nigeria"</li>
+                    <li>"What grants are available for AI?"</li>
+                    <li>"Show high ROI opportunities"</li>
+                    <li>"Internships in robotics"</li>
+                </ul>
+            </div>
+        </div>
+        <div class="chat-input-area">
+            <input type="text" id="chat-input" placeholder="Ask about opportunities..."
+                onkeypress="if(event.key==='Enter') app.sendChatMessage()">
+            <button onclick="app.sendChatMessage()">Send</button>
+        </div>
+    </div>
+    <!-- Floating Chat Button -->
+    <button class="chat-fab" id="chat-fab" onclick="app.toggleChat()">
+        AI
+    </button>
+    <script src="/static/app.js"></script>
+</body>
+</html>

frontend/styles.css ADDED Viewed

	@@ -0,0 +1,905 @@

+/* PIOE - Personal Intelligence & Opportunity Engine
+   Modern Dark Theme with Glassmorphism */
+:root {
+    /* Color Palette */
+    --bg-primary: #0a0a0f;
+    --bg-secondary: #12121a;
+    --bg-tertiary: #1a1a24;
+    --bg-card: rgba(26, 26, 36, 0.8);
+    --bg-glass: rgba(255, 255, 255, 0.03);
+    --accent-primary: #6366f1;
+    --accent-secondary: #8b5cf6;
+    --accent-gradient: linear-gradient(135deg, #6366f1, #8b5cf6);
+    --text-primary: #ffffff;
+    --text-secondary: #a1a1aa;
+    --text-muted: #71717a;
+    --border-color: rgba(255, 255, 255, 0.08);
+    --border-hover: rgba(255, 255, 255, 0.15);
+    /* Status Colors */
+    --success: #22c55e;
+    --warning: #f59e0b;
+    --danger: #ef4444;
+    --info: #3b82f6;
+    /* Category Colors */
+    --cat-hackathon: #f43f5e;
+    --cat-internship: #3b82f6;
+    --cat-scholarship: #22c55e;
+    --cat-research: #8b5cf6;
+    --cat-job: #f59e0b;
+    --cat-grant: #14b8a6;
+    --cat-opensource: #ec4899;
+    /* Spacing */
+    --sidebar-width: 260px;
+    --header-height: 70px;
+    --radius-sm: 8px;
+    --radius-md: 12px;
+    --radius-lg: 16px;
+}
+* {
+    margin: 0;
+    padding: 0;
+    box-sizing: border-box;
+}
+body {
+    font-family: 'Inter', -apple-system, BlinkMacSystemFont, sans-serif;
+    background: var(--bg-primary);
+    color: var(--text-primary);
+    line-height: 1.6;
+    min-height: 100vh;
+    overflow-x: hidden;
+}
+/* App Layout */
+.app {
+    display: flex;
+    min-height: 100vh;
+}
+/* Sidebar */
+.sidebar {
+    width: var(--sidebar-width);
+    background: var(--bg-secondary);
+    border-right: 1px solid var(--border-color);
+    padding: 24px 16px;
+    position: fixed;
+    height: 100vh;
+    overflow-y: auto;
+    z-index: 100;
+}
+.logo {
+    display: flex;
+    align-items: center;
+    gap: 12px;
+    padding: 8px 12px;
+    margin-bottom: 32px;
+}
+.logo-icon {
+    font-size: 28px;
+    filter: drop-shadow(0 0 8px rgba(99, 102, 241, 0.5));
+}
+.logo-text {
+    font-size: 24px;
+    font-weight: 700;
+    background: var(--accent-gradient);
+    -webkit-background-clip: text;
+    -webkit-text-fill-color: transparent;
+    background-clip: text;
+}
+.nav-section {
+    margin-bottom: 24px;
+}
+.nav-label {
+    display: block;
+    font-size: 11px;
+    font-weight: 600;
+    text-transform: uppercase;
+    letter-spacing: 0.05em;
+    color: var(--text-muted);
+    padding: 0 12px;
+    margin-bottom: 8px;
+}
+.nav-item {
+    display: flex;
+    align-items: center;
+    gap: 10px;
+    padding: 10px 12px;
+    border-radius: var(--radius-sm);
+    color: var(--text-secondary);
+    text-decoration: none;
+    font-size: 14px;
+    font-weight: 500;
+    transition: all 0.2s ease;
+    cursor: pointer;
+}
+.nav-item:hover {
+    background: var(--bg-glass);
+    color: var(--text-primary);
+}
+.nav-item.active {
+    background: var(--accent-gradient);
+    color: white;
+}
+.nav-item .icon {
+    font-size: 16px;
+}
+/* Main Content */
+.main-content {
+    flex: 1;
+    margin-left: var(--sidebar-width);
+    min-height: 100vh;
+    display: flex;
+    flex-direction: column;
+}
+/* Header */
+.header {
+    height: var(--header-height);
+    background: var(--bg-secondary);
+    border-bottom: 1px solid var(--border-color);
+    display: flex;
+    align-items: center;
+    justify-content: space-between;
+    padding: 0 32px;
+    position: sticky;
+    top: 0;
+    z-index: 50;
+    backdrop-filter: blur(12px);
+}
+.header-title h1 {
+    font-size: 20px;
+    font-weight: 600;
+}
+.subtitle {
+    font-size: 13px;
+    color: var(--text-muted);
+}
+.header-actions {
+    display: flex;
+    gap: 12px;
+}
+.filter-select {
+    background: var(--bg-tertiary);
+    border: 1px solid var(--border-color);
+    color: var(--text-primary);
+    padding: 8px 16px;
+    border-radius: var(--radius-sm);
+    font-size: 13px;
+    cursor: pointer;
+    transition: border-color 0.2s;
+}
+.filter-select:hover {
+    border-color: var(--border-hover);
+}
+.filter-select:focus {
+    outline: none;
+    border-color: var(--accent-primary);
+}
+/* Content Area */
+.content-area {
+    flex: 1;
+    padding: 24px 32px;
+    overflow-y: auto;
+}
+/* Stats Banner */
+.stats-banner {
+    display: grid;
+    grid-template-columns: repeat(4, 1fr);
+    gap: 16px;
+    margin-bottom: 24px;
+}
+.stat-card {
+    background: var(--bg-card);
+    border: 1px solid var(--border-color);
+    border-radius: var(--radius-md);
+    padding: 20px;
+    display: flex;
+    flex-direction: column;
+    gap: 4px;
+    backdrop-filter: blur(8px);
+}
+.stat-card.highlight {
+    background: var(--accent-gradient);
+    border: none;
+}
+.stat-value {
+    font-size: 28px;
+    font-weight: 700;
+}
+.stat-label {
+    font-size: 12px;
+    color: var(--text-secondary);
+    text-transform: uppercase;
+    letter-spacing: 0.05em;
+}
+.stat-card.highlight .stat-label {
+    color: rgba(255, 255, 255, 0.8);
+}
+/* Opportunity Feed */
+.feed {
+    display: flex;
+    flex-direction: column;
+    gap: 16px;
+}
+.loading {
+    text-align: center;
+    padding: 60px;
+    color: var(--text-muted);
+}
+/* Opportunity Card */
+.opportunity-card {
+    background: var(--bg-card);
+    border: 1px solid var(--border-color);
+    border-radius: var(--radius-md);
+    padding: 20px;
+    transition: all 0.2s ease;
+    cursor: pointer;
+    backdrop-filter: blur(8px);
+}
+.opportunity-card:hover {
+    border-color: var(--border-hover);
+    transform: translateY(-2px);
+    box-shadow: 0 8px 24px rgba(0, 0, 0, 0.3);
+}
+.card-header {
+    display: flex;
+    align-items: flex-start;
+    justify-content: space-between;
+    margin-bottom: 12px;
+}
+.card-category {
+    display: inline-flex;
+    align-items: center;
+    gap: 6px;
+    padding: 4px 10px;
+    border-radius: 20px;
+    font-size: 11px;
+    font-weight: 600;
+    text-transform: uppercase;
+    letter-spacing: 0.03em;
+}
+.card-category.hackathon {
+    background: rgba(244, 63, 94, 0.2);
+    color: var(--cat-hackathon);
+}
+.card-category.internship {
+    background: rgba(59, 130, 246, 0.2);
+    color: var(--cat-internship);
+}
+.card-category.scholarship {
+    background: rgba(34, 197, 94, 0.2);
+    color: var(--cat-scholarship);
+}
+.card-category.research {
+    background: rgba(139, 92, 246, 0.2);
+    color: var(--cat-research);
+}
+.card-category.job {
+    background: rgba(245, 158, 11, 0.2);
+    color: var(--cat-job);
+}
+.card-category.grant {
+    background: rgba(20, 184, 166, 0.2);
+    color: var(--cat-grant);
+}
+.card-category.open_source {
+    background: rgba(236, 72, 153, 0.2);
+    color: var(--cat-opensource);
+}
+.card-category.other {
+    background: rgba(161, 161, 170, 0.2);
+    color: var(--text-secondary);
+}
+.card-score {
+    display: flex;
+    align-items: center;
+    gap: 4px;
+    font-size: 13px;
+    color: var(--text-secondary);
+}
+.score-bar {
+    width: 60px;
+    height: 6px;
+    background: var(--bg-tertiary);
+    border-radius: 3px;
+    overflow: hidden;
+}
+.score-fill {
+    height: 100%;
+    background: var(--accent-gradient);
+    border-radius: 3px;
+    transition: width 0.3s ease;
+}
+.card-title {
+    font-size: 16px;
+    font-weight: 600;
+    margin-bottom: 8px;
+    line-height: 1.4;
+}
+.card-meta {
+    display: flex;
+    gap: 16px;
+    font-size: 12px;
+    color: var(--text-muted);
+    margin-bottom: 12px;
+}
+.card-meta span {
+    display: flex;
+    align-items: center;
+    gap: 4px;
+}
+.card-summary {
+    font-size: 14px;
+    color: var(--text-secondary);
+    line-height: 1.6;
+    display: -webkit-box;
+    -webkit-line-clamp: 2;
+    -webkit-box-orient: vertical;
+    overflow: hidden;
+}
+.card-footer {
+    display: flex;
+    align-items: center;
+    justify-content: space-between;
+    margin-top: 16px;
+    padding-top: 16px;
+    border-top: 1px solid var(--border-color);
+}
+.deadline-badge {
+    display: inline-flex;
+    align-items: center;
+    gap: 6px;
+    padding: 4px 10px;
+    border-radius: var(--radius-sm);
+    font-size: 12px;
+    font-weight: 500;
+}
+.deadline-badge.urgent {
+    background: rgba(239, 68, 68, 0.2);
+    color: var(--danger);
+}
+.deadline-badge.soon {
+    background: rgba(245, 158, 11, 0.2);
+    color: var(--warning);
+}
+.deadline-badge.ok {
+    background: rgba(34, 197, 94, 0.2);
+    color: var(--success);
+}
+.card-actions {
+    display: flex;
+    gap: 8px;
+}
+.action-btn {
+    padding: 6px 12px;
+    border-radius: var(--radius-sm);
+    font-size: 12px;
+    font-weight: 500;
+    border: none;
+    cursor: pointer;
+    transition: all 0.2s;
+}
+.action-btn.primary {
+    background: var(--accent-gradient);
+    color: white;
+}
+.action-btn.primary:hover {
+    transform: scale(1.05);
+}
+.action-btn.secondary {
+    background: var(--bg-tertiary);
+    color: var(--text-secondary);
+    border: 1px solid var(--border-color);
+}
+.action-btn.secondary:hover {
+    border-color: var(--border-hover);
+    color: var(--text-primary);
+}
+/* Digest View */
+.digest-view {
+    background: var(--bg-card);
+    border: 1px solid var(--border-color);
+    border-radius: var(--radius-md);
+    padding: 32px;
+    backdrop-filter: blur(8px);
+}
+.digest-content {
+    font-size: 14px;
+    line-height: 1.8;
+}
+.digest-content h1 {
+    font-size: 24px;
+    margin-bottom: 16px;
+}
+.digest-content h2 {
+    font-size: 18px;
+    margin: 24px 0 12px;
+}
+.digest-content h3 {
+    font-size: 16px;
+    margin: 20px 0 8px;
+}
+.digest-content p {
+    margin-bottom: 12px;
+    color: var(--text-secondary);
+}
+.digest-content blockquote {
+    border-left: 3px solid var(--accent-primary);
+    padding-left: 16px;
+    color: var(--text-secondary);
+    margin: 12px 0;
+}
+.digest-content a {
+    color: var(--accent-primary);
+}
+.digest-content hr {
+    border: none;
+    border-top: 1px solid var(--border-color);
+    margin: 24px 0;
+}
+.digest-content table {
+    width: 100%;
+    border-collapse: collapse;
+    margin: 16px 0;
+}
+.digest-content th,
+.digest-content td {
+    padding: 8px 12px;
+    border: 1px solid var(--border-color);
+    text-align: left;
+}
+.digest-content th {
+    background: var(--bg-tertiary);
+}
+/* Modal */
+.modal {
+    display: none;
+    position: fixed;
+    top: 0;
+    left: 0;
+    width: 100%;
+    height: 100%;
+    z-index: 1000;
+}
+.modal.active {
+    display: flex;
+    align-items: center;
+    justify-content: center;
+}
+.modal-backdrop {
+    position: absolute;
+    top: 0;
+    left: 0;
+    width: 100%;
+    height: 100%;
+    background: rgba(0, 0, 0, 0.7);
+    backdrop-filter: blur(4px);
+}
+.modal-content {
+    position: relative;
+    background: var(--bg-secondary);
+    border: 1px solid var(--border-color);
+    border-radius: var(--radius-lg);
+    width: 90%;
+    max-width: 700px;
+    max-height: 80vh;
+    overflow-y: auto;
+    padding: 32px;
+    z-index: 1001;
+}
+.modal-close {
+    position: absolute;
+    top: 16px;
+    right: 16px;
+    background: var(--bg-tertiary);
+    border: none;
+    color: var(--text-secondary);
+    width: 32px;
+    height: 32px;
+    border-radius: 50%;
+    font-size: 20px;
+    cursor: pointer;
+    display: flex;
+    align-items: center;
+    justify-content: center;
+    transition: all 0.2s;
+}
+.modal-close:hover {
+    background: var(--danger);
+    color: white;
+}
+/* Scrollbar */
+::-webkit-scrollbar {
+    width: 8px;
+    height: 8px;
+}
+::-webkit-scrollbar-track {
+    background: var(--bg-primary);
+}
+::-webkit-scrollbar-thumb {
+    background: var(--bg-tertiary);
+    border-radius: 4px;
+}
+::-webkit-scrollbar-thumb:hover {
+    background: var(--accent-primary);
+}
+/* Animations */
+@keyframes fadeIn {
+    from {
+        opacity: 0;
+        transform: translateY(10px);
+    }
+    to {
+        opacity: 1;
+        transform: translateY(0);
+    }
+}
+.opportunity-card {
+    animation: fadeIn 0.3s ease forwards;
+}
+.opportunity-card:nth-child(1) {
+    animation-delay: 0.05s;
+}
+.opportunity-card:nth-child(2) {
+    animation-delay: 0.1s;
+}
+.opportunity-card:nth-child(3) {
+    animation-delay: 0.15s;
+}
+.opportunity-card:nth-child(4) {
+    animation-delay: 0.2s;
+}
+.opportunity-card:nth-child(5) {
+    animation-delay: 0.25s;
+}
+/* Responsive */
+@media (max-width: 1024px) {
+    .sidebar {
+        width: 200px;
+    }
+    .main-content {
+        margin-left: 200px;
+    }
+    .stats-banner {
+        grid-template-columns: repeat(2, 1fr);
+    }
+}
+@media (max-width: 768px) {
+    .sidebar {
+        display: none;
+    }
+    .main-content {
+        margin-left: 0;
+    }
+    .header {
+        flex-direction: column;
+        height: auto;
+        padding: 16px;
+        gap: 12px;
+    }
+    .content-area {
+        padding: 16px;
+    }
+    .stats-banner {
+        grid-template-columns: 1fr 1fr;
+    }
+}
+/* PIOE 2.0: New Category Colors */
+.card-category.micro_grant {
+    background: rgba(16, 185, 129, 0.2);
+    color: #10b981;
+}
+.card-category.ecosystem_grant {
+    background: rgba(245, 158, 11, 0.2);
+    color: #f59e0b;
+}
+.card-category.innovation_fund {
+    background: rgba(59, 130, 246, 0.2);
+    color: #3b82f6;
+}
+.card-category.partnership {
+    background: rgba(139, 92, 246, 0.2);
+    color: #8b5cf6;
+}
+.card-category.collaboration {
+    background: rgba(236, 72, 153, 0.2);
+    color: #ec4899;
+}
+.card-category.pitch_event {
+    background: rgba(244, 63, 94, 0.2);
+    color: #f43f5e;
+}
+.card-category.demo_day {
+    background: rgba(99, 102, 241, 0.2);
+    color: #6366f1;
+}
+.card-category.bounty {
+    background: rgba(34, 197, 94, 0.2);
+    color: #22c55e;
+}
+.card-category.ambassador {
+    background: rgba(234, 179, 8, 0.2);
+    color: #eab308;
+}
+.card-category.pre_grant_signal {
+    background: rgba(168, 85, 247, 0.2);
+    color: #a855f7;
+}
+.card-category.pre_hiring_signal {
+    background: rgba(6, 182, 212, 0.2);
+    color: #06b6d4;
+}
+/* PIOE 2.0: Chat Panel */
+.chat-fab {
+    position: fixed;
+    bottom: 24px;
+    right: 24px;
+    width: 60px;
+    height: 60px;
+    border-radius: 50%;
+    background: var(--accent-gradient);
+    border: none;
+    box-shadow: 0 4px 20px rgba(99, 102, 241, 0.4);
+    font-size: 28px;
+    cursor: pointer;
+    z-index: 999;
+    transition: all 0.3s ease;
+}
+.chat-fab:hover {
+    transform: scale(1.1);
+    box-shadow: 0 6px 30px rgba(99, 102, 241, 0.6);
+}
+.chat-panel {
+    position: fixed;
+    bottom: 100px;
+    right: 24px;
+    width: 380px;
+    height: 500px;
+    background: var(--bg-secondary);
+    border: 1px solid var(--border-color);
+    border-radius: var(--radius-lg);
+    display: none;
+    flex-direction: column;
+    z-index: 1000;
+    box-shadow: 0 8px 40px rgba(0, 0, 0, 0.4);
+}
+.chat-panel.active {
+    display: flex;
+}
+.chat-header {
+    display: flex;
+    align-items: center;
+    justify-content: space-between;
+    padding: 16px 20px;
+    background: var(--accent-gradient);
+    border-radius: var(--radius-lg) var(--radius-lg) 0 0;
+    font-weight: 600;
+}
+.chat-close {
+    background: none;
+    border: none;
+    color: white;
+    font-size: 24px;
+    cursor: pointer;
+    opacity: 0.8;
+    transition: opacity 0.2s;
+}
+.chat-close:hover {
+    opacity: 1;
+}
+.chat-messages {
+    flex: 1;
+    overflow-y: auto;
+    padding: 16px;
+    display: flex;
+    flex-direction: column;
+    gap: 12px;
+}
+.chat-message {
+    padding: 12px 16px;
+    border-radius: var(--radius-md);
+    max-width: 90%;
+    animation: fadeIn 0.3s ease;
+}
+.chat-message.user {
+    background: var(--accent-gradient);
+    color: white;
+    align-self: flex-end;
+}
+.chat-message.bot {
+    background: var(--bg-tertiary);
+    color: var(--text-secondary);
+    align-self: flex-start;
+}
+.chat-message p {
+    margin: 0;
+    font-size: 14px;
+    line-height: 1.5;
+}
+.chat-message .opp-link {
+    display: block;
+    background: var(--bg-card);
+    padding: 8px 12px;
+    border-radius: var(--radius-sm);
+    margin-top: 8px;
+    font-size: 12px;
+    color: var(--accent-primary);
+    text-decoration: none;
+    border: 1px solid var(--border-color);
+    transition: border-color 0.2s;
+}
+.chat-message .opp-link:hover {
+    border-color: var(--accent-primary);
+}
+.chat-input-area {
+    display: flex;
+    gap: 8px;
+    padding: 16px;
+    border-top: 1px solid var(--border-color);
+}
+.chat-input-area input {
+    flex: 1;
+    background: var(--bg-tertiary);
+    border: 1px solid var(--border-color);
+    color: var(--text-primary);
+    padding: 12px 16px;
+    border-radius: var(--radius-sm);
+    font-size: 14px;
+}
+.chat-input-area input:focus {
+    outline: none;
+    border-color: var(--accent-primary);
+}
+.chat-input-area button {
+    background: var(--accent-gradient);
+    border: none;
+    color: white;
+    padding: 12px 20px;
+    border-radius: var(--radius-sm);
+    font-weight: 500;
+    cursor: pointer;
+    transition: transform 0.2s;
+}
+.chat-input-area button:hover {
+    transform: scale(1.05);
+}
+@media (max-width: 480px) {
+    .chat-panel {
+        width: calc(100% - 32px);
+        right: 16px;
+        bottom: 90px;
+        height: 60vh;
+    }
+}

render.yaml ADDED Viewed

	@@ -0,0 +1,25 @@

+# render.yaml - Render Blueprint for one-click deploy
+services:
+  - type: web
+    name: pioe
+    runtime: python
+    buildCommand: pip install -r requirements.txt
+    startCommand: uvicorn backend.main:app --host 0.0.0.0 --port $PORT
+    envVars:
+      - key: GEMINI_API_KEY
+        sync: false
+      - key: ADZUNA_APP_ID
+        sync: false
+      - key: ADZUNA_API_KEY
+        sync: false
+      - key: JOOBLE_API_KEY
+        sync: false
+      - key: RAPIDAPI_KEY
+        sync: false
+      - key: GITHUB_TOKEN
+        sync: false
+      - key: DATABASE_URL
+        value: sqlite:///./pioe.db
+      - key: MIN_RELEVANCE_SCORE
+        value: "0.3"
+    healthCheckPath: /api/stats

requirements.txt ADDED Viewed

	@@ -0,0 +1,18 @@

+# PIOE - Personal Intelligence & Opportunity Engine
+fastapi
+uvicorn[standard]
+sqlalchemy
+httpx
+feedparser
+beautifulsoup4
+lxml
+apscheduler
+sentence-transformers
+python-dotenv
+pydantic
+pydantic-settings
+google-generativeai
+praw
+aiofiles
+PyYAML
+numpy