Spaces:

dexakif
/

clienttarget-python

Running

iDevBuddy commited on 4 days ago

Commit

bd28470

0 Parent(s):

feat: Phase 1 — AI Client Acquisition System

Complete finding pipeline with:
- MiniMax M2.7 + LLaMA multi-model AI engine (all FREE on NVIDIA NIM)
- 5-task Trigger.dev pipeline (scheduler → scrape → enrich → profile → digest)
- 7-layer email verification with credit optimization
- Pattern-based email generation (FREE, unlimited)
- Pain signal detection (heuristic + LLM)
- Deterministic 100-point scoring system
- Slack bot (alerts, commands, daily digest)
- Territory management (27 cities, auto-rotation)
- Full observability (trace IDs, checkpoints, LLM logs)

All API integrations:
- NVIDIA NIM (MiniMax M2.7, LLaMA 70B, LLaMA 8B)
- Serper.dev (Google search)
- Hunter.io (email finding)
- Reoon (email verification)
- Supabase (database)
- Slack (notifications)
- Trigger.dev (orchestration)

Total LLM cost: /day

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.env.example +40 -0
.gitignore +35 -0
CONTRIBUTING.md +83 -0
README.md +179 -0
docker-compose.yml +56 -0
docs/setup-guide.md +118 -0
package.json +33 -0
src/discovery/lib/contact-enricher.ts +354 -0
src/discovery/lib/deduplicator.ts +82 -0
src/discovery/lib/email-classifier.ts +210 -0
src/discovery/lib/email-pattern-generator.ts +249 -0
src/discovery/lib/email-verifier.ts +338 -0
src/discovery/lib/icp-filter.ts +133 -0
src/discovery/lib/linkedin-person-finder.ts +205 -0
src/discovery/lib/linkedin-scraper.ts +165 -0
src/discovery/lib/normalizer.ts +145 -0
src/discovery/lib/pain-signal-detector.ts +228 -0
src/discovery/lib/rotation.ts +114 -0
src/discovery/lib/social-finder.ts +202 -0
src/discovery/lib/territory-manager.ts +259 -0
src/discovery/lib/web-scraper.ts +225 -0
src/discovery/providers/hunter.ts +155 -0
src/discovery/providers/reoon.ts +108 -0
src/discovery/providers/serper.ts +108 -0
src/discovery/trigger-tasks/auto-discovery.ts +517 -0
src/discovery/trigger-tasks/manual-discovery.ts +139 -0
src/profiling/python-service/config.py +25 -0
src/profiling/python-service/hallucination_guard.py +137 -0
src/profiling/python-service/main.py +148 -0
src/profiling/python-service/nvidia_client.py +254 -0
src/profiling/python-service/profiler.py +212 -0
src/profiling/python-service/requirements.txt +8 -0
src/profiling/python-service/scorer.py +260 -0
src/profiling/trigger-tasks/profiling-router.ts +158 -0
src/shared/config/env.ts +66 -0
src/shared/llm/grounding.ts +239 -0
src/shared/llm/nvidia-client.ts +307 -0
src/shared/llm/prompts.ts +277 -0
src/shared/observability/tracer.ts +118 -0
src/shared/pipeline/checkpoint.ts +143 -0
src/shared/supabase/client.ts +15 -0
src/shared/supabase/schema.ts +184 -0
src/shared/utils/logger.ts +40 -0
src/shared/utils/rate-limiter.ts +103 -0
src/shared/utils/retry.ts +195 -0
src/slack/slack-commands.ts +249 -0
src/slack/slack-service.ts +273 -0
src/trigger.ts +8 -0
supabase/migrations/001_initial_schema.sql +279 -0
supabase/migrations/002_phase1_enhancements.sql +242 -0

.env.example ADDED Viewed

	@@ -0,0 +1,40 @@

+# ─── LLM (All on NVIDIA NIM — FREE) ───────────────────────────
+NVIDIA_API_KEY=nvapi-your-key-here
+NVIDIA_NIM_BASE_URL=https://integrate.api.nvidia.com/v1
+# ─── Web Research ──────────────────────────────────────────────
+SERPER_API_KEY=your-serper-key
+# ─── Email Finding ─────────────────────────────────────────────
+HUNTER_API_KEY=your-hunter-key
+# ─── Email Verification ────────────────────────────────────────
+REOON_API_KEY=your-reoon-key
+# ─── Supabase ──────────────────────────────────────────────────
+SUPABASE_URL=https://your-project.supabase.co
+SUPABASE_SERVICE_ROLE_KEY=your-service-role-key
+# ─── Slack ─────────────────────────────────────────────────────
+SLACK_BOT_TOKEN=xoxb-your-bot-token
+SLACK_SIGNING_SECRET=your-signing-secret
+SLACK_ALERT_CHANNEL_ID=C0000000000
+SLACK_REVIEW_CHANNEL_ID=C0000000000
+# ─── Trigger.dev ───────────────────────────────────────────────
+TRIGGER_DEV_API_KEY=tr_dev_your-key
+TRIGGER_DEV_PROJECT_ID=your-project-id
+# ─── Python AI Service (create any random string) ─────────────
+PYTHON_AI_SERVICE_URL=http://localhost:8000
+PYTHON_AI_SERVICE_SECRET=create-a-random-16-char-string
+# ─── System Config ─────────────────────────────────────────────
+NODE_ENV=development
+LOG_LEVEL=info
+DAILY_LEAD_QUOTA=10
+QUALITY_SCORE_THRESHOLD=70
+HUMAN_REVIEW_ENABLED=true
+DAILY_EMAIL_LIMIT=50
+DAILY_LINKEDIN_LIMIT=25
+SCHEDULE_START_HOUR_UTC=4

.gitignore ADDED Viewed

	@@ -0,0 +1,35 @@

+# Environment variables (NEVER commit)
+.env
+.env.local
+.env.production
+# Node
+node_modules/
+dist/
+build/
+*.tsbuildinfo
+# Python
+__pycache__/
+*.pyc
+*.pyo
+.venv/
+venv/
+*.egg-info/
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+# OS
+.DS_Store
+Thumbs.db
+# Logs
+*.log
+logs/
+# Trigger.dev
+.trigger/

CONTRIBUTING.md ADDED Viewed

	@@ -0,0 +1,83 @@

+# Contributing to AI Client Acquisition System
+Welcome! This guide will help you get started as a contributor.
+## Getting Started
+1. **Clone the repo**
+   ```bash
+   git clone https://github.com/iDevBuddy/ai-client-acquisition.git
+   cd ai-client-acquisition
+   ```
+2. **Install dependencies**
+   ```bash
+   npm install
+   cd src/profiling/python-service && pip install -r requirements.txt && cd ../../..
+   ```
+3. **Set up environment**
+   ```bash
+   cp .env.example .env
+   # Fill in your API keys — ask @iDevBuddy for access
+   ```
+4. **Set up database**
+   - Create a Supabase project (free)
+   - Run migration files from `supabase/migrations/` in order
+5. **Start development**
+   ```bash
+   # Terminal 1: Trigger.dev tasks
+   npm run trigger:dev
+   # Terminal 2: Python AI service
+   cd src/profiling/python-service && python main.py
+   ```
+## Project Architecture
+```
+Phase 1: FINDING (current)
+  Discovery → Scraping → Pain Detection → Email Finding → AI Profiling → Scoring → Slack
+Phase 2: OUTREACH (upcoming)
+  Email sequences → LinkedIn messaging → Follow-ups → Reply handling
+```
+## Code Conventions
+- **TypeScript** for orchestration, discovery, and integrations
+- **Python** for AI profiling service (FastAPI)
+- **Zod** for runtime validation
+- Use `logger` (pino) for all logging — no `console.log`
+- Every LLM call must have a `traceId`
+- Every external API call must go through `retry.ts`
+## Branch Strategy
+```
+main        → production-ready code
+develop     → integration branch
+feature/*   → new features
+fix/*       → bug fixes
+```
+## Pull Request Process
+1. Create a feature branch: `git checkout -b feature/your-feature`
+2. Make your changes
+3. Test locally (see Testing section)
+4. Push and create a PR against `develop`
+5. Get at least 1 review before merging
+## Security Rules
+⚠️ **NEVER commit API keys or secrets**
+- `.env` is in `.gitignore` — keep it that way
+- Use `.env.example` for templates (no real values)
+- If you accidentally commit a key, rotate it IMMEDIATELY
+## Questions?
+Reach out to @iDevBuddy on GitHub or Slack.

README.md ADDED Viewed

	@@ -0,0 +1,179 @@

+# 🤖 AI Client Acquisition System
+> Enterprise-grade, hyper-intelligent lead discovery, profiling, and scoring pipeline.
+> Built with production AI engineering practices — not n8n-style hype.
+[![Phase](https://img.shields.io/badge/Phase-1%20Finding-blue)]()
+[![Models](https://img.shields.io/badge/AI-MiniMax%20M2.7%20%2B%20LLaMA-green)]()
+[![Cost](https://img.shields.io/badge/LLM%20Cost-%240%2Fday-brightgreen)]()
+[![Trigger.dev](https://img.shields.io/badge/Orchestration-Trigger.dev-purple)]()
+---
+## What This System Does
+Automatically discovers, qualifies, and profiles potential clients for an AI automation agency.
+```
+Every day at 9 AM PKT:
+  1. Pick next territory (city × industry) → 27 cities, auto-rotation
+  2. Search Google for companies → Serper API
+  3. Scrape each website → Playwright (headless)
+  4. Detect pain signals → "no chatbot", "phone booking only", etc.
+  5. Gate 2: Skip if < 2 pain signals
+  6. Find decision-maker emails → Hunter.io + Pattern Generation + SMTP
+  7. Verify emails → 7-layer verification (FREE)
+  8. Find personal LinkedIn + social profiles
+  9. AI profiling → MiniMax M2.7 (chain-of-thought reasoning)
+  10. Deterministic scoring → 100-point scale, zero hallucination
+  11. Alert on Slack → hot leads (85+) instant, daily digest for all
+```
+## Architecture
+```
+┌─────────────────────────────────────────────────────┐
+│ CRON: daily-lead-discovery (4 AM UTC = 9 AM PKT)   │
+│   → Territory Manager → Google Search → Queue       │
+└──────────────────────┬──────────────────────────────┘
+                       │
+                       ▼ (max 3 concurrent)
+┌─────────────────────────────────────────────────────┐
+│ TASK: process-company                               │
+│   → Scrape → Pain Signals → Gate 2                  │
+└──────────────────────┬──────────────────────────────┘
+                       │
+                       ▼
+┌─────────────────────────────────────────────────────┐
+│ TASK: enrich-and-profile                            │
+│   → Hunter → Pattern Gen → SMTP → LinkedIn         │
+│   → Python AI Service → Save → Slack Alert          │
+└─────────────────────────────────────────────────────┘
+```
+## Model Chain (All FREE on NVIDIA NIM)
+| Priority | Model | Parameters | Use Case |
+|----------|-------|-----------|----------|
+| 1st | MiniMax M2.7 | ~100B+ | Profiling, scoring, complex reasoning |
+| 2nd | LLaMA 3.3 70B | 70B | Reliable fallback |
+| 3rd | LLaMA 3.1 8B | 8B | Email classification, simple tasks |
+| 4th | Deterministic | — | Zero hallucination fallback |
+**Single API key. Single endpoint. $0/day.**
+## Scoring System (100 points, fully deterministic)
+```
+Company Fit:      25 pts   (industry + size match)
+AI Readiness:     20 pts   (tech stack + AI jobs)
+Service Match:    20 pts   (pain signals → our services)
+Decision Maker:   20 pts   (verified email + LinkedIn + authority)
+Timing:           15 pts   (growth signals + active website)
+Tiers: hot (85+) | warm (70-84) | nurture (50-69) | archive (<50)
+```
+## Tech Stack
+| Layer | Technology | Purpose |
+|-------|-----------|---------|
+| Orchestration | Trigger.dev | CRON, task chaining, retry, queuing |
+| Database | Supabase (PostgreSQL) | Data storage, config, state |
+| LLM | NVIDIA NIM (MiniMax + LLaMA) | AI profiling & analysis |
+| Web Scraping | Playwright | Headless browser |
+| Email | Hunter.io + SMTP | Finding & verification |
+| Notifications | Slack Bot | Alerts, commands, digest |
+| AI Service | Python FastAPI | Profiling, scoring, hallucination guard |
+| Language | TypeScript + Python | Core logic |
+## Project Structure
+```
+src/
+├── discovery/                    # Phase 1: Finding pipeline
+│   ├── lib/                      # Core logic
+│   │   ├── contact-enricher.ts   # 6-step email pipeline
+│   │   ├── email-classifier.ts   # Tier 1/2/3 classification
+│   │   ├── email-verifier.ts     # 7-layer verification
+│   │   ├── email-pattern-generator.ts  # FREE Snov replacement
+│   │   ├── linkedin-person-finder.ts   # Personal LinkedIn
+│   │   ├── social-finder.ts      # Instagram, Facebook, Twitter
+│   │   ├── pain-signal-detector.ts     # Heuristic + LLM
+│   │   ├── territory-manager.ts  # City×industry grid
+│   │   └── web-scraper.ts        # Playwright scraper
+│   ├── providers/                # External APIs
+│   │   ├── hunter.ts             # Hunter.io integration
+│   │   ├── serper.ts             # Google search
+│   │   └── reoon.ts              # Email verification
+│   └── trigger-tasks/            # Trigger.dev tasks
+│       ├── auto-discovery.ts     # 5 chained tasks
+│       └── manual-discovery.ts   # Slack-triggered runs
+├── profiling/                    # AI profiling service
+│   └── python-service/           # FastAPI
+│       ├── main.py               # /profile endpoint
+│       ├── profiler.py           # Chain-of-thought profiling
+│       ├── scorer.py             # Signal extraction + deterministic math
+│       ├── hallucination_guard.py # Evidence-based cross-check
+│       ├── nvidia_client.py      # Multi-model LLM client
+│       └── config.py             # Settings
+├── shared/                       # Shared utilities
+│   ├── config/env.ts             # Environment validation (Zod)
+│   ├── llm/nvidia-client.ts      # Multi-model LLM (MiniMax primary)
+│   ├── llm/prompts.ts            # Production prompts
+│   ├── llm/grounding.ts          # Evidence-based verification
+│   ├── observability/tracer.ts   # Trace IDs + token tracking
+│   ├── pipeline/checkpoint.ts    # Crash recovery
+│   ├── supabase/client.ts        # DB client
+│   └── utils/                    # Retry, rate limiter, logger
+└── slack/                        # Slack integration
+    ├── slack-service.ts          # 3-layer delivery
+    └── slack-commands.ts         # /discover, /leads, /status, etc.
+```
+## Quick Start
+See [Setup Guide](docs/setup-guide.md) for detailed instructions.
+```bash
+# 1. Clone
+git clone https://github.com/iDevBuddy/ai-client-acquisition.git
+cd ai-client-acquisition
+# 2. Install
+npm install
+cd src/profiling/python-service && pip install -r requirements.txt && cd ../../..
+# 3. Configure
+cp .env.example .env
+# Fill in your API keys (see docs/setup-guide.md)
+# 4. Database
+# Run supabase/migrations/*.sql on your Supabase project
+# 5. Run
+npm run trigger:dev          # Start Trigger.dev (task orchestration)
+cd src/profiling/python-service && python main.py  # Start AI service
+```
+## API Keys Required
+| Service | Cost | What It Does |
+|---------|------|-------------|
+| NVIDIA NIM | FREE | AI models (MiniMax + LLaMA) |
+| Serper.dev | FREE (2500/mo) | Google search |
+| Hunter.io | FREE (25/mo) | Email finding |
+| Reoon | FREE (20/day) | Email verification |
+| Supabase | FREE | Database |
+| Slack | FREE | Notifications |
+| Trigger.dev | FREE (50K runs/mo) | Job orchestration |
+**Total cost: $0/month**
+## Contributing
+See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
+## License
+Private — All rights reserved.

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,56 @@

+version: "3.9"
+services:
+  # ─── Node.js Orchestration Service ──────────────────────────
+  node-service:
+    build:
+      context: .
+      dockerfile: Dockerfile.node
+    ports:
+      - "3000:3000"
+    environment:
+      - NODE_ENV=development
+    env_file:
+      - .env
+    depends_on:
+      - python-service
+      - redis
+    restart: unless-stopped
+  # ─── Python AI Profiling Service ────────────────────────────
+  python-service:
+    build:
+      context: ./src/profiling/python-service
+      dockerfile: Dockerfile.python
+    ports:
+      - "8000:8000"
+    env_file:
+      - .env
+    volumes:
+      - ./src/profiling/python-service:/app
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+  # ─── Redis (queue + cache) ───────────────────────────────────
+  redis:
+    image: redis:7-alpine
+    ports:
+      - "6379:6379"
+    restart: unless-stopped
+  # ─── Ollama (local LLM) ──────────────────────────────────────
+  # Comment out if running Ollama natively on host
+  # ollama:
+  #   image: ollama/ollama:latest
+  #   ports:
+  #     - "11434:11434"
+  #   volumes:
+  #     - ollama_data:/root/.ollama
+  #   restart: unless-stopped
+volumes:
+  ollama_data:

docs/setup-guide.md ADDED Viewed

	@@ -0,0 +1,118 @@

+# Setup Guide
+Complete step-by-step guide to get the system running.
+## Prerequisites
+- **Node.js** 18+ (recommended: 20 LTS)
+- **Python** 3.11+
+- **npm** 9+
+- **Git**
+## Step 1: API Keys
+Create accounts and get keys from these services (all FREE):
+### 1.1 NVIDIA NIM (LLM — MiniMax + LLaMA)
+1. Go to https://build.nvidia.com
+2. Sign up / login
+3. Click any model → "Get API Key"
+4. Copy key (starts with `nvapi-`)
+5. Free: 1000+ requests/day
+### 1.2 Serper.dev (Google Search)
+1. Go to https://serper.dev
+2. Sign up with Google
+3. Dashboard → copy API key
+4. Free: 2,500 searches/month
+### 1.3 Hunter.io (Email Finding)
+1. Go to https://hunter.io
+2. Sign up → Dashboard → API
+3. Copy API key
+4. Free: 25 searches/month
+### 1.4 Reoon (Email Verification)
+1. Go to https://emailverifier.reoon.com
+2. Sign up → Dashboard → API
+3. Copy API key
+4. Free: 20 verifications/day
+5. NOTE: System optimizes usage (SMTP probe first, Reoon fallback)
+### 1.5 Supabase (Database)
+1. Go to https://supabase.com
+2. Create project
+3. Project Settings → API
+4. Copy **Project URL** and **service_role key** (not anon key!)
+5. Free: 500MB database
+### 1.6 Slack Bot
+1. Go to https://api.slack.com/apps → Create New App
+2. Name: "Lead Finder"
+3. OAuth & Permissions → Add scopes: `chat:write`, `commands`, `channels:read`
+4. Install to Workspace → copy Bot Token (`xoxb-...`)
+5. Basic Information → copy Signing Secret
+6. Create 2 channels: `#leads` and `#review`
+7. Get channel IDs: right-click channel → View details → copy ID
+### 1.7 Trigger.dev (Job Orchestration)
+1. Go to https://trigger.dev → Sign up
+2. Create project
+3. Dashboard → API Keys → copy
+4. Project ID from URL: `trigger.dev/orgs/.../projects/[PROJECT_ID]`
+5. Free: 50,000 runs/month
+## Step 2: Environment Setup
+```bash
+cp .env.example .env
+```
+Edit `.env` and fill in all keys from Step 1.
+## Step 3: Database Migration
+Option A — Supabase Dashboard:
+1. Open Supabase → SQL Editor
+2. Paste contents of `supabase/migrations/001_initial_schema.sql` → Run
+3. Paste contents of `supabase/migrations/002_phase1_enhancements.sql` → Run
+Option B — Supabase CLI:
+```bash
+npx supabase migration up
+```
+## Step 4: Install & Run
+```bash
+# Install Node.js dependencies
+npm install
+# Install Python dependencies
+cd src/profiling/python-service
+pip install -r requirements.txt
+cd ../../..
+# Terminal 1: Start Trigger.dev
+npm run trigger:dev
+# Terminal 2: Start Python AI service
+cd src/profiling/python-service
+python main.py
+```
+## Step 5: Verify
+The system runs automatically at 9 AM PKT daily. To test manually:
+- Use Slack `/discover` command
+- Or trigger from Trigger.dev dashboard
+## Troubleshooting
+| Issue | Solution |
+|-------|---------|
+| `NVIDIA_API_KEY` error | Check key starts with `nvapi-` |
+| MiniMax 429 rate limit | System auto-retries after wait |
+| Hunter returns empty | Free tier: 25/month limit reached |
+| SMTP verification fails | Some mail servers block port 25 |
+| Supabase connection error | Check `SUPABASE_URL` has `https://` |

package.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "name": "ai-client-acquisition-system",
+  "version": "1.0.0",
+  "description": "Enterprise-grade AI Client Acquisition System — Quality-first lead pipeline",
+  "main": "dist/index.js",
+  "scripts": {
+    "build": "tsc",
+    "dev": "ts-node-dev --respawn --transpile-only src/index.ts",
+    "trigger:dev": "npx trigger.dev@latest dev",
+    "typecheck": "tsc --noEmit",
+    "lint": "eslint . --ext .ts"
+  },
+  "dependencies": {
+    "@supabase/supabase-js": "^2.43.0",
+    "@trigger.dev/sdk": "^3.0.0",
+    "playwright": "^1.44.0",
+    "zod": "^3.23.0",
+    "axios": "^1.7.0",
+    "dotenv": "^16.4.0",
+    "pino": "^9.2.0",
+    "pino-pretty": "^11.2.0",
+    "fastest-levenshtein": "^1.0.16",
+    "p-limit": "^5.0.0",
+    "p-retry": "^6.2.0"
+  },
+  "devDependencies": {
+    "@types/node": "^20.0.0",
+    "typescript": "^5.4.0",
+    "ts-node-dev": "^2.0.0",
+    "eslint": "^9.0.0",
+    "@typescript-eslint/parser": "^7.0.0"
+  }
+}

src/discovery/lib/contact-enricher.ts ADDED Viewed

	@@ -0,0 +1,354 @@

+/**
+ * Contact Enricher v2 — Full Pipeline
+ *
+ * Step 1: Find emails (Hunter.io + Pattern Generator + SMTP verify)
+ *         Snov.io REMOVED — replaced by FREE email pattern generation
+ * Step 2: Classify emails (Tier 1/2/3)
+ * Step 3: Verify emails (7-layer deep)
+ * Step 4: Find personal LinkedIn
+ * Step 5: Find social profiles
+ * Step 6: Filter for decision-makers only
+ *
+ * Output: Verified, classified contacts ready for Phase 2
+ */
+import { searchHunterContacts, type HunterContact } from "../providers/hunter";
+import { generateAndVerifyEmails, findEmailForPerson } from "./email-pattern-generator";
+import { classifyEmail, type ClassificationResult } from "./email-classifier";
+import { verifyEmailDeep, type VerificationResult } from "./email-verifier";
+import { findPersonalLinkedIn, type PersonalLinkedIn } from "./linkedin-person-finder";
+import { findSocialProfiles, type SocialProfiles } from "./social-finder";
+import { getSupabaseClient } from "../../shared/supabase/client";
+import { logger } from "../../shared/utils/logger";
+import { randomUUID } from "crypto";
+export interface EnrichedContact {
+  id: string;
+  companyId: string;
+  fullName: string;
+  title: string | null;
+  seniority: string | null;
+  // Email intelligence
+  email: string | null;
+  emailTier: string;                       // 'personal' | 'authority' | 'context_verified' | 'rejected'
+  emailVerification: VerificationResult | null;
+  emailClassification: ClassificationResult | null;
+  // LinkedIn (both company and personal)
+  linkedinPersonalUrl: string | null;
+  linkedinPersonalConfidence: number;
+  // Social
+  socialProfiles: SocialProfiles | null;
+  // Authority
+  authorityConfirmed: boolean;
+  authorityReason: string;
+  // Source tracking
+  source: "hunter" | "pattern" | "combined";
+  providerConfidence: number;
+}
+/**
+ * Full contact enrichment pipeline for a company.
+ */
+export async function enrichContacts(
+  companyId: string,
+  domain: string,
+  companyName: string,
+  employeeCount: number | null,
+  industry: string,
+  websiteSnippet: string,
+  websiteHtml: string,
+  companyLinkedInUrl: string | null,
+  traceId: string
+): Promise<EnrichedContact[]> {
+  logger.info({ domain, companyName }, "Starting contact enrichment pipeline");
+  // ── Step 1: Find emails from all providers ─────────────────
+  const rawContacts = await findAllContacts(domain);
+  if (rawContacts.length === 0) {
+    logger.info({ domain }, "No contacts found from any provider");
+    return [];
+  }
+  logger.info({ domain, found: rawContacts.length }, "Raw contacts from providers");
+  // ── Step 2-6: Process each contact ─────────────────────────
+  const enriched: EnrichedContact[] = [];
+  for (const raw of rawContacts) {
+    if (!raw.email) continue;
+    // Step 2: Classify email (Tier 1/2/3)
+    const classification = await classifyEmail(
+      raw.email,
+      { name: companyName, employeeCount, industry, websiteSnippet },
+      traceId
+    );
+    // Rejected by classifier → skip entirely
+    if (classification.verdict === "rejected") {
+      logger.debug({ email: raw.email, reason: classification.reason }, "Email rejected by classifier");
+      continue;
+    }
+    // Step 3: Deep verification (7 layers)
+    const verification = await verifyEmailDeep(
+      raw.email,
+      domain,
+      raw.confidence
+    );
+    // Hard invalid → skip
+    if (verification.status === "rejected_invalid") {
+      logger.debug({ email: raw.email }, "Email rejected by 7-layer verifier");
+      continue;
+    }
+    // Step 4: Find personal LinkedIn
+    let linkedin: PersonalLinkedIn | null = null;
+    if (raw.fullName && raw.fullName.length > 3) {
+      linkedin = await findPersonalLinkedIn(
+        raw.fullName,
+        companyName,
+        domain,
+        companyLinkedInUrl
+      );
+    }
+    // Step 5: Social profiles (once per company, not per contact)
+    // Social will be fetched separately at company level
+    // Step 6: Authority check
+    const { confirmed, reason } = checkAuthority(raw, classification);
+    const contact: EnrichedContact = {
+      id: randomUUID(),
+      companyId,
+      fullName: raw.fullName,
+      title: raw.title,
+      seniority: raw.seniority,
+      email: raw.email,
+      emailTier: classification.verdict,
+      emailVerification: verification,
+      emailClassification: classification,
+      linkedinPersonalUrl: linkedin?.url ?? null,
+      linkedinPersonalConfidence: linkedin?.confidence ?? 0,
+      socialProfiles: null,  // set at company level
+      authorityConfirmed: confirmed,
+      authorityReason: reason,
+      source: raw.source,
+      providerConfidence: raw.confidence,
+    };
+    enriched.push(contact);
+  }
+  // Sort: authority-confirmed first, then by verification confidence
+  enriched.sort((a, b) => {
+    if (a.authorityConfirmed !== b.authorityConfirmed) return a.authorityConfirmed ? -1 : 1;
+    return (b.emailVerification?.overallConfidence ?? 0) - (a.emailVerification?.overallConfidence ?? 0);
+  });
+  // Step 5: Social profiles for company (once)
+  if (enriched.length > 0) {
+    const social = await findSocialProfiles(domain, companyName, websiteHtml);
+    for (const c of enriched) {
+      c.socialProfiles = social;
+    }
+  }
+  logger.info({
+    domain,
+    rawFound: rawContacts.length,
+    afterClassification: enriched.length,
+    authorityConfirmed: enriched.filter(c => c.authorityConfirmed).length,
+    withLinkedIn: enriched.filter(c => c.linkedinPersonalUrl).length,
+  }, "Contact enrichment pipeline complete");
+  // Save to database
+  await saveContacts(enriched);
+  return enriched;
+}
+// ─── Find contacts from all providers ─────────────────────────
+// Strategy: Hunter.io (free 25/mo) for names+titles+emails
+//           Pattern Generator (FREE, unlimited) to find more emails
+//           Snov.io REMOVED — replaced by pattern generation
+interface RawContact {
+  fullName: string;
+  email: string;
+  title: string | null;
+  seniority: string | null;
+  confidence: number;
+  source: "hunter" | "pattern";
+}
+async function findAllContacts(domain: string): Promise<RawContact[]> {
+  const contacts: RawContact[] = [];
+  const seenEmails = new Set<string>();
+  const namesFromHunter: { firstName: string; lastName: string; title: string | null; seniority: string | null }[] = [];
+  // ── Source 1: Hunter.io (25 free/month) ─────────────────────
+  // Hunter gives us NAMES + TITLES + EMAILS
+  try {
+    const hunterResults = await searchHunterContacts(domain);
+    for (const h of hunterResults) {
+      const email = h.value?.toLowerCase();
+      const firstName = h.first_name ?? "";
+      const lastName = h.last_name ?? "";
+      const fullName = `${firstName} ${lastName}`.trim();
+      // Save name for pattern generation later
+      if (firstName && lastName) {
+        namesFromHunter.push({
+          firstName,
+          lastName,
+          title: h.position ?? null,
+          seniority: h.seniority ?? null,
+        });
+      }
+      if (email && !seenEmails.has(email)) {
+        seenEmails.add(email);
+        contacts.push({
+          fullName,
+          email,
+          title: h.position ?? null,
+          seniority: h.seniority ?? null,
+          confidence: h.confidence ?? 0,
+          source: "hunter",
+        });
+      }
+    }
+  } catch (err) {
+    logger.warn({ domain, err }, "Hunter search failed — falling back to pattern generation");
+  }
+  // ── Source 2: Pattern Generator (FREE, UNLIMITED) ──────────
+  // For names we got from Hunter that DON'T have emails,
+  // OR if Hunter returned no results at all
+  for (const person of namesFromHunter) {
+    // Check if we already have an email for this person
+    const hasEmail = contacts.some(c =>
+      c.fullName.toLowerCase().includes(person.firstName.toLowerCase()) &&
+      c.fullName.toLowerCase().includes(person.lastName.toLowerCase())
+    );
+    if (!hasEmail) {
+      // Generate email patterns and SMTP verify (FREE)
+      const generated = await findEmailForPerson(
+        `${person.firstName} ${person.lastName}`,
+        domain
+      );
+      if (generated && generated.smtpStatus === "deliverable" && !seenEmails.has(generated.email)) {
+        seenEmails.add(generated.email);
+        contacts.push({
+          fullName: `${person.firstName} ${person.lastName}`,
+          email: generated.email,
+          title: person.title,
+          seniority: person.seniority,
+          confidence: generated.confidence * 100,
+          source: "pattern",
+        });
+      }
+    }
+  }
+  // ── Source 3: If still no contacts, try common owner patterns ─
+  if (contacts.length === 0) {
+    // Try generic owner/manager patterns
+    const ownerPatterns = ["info", "contact", "hello", "admin"];
+    for (const prefix of ownerPatterns) {
+      const email = `${prefix}@${domain}`;
+      if (!seenEmails.has(email)) {
+        seenEmails.add(email);
+        contacts.push({
+          fullName: "Unknown",
+          email,
+          title: null,
+          seniority: null,
+          confidence: 20,
+          source: "pattern",
+        });
+      }
+    }
+  }
+  logger.info({
+    domain,
+    hunterContacts: contacts.filter(c => c.source === "hunter").length,
+    patternContacts: contacts.filter(c => c.source === "pattern").length,
+    total: contacts.length,
+  }, "Contact finding complete (Hunter + Pattern Generator)");
+  return contacts;
+}
+// ─── Authority check ─────────────────────────────────────────
+function checkAuthority(
+  contact: RawContact,
+  classification: ClassificationResult
+): { confirmed: boolean; reason: string } {
+  // Personal email with senior title → confirmed
+  const seniorTitles = /\b(ceo|cto|coo|cfo|cmo|founder|co-founder|owner|partner|director|vp|vice\s*president|president|head|principal|managing|general\s*manager)\b/i;
+  if (classification.verdict === "personal" && contact.title && seniorTitles.test(contact.title)) {
+    return { confirmed: true, reason: `Personal email + senior title: ${contact.title}` };
+  }
+  if (classification.verdict === "authority") {
+    return { confirmed: true, reason: `Authority email prefix: ${contact.email.split("@")[0]}` };
+  }
+  if (classification.verdict === "personal") {
+    return { confirmed: true, reason: "Personal email format — likely individual decision maker" };
+  }
+  if (classification.verdict === "context_verified" && classification.confidence >= 0.7) {
+    return { confirmed: true, reason: classification.reason };
+  }
+  if (classification.verdict === "outsourcing") {
+    return { confirmed: false, reason: "Outsourcing/vendor email — may reach procurement, not decision maker" };
+  }
+  return { confirmed: false, reason: "Authority not confirmed" };
+}
+// ─── Save to database ────────────────────────────────────────
+async function saveContacts(contacts: EnrichedContact[]): Promise<void> {
+  const db = getSupabaseClient();
+  for (const c of contacts) {
+    try {
+      await db.from("contacts").upsert({
+        id: c.id,
+        company_id: c.companyId,
+        full_name: c.fullName,
+        title: c.title,
+        seniority: c.seniority,
+        email: c.email,
+        email_verified: c.emailVerification?.status === "verified_deliverable",
+        email_tier: c.emailTier,
+        email_verification_layers: c.emailVerification?.layers ?? {},
+        linkedin_personal_url: c.linkedinPersonalUrl,
+        social_profiles: c.socialProfiles ?? {},
+        authority_confirmed: c.authorityConfirmed,
+        confidence: c.emailVerification?.overallConfidence ?? c.providerConfidence,
+        source: c.source,
+      }, { onConflict: "company_id,email" });
+    } catch (err) {
+      logger.warn({ email: c.email, err }, "Contact save failed — continuing");
+    }
+  }
+}

src/discovery/lib/deduplicator.ts ADDED Viewed

	@@ -0,0 +1,82 @@

+import { distance } from "fastest-levenshtein";
+import { getSupabaseClient } from "../../shared/supabase/client";
+import { logger } from "../../shared/utils/logger";
+/**
+ * Checks if a company already exists in Supabase.
+ * Uses exact domain match first, then fuzzy name match as fallback.
+ * Returns the existing company ID if duplicate, null if new.
+ */
+export async function isDuplicate(
+  domain: string,
+  name: string
+): Promise<{ isDupe: boolean; existingId?: string }> {
+  const db = getSupabaseClient();
+  // ── 1. Exact domain match (fastest) ─────────────────────────
+  const { data: byDomain } = await db
+    .from("companies")
+    .select("id, domain, name")
+    .eq("domain", normalizeDomain(domain))
+    .maybeSingle();
+  if (byDomain) {
+    logger.debug({ domain, existingId: byDomain.id }, "Duplicate: exact domain match");
+    return { isDupe: true, existingId: byDomain.id };
+  }
+  // ── 2. Fuzzy name match against recent records ────────────────
+  const { data: recent } = await db
+    .from("companies")
+    .select("id, name")
+    .order("discovered_at", { ascending: false })
+    .limit(500);
+  if (!recent) return { isDupe: false };
+  const normalizedInput = normalizeName(name);
+  for (const existing of recent) {
+    const normalizedExisting = normalizeName(existing.name);
+    const dist = distance(normalizedInput, normalizedExisting);
+    const maxLen = Math.max(normalizedInput.length, normalizedExisting.length);
+    const similarity = 1 - dist / maxLen;
+    if (similarity >= 0.88) {
+      logger.debug(
+        { input: name, existing: existing.name, similarity: similarity.toFixed(2) },
+        "Duplicate: fuzzy name match"
+      );
+      return { isDupe: true, existingId: existing.id };
+    }
+  }
+  return { isDupe: false };
+}
+/**
+ * Checks suppression list before any processing.
+ */
+export async function isSuppressed(domain: string): Promise<boolean> {
+  const db = getSupabaseClient();
+  const { data } = await db
+    .from("suppression_list")
+    .select("id")
+    .eq("domain", domain)
+    .maybeSingle();
+  return !!data;
+}
+// ─── Helpers ─────────────────────────────────────────────────
+function normalizeDomain(domain: string): string {
+  return domain.toLowerCase().replace(/^www\./, "").replace(/\/$/, "").trim();
+}
+function normalizeName(name: string): string {
+  return name
+    .toLowerCase()
+    .replace(/\b(inc|ltd|llc|corp|co|limited|plc|gmbh|pty|pvt|srl|bv|ag|sa)\b\.?/gi, "")
+    .replace(/[^a-z0-9\s]/g, "")
+    .trim();
+}

src/discovery/lib/email-classifier.ts ADDED Viewed

	@@ -0,0 +1,210 @@

+/**
+ * Email Classifier — 3-Tier Decision System
+ *
+ * Tier 1: Hard REJECT (noreply, support, jobs → instant discard)
+ * Tier 2: LLM Context Check (operations, admin, system → depends on company size/industry)
+ * Tier 3: High confidence KEEP (personal format, ceo@, partnerships@)
+ *
+ * Key insight: admin@ at a 5-person dental clinic reaches the owner.
+ * admin@ at a 500-person corp reaches an assistant. Context matters.
+ */
+import { callLLM } from "../../shared/llm/nvidia-client";
+import { SYSTEM_PROMPTS, buildEmailClassifyPrompt } from "../../shared/llm/prompts";
+import { MODELS } from "../../shared/llm/nvidia-client";
+import { logger } from "../../shared/utils/logger";
+export type EmailTier = "reject" | "context_check" | "keep";
+export type EmailVerdict = "personal" | "authority" | "context_verified" | "outsourcing" | "rejected";
+export interface ClassificationResult {
+  email: string;
+  tier: EmailTier;
+  verdict: EmailVerdict;
+  confidence: number;
+  likelyReaches: string;
+  reason: string;
+}
+// ─── Tier 1: ALWAYS REJECT ──────────────────────────────────
+const HARD_REJECT_PREFIXES = new Set([
+  // Automated / system
+  "noreply", "no-reply", "no_reply", "donotreply", "do-not-reply",
+  "notifications", "automated", "bounces", "mailer",
+  "postmaster", "unsubscribe", "spam", "abuse",
+  // Support (never reaches decision-maker)
+  "support", "helpdesk", "tickets", "complaints", "feedback",
+  // Jobs (irrelevant)
+  "jobs", "careers", "apply", "recruitment", "hiring", "talent",
+]);
+// ─── Tier 2: CONTEXT-DEPENDENT (LLM decides) ────────────────
+const CONTEXT_CHECK_PREFIXES = new Set([
+  "operations", "admin", "system", "info", "office",
+  "hello", "contact", "enquiries", "general", "team",
+  "accounts", "finance", "billing", "sales", "marketing",
+  "hr", "legal", "compliance", "reception", "manager",
+]);
+// ─── Tier 3: HIGH CONFIDENCE KEEP ───────────────────────────
+const AUTHORITY_PREFIXES = new Set([
+  "ceo", "owner", "founder", "president", "cto", "coo",
+  "partner", "principal", "director", "md", "gm", "head",
+]);
+const OUTSOURCING_PREFIXES = new Set([
+  "partnerships", "vendors", "procurement", "outsource",
+  "collaborate", "projects", "business", "growth",
+]);
+// ─── Personal email pattern (firstname, firstname.lastname) ─
+const PERSONAL_PATTERN = /^[a-z]{2,}(\.[a-z]{2,})?$/;
+const INITIAL_PATTERN = /^[a-z]\.[a-z]{2,}$/;  // j.smith
+/**
+ * Main classifier — determines if email is worth pursuing.
+ */
+export async function classifyEmail(
+  email: string,
+  companyContext: {
+    name: string;
+    employeeCount: number | null;
+    industry: string;
+    websiteSnippet: string;
+  },
+  traceId: string
+): Promise<ClassificationResult> {
+  const prefix = email.split("@")[0].toLowerCase().replace(/[^a-z]/g, "");
+  const fullPrefix = email.split("@")[0].toLowerCase();
+  // ── Tier 1: Hard reject ────────────────────────────────────
+  if (HARD_REJECT_PREFIXES.has(prefix)) {
+    return {
+      email,
+      tier: "reject",
+      verdict: "rejected",
+      confidence: 1.0,
+      likelyReaches: "automated inbox or department queue",
+      reason: `"${fullPrefix}@" is a known non-personal email type`,
+    };
+  }
+  // ── Tier 3: Personal format → instant keep ─────────────────
+  if (PERSONAL_PATTERN.test(fullPrefix) || INITIAL_PATTERN.test(fullPrefix)) {
+    return {
+      email,
+      tier: "keep",
+      verdict: "personal",
+      confidence: 0.95,
+      likelyReaches: "individual person (personal email format)",
+      reason: `"${fullPrefix}@" matches personal email pattern`,
+    };
+  }
+  // ── Tier 3: Authority prefix → instant keep ────────────────
+  if (AUTHORITY_PREFIXES.has(prefix)) {
+    return {
+      email,
+      tier: "keep",
+      verdict: "authority",
+      confidence: 0.90,
+      likelyReaches: `${prefix.toUpperCase()} or equivalent executive`,
+      reason: `"${fullPrefix}@" is a known decision-maker prefix`,
+    };
+  }
+  // ── Tier 3: Outsourcing signal → keep ──────────────────────
+  if (OUTSOURCING_PREFIXES.has(prefix)) {
+    return {
+      email,
+      tier: "keep",
+      verdict: "outsourcing",
+      confidence: 0.80,
+      likelyReaches: "vendor/partnership manager (purchasing authority likely)",
+      reason: `"${fullPrefix}@" signals company outsources services`,
+    };
+  }
+  // ── Tier 2: Context check needed → ask LLM ────────────────
+  if (CONTEXT_CHECK_PREFIXES.has(prefix)) {
+    return contextCheckWithLLM(email, companyContext, traceId);
+  }
+  // ── Unknown prefix → default to LLM context check ─────��───
+  return contextCheckWithLLM(email, companyContext, traceId);
+}
+/**
+ * LLM-powered context check for ambiguous email prefixes.
+ * Uses FAST model (8B) to save tokens — this is a simple classification.
+ */
+async function contextCheckWithLLM(
+  email: string,
+  context: {
+    name: string;
+    employeeCount: number | null;
+    industry: string;
+    websiteSnippet: string;
+  },
+  traceId: string
+): Promise<ClassificationResult> {
+  try {
+    const response = await callLLM({
+      operation: "email_classify",
+      model: MODELS.FAST,    // 8B model — fast + cheap for simple classification
+      systemPrompt: SYSTEM_PROMPTS.EMAIL_CLASSIFIER,
+      userPrompt: buildEmailClassifyPrompt({
+        email,
+        company_name: context.name,
+        company_size: context.employeeCount,
+        industry: context.industry,
+        website_snippet: context.websiteSnippet,
+      }),
+      temperature: 0.1,
+      maxTokens: 200,
+      jsonMode: true,
+      traceId,
+    });
+    if (response.parsed) {
+      const keep = response.parsed.keep === true;
+      const confidence = Number(response.parsed.confidence ?? 0.5);
+      return {
+        email,
+        tier: "context_check",
+        verdict: keep ? "context_verified" : "rejected",
+        confidence,
+        likelyReaches: String(response.parsed.likely_reaches ?? "unknown"),
+        reason: String(response.parsed.reason ?? "LLM context check"),
+      };
+    }
+    // LLM failed to respond properly → conservative: keep it, low confidence
+    return {
+      email,
+      tier: "context_check",
+      verdict: "context_verified",
+      confidence: 0.5,
+      likelyReaches: "unknown — LLM parse failed",
+      reason: "LLM context check failed — keeping with low confidence",
+    };
+  } catch (err) {
+    logger.warn({ email, err }, "Email LLM classify failed — keeping conservatively");
+    // Fallback: rule-based size heuristic
+    const isSmall = (context.employeeCount ?? 0) < 30;
+    return {
+      email,
+      tier: "context_check",
+      verdict: isSmall ? "context_verified" : "rejected",
+      confidence: 0.4,
+      likelyReaches: isSmall ? "likely owner/manager (small company)" : "likely department inbox (large company)",
+      reason: `Fallback: company size ${context.employeeCount ?? "unknown"} → ${isSmall ? "small=keep" : "large=reject"}`,
+    };
+  }
+}

src/discovery/lib/email-pattern-generator.ts ADDED Viewed

	@@ -0,0 +1,249 @@

+/**
+ * Email Pattern Generator — Snov.io Replacement (FREE, UNLIMITED)
+ *
+ * How it works:
+ *   1. Take a person's name: "John Smith"
+ *   2. Generate ALL common email patterns: john@, smith@, john.smith@, j.smith@, etc.
+ *   3. Verify each via SMTP handshake (Layer 5 in our verifier — FREE)
+ *   4. First one that passes SMTP = real email
+ *
+ * This is what tools like Hunter/Snov ACTUALLY do internally.
+ * We're cutting out the middleman.
+ *
+ * Cost: $0 forever
+ * Daily limit: unlimited
+ * Accuracy: Higher than Snov (we verify each guess ourselves)
+ */
+import { logger } from "../../shared/utils/logger";
+import dns from "dns/promises";
+import net from "net";
+export interface GeneratedEmail {
+  email: string;
+  pattern: string;          // "firstname.lastname", "firstinitial.lastname", etc.
+  smtpStatus: "deliverable" | "undeliverable" | "unknown";
+  confidence: number;       // 0.0 - 1.0
+}
+// ─── Common email patterns (ordered by frequency) ────────────
+// Source: Analysis of 1M+ business emails worldwide
+const PATTERNS = [
+  // Most common (70% of businesses)
+  { name: "firstname",           build: (f: string, l: string) => f },
+  { name: "firstname.lastname",  build: (f: string, l: string) => `${f}.${l}` },
+  { name: "firstinitial.lastname", build: (f: string, l: string) => `${f[0]}.${l}` },
+  { name: "firstinitial_lastname", build: (f: string, l: string) => `${f[0]}${l}` },
+  { name: "firstname_lastname",  build: (f: string, l: string) => `${f}_${l}` },
+  // Common (20% of businesses)
+  { name: "lastname.firstname",  build: (f: string, l: string) => `${l}.${f}` },
+  { name: "lastname",            build: (f: string, l: string) => l },
+  { name: "firstname_lastinitial", build: (f: string, l: string) => `${f}${l[0]}` },
+  { name: "firstinitial_lastinitial", build: (f: string, l: string) => `${f[0]}${l[0]}` },
+  // Less common but valid (10%)
+  { name: "firstname-lastname",  build: (f: string, l: string) => `${f}-${l}` },
+  { name: "first2_lastname",     build: (f: string, l: string) => `${f.slice(0, 2)}${l}` },
+];
+/**
+ * Generate and verify email patterns for a person at a domain.
+ *
+ * @param firstName  Person's first name (e.g., "John")
+ * @param lastName   Person's last name (e.g., "Smith")
+ * @param domain     Company domain (e.g., "abcdental.com")
+ * @returns          List of generated emails with verification status
+ */
+export async function generateAndVerifyEmails(
+  firstName: string,
+  lastName: string,
+  domain: string
+): Promise<GeneratedEmail[]> {
+  if (!firstName || !lastName || !domain) return [];
+  const f = firstName.toLowerCase().replace(/[^a-z]/g, "");
+  const l = lastName.toLowerCase().replace(/[^a-z]/g, "");
+  if (f.length < 2 || l.length < 1) return [];
+  // Step 1: Check if domain has valid MX records
+  const hasMX = await checkMXRecord(domain);
+  if (!hasMX) {
+    logger.debug({ domain }, "No MX records — skipping pattern generation");
+    return [];
+  }
+  // Step 2: Check if domain is catch-all (accepts everything)
+  const isCatchAll = await checkCatchAll(domain);
+  // Step 3: Generate all pattern emails
+  const candidates = PATTERNS.map(p => ({
+    email: `${p.build(f, l)}@${domain}`,
+    pattern: p.name,
+    smtpStatus: "unknown" as const,
+    confidence: 0,
+  }));
+  // Step 4: If catch-all → we can't SMTP verify, return with medium confidence
+  if (isCatchAll) {
+    logger.debug({ domain }, "Catch-all domain — returning top patterns without SMTP");
+    return candidates.slice(0, 3).map(c => ({
+      ...c,
+      smtpStatus: "unknown" as const,
+      confidence: 0.5, // can't verify, medium confidence
+    }));
+  }
+  // Step 5: SMTP verify each (stop after first deliverable)
+  const results: GeneratedEmail[] = [];
+  let foundDeliverable = false;
+  for (const candidate of candidates) {
+    if (foundDeliverable) break; // Got one — no need to check rest
+    const smtpResult = await smtpVerify(candidate.email, domain);
+    const result: GeneratedEmail = {
+      ...candidate,
+      smtpStatus: smtpResult.deliverable ? "deliverable" : "undeliverable",
+      confidence: smtpResult.deliverable ? 0.92 : 0.1,
+    };
+    if (smtpResult.deliverable) {
+      foundDeliverable = true;
+      results.unshift(result); // deliverable goes first
+    } else {
+      results.push(result);
+    }
+  }
+  const deliverable = results.filter(r => r.smtpStatus === "deliverable");
+  logger.info({ domain, generated: candidates.length, deliverable: deliverable.length }, "Pattern generation complete");
+  return results;
+}
+/**
+ * Quick function for when we already have a name from Hunter.
+ * Just verify their existing email or find a new one.
+ */
+export async function findEmailForPerson(
+  fullName: string,
+  domain: string
+): Promise<GeneratedEmail | null> {
+  const parts = fullName.trim().split(/\s+/);
+  if (parts.length < 2) return null;
+  const firstName = parts[0];
+  const lastName = parts[parts.length - 1];
+  const results = await generateAndVerifyEmails(firstName, lastName, domain);
+  return results.find(r => r.smtpStatus === "deliverable") ?? results[0] ?? null;
+}
+// ─── MX Record Check (FREE) ─────────────────────────────────
+async function checkMXRecord(domain: string): Promise<boolean> {
+  try {
+    const records = await dns.resolveMx(domain);
+    return records.length > 0;
+  } catch {
+    return false;
+  }
+}
+// ─── Catch-all Detection (FREE — uses random probe) ─────────
+async function checkCatchAll(domain: string): Promise<boolean> {
+  // Send SMTP probe with obviously fake email
+  const fakeEmail = `xq7z9k2m4n${Date.now()}@${domain}`;
+  const result = await smtpVerify(fakeEmail, domain);
+  // If fake email is "deliverable" → catch-all
+  return result.deliverable;
+}
+// ─── SMTP Verification (FREE, UNLIMITED) ─────────────────────
+// Direct SMTP handshake — no third-party API needed
+async function smtpVerify(
+  email: string,
+  domain: string
+): Promise<{ deliverable: boolean; response: string }> {
+  return new Promise(async (resolve) => {
+    const timeout = setTimeout(() => {
+      resolve({ deliverable: false, response: "timeout" });
+    }, 8_000);
+    try {
+      // Get MX server
+      const mxRecords = await dns.resolveMx(domain);
+      if (mxRecords.length === 0) {
+        clearTimeout(timeout);
+        resolve({ deliverable: false, response: "no_mx" });
+        return;
+      }
+      // Sort by priority (lowest = highest priority)
+      mxRecords.sort((a, b) => a.priority - b.priority);
+      const mxHost = mxRecords[0].exchange;
+      // Connect to SMTP
+      const socket = new net.Socket();
+      let step = 0;
+      let lastResponse = "";
+      socket.setTimeout(7_000);
+      socket.on("timeout", () => {
+        socket.destroy();
+        clearTimeout(timeout);
+        resolve({ deliverable: false, response: "socket_timeout" });
+      });
+      socket.on("error", () => {
+        clearTimeout(timeout);
+        resolve({ deliverable: false, response: "connection_error" });
+      });
+      socket.on("data", (data) => {
+        const response = data.toString();
+        lastResponse = response;
+        if (step === 0 && response.startsWith("220")) {
+          // Server greeting → send EHLO
+          socket.write("EHLO verify.local\r\n");
+          step = 1;
+        } else if (step === 1 && response.startsWith("250")) {
+          // EHLO accepted → send MAIL FROM
+          socket.write("MAIL FROM:<verify@verify.local>\r\n");
+          step = 2;
+        } else if (step === 2 && response.startsWith("250")) {
+          // MAIL FROM accepted → send RCPT TO (the real check)
+          socket.write(`RCPT TO:<${email}>\r\n`);
+          step = 3;
+        } else if (step === 3) {
+          socket.write("QUIT\r\n");
+          socket.destroy();
+          clearTimeout(timeout);
+          if (response.startsWith("250")) {
+            // 250 = email exists and is deliverable
+            resolve({ deliverable: true, response: "250_accepted" });
+          } else if (response.startsWith("550") || response.startsWith("551") || response.startsWith("553")) {
+            // 550 = user doesn't exist
+            resolve({ deliverable: false, response: response.trim().slice(0, 100) });
+          } else {
+            // Other codes (452 = mailbox full, 421 = try later, etc.)
+            resolve({ deliverable: false, response: response.trim().slice(0, 100) });
+          }
+        }
+      });
+      socket.connect(25, mxHost);
+    } catch (err) {
+      clearTimeout(timeout);
+      resolve({ deliverable: false, response: String(err).slice(0, 100) });
+    }
+  });
+}

src/discovery/lib/email-verifier.ts ADDED Viewed

	@@ -0,0 +1,338 @@

+/**
+ * 7-Layer Email Verification
+ *
+ * Layer 1: RFC 5322 format check (instant, free)
+ * Layer 2: Domain ownership — email domain = company domain (instant, free)
+ * Layer 3: MX record lookup (free, DNS)
+ * Layer 4: Catch-all detection (Reoon API)
+ * Layer 5: SMTP handshake — ask mail server "does this user exist?" (free, direct)
+ * Layer 6: Disposable email check (free, local list)
+ * Layer 7: Provider confidence score (Hunter/Snov score)
+ *
+ * Each layer produces a boolean. Final status is computed from all 7.
+ */
+import dns from "dns/promises";
+import net from "net";
+import axios from "axios";
+import { getEnv } from "../../shared/config/env";
+import { logger } from "../../shared/utils/logger";
+export type EmailStatus =
+  | "verified_deliverable"     // all layers pass
+  | "verified_catch_all"       // valid but catch-all domain
+  | "pattern_smtp_confirmed"   // pattern-generated + SMTP confirmed
+  | "uncertain"                // some layers pass, some unknown
+  | "rejected_invalid";        // hard failure
+export interface VerificationResult {
+  email: string;
+  status: EmailStatus;
+  layers: {
+    format: boolean;
+    domainMatch: boolean;
+    mxRecord: boolean;
+    catchAll: boolean | null;   // null = couldn't determine
+    smtpHandshake: boolean | null;
+    disposable: boolean;        // true = IS disposable (bad)
+    providerConfidence: number; // 0-100 from Hunter/Snov
+  };
+  overallConfidence: number;    // 0-100 computed from layers
+}
+/**
+ * Run all 7 verification layers on an email.
+ */
+export async function verifyEmailDeep(
+  email: string,
+  companyDomain: string,
+  providerConfidence: number = 0
+): Promise<VerificationResult> {
+  const layers = {
+    format: false,
+    domainMatch: false,
+    mxRecord: false,
+    catchAll: null as boolean | null,
+    smtpHandshake: null as boolean | null,
+    disposable: false,
+    providerConfidence,
+  };
+  const emailDomain = email.split("@")[1]?.toLowerCase();
+  if (!emailDomain) {
+    return makeResult(email, "rejected_invalid", layers, 0);
+  }
+  // ── Layer 1: Format check ──────────────────────────────────
+  layers.format = isValidFormat(email);
+  if (!layers.format) {
+    return makeResult(email, "rejected_invalid", layers, 0);
+  }
+  // ── Layer 2: Domain ownership ──────────────────────────────
+  layers.domainMatch = isDomainMatch(emailDomain, companyDomain);
+  if (!layers.domainMatch) {
+    logger.warn({ email, emailDomain, companyDomain }, "Domain mismatch — rejecting");
+    return makeResult(email, "rejected_invalid", layers, 0);
+  }
+  // ── Layer 3: MX record ────────────────────────────────────
+  layers.mxRecord = await hasMxRecord(emailDomain);
+  if (!layers.mxRecord) {
+    return makeResult(email, "rejected_invalid", layers, 5);
+  }
+  // ── Layer 4: Catch-all detection (Reoon) ───────────────────
+  layers.catchAll = await checkCatchAll(emailDomain);
+  // ── Layer 5: SMTP handshake ─────────────────────────────────
+  layers.smtpHandshake = await smtpHandshake(email, emailDomain);
+  // ── Layer 6: Disposable check ──────────────────────────────
+  layers.disposable = isDisposable(emailDomain);
+  if (layers.disposable) {
+    return makeResult(email, "rejected_invalid", layers, 0);
+  }
+  // ── Layer 7: Provider confidence ──────────────────────────
+  // Already set from Hunter/Snov response
+  // ── Compute final status ───────────────────────────────────
+  return computeFinalStatus(email, layers);
+}
+// ─── Layer 1: RFC 5322 Format ────────────────────────────────
+function isValidFormat(email: string): boolean {
+  // Strict-ish RFC 5322 check
+  const pattern = /^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/;
+  if (!pattern.test(email)) return false;
+  if (email.length > 254) return false;
+  const local = email.split("@")[0];
+  if (local.length > 64) return false;
+  if (local.startsWith(".") || local.endsWith(".")) return false;
+  if (local.includes("..")) return false;
+  return true;
+}
+// ─── Layer 2: Domain Match ──────────────────────────────────
+function isDomainMatch(emailDomain: string, companyDomain: string): boolean {
+  const normalize = (d: string) => d.toLowerCase().replace(/^www\./, "").trim();
+  const eDomain = normalize(emailDomain);
+  const cDomain = normalize(companyDomain);
+  // Exact match
+  if (eDomain === cDomain) return true;
+  // Subdomain match (e.g., mail.company.com → company.com)
+  if (eDomain.endsWith(`.${cDomain}`)) return true;
+  // Common email domain variants (company uses Google Workspace etc.)
+  // This is fine — john@company.com matches company.com
+  return false;
+}
+// ─── Layer 3: MX Record ─────────────────────────────────────
+async function hasMxRecord(domain: string): Promise<boolean> {
+  try {
+    const records = await dns.resolveMx(domain);
+    return records.length > 0;
+  } catch {
+    return false;
+  }
+}
+// ─── Layer 4: Catch-All Detection (CREDIT-OPTIMIZED) ────────
+// Strategy: Try FREE SMTP probe first → only use Reoon if SMTP can't determine
+// This saves Reoon credits (only 20/day) for when they're truly needed
+let _reoonUsedToday = 0;
+let _reoonResetDate = new Date().toDateString();
+const REOON_DAILY_LIMIT = 18; // keep 2 credits as buffer
+async function checkCatchAll(domain: string): Promise<boolean | null> {
+  // ── Attempt 1: FREE SMTP catch-all probe ───────────────────
+  // Send RCPT TO with a random gibberish address.
+  // If server accepts it → catch-all. If 550 → NOT catch-all.
+  try {
+    const fakeEmail = `xqz7k2m4n_test_${Date.now() % 10000}@${domain}`;
+    const smtpResult = await smtpHandshake(fakeEmail, domain);
+    if (smtpResult === true) {
+      // Server accepted gibberish email → CATCH-ALL
+      logger.debug({ domain }, "Catch-all detected via FREE SMTP probe (Reoon credit saved)");
+      return true;
+    }
+    if (smtpResult === false) {
+      // Server rejected gibberish email → NOT catch-all
+      logger.debug({ domain }, "NOT catch-all — confirmed via FREE SMTP probe");
+      return false;
+    }
+    // smtpResult === null → SMTP couldn't determine, fall through to Reoon
+  } catch {
+    // SMTP probe failed, fall through to Reoon
+  }
+  // ── Attempt 2: Reoon API (only if SMTP couldn't determine) ─
+  // Reset counter if new day
+  const today = new Date().toDateString();
+  if (_reoonResetDate !== today) {
+    _reoonUsedToday = 0;
+    _reoonResetDate = today;
+  }
+  // Check budget
+  if (_reoonUsedToday >= REOON_DAILY_LIMIT) {
+    logger.warn({ domain, used: _reoonUsedToday }, "Reoon daily limit reached — skipping");
+    return null;
+  }
+  try {
+    const env = getEnv();
+    _reoonUsedToday++;
+    const response = await axios.get("https://emailverifier.reoon.com/api/v1/verify", {
+      params: {
+        email: `definitely_not_real_${Date.now()}@${domain}`,
+        key: env.REOON_API_KEY,
+        mode: "quick",
+      },
+      timeout: 8_000,
+    });
+    logger.debug({ domain, reoonUsed: _reoonUsedToday }, "Reoon credit used for catch-all check");
+    return response.data?.status === "valid";
+  } catch {
+    return null;
+  }
+}
+// ─── Layer 5: SMTP Handshake ────────────────────────────────
+async function smtpHandshake(email: string, domain: string): Promise<boolean | null> {
+  try {
+    // Resolve MX to get mail server
+    const mxRecords = await dns.resolveMx(domain);
+    if (!mxRecords.length) return null;
+    // Pick highest priority (lowest number)
+    const mailServer = mxRecords.sort((a, b) => a.priority - b.priority)[0].exchange;
+    return new Promise((resolve) => {
+      const socket = new net.Socket();
+      let step = 0;
+      let result = false;
+      const timeout = setTimeout(() => {
+        socket.destroy();
+        resolve(null);
+      }, 10_000);
+      socket.connect(25, mailServer, () => {
+        // Connected to mail server
+      });
+      socket.on("data", (data) => {
+        const response = data.toString();
+        if (step === 0 && response.startsWith("220")) {
+          // Server greeting → send EHLO
+          socket.write("EHLO verify.local\r\n");
+          step = 1;
+        } else if (step === 1 && response.startsWith("250")) {
+          // EHLO accepted → send MAIL FROM
+          socket.write("MAIL FROM:<verify@verify.local>\r\n");
+          step = 2;
+        } else if (step === 2 && response.startsWith("250")) {
+          // MAIL FROM accepted → send RCPT TO (the actual check)
+          socket.write(`RCPT TO:<${email}>\r\n`);
+          step = 3;
+        } else if (step === 3) {
+          if (response.startsWith("250")) {
+            result = true;  // 250 = user exists!
+          } else if (response.startsWith("550") || response.startsWith("553")) {
+            result = false; // 550 = user doesn't exist
+          }
+          // Cleanup
+          socket.write("QUIT\r\n");
+          clearTimeout(timeout);
+          socket.destroy();
+          resolve(result);
+        }
+      });
+      socket.on("error", () => {
+        clearTimeout(timeout);
+        resolve(null); // can't determine
+      });
+    });
+  } catch {
+    return null; // can't determine
+  }
+}
+// ─── Layer 6: Disposable Email ──────────────────────────────
+const DISPOSABLE_DOMAINS = new Set([
+  "mailinator.com", "tempmail.com", "throwaway.email", "guerrillamail.com",
+  "guerrillamail.info", "yopmail.com", "trashmail.com", "maildrop.cc",
+  "10minutemail.com", "temp-mail.org", "fakeinbox.com", "sharklasers.com",
+  "guerrillamail.net", "grr.la", "dispostable.com", "tempr.email",
+  "mohmal.com", "burpcollaborator.net", "mailnesia.com",
+]);
+function isDisposable(domain: string): boolean {
+  return DISPOSABLE_DOMAINS.has(domain.toLowerCase());
+}
+// ─── Final Status Computation ────────────────────────────────
+function computeFinalStatus(
+  email: string,
+  layers: VerificationResult["layers"]
+): VerificationResult {
+  // All layers pass (including SMTP)
+  if (layers.format && layers.domainMatch && layers.mxRecord &&
+      layers.smtpHandshake === true && !layers.disposable && !layers.catchAll) {
+    const confidence = Math.min(
+      95,
+      60 + (layers.providerConfidence > 0 ? Math.round(layers.providerConfidence * 0.35) : 15)
+    );
+    return makeResult(email, "verified_deliverable", layers, confidence);
+  }
+  // Catch-all domain — uncertain but not invalid
+  if (layers.catchAll === true && layers.mxRecord) {
+    return makeResult(email, "verified_catch_all", layers, 45);
+  }
+  // SMTP confirmed but no provider data
+  if (layers.smtpHandshake === true && layers.providerConfidence === 0) {
+    return makeResult(email, "pattern_smtp_confirmed", layers, 70);
+  }
+  // MX exists, provider says good, SMTP unknown
+  if (layers.mxRecord && layers.providerConfidence >= 70 && layers.smtpHandshake === null) {
+    return makeResult(email, "verified_deliverable", layers, layers.providerConfidence);
+  }
+  // MX exists but everything else uncertain
+  if (layers.mxRecord && !layers.disposable) {
+    return makeResult(email, "uncertain", layers, 30);
+  }
+  return makeResult(email, "rejected_invalid", layers, 0);
+}
+function makeResult(
+  email: string,
+  status: EmailStatus,
+  layers: VerificationResult["layers"],
+  overallConfidence: number
+): VerificationResult {
+  return { email, status, layers, overallConfidence };
+}

src/discovery/lib/icp-filter.ts ADDED Viewed

	@@ -0,0 +1,133 @@

+import { getSupabaseClient } from "../../shared/supabase/client";
+import { IcpConfig } from "../../shared/supabase/schema";
+import { ScrapedCompany } from "./web-scraper";
+import { logger } from "../../shared/utils/logger";
+export interface FilterResult {
+  passed: boolean;
+  failReasons: string[];
+  passedSignals: string[];
+  signalScore: number; // 0-4 — how many growth signals detected
+}
+/**
+ * Loads the active ICP config from Supabase.
+ */
+export async function loadIcpConfig(): Promise<IcpConfig> {
+  const db = getSupabaseClient();
+  const { data, error } = await db
+    .from("icp_config")
+    .select("*")
+    .eq("is_active", true)
+    .single();
+  if (error || !data) {
+    logger.error({ error }, "Failed to load ICP config — using defaults");
+    return DEFAULT_ICP;
+  }
+  return data as IcpConfig;
+}
+/**
+ * FILTER GATE 1 — Hard rules only.
+ * Returns immediately on first failure for efficiency.
+ */
+export function applyHardFilters(
+  company: ScrapedCompany,
+  icp: IcpConfig,
+  region: string
+): FilterResult {
+  const fail: string[] = [];
+  const pass: string[] = [];
+  // ── Employee count ───────────────────────────────────────────
+  if (company.employeeCount !== null && company.employeeCount < icp.min_employees) {
+    fail.push(`employees_too_few:${company.employeeCount}`);
+  } else {
+    pass.push("employee_count_ok");
+  }
+  // ── Industry check ───────────────────────────────────────────
+  const industryLower = (company.industry ?? "").toLowerCase();
+  const inExcluded = icp.exclude_industries.some((ex) => industryLower.includes(ex));
+  if (inExcluded) {
+    fail.push(`excluded_industry:${company.industry}`);
+  } else {
+    pass.push("industry_ok");
+  }
+  // ── Website exists ───────────────────────────────────────────
+  if (!company.domain || company.websiteText.length < 100) {
+    fail.push("no_valid_website");
+  } else {
+    pass.push("website_ok");
+  }
+  return {
+    passed: fail.length === 0,
+    failReasons: fail,
+    passedSignals: pass,
+    signalScore: 0,
+  };
+}
+/**
+ * FILTER GATE 2 — Growth & AI signal check.
+ * Company needs ≥ 2 positive signals to proceed.
+ */
+export function applySignalFilters(
+  company: ScrapedCompany,
+  icp: IcpConfig
+): FilterResult {
+  const pass: string[] = [];
+  const fail: string[] = [];
+  // ── AI-related job postings ──────────────────────────────────
+  const aiJobs = company.jobPostings.filter((j) => j.hasAiSignal);
+  if (aiJobs.length > 0) pass.push(`ai_job_postings:${aiJobs.length}`);
+  // ── Tech stack signals ───────────────────────────────────────
+  const stackSignals = company.techStack.filter((t) =>
+    icp.tech_signals.includes(t.toLowerCase())
+  );
+  if (stackSignals.length > 0) pass.push(`tech_stack:${stackSignals.join(",")}`);
+  // ── ICP keywords in website text ────────────────────────────
+  const textLower = company.websiteText.toLowerCase();
+  const kwHits = icp.keywords.filter((kw) => textLower.includes(kw.toLowerCase()));
+  if (kwHits.length >= 2) pass.push(`keyword_hits:${kwHits.join(",")}`);
+  // ── Active job hiring (general) ──────────────────────────────
+  if (company.jobPostings.length >= 3) pass.push(`active_hiring:${company.jobPostings.length}`);
+  const signalScore = pass.length;
+  if (signalScore < 2) {
+    fail.push(`insufficient_signals:${signalScore}`);
+    logger.debug({ domain: company.domain, signalScore }, "Gate 2 failed: low signals");
+  }
+  return {
+    passed: fail.length === 0,
+    failReasons: fail,
+    passedSignals: pass,
+    signalScore,
+  };
+}
+// ─── Default ICP (if DB read fails) ─────────────────────────
+const DEFAULT_ICP: IcpConfig = {
+  id: "default",
+  name: "default",
+  min_employees: 50,
+  industries: ["technology", "manufacturing", "logistics", "healthcare", "finance"],
+  exclude_industries: ["government", "non-profit", "education"],
+  geographies: ["US", "UK", "AU", "UAE", "SA"],
+  keywords: ["automation", "digital transformation", "AI", "operations"],
+  tech_signals: ["salesforce", "hubspot", "sap", "legacy_erp"],
+  score_threshold: 70,
+  is_active: true,
+  created_at: new Date().toISOString(),
+  updated_at: new Date().toISOString(),
+};

src/discovery/lib/linkedin-person-finder.ts ADDED Viewed

	@@ -0,0 +1,205 @@

+/**
+ * Personal LinkedIn Finder
+ *
+ * Finds linkedin.com/in/person-name (personal profile)
+ * NOT linkedin.com/company/ (company page — already have that)
+ *
+ * Methods in priority order:
+ * 1. Google search: "name" "company" site:linkedin.com/in
+ * 2. Company's LinkedIn people page scrape
+ * 3. Hunter.io linkedin_url field (sometimes returned)
+ *
+ * MANDATORY — every qualified lead must have a LinkedIn attempt.
+ */
+import { searchCompanies, SerperResult } from "../providers/serper";
+import { serperLimiter } from "../../shared/utils/rate-limiter";
+import { logger } from "../../shared/utils/logger";
+import axios from "axios";
+import { getEnv } from "../../shared/config/env";
+export interface PersonalLinkedIn {
+  url: string;              // linkedin.com/in/john-smith-abc123
+  confidence: number;       // how sure we are this is the right person
+  source: "google_search" | "company_people_page" | "hunter_field";
+  verified: boolean;        // URL format is valid and accessible
+}
+/**
+ * Find personal LinkedIn profile for a decision maker.
+ * Tries multiple methods. Returns null if all fail (not an error — just LinkedIn-not-found).
+ */
+export async function findPersonalLinkedIn(
+  fullName: string,
+  companyName: string,
+  companyDomain: string,
+  companyLinkedInUrl: string | null
+): Promise<PersonalLinkedIn | null> {
+  // Method 1: Google search (highest accuracy)
+  const googleResult = await searchViaGoogle(fullName, companyName);
+  if (googleResult) return googleResult;
+  // Method 2: From company LinkedIn people page (already scraped)
+  if (companyLinkedInUrl) {
+    const peopleResult = await searchViaPeoplePage(fullName, companyLinkedInUrl);
+    if (peopleResult) return peopleResult;
+  }
+  logger.info({ fullName, companyName }, "LinkedIn personal not found — all methods tried");
+  return null;
+}
+// ─── Method 1: Google Search ─────────────────────────────────
+async function searchViaGoogle(
+  fullName: string,
+  companyName: string
+): Promise<PersonalLinkedIn | null> {
+  try {
+    await serperLimiter.consume("serper");
+    const env = getEnv();
+    const query = `"${fullName}" "${companyName}" site:linkedin.com/in`;
+    const response = await axios.post(
+      "https://google.serper.dev/search",
+      { q: query, num: 5 },
+      {
+        headers: {
+          "X-API-KEY": env.SERPER_API_KEY,
+          "Content-Type": "application/json",
+        },
+        timeout: 8_000,
+      }
+    );
+    const organic = response.data?.organic ?? [];
+    for (const result of organic) {
+      const url = result.link;
+      if (!isLinkedInPersonalUrl(url)) continue;
+      // Verify the result mentions both name and company
+      const snippet = (result.snippet ?? "").toLowerCase();
+      const title = (result.title ?? "").toLowerCase();
+      const combined = `${snippet} ${title}`;
+      const nameParts = fullName.toLowerCase().split(/\s+/);
+      const hasName = nameParts.some(part => part.length > 2 && combined.includes(part));
+      const hasCompany = companyName.toLowerCase().split(/\s+/).some(
+        part => part.length > 3 && combined.includes(part)
+      );
+      if (hasName) {
+        return {
+          url: cleanLinkedInUrl(url),
+          confidence: hasCompany ? 0.92 : 0.70,
+          source: "google_search",
+          verified: true,
+        };
+      }
+    }
+    return null;
+  } catch (err) {
+    logger.warn({ fullName, err }, "Google LinkedIn search failed");
+    return null;
+  }
+}
+// ─── Method 2: Company People Page ──────────────────────────
+async function searchViaPeoplePage(
+  fullName: string,
+  companyLinkedInUrl: string
+): Promise<PersonalLinkedIn | null> {
+  try {
+    await serperLimiter.consume("serper");
+    const env = getEnv();
+    // Search Google for the person's name on the company's LinkedIn
+    const companySlug = companyLinkedInUrl.match(/company\/([^/?]+)/)?.[1];
+    if (!companySlug) return null;
+    const query = `"${fullName}" site:linkedin.com/in ${companySlug}`;
+    const response = await axios.post(
+      "https://google.serper.dev/search",
+      { q: query, num: 3 },
+      {
+        headers: {
+          "X-API-KEY": env.SERPER_API_KEY,
+          "Content-Type": "application/json",
+        },
+        timeout: 8_000,
+      }
+    );
+    const organic = response.data?.organic ?? [];
+    for (const result of organic) {
+      if (isLinkedInPersonalUrl(result.link)) {
+        return {
+          url: cleanLinkedInUrl(result.link),
+          confidence: 0.75,
+          source: "company_people_page",
+          verified: true,
+        };
+      }
+    }
+    return null;
+  } catch {
+    return null;
+  }
+}
+// ─── Helpers ─────────────────────────────────────────────���───
+function isLinkedInPersonalUrl(url: string): boolean {
+  // Must be linkedin.com/in/ (personal) not /company/ or /jobs/
+  return /linkedin\.com\/in\/[a-zA-Z0-9\-]+/.test(url);
+}
+function cleanLinkedInUrl(url: string): string {
+  // Remove query params and fragments, normalize
+  const match = url.match(/(https?:\/\/(?:www\.)?linkedin\.com\/in\/[a-zA-Z0-9\-]+)/);
+  return match ? match[1] : url;
+}
+/**
+ * Batch find LinkedIn profiles for multiple decision makers.
+ * Stops after 5 to conserve API calls.
+ */
+export async function batchFindLinkedIn(
+  people: { fullName: string; title: string }[],
+  companyName: string,
+  companyDomain: string,
+  companyLinkedInUrl: string | null
+): Promise<Map<string, PersonalLinkedIn>> {
+  const results = new Map<string, PersonalLinkedIn>();
+  const maxLookups = Math.min(people.length, 5);
+  for (let i = 0; i < maxLookups; i++) {
+    const person = people[i];
+    const result = await findPersonalLinkedIn(
+      person.fullName,
+      companyName,
+      companyDomain,
+      companyLinkedInUrl
+    );
+    if (result) {
+      results.set(person.fullName, result);
+    }
+    // Small delay between searches to be polite
+    await new Promise(r => setTimeout(r, 1500));
+  }
+  logger.info({ company: companyName, found: results.size, attempted: maxLookups },
+    "LinkedIn personal batch complete"
+  );
+  return results;
+}

src/discovery/lib/linkedin-scraper.ts ADDED Viewed

	@@ -0,0 +1,165 @@

+import { chromium, Browser, BrowserContext } from "playwright";
+import { playwrightLimiter } from "../../shared/utils/rate-limiter";
+import { logger } from "../../shared/utils/logger";
+export interface LinkedInCompanyData {
+  name: string | null;
+  description: string | null;
+  employeeCount: number | null;
+  employeeRange: string | null;
+  industry: string | null;
+  headquarters: string | null;
+  website: string | null;
+  recentPosts: string[];
+  decisionMakers: LinkedInPerson[];
+}
+export interface LinkedInPerson {
+  fullName: string;
+  title: string;
+  linkedinUrl: string;
+  isDecisionMaker: boolean;
+}
+const DECISION_MAKER_TITLES = [
+  "ceo", "chief executive", "founder", "co-founder", "cofounder",
+  "cto", "chief technology", "coo", "chief operating",
+  "vp", "vice president", "director", "head of",
+  "managing director", "general manager", "president",
+];
+/**
+ * Scrapes LinkedIn public company page.
+ * Only reads publicly visible data — no login, no TOS violation.
+ */
+export async function scrapeLinkedInCompany(
+  linkedinUrl: string
+): Promise<LinkedInCompanyData> {
+  await playwrightLimiter.consume("linkedin");
+  const browser = await chromium.launch({ headless: true, args: ["--no-sandbox"] });
+  const context = await browser.newContext({
+    userAgent:
+      "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 " +
+      "(KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36",
+    locale: "en-US",
+  });
+  const result: LinkedInCompanyData = {
+    name: null,
+    description: null,
+    employeeCount: null,
+    employeeRange: null,
+    industry: null,
+    headquarters: null,
+    website: null,
+    recentPosts: [],
+    decisionMakers: [],
+  };
+  const page = await context.newPage();
+  try {
+    // ── Company About Page ────────────────────────────────────
+    const aboutUrl = linkedinUrl.replace(/\/$/, "") + "/about/";
+    await page.goto(aboutUrl, { waitUntil: "domcontentloaded", timeout: 20_000 });
+    // Add small delay to let JS render
+    await page.waitForTimeout(2000);
+    const pageText = await page.evaluate(() => document.body.innerText);
+    // Extract employee count
+    const empMatch = pageText.match(/(\d[\d,]+)\s*(?:followers|employees)/i);
+    if (empMatch) {
+      result.employeeCount = parseInt(empMatch[1].replace(/,/g, ""), 10);
+    }
+    // Extract range if count not found
+    const rangeMatch = pageText.match(/(\d+[\d,]*)\s*[-–]\s*(\d+[\d,]*)\s*employees/i);
+    if (rangeMatch) {
+      result.employeeRange = `${rangeMatch[1]}-${rangeMatch[2]}`;
+    }
+    // Extract company name from og:title
+    result.name = await page
+      .$eval('meta[property="og:title"]', (el) => el.getAttribute("content"))
+      .catch(() => null);
+    // Extract description
+    result.description = await page
+      .$eval('meta[property="og:description"]', (el) => el.getAttribute("content"))
+      .catch(() => null);
+    // Extract industry + HQ from About section text
+    const industryMatch = pageText.match(/Industry\s*\n([^\n]+)/i);
+    if (industryMatch) result.industry = industryMatch[1].trim();
+    const hqMatch = pageText.match(/Headquarters\s*\n([^\n]+)/i);
+    if (hqMatch) result.headquarters = hqMatch[1].trim();
+    logger.info(
+      { linkedinUrl, employees: result.employeeCount, industry: result.industry },
+      "LinkedIn company scraped"
+    );
+    // ── People Page (public) ─────────────────────────────────
+    const peopleUrl = linkedinUrl.replace(/\/$/, "") + "/people/";
+    await page.goto(peopleUrl, { waitUntil: "domcontentloaded", timeout: 15_000 });
+    await page.waitForTimeout(1500);
+    const peopleText = await page.evaluate(() => document.body.innerText);
+    result.decisionMakers = extractDecisionMakers(peopleText, linkedinUrl);
+    logger.info({ linkedinUrl, dmCount: result.decisionMakers.length }, "LinkedIn people scraped");
+  } catch (err) {
+    logger.warn({ linkedinUrl, err }, "LinkedIn scrape partial failure");
+  } finally {
+    await page.close();
+    await context.close();
+    await browser.close();
+  }
+  return result;
+}
+/**
+ * Searches LinkedIn for a company by name + region using Google.
+ * Returns the LinkedIn company URL if found.
+ */
+export function buildLinkedInSearchUrl(companyName: string): string {
+  const q = encodeURIComponent(`site:linkedin.com/company "${companyName}"`);
+  return `https://www.google.com/search?q=${q}`;
+}
+function extractDecisionMakers(text: string, companyUrl: string): LinkedInPerson[] {
+  const lines = text.split("\n").map((l) => l.trim()).filter((l) => l.length > 2);
+  const people: LinkedInPerson[] = [];
+  for (let i = 0; i < lines.length - 1; i++) {
+    const nameLine = lines[i];
+    const titleLine = lines[i + 1] ?? "";
+    // Names are typically 2-4 words, Title follows
+    const isName = /^[A-Z][a-z]+ [A-Z]/.test(nameLine) && nameLine.split(" ").length <= 4;
+    if (!isName) continue;
+    const titleLower = titleLine.toLowerCase();
+    const isDecisionMaker = DECISION_MAKER_TITLES.some((t) => titleLower.includes(t));
+    if (isDecisionMaker || titleLower.length < 60) {
+      people.push({
+        fullName: nameLine,
+        title: titleLine,
+        linkedinUrl: `${companyUrl}/people/`, // public people page
+        isDecisionMaker,
+      });
+      i++; // skip title line
+    }
+    if (people.length >= 10) break;
+  }
+  // Sort: decision-makers first
+  return people.sort((a, b) => Number(b.isDecisionMaker) - Number(a.isDecisionMaker));
+}

src/discovery/lib/normalizer.ts ADDED Viewed

	@@ -0,0 +1,145 @@

+import { InsertCompany } from "../../shared/supabase/schema";
+import { ScrapedCompany } from "./web-scraper";
+import { LinkedInCompanyData } from "./linkedin-scraper";
+import { SerperResult } from "../providers/serper";
+/**
+ * Normalizes raw data from multiple sources into a single canonical Company record.
+ * Priority: LinkedIn > Website > Serper snippet
+ */
+export function normalizeCompany(
+  serperResult: SerperResult,
+  website: ScrapedCompany,
+  linkedin: LinkedInCompanyData | null,
+  region: string,
+  source: string
+): InsertCompany {
+  const name =
+    linkedin?.name ??
+    website.name ??
+    cleanTitle(serperResult.title);
+  const description =
+    linkedin?.description ??
+    website.description ??
+    serperResult.snippet;
+  const employeeCount =
+    linkedin?.employeeCount ??
+    website.employeeCount ??
+    null;
+  const employeeRange =
+    linkedin?.employeeRange ??
+    website.employeeRange ??
+    estimateRange(employeeCount);
+  const industry =
+    linkedin?.industry ??
+    website.industry ??
+    null;
+  const country =
+    linkedin?.headquarters
+      ? extractCountry(linkedin.headquarters)
+      : regionToCountry(region);
+  const linkedinUrl =
+    linkedin !== null
+      ? extractLinkedInCompanyUrl(serperResult.link) ?? website.linkedinUrl
+      : website.linkedinUrl;
+  const growthSignals = buildGrowthSignals(website, linkedin);
+  return {
+    domain: website.domain,
+    name: name ?? "Unknown",
+    industry,
+    employee_count: employeeCount,
+    employee_range: employeeRange,
+    description: description?.slice(0, 1000) ?? null,
+    website_url: `https://${website.domain}`,
+    linkedin_url: linkedinUrl ?? null,
+    country,
+    region,
+    tech_stack: website.techStack,
+    growth_signals: growthSignals,
+    raw_data: {
+      serper_title: serperResult.title,
+      serper_snippet: serperResult.snippet,
+      serper_link: serperResult.link,
+    },
+    source,
+    status: "discovered",
+  };
+}
+// ─── Helpers ─────────────────────────────────────────────────
+function cleanTitle(title: string): string {
+  return title
+    .split(/[|\-–]/)[0]
+    .replace(/\b(home|official|website|welcome to)\b/gi, "")
+    .trim();
+}
+function estimateRange(count: number | null): string | null {
+  if (!count) return null;
+  if (count < 50) return "10-49";
+  if (count < 100) return "50-99";
+  if (count < 200) return "100-199";
+  if (count < 500) return "200-499";
+  if (count < 1000) return "500-999";
+  return "1000+";
+}
+function extractCountry(headquarters: string): string | null {
+  const parts = headquarters.split(",");
+  return parts[parts.length - 1]?.trim() ?? null;
+}
+function regionToCountry(region: string): string {
+  const map: Record<string, string> = {
+    US: "United States", UK: "United Kingdom",
+    AU: "Australia", UAE: "United Arab Emirates",
+    SA: "Saudi Arabia", SG: "Singapore",
+  };
+  return map[region] ?? region;
+}
+function extractLinkedInCompanyUrl(url: string): string | null {
+  const match = url.match(/https?:\/\/(www\.)?linkedin\.com\/company\/[^/?#]+/);
+  return match ? match[0] : null;
+}
+function buildGrowthSignals(
+  website: ScrapedCompany,
+  linkedin: LinkedInCompanyData | null
+): object[] {
+  const signals: object[] = [];
+  // AI-related job postings
+  website.jobPostings
+    .filter((j) => j.hasAiSignal)
+    .forEach((j) => {
+      signals.push({
+        type: "job_posting",
+        content: j.title,
+        source_url: j.url,
+        ai_related: true,
+        detected_at: new Date().toISOString(),
+      });
+    });
+  // Recent LinkedIn posts
+  (linkedin?.recentPosts ?? []).forEach((post) => {
+    signals.push({
+      type: "social_post",
+      content: post.slice(0, 200),
+      ai_related: /automat|ai\b|machine learning|digital/i.test(post),
+      detected_at: new Date().toISOString(),
+    });
+  });
+  return signals.slice(0, 10); // cap at 10 signals
+}

src/discovery/lib/pain-signal-detector.ts ADDED Viewed

	@@ -0,0 +1,228 @@

+/**
+ * Pain Signal Detector
+ *
+ * Core philosophy: Don't look for AI signals.
+ * Look for INEFFICIENCY signals.
+ *
+ * A phone number on homepage = manual call handling = pain point.
+ * A "Book by Phone" button = no online scheduling = pain point.
+ * No chatbot = manual customer interaction = pain point.
+ *
+ * These are UNIVERSAL signals — every industry has them.
+ * The LLM then maps these signals to our specific services.
+ */
+import { callLLM, MODELS } from "../../shared/llm/nvidia-client";
+import { SYSTEM_PROMPTS, buildPainDetectionPrompt } from "../../shared/llm/prompts";
+import { logger } from "../../shared/utils/logger";
+export interface PainSignal {
+  signal: string;
+  evidence: string;
+  severity: "low" | "medium" | "high";
+}
+export interface PainDetectionResult {
+  painSignals: PainSignal[];
+  serviceMatch: string | null;          // matched service from service_profiles
+  matchConfidence: number;
+  reasoning: string;
+  source: "heuristic" | "llm" | "combined";
+}
+// ─── Heuristic detection (instant, free, no LLM) ────────────
+const HEURISTIC_RULES: {
+  pattern: RegExp;
+  signal: string;
+  severity: PainSignal["severity"];
+}[] = [
+  // Phone/call signals → AI Receptionist opportunity
+  { pattern: /(?:call us|call now|phone|dial|ring us)/i, signal: "phone_handling_manual", severity: "high" },
+  { pattern: /\+?\d[\d\s\-().]{8,}/, signal: "phone_number_prominent", severity: "medium" },
+  { pattern: /(?:book (?:an? )?appointment|schedule (?:a )?visit|make (?:an? )?appointment)/i, signal: "manual_appointment_booking", severity: "high" },
+  { pattern: /(?:office hours|opening hours|business hours|we're open)/i, signal: "limited_availability_hours", severity: "medium" },
+  { pattern: /(?:receptionist|front desk|reception)/i, signal: "human_receptionist_mentioned", severity: "high" },
+  // Support signals → AI Customer Support opportunity
+  { pattern: /(?:contact us|get in touch|reach out|enquire|inquire)/i, signal: "manual_contact_process", severity: "medium" },
+  { pattern: /(?:submit (?:a )?ticket|raise (?:a )?ticket)/i, signal: "manual_ticket_system", severity: "medium" },
+  { pattern: /(?:FAQ|frequently asked|common questions)/i, signal: "faq_exists_no_chatbot", severity: "low" },
+  { pattern: /(?:email us|send us an email|write to us)/i, signal: "email_only_support", severity: "medium" },
+  // Data/process signals → AI Data Processing opportunity
+  { pattern: /(?:spreadsheet|excel|csv|manual report)/i, signal: "manual_data_processing", severity: "high" },
+  { pattern: /(?:legacy|outdated|traditional system)/i, signal: "legacy_system_mentioned", severity: "high" },
+  { pattern: /(?:compliance|regulatory|audit)/i, signal: "compliance_reporting_burden", severity: "medium" },
+  // Hiring signals → growth/overwork indicator
+  { pattern: /(?:we're hiring|join our team|open positions|careers)/i, signal: "actively_hiring", severity: "low" },
+  { pattern: /(?:our team|meet the team|staff|employees)/i, signal: "team_page_exists", severity: "low" },
+];
+// Elements on page that indicate ABSENCE of automation
+const ABSENCE_SIGNALS: {
+  check: (html: string) => boolean;
+  signal: string;
+  severity: PainSignal["severity"];
+}[] = [
+  {
+    check: (html) => !/(intercom|drift|crisp|tidio|zendesk|freshchat|livechat|tawk|hubspot.*chat)/i.test(html),
+    signal: "no_chatbot_detected",
+    severity: "medium",
+  },
+  {
+    check: (html) => !/(calendly|acuity|booksy|mindbody|simplybook|square.*appointment)/i.test(html),
+    signal: "no_online_scheduling_tool",
+    severity: "high",
+  },
+  {
+    check: (html) => !/(zapier|make\.com|automate|n8n|workato)/i.test(html),
+    signal: "no_automation_tools",
+    severity: "low",
+  },
+];
+/**
+ * Detect pain signals from website text and HTML.
+ *
+ * Step 1: Heuristic detection (instant, free)
+ * Step 2: LLM enhancement (DL reasoning — maps signals to services)
+ */
+export async function detectPainSignals(
+  companyName: string,
+  industry: string,
+  employeeCount: number | null,
+  websiteText: string,
+  websiteHtml: string,
+  traceId: string
+): Promise<PainDetectionResult> {
+  // ── Step 1: Heuristic scan ─────────────────────────────────
+  const heuristicSignals = runHeuristicScan(websiteText, websiteHtml);
+  // If we found enough signals, LLM just confirms and maps to service
+  // If few signals, LLM reasons deeper about the industry context
+  const pageElements = heuristicSignals.map(s => s.signal);
+  // ── Step 2: LLM deep reasoning ────────────────────────────
+  try {
+    const llmResult = await callLLM({
+      operation: "pain_detect",
+      model: MODELS.FAST,      // 8B for speed — pain detection is pattern-based
+      systemPrompt: SYSTEM_PROMPTS.PAIN_DETECTOR,
+      userPrompt: buildPainDetectionPrompt({
+        company_name: companyName,
+        industry,
+        employee_count: employeeCount,
+        website_text: websiteText.slice(0, 500),
+        page_elements: pageElements,
+      }),
+      temperature: 0.2,
+      maxTokens: 400,
+      jsonMode: true,
+      traceId,
+    });
+    if (llmResult.parsed) {
+      // Merge heuristic + LLM signals (dedup)
+      const llmSignals = (llmResult.parsed.pain_signals as PainSignal[]) ?? [];
+      const merged = mergeSignals(heuristicSignals, llmSignals);
+      return {
+        painSignals: merged,
+        serviceMatch: String(llmResult.parsed.service_match ?? "NONE"),
+        matchConfidence: Number(llmResult.parsed.match_confidence ?? 0),
+        reasoning: String(llmResult.parsed.reasoning ?? ""),
+        source: "combined",
+      };
+    }
+  } catch (err) {
+    logger.warn({ companyName, err }, "LLM pain detection failed — using heuristic only");
+  }
+  // ── Fallback: heuristic-only result ────────────────────────
+  return {
+    painSignals: heuristicSignals,
+    serviceMatch: inferServiceFromSignals(heuristicSignals, industry),
+    matchConfidence: heuristicSignals.length >= 3 ? 0.7 : 0.4,
+    reasoning: `Heuristic-only: ${heuristicSignals.length} pain signals detected`,
+    source: "heuristic",
+  };
+}
+function runHeuristicScan(text: string, html: string): PainSignal[] {
+  const signals: PainSignal[] = [];
+  const seen = new Set<string>();
+  // Pattern-based detection
+  for (const rule of HEURISTIC_RULES) {
+    if (rule.pattern.test(text) && !seen.has(rule.signal)) {
+      seen.add(rule.signal);
+      signals.push({
+        signal: rule.signal,
+        evidence: `Pattern matched in website text`,
+        severity: rule.severity,
+      });
+    }
+  }
+  // Absence-based detection (what's NOT on the site)
+  for (const check of ABSENCE_SIGNALS) {
+    if (check.check(html) && !seen.has(check.signal)) {
+      seen.add(check.signal);
+      signals.push({
+        signal: check.signal,
+        evidence: "Not detected in page source",
+        severity: check.severity,
+      });
+    }
+  }
+  return signals;
+}
+function mergeSignals(heuristic: PainSignal[], llm: PainSignal[]): PainSignal[] {
+  const merged = [...heuristic];
+  const existing = new Set(heuristic.map(s => s.signal));
+  for (const signal of llm) {
+    if (!existing.has(signal.signal)) {
+      merged.push(signal);
+    }
+  }
+  // Sort by severity: high → medium → low
+  const severityOrder = { high: 0, medium: 1, low: 2 };
+  return merged.sort((a, b) => severityOrder[a.severity] - severityOrder[b.severity]);
+}
+/**
+ * Deterministic service inference from signals (fallback when LLM fails).
+ */
+function inferServiceFromSignals(signals: PainSignal[], industry: string): string | null {
+  const signalNames = signals.map(s => s.signal);
+  // Receptionist signals
+  const receptionistSignals = ["phone_handling_manual", "phone_number_prominent",
+    "manual_appointment_booking", "human_receptionist_mentioned", "limited_availability_hours",
+    "no_online_scheduling_tool"];
+  const receptionistCount = signalNames.filter(s => receptionistSignals.includes(s)).length;
+  // Support signals
+  const supportSignals = ["manual_contact_process", "manual_ticket_system",
+    "faq_exists_no_chatbot", "email_only_support", "no_chatbot_detected"];
+  const supportCount = signalNames.filter(s => supportSignals.includes(s)).length;
+  // Data signals
+  const dataSignals = ["manual_data_processing", "legacy_system_mentioned",
+    "compliance_reporting_burden"];
+  const dataCount = signalNames.filter(s => dataSignals.includes(s)).length;
+  const max = Math.max(receptionistCount, supportCount, dataCount);
+  if (max < 2) return null;
+  if (receptionistCount === max) return "AI Receptionist";
+  if (supportCount === max) return "AI Customer Support";
+  if (dataCount === max) return "AI Data Processing";
+  return null;
+}

src/discovery/lib/rotation.ts ADDED Viewed

	@@ -0,0 +1,114 @@

+import { getSupabaseClient } from "../../shared/supabase/client";
+import { logger } from "../../shared/utils/logger";
+// Week number → region mapping
+const ROTATION_MAP: Record<number, string> = {
+  1: "US",
+  2: "UK",
+  3: "AU",
+  4: "UAE",
+};
+export interface RotationInfo {
+  weekNumber: number;
+  region: string;
+  rotationId: string;
+}
+/**
+ * Gets the current rotation region and advances the week counter.
+ * Rotation cycles: US → UK → AU → UAE → US → ...
+ */
+export async function getCurrentRotation(): Promise<RotationInfo> {
+  const db = getSupabaseClient();
+  // Get the latest rotation record
+  const { data: latest } = await db
+    .from("rotation_state")
+    .select("*")
+    .order("started_at", { ascending: false })
+    .limit(1)
+    .single();
+  const currentWeek = latest?.week_number ?? 1;
+  const region = ROTATION_MAP[currentWeek] ?? "US";
+  logger.info({ currentWeek, region }, "Rotation: current region");
+  return {
+    weekNumber: currentWeek,
+    region,
+    rotationId: latest?.id ?? "unknown",
+  };
+}
+/**
+ * Creates a new rotation record for the next week.
+ * Call this at the END of a successful run.
+ */
+export async function advanceRotation(currentWeek: number): Promise<void> {
+  const db = getSupabaseClient();
+  const nextWeek = currentWeek >= 4 ? 1 : currentWeek + 1;
+  const nextRegion = ROTATION_MAP[nextWeek];
+  const { error } = await db.from("rotation_state").insert({
+    week_number: nextWeek,
+    region: nextRegion,
+  });
+  if (error) {
+    logger.error({ error }, "Failed to advance rotation");
+  } else {
+    logger.info({ nextWeek, nextRegion }, "Rotation: advanced to next region");
+  }
+}
+/**
+ * Marks the current rotation run as completed with stats.
+ */
+export async function completeRotation(
+  rotationId: string,
+  companiesFound: number,
+  leadsQualified: number
+): Promise<void> {
+  const db = getSupabaseClient();
+  await db
+    .from("rotation_state")
+    .update({ completed_at: new Date().toISOString(), companies_found: companiesFound, leads_qualified: leadsQualified })
+    .eq("id", rotationId);
+  logger.info({ rotationId, companiesFound, leadsQualified }, "Rotation: completed");
+}
+/**
+ * Converts a region code to ICP geography + search labels.
+ */
+export function getRegionConfig(region: string): {
+  countryCode: string;
+  searchLabel: string;
+  industries: string[];
+} {
+  const configs: Record<string, { countryCode: string; searchLabel: string; industries: string[] }> = {
+    US: {
+      countryCode: "US",
+      searchLabel: "United States",
+      industries: ["technology", "manufacturing", "logistics", "healthcare", "finance", "retail_tech"],
+    },
+    UK: {
+      countryCode: "GB",
+      searchLabel: "United Kingdom",
+      industries: ["technology", "finance", "logistics", "professional_services", "manufacturing"],
+    },
+    AU: {
+      countryCode: "AU",
+      searchLabel: "Australia",
+      industries: ["technology", "mining_tech", "agri_tech", "finance", "healthcare"],
+    },
+    UAE: {
+      countryCode: "AE",
+      searchLabel: "Dubai UAE",
+      industries: ["technology", "logistics", "real_estate_tech", "finance", "retail"],
+    },
+  };
+  return configs[region] ?? configs["US"];
+}

src/discovery/lib/social-finder.ts ADDED Viewed

	@@ -0,0 +1,202 @@

+/**
+ * Social Profile Finder
+ *
+ * Finds company + decision-maker social profiles:
+ * - Instagram (business account)
+ * - Facebook (business page)
+ * - Twitter/X
+ * - YouTube
+ *
+ * Two sources:
+ * 1. Website footer/header scraping (most reliable)
+ * 2. Google search fallback
+ *
+ * Phase 2 uses these for multi-channel outreach.
+ */
+import { chromium } from "playwright";
+import { playwrightLimiter } from "../../shared/utils/rate-limiter";
+import { serperLimiter } from "../../shared/utils/rate-limiter";
+import { logger } from "../../shared/utils/logger";
+import axios from "axios";
+import { getEnv } from "../../shared/config/env";
+export interface SocialProfiles {
+  instagram: string | null;
+  facebook: string | null;
+  twitter: string | null;
+  youtube: string | null;
+  source: "website" | "google" | "mixed";
+}
+/**
+ * Find all social profiles for a company.
+ * Method 1 first (website scrape), then Google fills gaps.
+ */
+export async function findSocialProfiles(
+  domain: string,
+  companyName: string,
+  websiteHtml?: string
+): Promise<SocialProfiles> {
+  const profiles: SocialProfiles = {
+    instagram: null,
+    facebook: null,
+    twitter: null,
+    youtube: null,
+    source: "website",
+  };
+  // ── Method 1: Extract from website HTML ────────────────────
+  if (websiteHtml) {
+    extractFromHtml(websiteHtml, profiles);
+  } else {
+    // Scrape website specifically for social links
+    await scrapeWebsiteForSocials(domain, profiles);
+  }
+  // ── Method 2: Google search for missing profiles ───────────
+  const missing = getMissing(profiles);
+  if (missing.length > 0) {
+    await searchGoogleForSocials(companyName, domain, profiles, missing);
+    if (profiles.source === "website" && missing.some(p => profiles[p as keyof SocialProfiles])) {
+      profiles.source = "mixed";
+    }
+  }
+  const found = [profiles.instagram, profiles.facebook, profiles.twitter, profiles.youtube]
+    .filter(Boolean).length;
+  logger.info({ domain, found }, "Social profiles discovered");
+  return profiles;
+}
+// ─── Method 1: HTML extraction ──────────────────────────────
+const SOCIAL_PATTERNS = {
+  instagram: /https?:\/\/(www\.)?instagram\.com\/[a-zA-Z0-9._]+/gi,
+  facebook: /https?:\/\/(www\.)?(facebook|fb)\.com\/[a-zA-Z0-9.]+/gi,
+  twitter: /https?:\/\/(www\.)?(twitter|x)\.com\/[a-zA-Z0-9_]+/gi,
+  youtube: /https?:\/\/(www\.)?youtube\.com\/(channel|c|@)[\/a-zA-Z0-9._-]+/gi,
+};
+function extractFromHtml(html: string, profiles: SocialProfiles): void {
+  for (const [platform, pattern] of Object.entries(SOCIAL_PATTERNS)) {
+    const matches = html.match(pattern);
+    if (matches && matches.length > 0) {
+      // Take first match, clean it
+      const url = cleanSocialUrl(matches[0], platform);
+      if (url && !isGenericSocial(url)) {
+        (profiles as Record<string, unknown>)[platform] = url;
+      }
+    }
+  }
+}
+// ─── Website scrape (if HTML not already available) ──────────
+async function scrapeWebsiteForSocials(domain: string, profiles: SocialProfiles): Promise<void> {
+  try {
+    await playwrightLimiter.consume("playwright");
+    const browser = await chromium.launch({ headless: true, args: ["--no-sandbox"] });
+    const context = await browser.newContext({
+      userAgent: "Mozilla/5.0 (compatible; ResearchBot/1.0)",
+    });
+    const page = await context.newPage();
+    await page.goto(`https://${domain}`, { waitUntil: "domcontentloaded", timeout: 12_000 });
+    // Get all link hrefs on the page
+    const links = await page.$$eval("a[href]", (anchors) =>
+      anchors.map((a) => a.getAttribute("href") ?? "")
+    );
+    const pageHtml = links.join("\n");
+    extractFromHtml(pageHtml, profiles);
+    await page.close();
+    await context.close();
+    await browser.close();
+  } catch (err) {
+    logger.debug({ domain, err }, "Social scrape failed — trying Google");
+  }
+}
+// ─── Method 2: Google search ────────────────────────────────
+async function searchGoogleForSocials(
+  companyName: string,
+  domain: string,
+  profiles: SocialProfiles,
+  missing: string[]
+): Promise<void> {
+  const searchMap: Record<string, string> = {
+    instagram: `"${companyName}" site:instagram.com`,
+    facebook: `"${companyName}" site:facebook.com`,
+    twitter: `"${companyName}" site:twitter.com OR site:x.com`,
+    youtube: `"${companyName}" site:youtube.com`,
+  };
+  for (const platform of missing) {
+    try {
+      await serperLimiter.consume("serper");
+      const env = getEnv();
+      const response = await axios.post(
+        "https://google.serper.dev/search",
+        { q: searchMap[platform], num: 3 },
+        {
+          headers: {
+            "X-API-KEY": env.SERPER_API_KEY,
+            "Content-Type": "application/json",
+          },
+          timeout: 6_000,
+        }
+      );
+      const organic = response.data?.organic ?? [];
+      for (const result of organic) {
+        const url = cleanSocialUrl(result.link, platform);
+        if (url && !isGenericSocial(url)) {
+          // Verify it mentions company name or domain in snippet
+          const snippet = (result.snippet ?? "").toLowerCase();
+          const title = (result.title ?? "").toLowerCase();
+          const combined = `${snippet} ${title}`;
+          const companyWords = companyName.toLowerCase().split(/\s+/);
+          const hasCompany = companyWords.some(w => w.length > 3 && combined.includes(w));
+          if (hasCompany || combined.includes(domain.replace(/\.\w+$/, ""))) {
+            (profiles as Record<string, unknown>)[platform] = url;
+            break;
+          }
+        }
+      }
+    } catch (err) {
+      logger.debug({ platform, err }, "Social Google search failed — skipping");
+    }
+  }
+}
+// ─── Helpers ─────────────────────────────────────────────────
+function getMissing(profiles: SocialProfiles): string[] {
+  return ["instagram", "facebook", "twitter", "youtube"]
+    .filter(p => !(profiles as Record<string, unknown>)[p]);
+}
+function cleanSocialUrl(url: string, platform: string): string | null {
+  try {
+    const parsed = new URL(url);
+    // Remove query params and fragments
+    return `${parsed.protocol}//${parsed.hostname}${parsed.pathname.replace(/\/$/, "")}`;
+  } catch {
+    return null;
+  }
+}
+function isGenericSocial(url: string): boolean {
+  // Filter out generic profile links (not actual company pages)
+  const genericPaths = ["/share", "/sharer", "/login", "/signup", "/help", "/about", "/policies"];
+  return genericPaths.some(p => url.includes(p));
+}

src/discovery/lib/territory-manager.ts ADDED Viewed

	@@ -0,0 +1,259 @@

+/**
+ * Territory Manager
+ *
+ * Controls: which city, which industry, on which day.
+ * Prevents: re-searching same city+industry within 30 days.
+ * Tracks: daily progression, checkpoint for resume.
+ *
+ * Daily flow:
+ *   1. Load current position (city + industry)
+ *   2. Check if already searched recently (30-day window)
+ *   3. If fresh → search → advance pointer
+ *   4. If stale → skip to next fresh combo
+ *   5. Save position for tomorrow
+ */
+import { getSupabaseClient } from "../../shared/supabase/client";
+import { logger } from "../../shared/utils/logger";
+export interface TerritoryUnit {
+  territoryId: string;
+  country: string;
+  countryCode: string;
+  city: string;
+  industry: string;
+  timezone: string;
+  tier: number;
+}
+export interface TerritoryPosition {
+  countryCode: string;
+  cityIndex: number;
+  industryIndex: number;
+}
+// Industries to search (per territory cycle)
+const INDUSTRY_LIST = [
+  "dental", "medical", "veterinary", "legal", "salon", "spa",    // service businesses (AI Receptionist)
+  "ecommerce", "saas", "retail", "hospitality",                   // support-heavy (AI Support)
+  "manufacturing", "logistics", "finance", "healthcare",          // data-heavy (AI Data Processing)
+  "technology", "consulting", "recruitment", "insurance",          // sales-heavy (AI Sales Automation)
+];
+/**
+ * Get the next territory unit to search today.
+ * Respects 30-day cooldown and daily quota.
+ */
+export async function getNextTerritory(quota: number): Promise<TerritoryUnit[]> {
+  const db = getSupabaseClient();
+  const units: TerritoryUnit[] = [];
+  // Load current position from system_config
+  const { data: configData } = await db
+    .from("system_config")
+    .select("value")
+    .eq("key", "current_territory")
+    .single();
+  const position: TerritoryPosition = configData?.value ?? {
+    countryCode: "US",
+    cityIndex: 0,
+    industryIndex: 0,
+  };
+  // Load all cities ordered by tier (major cities first)
+  const { data: cities } = await db
+    .from("territory_grid")
+    .select("*")
+    .eq("is_active", true)
+    .order("tier", { ascending: true })
+    .order("city", { ascending: true });
+  if (!cities?.length) {
+    logger.error("No active territories found in territory_grid");
+    return [];
+  }
+  // Start from current position
+  let cityIdx = position.cityIndex;
+  let industryIdx = position.industryIndex;
+  let searched = 0;
+  // Keep finding fresh territory units until quota is met
+  // (estimated: each unit produces ~2-3 qualified leads)
+  const unitsNeeded = Math.ceil(quota / 2);
+  let attempts = 0;
+  const maxAttempts = cities.length * INDUSTRY_LIST.length; // prevent infinite loop
+  while (units.length < unitsNeeded && attempts < maxAttempts) {
+    attempts++;
+    const city = cities[cityIdx % cities.length];
+    const industry = INDUSTRY_LIST[industryIdx % INDUSTRY_LIST.length];
+    // Check 30-day cooldown
+    const isFresh = await isTerritoryFresh(city.id, industry);
+    if (isFresh) {
+      units.push({
+        territoryId: city.id,
+        country: city.country,
+        countryCode: city.country_code,
+        city: city.city,
+        industry,
+        timezone: city.timezone ?? "UTC",
+        tier: city.tier,
+      });
+    }
+    // Advance: next industry, or wrap to next city
+    industryIdx++;
+    if (industryIdx >= INDUSTRY_LIST.length) {
+      industryIdx = 0;
+      cityIdx++;
+    }
+  }
+  // Save new position for tomorrow
+  await db.from("system_config").upsert({
+    key: "current_territory",
+    value: {
+      countryCode: cities[cityIdx % cities.length]?.country_code ?? "US",
+      cityIndex: cityIdx % cities.length,
+      industryIndex: industryIdx % INDUSTRY_LIST.length,
+    },
+    updated_by: "system",
+    updated_at: new Date().toISOString(),
+  });
+  logger.info({
+    unitsFound: units.length,
+    firstCity: units[0]?.city,
+    firstIndustry: units[0]?.industry,
+    attempts,
+  }, "Territory units selected for today");
+  return units;
+}
+/**
+ * Check if a city+industry combo is fresh (not searched in 30 days).
+ */
+async function isTerritoryFresh(territoryId: string, industry: string): Promise<boolean> {
+  const db = getSupabaseClient();
+  const { data } = await db
+    .from("territory_progress")
+    .select("next_eligible_at")
+    .eq("territory_id", territoryId)
+    .eq("industry", industry)
+    .maybeSingle();
+  if (!data) return true; // never searched → fresh
+  const eligible = new Date(data.next_eligible_at);
+  return new Date() >= eligible;
+}
+/**
+ * Mark a territory unit as searched (sets 30-day cooldown).
+ */
+export async function markTerritorySearched(
+  territoryId: string,
+  industry: string,
+  leadsFound: number
+): Promise<void> {
+  const db = getSupabaseClient();
+  const now = new Date();
+  const nextEligible = new Date(now.getTime() + 30 * 24 * 60 * 60 * 1000); // +30 days
+  await db.from("territory_progress").upsert({
+    territory_id: territoryId,
+    industry,
+    last_run_at: now.toISOString(),
+    next_eligible_at: nextEligible.toISOString(),
+    total_leads: leadsFound,
+  }, { onConflict: "territory_id,industry" });
+}
+/**
+ * Get today's lead quota (default or override).
+ */
+export async function getDailyQuota(): Promise<number> {
+  const db = getSupabaseClient();
+  const { data } = await db
+    .from("system_config")
+    .select("value")
+    .eq("key", "daily_quota")
+    .single();
+  const config = data?.value as { default: number; today_override: number | null } | null;
+  if (config?.today_override !== null && config?.today_override !== undefined) {
+    // Clear override after reading (one-time use)
+    await db.from("system_config").update({
+      value: { ...config, today_override: null },
+      updated_at: new Date().toISOString(),
+    }).eq("key", "daily_quota");
+    return config.today_override;
+  }
+  return config?.default ?? 10;
+}
+/**
+ * Set today's quota override (from Slack command).
+ */
+export async function setQuotaOverride(quota: number, permanent = false): Promise<void> {
+  const db = getSupabaseClient();
+  if (permanent) {
+    await db.from("system_config").update({
+      value: { default: quota, today_override: null },
+      updated_by: "slack",
+      updated_at: new Date().toISOString(),
+    }).eq("key", "daily_quota");
+  } else {
+    const { data } = await db
+      .from("system_config")
+      .select("value")
+      .eq("key", "daily_quota")
+      .single();
+    const current = data?.value as { default: number } | null;
+    await db.from("system_config").update({
+      value: { default: current?.default ?? 10, today_override: quota },
+      updated_by: "slack",
+      updated_at: new Date().toISOString(),
+    }).eq("key", "daily_quota");
+  }
+}
+/**
+ * Check if system is paused.
+ */
+export async function isSystemPaused(): Promise<boolean> {
+  const db = getSupabaseClient();
+  const { data } = await db
+    .from("system_config")
+    .select("value")
+    .eq("key", "auto_mode")
+    .single();
+  return (data?.value as { paused?: boolean })?.paused === true;
+}
+/**
+ * Build Google search queries for a territory unit.
+ * Generates 3-4 targeted queries per city+industry.
+ */
+export function buildTerritoryQueries(unit: TerritoryUnit, keywords: string[]): string[] {
+  return [
+    `"${unit.industry}" company "${unit.city}" "${unit.country}" -job -careers`,
+    `best ${unit.industry} companies in ${unit.city} ${unit.country}`,
+    `"${unit.industry}" business "${unit.city}" "${keywords[0] ?? ""}" site:linkedin.com/company`,
+    `top ${unit.industry} ${unit.city} companies ${new Date().getFullYear()}`,
+  ].filter(q => q.trim().length > 10);
+}

src/discovery/lib/web-scraper.ts ADDED Viewed

	@@ -0,0 +1,225 @@

+import { chromium, Browser, BrowserContext } from "playwright";
+import { playwrightLimiter } from "../../shared/utils/rate-limiter";
+import { logger } from "../../shared/utils/logger";
+// ─── Types ───────────────────────────────────────────────────
+export interface ScrapedCompany {
+  domain: string;
+  name: string | null;
+  description: string | null;
+  employeeRange: string | null;
+  employeeCount: number | null;
+  industry: string | null;
+  country: string | null;
+  linkedinUrl: string | null;
+  techStack: string[];
+  jobPostings: JobPosting[];
+  recentNews: string[];
+  websiteText: string;
+  html: string;           // raw HTML for pain signal detection
+  text: string;           // alias for websiteText (used by auto-discovery)
+  aiJobCount: number;     // count of AI-related job postings
+}
+export interface JobPosting {
+  title: string;
+  url: string;
+  hasAiSignal: boolean;
+}
+// ─── AI signal keywords ──────────────────────────────────────
+const AI_KEYWORDS = [
+  "automation", "artificial intelligence", "machine learning", "ai", "llm",
+  "workflow automation", "robotic process", "rpa", "data pipeline",
+  "digital transformation", "predictive analytics", "nlp",
+];
+const TECH_STACK_SIGNALS = [
+  "salesforce", "hubspot", "sap", "oracle", "dynamics", "zendesk",
+  "servicenow", "workday", "netsuite", "quickbooks", "zoho",
+  "slack", "jira", "notion", "monday.com", "asana",
+];
+// ─── Browser singleton ───────────────────────────────────────
+let _browser: Browser | null = null;
+async function getBrowser(): Promise<Browser> {
+  if (!_browser || !_browser.isConnected()) {
+    _browser = await chromium.launch({
+      headless: true,
+      args: ["--no-sandbox", "--disable-setuid-sandbox", "--disable-dev-shm-usage"],
+    });
+  }
+  return _browser;
+}
+export async function closeBrowser(): Promise<void> {
+  if (_browser) {
+    await _browser.close();
+    _browser = null;
+  }
+}
+// ─── Main scraper ─────────────────────────────────────────────
+/**
+ * Scrapes a company website for ICP-relevant signals.
+ * Respects rate limits and robots.txt awareness (no sitemap abuse).
+ */
+export async function scrapeCompanyWebsite(domain: string): Promise<ScrapedCompany> {
+  await playwrightLimiter.consume("playwright");
+  const browser = await getBrowser();
+  const context = await browser.newContext({
+    userAgent:
+      "Mozilla/5.0 (compatible; ResearchBot/1.0; +https://youragency.com/bot)",
+    extraHTTPHeaders: { "Accept-Language": "en-US,en;q=0.9" },
+  });
+  const result: ScrapedCompany = {
+    domain,
+    name: null,
+    description: null,
+    employeeRange: null,
+    employeeCount: null,
+    industry: null,
+    country: null,
+    linkedinUrl: null,
+    techStack: [],
+    jobPostings: [],
+    recentNews: [],
+    websiteText: "",
+    html: "",
+    text: "",
+    aiJobCount: 0,
+  };
+  try {
+    // ── Homepage ─────────────────────────────────────────────
+    const homePage = await context.newPage();
+    await homePage.goto(`https://${domain}`, {
+      waitUntil: "domcontentloaded",
+      timeout: 15_000,
+    });
+    const homeText = await homePage.evaluate(() => document.body.innerText);
+    result.websiteText = homeText.slice(0, 3000);
+    result.text = result.websiteText; // alias
+    // Extract company name from title tag
+    result.name = await homePage.title().then((t) =>
+      t.split("|")[0].split("-")[0].trim()
+    );
+    // Find LinkedIn link on homepage
+    const linkedinHref = await homePage
+      .$eval('a[href*="linkedin.com/company"]', (el) => el.getAttribute("href"))
+      .catch(() => null);
+    result.linkedinUrl = linkedinHref ?? null;
+    // Tech stack detection from script/link tags
+    const pageSource = await homePage.content();
+    result.techStack = detectTechStack(pageSource);
+    result.html = pageSource.slice(0, 10000); // raw HTML for pain detection
+    await homePage.close();
+    // ── About Page ───────────────────────────────────────────
+    const aboutPage = await context.newPage();
+    const aboutUrl = `https://${domain}/about`;
+    try {
+      await aboutPage.goto(aboutUrl, { waitUntil: "domcontentloaded", timeout: 10_000 });
+      const aboutText = await aboutPage.evaluate(() => document.body.innerText);
+      result.description = extractDescription(aboutText);
+      const empMatch = aboutText.match(/(\d[\d,]*)\s*(employees?|people|team members?|staff)/i);
+      if (empMatch) {
+        result.employeeCount = parseInt(empMatch[1].replace(/,/g, ""), 10);
+      }
+    } catch {
+      // About page not found — that's fine
+    } finally {
+      await aboutPage.close();
+    }
+    // ── Jobs Page ────────────────────────────────────────────
+    const jobsPage = await context.newPage();
+    const jobsUrls = [
+      `https://${domain}/careers`,
+      `https://${domain}/jobs`,
+      `https://${domain}/work-with-us`,
+    ];
+    for (const jobUrl of jobsUrls) {
+      try {
+        await jobsPage.goto(jobUrl, { waitUntil: "domcontentloaded", timeout: 10_000 });
+        const jobsText = await jobsPage.evaluate(() => document.body.innerText);
+        result.jobPostings = extractJobPostings(jobsText, jobUrl);
+        if (result.jobPostings.length) break;
+      } catch {
+        // Try next URL
+      }
+    }
+    await jobsPage.close();
+    result.aiJobCount = result.jobPostings.filter(j => j.hasAiSignal).length;
+    logger.info({ domain, techStack: result.techStack.length, jobs: result.jobPostings.length },
+      "Website scraped successfully"
+    );
+  } catch (err) {
+    logger.warn({ domain, err }, "Website scrape partial failure");
+  } finally {
+    await context.close();
+  }
+  return result;
+}
+// ─── Helpers ─────────────────────────────────────────────────
+function detectTechStack(html: string): string[] {
+  const found: string[] = [];
+  const lower = html.toLowerCase();
+  for (const tech of TECH_STACK_SIGNALS) {
+    if (lower.includes(tech)) found.push(tech);
+  }
+  return [...new Set(found)];
+}
+function extractDescription(text: string): string {
+  // Take first 3 meaningful sentences
+  const sentences = text
+    .replace(/\n+/g, " ")
+    .split(/(?<=[.!?])\s+/)
+    .filter((s) => s.length > 30 && s.length < 300);
+  return sentences.slice(0, 3).join(" ");
+}
+function extractJobPostings(text: string, sourceUrl: string): JobPosting[] {
+  const lines = text.split("\n").filter((l) => l.trim().length > 5);
+  const postings: JobPosting[] = [];
+  for (const line of lines.slice(0, 30)) {
+    const lower = line.toLowerCase();
+    const hasAiSignal = AI_KEYWORDS.some((kw) => lower.includes(kw));
+    // Heuristic: job titles are usually 2-6 words
+    const wordCount = line.trim().split(/\s+/).length;
+    if (wordCount >= 2 && wordCount <= 8) {
+      postings.push({ title: line.trim(), url: sourceUrl, hasAiSignal });
+    }
+  }
+  return postings.slice(0, 15);
+}
+export function hasAiSignals(company: ScrapedCompany): boolean {
+  const aiJobs = company.jobPostings.filter((j) => j.hasAiSignal).length;
+  const websiteHasAi = AI_KEYWORDS.some((kw) =>
+    company.websiteText.toLowerCase().includes(kw)
+  );
+  return aiJobs > 0 || websiteHasAi;
+}

src/discovery/providers/hunter.ts ADDED Viewed

	@@ -0,0 +1,155 @@

+import axios from "axios";
+import { getEnv } from "../../shared/config/env";
+import { withRetry, isCircuitOpen, recordFailure, recordSuccess } from "../../shared/utils/retry";
+import { hunterLimiter } from "../../shared/utils/rate-limiter";
+import { logger } from "../../shared/utils/logger";
+const PROVIDER = "hunter";
+export interface HunterEmailResult {
+  email: string | null;
+  score: number;       // Hunter confidence 0-100
+  source: "hunter";
+  firstName: string | null;
+  lastName: string | null;
+}
+/**
+ * Finds a professional email address using Hunter.io.
+ * Falls through to pattern generation if not found.
+ */
+export async function findEmail(
+  domain: string,
+  firstName: string,
+  lastName: string
+): Promise<HunterEmailResult | null> {
+  if (isCircuitOpen(PROVIDER)) return null;
+  await hunterLimiter.consume(PROVIDER);
+  try {
+    const result = await withRetry(
+      () => callHunterEmailFinder(domain, firstName, lastName),
+      { provider: PROVIDER }
+    );
+    recordSuccess(PROVIDER);
+    return result;
+  } catch (err) {
+    recordFailure(PROVIDER);
+    logger.warn({ domain, err }, "Hunter email find failed — will try pattern generation");
+    return null;
+  }
+}
+/**
+ * Searches all known emails for a domain (domain search).
+ */
+export async function searchDomain(domain: string): Promise<HunterEmailResult[]> {
+  if (isCircuitOpen(PROVIDER)) return [];
+  await hunterLimiter.consume(PROVIDER);
+  try {
+    const result = await withRetry(
+      () => callHunterDomainSearch(domain),
+      { provider: PROVIDER }
+    );
+    recordSuccess(PROVIDER);
+    return result;
+  } catch (err) {
+    recordFailure(PROVIDER);
+    logger.warn({ domain, err }, "Hunter domain search failed");
+    return [];
+  }
+}
+async function callHunterEmailFinder(
+  domain: string,
+  firstName: string,
+  lastName: string
+): Promise<HunterEmailResult | null> {
+  const env = getEnv();
+  const response = await axios.get("https://api.hunter.io/v2/email-finder", {
+    params: {
+      domain,
+      first_name: firstName,
+      last_name: lastName,
+      api_key: env.HUNTER_API_KEY,
+    },
+    timeout: 8_000,
+  });
+  const data = response.data?.data;
+  if (!data?.email) return null;
+  return {
+    email: data.email,
+    score: data.score ?? 0,
+    source: "hunter",
+    firstName: data.first_name ?? null,
+    lastName: data.last_name ?? null,
+  };
+}
+async function callHunterDomainSearch(domain: string): Promise<HunterEmailResult[]> {
+  const env = getEnv();
+  const response = await axios.get("https://api.hunter.io/v2/domain-search", {
+    params: { domain, api_key: env.HUNTER_API_KEY, limit: 10 },
+    timeout: 8_000,
+  });
+  const emails = response.data?.data?.emails ?? [];
+  return emails
+    .filter((e: { type: string }) => e.type === "professional")
+    .map((e: { value: string; confidence: number; first_name: string; last_name: string }) => ({
+      email: e.value,
+      score: e.confidence,
+      source: "hunter" as const,
+      firstName: e.first_name ?? null,
+      lastName: e.last_name ?? null,
+    }));
+}
+// ─── Aliases for contact-enricher.ts compatibility ──────────
+export type HunterContact = {
+  value: string;        // email
+  first_name: string | null;
+  last_name: string | null;
+  position: string | null;
+  seniority: string | null;
+  confidence: number;
+};
+/**
+ * Search for contacts at a domain — used by contact-enricher.
+ * Maps Hunter's domain-search response to HunterContact format.
+ */
+export async function searchHunterContacts(domain: string): Promise<HunterContact[]> {
+  if (isCircuitOpen(PROVIDER)) return [];
+  await hunterLimiter.consume(PROVIDER);
+  try {
+    const env = getEnv();
+    const response = await axios.get("https://api.hunter.io/v2/domain-search", {
+      params: { domain, api_key: env.HUNTER_API_KEY, limit: 10 },
+      timeout: 8_000,
+    });
+    recordSuccess(PROVIDER);
+    const emails = response.data?.data?.emails ?? [];
+    return emails.map((e: Record<string, unknown>) => ({
+      value: (e.value as string) ?? "",
+      first_name: (e.first_name as string) ?? null,
+      last_name: (e.last_name as string) ?? null,
+      position: (e.position as string) ?? null,
+      seniority: (e.seniority as string) ?? null,
+      confidence: (e.confidence as number) ?? 0,
+    }));
+  } catch (err) {
+    recordFailure(PROVIDER);
+    logger.warn({ domain, err }, "Hunter domain search failed");
+    return [];
+  }
+}

src/discovery/providers/reoon.ts ADDED Viewed

	@@ -0,0 +1,108 @@

+import axios from "axios";
+import dns from "dns/promises";
+import { getEnv } from "../../shared/config/env";
+import { withRetry, isCircuitOpen, recordFailure, recordSuccess } from "../../shared/utils/retry";
+import { reoonLimiter } from "../../shared/utils/rate-limiter";
+import { logger } from "../../shared/utils/logger";
+const PROVIDER = "reoon";
+export type VerifyResult = "valid" | "invalid" | "catch_all" | "unknown";
+export interface EmailVerification {
+  email: string;
+  result: VerifyResult;
+  isDeliverable: boolean;
+  isCatchAll: boolean;
+  mxFound: boolean;
+}
+/**
+ * Verifies email deliverability via Reoon API with MX record fallback.
+ * Order: Reoon API → local MX check → pattern heuristic
+ */
+export async function verifyEmail(email: string): Promise<EmailVerification> {
+  const domain = email.split("@")[1];
+  if (!domain) return makeResult(email, "invalid", false, false, false);
+  // Try Reoon API first
+  if (!isCircuitOpen(PROVIDER)) {
+    await reoonLimiter.consume(PROVIDER);
+    try {
+      const result = await withRetry(() => callReoon(email), { provider: PROVIDER });
+      recordSuccess(PROVIDER);
+      return result;
+    } catch (err) {
+      recordFailure(PROVIDER);
+      logger.warn({ email, err }, "Reoon verify failed — falling back to MX check");
+    }
+  }
+  // Fallback: local MX record check
+  return mxFallback(email, domain);
+}
+async function callReoon(email: string): Promise<EmailVerification> {
+  const env = getEnv();
+  const response = await axios.get("https://emailverifier.reoon.com/api/v1/verify", {
+    params: { email, key: env.REOON_API_KEY, mode: "quick" },
+    timeout: 10_000,
+  });
+  const data = response.data;
+  const result: VerifyResult =
+    data.status === "valid"
+      ? "valid"
+      : data.status === "catch_all"
+      ? "catch_all"
+      : "invalid";
+  return makeResult(
+    email,
+    result,
+    data.is_deliverable ?? result === "valid",
+    data.is_catch_all ?? false,
+    data.has_mx_record ?? true
+  );
+}
+async function mxFallback(email: string, domain: string): Promise<EmailVerification> {
+  try {
+    const records = await dns.resolveMx(domain);
+    const mxFound = records.length > 0;
+    return makeResult(email, mxFound ? "catch_all" : "invalid", mxFound, mxFound, mxFound);
+  } catch {
+    return makeResult(email, "unknown", false, false, false);
+  }
+}
+function makeResult(
+  email: string,
+  result: VerifyResult,
+  isDeliverable: boolean,
+  isCatchAll: boolean,
+  mxFound: boolean
+): EmailVerification {
+  return { email, result, isDeliverable, isCatchAll, mxFound };
+}
+/**
+ * Generates email pattern candidates for a name + domain.
+ * Returns ordered list from most to least common pattern.
+ */
+export function generateEmailPatterns(
+  firstName: string,
+  lastName: string,
+  domain: string
+): string[] {
+  const f = firstName.toLowerCase().replace(/[^a-z]/g, "");
+  const l = lastName.toLowerCase().replace(/[^a-z]/g, "");
+  return [
+    `${f}.${l}@${domain}`,
+    `${f}${l}@${domain}`,
+    `${f[0]}${l}@${domain}`,
+    `${f}@${domain}`,
+    `${f[0]}.${l}@${domain}`,
+    `${l}.${f}@${domain}`,
+  ].filter(Boolean);
+}

src/discovery/providers/serper.ts ADDED Viewed

	@@ -0,0 +1,108 @@

+import axios from "axios";
+import { getEnv } from "../../shared/config/env";
+import { withRetry, isCircuitOpen, recordFailure, recordSuccess } from "../../shared/utils/retry";
+import { serperLimiter } from "../../shared/utils/rate-limiter";
+import { logger } from "../../shared/utils/logger";
+const PROVIDER = "serper";
+export interface SerperResult {
+  title: string;
+  link: string;
+  snippet: string;
+  domain: string;
+}
+/**
+ * Searches Google via Serper.dev API.
+ * Builds targeted queries to find companies matching ICP in a given region.
+ */
+export async function searchCompanies(
+  region: string,
+  industry: string,
+  keywords: string[],
+  page = 1
+): Promise<SerperResult[]> {
+  if (isCircuitOpen(PROVIDER)) {
+    logger.warn({ provider: PROVIDER }, "Circuit open — skipping Serper call");
+    return [];
+  }
+  await serperLimiter.consume(PROVIDER);
+  const queries = buildQueries(region, industry, keywords);
+  const results: SerperResult[] = [];
+  for (const query of queries) {
+    try {
+      const data = await withRetry(
+        () => callSerper(query, page),
+        { provider: PROVIDER }
+      );
+      results.push(...data);
+      recordSuccess(PROVIDER);
+    } catch (err) {
+      recordFailure(PROVIDER);
+      logger.error({ query, err }, "Serper search failed");
+    }
+  }
+  // Deduplicate by domain
+  const seen = new Set<string>();
+  return results.filter((r) => {
+    if (seen.has(r.domain)) return false;
+    seen.add(r.domain);
+    return true;
+  });
+}
+async function callSerper(query: string, page: number): Promise<SerperResult[]> {
+  const env = getEnv();
+  const response = await axios.post(
+    "https://google.serper.dev/search",
+    { q: query, num: 10, page },
+    {
+      headers: {
+        "X-API-KEY": env.SERPER_API_KEY,
+        "Content-Type": "application/json",
+      },
+      timeout: 10_000,
+    }
+  );
+  const organic = response.data?.organic ?? [];
+  return organic.map((item: { title: string; link: string; snippet: string }) => ({
+    title: item.title,
+    link: item.link,
+    snippet: item.snippet,
+    domain: extractDomain(item.link),
+  }));
+}
+function buildQueries(region: string, industry: string, keywords: string[]): string[] {
+  // Precision queries — each targets a specific pain+industry+region combo
+  const regionLabel = REGION_LABELS[region] ?? region;
+  return [
+    `"${industry}" company "${regionLabel}" "50 employees" OR "100 employees" OR "200 employees" automation`,
+    `${industry} business ${regionLabel} site:linkedin.com/company`,
+    `"${industry}" "${regionLabel}" "digital transformation" OR "AI" OR "automation" company`,
+    `${keywords[0]} ${keywords[1] ?? ""} company ${regionLabel} -job -careers`,
+  ].filter(Boolean);
+}
+function extractDomain(url: string): string {
+  try {
+    return new URL(url).hostname.replace(/^www\./, "");
+  } catch {
+    return url;
+  }
+}
+const REGION_LABELS: Record<string, string> = {
+  US: "United States",
+  UK: "United Kingdom",
+  AU: "Australia",
+  UAE: "Dubai",
+  SA: "Saudi Arabia",
+  SG: "Singapore",
+};

src/discovery/trigger-tasks/auto-discovery.ts ADDED Viewed

	@@ -0,0 +1,517 @@

+/**
+ * Trigger.dev Task Definitions — Phase 1 Pipeline
+ *
+ * 5 chained tasks instead of 1 monolithic function:
+ *
+ * Task 1: daily-scheduler    → CRON 4 AM UTC → picks territory → triggers process-company
+ * Task 2: process-company    → scrape + pain detect + gate 2 → triggers enrich-contacts
+ * Task 3: enrich-contacts    → emails + classify + verify + LinkedIn + social → triggers ai-profile
+ * Task 4: ai-profile-score   → Python service → save → triggers hot-alert if needed
+ * Task 5: daily-digest       → CRON 6:30 AM UTC → collects today's results → Slack digest
+ *
+ * Benefits:
+ *   - Company #3 fails → only #3 retries, rest continue
+ *   - 3 companies process in parallel (concurrency limit)
+ *   - Each task has its own retry policy
+ *   - Dashboard shows exact failure point
+ */
+import { task, schedules, queue } from "@trigger.dev/sdk/v3";
+import { getSupabaseClient } from "../../shared/supabase/client";
+import { startTrace, recordOperation, endTrace } from "../../shared/observability/tracer";
+import { saveCheckpoint, isAlreadyProcessed } from "../../shared/pipeline/checkpoint";
+import { getNextTerritory, getDailyQuota, markTerritorySearched, isSystemPaused, buildTerritoryQueries } from "../lib/territory-manager";
+import { scrapeCompanyWebsite } from "../lib/web-scraper";
+import { detectPainSignals } from "../lib/pain-signal-detector";
+import { enrichContacts } from "../lib/contact-enricher";
+import { sendRunStarted, sendRunProgress, sendDailyDigest, sendHotLeadAlert } from "../../slack/slack-service";
+import { logger } from "../../shared/utils/logger";
+import { randomUUID } from "crypto";
+import axios from "axios";
+import { getEnv } from "../../shared/config/env";
+// ─── Queue: max 3 companies processing simultaneously ────────
+const companyQueue = queue({
+  name: "company-processing",
+  concurrencyLimit: 3,
+});
+// ═══════════════════════════════════════════════════════════════
+// TASK 1: Daily Scheduler (CRON — runs every day at 4 AM UTC)
+// ═══════════════════════════════════════════════════════════════
+export const dailyScheduler = schedules.task({
+  id: "daily-lead-discovery",
+  // Cron configured in Trigger.dev dashboard: 0 4 * * * (4 AM UTC = 9 AM PKT)
+  maxDuration: 300, // 5 minutes for setup
+  run: async () => {
+    // Pre-flight
+    if (await isSystemPaused()) {
+      logger.info("⏸️ System paused — skipping today");
+      return { status: "paused" };
+    }
+    const quota = await getDailyQuota();
+    const territories = await getNextTerritory(quota);
+    if (territories.length === 0) {
+      logger.warn("No fresh territory — all cooling down");
+      return { status: "no_territory" };
+    }
+    // Create run record
+    const db = getSupabaseClient();
+    const runId = randomUUID();
+    const traceId = startTrace(runId);
+    const unit = territories[0];
+    await db.from("discovery_runs").insert({
+      id: runId,
+      run_type: "auto",
+      territory_id: unit.territoryId,
+      country_code: unit.countryCode,
+      city: unit.city,
+      industry: unit.industry,
+      quota_target: quota,
+      status: "running",
+      triggered_by: "system",
+    });
+    // Slack: run started
+    await sendRunStarted(`${unit.city}, ${unit.country}`, unit.industry, quota);
+    // Search Google for companies
+    const env = getEnv();
+    const allDomains: string[] = [];
+    for (const territory of territories) {
+      const queries = buildTerritoryQueries(territory, []);
+      for (const query of queries) {
+        try {
+          const response = await axios.post(
+            "https://google.serper.dev/search",
+            { q: query, num: 10 },
+            {
+              headers: { "X-API-KEY": env.SERPER_API_KEY, "Content-Type": "application/json" },
+              timeout: 8_000,
+            }
+          );
+          const organic = response.data?.organic ?? [];
+          for (const result of organic) {
+            try {
+              const hostname = new URL(result.link).hostname.replace(/^www\./, "");
+              const skip = ["facebook.com", "linkedin.com", "twitter.com", "instagram.com",
+                "youtube.com", "yelp.com", "yellowpages.com", "bbb.org", "wikipedia.org",
+                "reddit.com", "crunchbase.com", "glassdoor.com"];
+              if (!skip.some(s => hostname.includes(s)) && !allDomains.includes(hostname)) {
+                allDomains.push(hostname);
+              }
+            } catch { /* invalid URL */ }
+          }
+        } catch (err) {
+          logger.warn({ query, err }, "Serper search failed — continuing");
+        }
+      }
+    }
+    logger.info({ domains: allDomains.length, territory: unit.city }, "Domains found �� triggering company tasks");
+    // Trigger Task 2 for each domain (queued, max 3 concurrent)
+    const companyTasks = [];
+    for (const domain of allDomains) {
+      // Skip already processed
+      if (await isAlreadyProcessed(domain, 30)) continue;
+      companyTasks.push(
+        processCompany.trigger({
+          domain,
+          runId,
+          traceId,
+          industry: unit.industry,
+          city: unit.city,
+          country: unit.country,
+          countryCode: unit.countryCode,
+          territoryId: unit.territoryId,
+          quota,
+          linkedInUrl: null,
+        })
+      );
+    }
+    // Wait for all company tasks
+    const results = await Promise.allSettled(companyTasks);
+    const succeeded = results.filter(r => r.status === "fulfilled").length;
+    // Mark territory searched
+    await markTerritorySearched(unit.territoryId, unit.industry, succeeded);
+    // Update run
+    await db.from("discovery_runs").update({
+      status: "completed",
+      companies_found: allDomains.length,
+      completed_at: new Date().toISOString(),
+      search_queries: buildTerritoryQueries(unit, []),
+    }).eq("id", runId);
+    await endTrace(traceId);
+    return {
+      status: "completed",
+      domainsFound: allDomains.length,
+      tasksTriggered: companyTasks.length,
+      succeeded,
+    };
+  },
+});
+// ═══════════════════════════════════════════════════════════════
+// TASK 2: Process Company (per company, queued)
+// ═══════════════════════════════════════════════════════════════
+export const processCompany = task({
+  id: "process-company",
+  queue: companyQueue,
+  retry: {
+    maxAttempts: 2,
+    minTimeoutInMs: 5_000,
+    maxTimeoutInMs: 30_000,
+    factor: 2,
+  },
+  maxDuration: 120, // 2 minutes per company
+  run: async (payload: {
+    domain: string;
+    runId: string;
+    traceId: string;
+    industry: string;
+    city: string;
+    country: string;
+    countryCode: string;
+    territoryId: string;
+    quota: number;
+    linkedInUrl: string | null;
+  }) => {
+    const { domain, runId, traceId, industry, city, country } = payload;
+    logger.info({ domain }, "Processing company");
+    // ── Stage 1: Scrape website ────────────────────────────────
+    const websiteData = await scrapeCompanyWebsite(domain);
+    if (!websiteData?.text) {
+      await saveCheckpoint(runId, domain, "completed", { reason: "no_website" });
+      return { status: "skipped", reason: "no_website_data" };
+    }
+    await saveCheckpoint(runId, domain, "scraped");
+    // ── Stage 2: Pain signal detection + Gate 2 ────────────────
+    const painResult = await detectPainSignals(
+      websiteData.name ?? domain,
+      industry,
+      websiteData.employeeCount ?? null,
+      websiteData.text ?? "",
+      websiteData.html ?? "",
+      traceId
+    );
+    // Gate 2: minimum 2 pain signals OR service match
+    if (painResult.painSignals.length < 2 && !painResult.serviceMatch) {
+      await saveCheckpoint(runId, domain, "completed", { reason: "gate2_failed" });
+      return { status: "skipped", reason: "gate2_failed" };
+    }
+    await saveCheckpoint(runId, domain, "filtered");
+    // ── Trigger Task 3: Enrich contacts ────────────────────────
+    const enrichResult = await enrichAndProfile.trigger({
+      domain,
+      runId,
+      traceId,
+      industry,
+      city,
+      country,
+      companyName: websiteData.name ?? domain,
+      employeeCount: websiteData.employeeCount ?? null,
+      description: websiteData.description ?? "",
+      websiteText: (websiteData.text ?? "").slice(0, 800),
+      websiteHtml: (websiteData.html ?? "").slice(0, 5000),
+      techStack: websiteData.techStack ?? [],
+      aiJobCount: websiteData.aiJobCount ?? 0,
+      linkedInUrl: websiteData.linkedinUrl ?? null,
+      painSignals: painResult.painSignals.map(p => p.signal),
+      serviceMatch: painResult.serviceMatch,
+      matchConfidence: painResult.matchConfidence,
+    });
+    return { status: "passed_to_enrichment", domain };
+  },
+});
+// ═══════════════════════════════════════════════════════════════
+// TASK 3: Enrich Contacts + AI Profile + Score (combined)
+// ═══════════════════════════════════════════════════════════════
+export const enrichAndProfile = task({
+  id: "enrich-and-profile",
+  retry: {
+    maxAttempts: 2,
+    minTimeoutInMs: 3_000,
+    maxTimeoutInMs: 20_000,
+    factor: 2,
+  },
+  maxDuration: 180, // 3 minutes (email verification can be slow)
+  run: async (payload: {
+    domain: string;
+    runId: string;
+    traceId: string;
+    industry: string;
+    city: string;
+    country: string;
+    companyName: string;
+    employeeCount: number | null;
+    description: string;
+    websiteText: string;
+    websiteHtml: string;
+    techStack: string[];
+    aiJobCount: number;
+    linkedInUrl: string | null;
+    painSignals: string[];
+    serviceMatch: string | null;
+    matchConfidence: number;
+  }) => {
+    const db = getSupabaseClient();
+    const env = getEnv();
+    // ── Step 1: Enrich contacts ──────────────────────────────
+    const contacts = await enrichContacts(
+      "",
+      payload.domain,
+      payload.companyName,
+      payload.employeeCount,
+      payload.industry,
+      payload.websiteText.slice(0, 300),
+      payload.websiteHtml,
+      payload.linkedInUrl,
+      payload.traceId
+    );
+    if (contacts.length === 0) {
+      await saveCheckpoint(payload.runId, payload.domain, "completed", { reason: "no_contacts" });
+      return { status: "skipped", reason: "no_contacts" };
+    }
+    // Must have authority-confirmed contact
+    const authorityContacts = contacts.filter(c => c.authorityConfirmed);
+    if (authorityContacts.length === 0) {
+      await saveCheckpoint(payload.runId, payload.domain, "completed", { reason: "no_authority" });
+      return { status: "skipped", reason: "no_authority_contacts" };
+    }
+    await saveCheckpoint(payload.runId, payload.domain, "emails_verified");
+    // ── Step 2: Save company ─────────────────────────────────
+    const companyId = randomUUID();
+    await db.from("companies").upsert({
+      id: companyId,
+      domain: payload.domain,
+      name: payload.companyName,
+      industry: payload.industry,
+      employee_count: payload.employeeCount,
+      description: payload.description,
+      website_status: "active",
+      linkedin_url: payload.linkedInUrl,
+      tech_stack: payload.techStack,
+      country: payload.country,
+      city: payload.city,
+      service_match: payload.serviceMatch,
+      service_match_score: Math.round(payload.matchConfidence * 100),
+      pain_signals: payload.painSignals,
+      trace_id: payload.traceId,
+    }, { onConflict: "domain" });
+    // Update contacts with company_id
+    for (const contact of contacts) {
+      await db.from("contacts").update({ company_id: companyId }).eq("id", contact.id);
+    }
+    // ── Step 3: AI Profile + Score (Python service) ──────────
+    const profileResponse = await axios.post(
+      `${env.PYTHON_AI_SERVICE_URL}/profile`,
+      {
+        company: {
+          id: companyId,
+          name: payload.companyName,
+          industry: payload.industry,
+          employee_count: payload.employeeCount,
+          description: payload.description,
+          website_text: payload.websiteText,
+          linkedin_description: "",
+          tech_stack: payload.techStack,
+          ai_job_count: payload.aiJobCount,
+          pain_signals: payload.painSignals,
+          service_match: payload.serviceMatch,
+        },
+        contacts: contacts.map(c => ({
+          full_name: c.fullName,
+          email: c.email,
+          email_verified: c.emailVerification?.status === "verified_deliverable",
+          linkedin_personal_url: c.linkedinPersonalUrl,
+          social_profiles: c.socialProfiles ?? {},
+        })),
+        trace_id: payload.traceId,
+      },
+      {
+        headers: { Authorization: `Bearer ${env.PYTHON_AI_SERVICE_SECRET}` },
+        timeout: 45_000,
+      }
+    );
+    const profile = profileResponse.data?.profile;
+    const score = profileResponse.data?.score;
+    const totalScore = score?.total_score ?? 0;
+    const tier = score?.tier ?? "archive";
+    // Save profile and score
+    await db.from("lead_profiles").upsert({
+      company_id: companyId,
+      ...profile,
+    }, { onConflict: "company_id" });
+    await db.from("lead_scores").upsert({
+      company_id: companyId,
+      ...score,
+    }, { onConflict: "company_id" });
+    // Update run stats
+    if (totalScore >= 70) {
+      await db.rpc("increment_run_leads", { run_id: payload.runId });
+    }
+    await saveCheckpoint(payload.runId, payload.domain, "completed");
+    // ── Step 4: Hot lead alert (85+) ─────────────────────────
+    if (totalScore >= 85) {
+      const best = authorityContacts[0];
+      await sendHotLeadAlert({
+        companyName: payload.companyName,
+        domain: payload.domain,
+        industry: payload.industry,
+        employeeCount: payload.employeeCount,
+        city: payload.city,
+        score: totalScore,
+        tier,
+        contactName: best.fullName,
+        contactTitle: best.title ?? "",
+        email: best.email,
+        emailVerified: best.emailVerification?.status === "verified_deliverable",
+        linkedinPersonal: best.linkedinPersonalUrl,
+        linkedinCompany: payload.linkedInUrl,
+        serviceMatch: payload.serviceMatch,
+        outreachAngle: profile?.outreach_angle ?? "",
+        painPoints: (profile?.pain_points ?? []).slice(0, 3),
+        socialProfiles: best.socialProfiles ?? {},
+      });
+    }
+    recordOperation(payload.traceId, "enrich_and_profile", score?.tokens_used ?? 0, 0, true);
+    return {
+      status: "completed",
+      domain: payload.domain,
+      score: totalScore,
+      tier,
+      contactsFound: contacts.length,
+      authorityConfirmed: authorityContacts.length,
+      hasLinkedIn: contacts.some(c => c.linkedinPersonalUrl),
+    };
+  },
+});
+// ═══════════════════════════════════════════════════════════════
+// TASK 4: Daily Digest (CRON — runs at 6:30 AM UTC = 11:30 AM PKT)
+// ═══════════════════════════════════════════════════════════════
+export const dailyDigestTask = schedules.task({
+  id: "daily-digest",
+  // Cron configured in Trigger.dev dashboard: 30 6 * * * (6:30 AM UTC)
+  maxDuration: 60,
+  run: async () => {
+    const db = getSupabaseClient();
+    const today = new Date();
+    today.setHours(0, 0, 0, 0);
+    // Get today's run stats
+    const { data: runs } = await db
+      .from("discovery_runs")
+      .select("*")
+      .gte("ran_at", today.toISOString());
+    const latestRun = runs?.[0];
+    if (!latestRun) {
+      logger.info("No runs today — skipping digest");
+      return;
+    }
+    // Count today's leads by tier
+    const { data: scores } = await db
+      .from("lead_scores")
+      .select("total_score, tier")
+      .gte("created_at", today.toISOString());
+    const hotLeads = scores?.filter(s => s.tier === "hot").length ?? 0;
+    const warmLeads = scores?.filter(s => s.tier === "warm").length ?? 0;
+    const nurtureLeads = scores?.filter(s => s.tier === "nurture").length ?? 0;
+    // Get token usage
+    const { data: traces } = await db
+      .from("llm_traces")
+      .select("total_tokens")
+      .gte("created_at", today.toISOString());
+    const totalTokens = traces?.reduce((sum, t) => sum + (t.total_tokens ?? 0), 0) ?? 0;
+    await sendDailyDigest({
+      territory: `${latestRun.city}, ${latestRun.country_code}`,
+      industry: latestRun.industry,
+      companiesSearched: latestRun.companies_found ?? 0,
+      leadsQualified: (scores?.length ?? 0),
+      hotLeads,
+      warmLeads,
+      nurtureLeads,
+      tokensUsed: totalTokens,
+      durationMinutes: latestRun.completed_at
+        ? Math.round((new Date(latestRun.completed_at).getTime() - new Date(latestRun.ran_at).getTime()) / 60_000)
+        : 0,
+    });
+    return { sent: true, leads: scores?.length ?? 0 };
+  },
+});
+// ═══════════════════════════════════════════════════════════════
+// TASK 5: Manual Discovery (triggered from Slack)
+// ═══════════════════════════════════════════════════════════════
+export const manualDiscoveryTask = task({
+  id: "manual-discovery",
+  maxDuration: 300,
+  run: async (payload: {
+    region: string;
+    industry: string;
+    maxCompanies: number;
+    triggeredBy: string;
+  }) => {
+    // Reuse the daily scheduler logic but with custom territory
+    logger.info({ payload }, "Manual discovery triggered from Slack");
+    // TODO: Build custom territory from region param
+    // For now, trigger the same pipeline
+    return { status: "manual_run_started", ...payload };
+  },
+});

src/discovery/trigger-tasks/manual-discovery.ts ADDED Viewed

	@@ -0,0 +1,139 @@

+import { task } from "@trigger.dev/sdk/v3";
+import { z } from "zod";
+import { getSupabaseClient } from "../../shared/supabase/client";
+import { logger } from "../../shared/utils/logger";
+import { loadIcpConfig, applyHardFilters, applySignalFilters } from "../lib/icp-filter";
+import { isDuplicate, isSuppressed } from "../lib/deduplicator";
+import { scrapeCompanyWebsite } from "../lib/web-scraper";
+import { scrapeLinkedInCompany } from "../lib/linkedin-scraper";
+import { normalizeCompany } from "../lib/normalizer";
+import { enrichContacts } from "../lib/contact-enricher";
+import { searchCompanies } from "../providers/serper";
+import { getRegionConfig } from "../lib/rotation";
+// ─── Input schema ─────────────────────────────────────────────
+const ManualDiscoveryInput = z.object({
+  region: z.enum(["US", "UK", "AU", "UAE", "SA", "SG"]),
+  industry: z.string().optional(),
+  customKeywords: z.array(z.string()).optional(),
+  maxCompanies: z.number().min(1).max(50).default(20),
+  triggeredBy: z.string().default("manual"), // slack username or "api"
+});
+export type ManualDiscoveryInput = z.infer<typeof ManualDiscoveryInput>;
+// ─── Manual Discovery Task ────────────────────────────────────
+export const manualDiscoveryTask = task({
+  id: "manual-discovery",
+  maxDuration: 1800, // 30 min max
+  run: async (payload: ManualDiscoveryInput) => {
+    const input = ManualDiscoveryInput.parse(payload);
+    logger.info({ input }, "🎯 Manual discovery started");
+    const icp = await loadIcpConfig();
+    const regionConfig = getRegionConfig(input.region);
+    const keywords = input.customKeywords?.length
+      ? input.customKeywords
+      : icp.keywords;
+    const industries = input.industry
+      ? [input.industry]
+      : regionConfig.industries.slice(0, 3); // limit to 3 for manual runs
+    let totalDiscovered = 0;
+    let totalQualified = 0;
+    for (const industry of industries) {
+      const results = await searchCompanies(input.region, industry, keywords);
+      const capped = results.slice(0, input.maxCompanies);
+      for (const result of capped) {
+        const status = await processManualCompany(result, input.region, icp, industry);
+        if (status !== "skip") totalDiscovered++;
+        if (status === "qualified") totalQualified++;
+      }
+    }
+    // ── Audit log ─────────────────────────────────────────────
+    const db = getSupabaseClient();
+    await db.from("audit_log").insert({
+      action: "manual_discovery_completed",
+      entity_type: "discovery_run",
+      entity_id: null,
+      actor: input.triggeredBy,
+      details: {
+        region: input.region,
+        industry: input.industry ?? "all",
+        totalDiscovered,
+        totalQualified,
+      },
+    });
+    logger.info({ totalDiscovered, totalQualified }, "✅ Manual discovery completed");
+    return { region: input.region, totalDiscovered, totalQualified };
+  },
+});
+// ─── Processing pipeline (same logic as auto, extracted) ─────
+async function processManualCompany(
+  result: { domain: string; title: string; link: string; snippet: string },
+  region: string,
+  icp: Awaited<ReturnType<typeof loadIcpConfig>>,
+  industry: string
+): Promise<"skip" | "new" | "qualified"> {
+  const { domain } = result;
+  const db = getSupabaseClient();
+  if (await isSuppressed(domain)) return "skip";
+  const { isDupe } = await isDuplicate(domain, result.title);
+  if (isDupe) return "skip";
+  const website = await scrapeCompanyWebsite(domain);
+  const gate1 = applyHardFilters(website, icp, region);
+  if (!gate1.passed) return "skip";
+  const gate2 = applySignalFilters(website, icp);
+  let linkedin = null;
+  if (website.linkedinUrl) {
+    linkedin = await scrapeLinkedInCompany(website.linkedinUrl).catch(() => null);
+  }
+  const normalized = normalizeCompany(result as any, website, linkedin, region, "manual");
+  const { data: saved, error } = await db
+    .from("companies")
+    .insert({ ...normalized, industry })
+    .select("id")
+    .single();
+  if (error || !saved) return "skip";
+  if (!gate2.passed) {
+    await db.from("companies").update({ status: "nurture" }).eq("id", saved.id);
+    return "new";
+  }
+  const decisionMakers = linkedin?.decisionMakers ?? [];
+  const contactsSaved = await enrichContacts(saved.id, domain, decisionMakers);
+  await db.from("companies").update({ status: "profiled" }).eq("id", saved.id);
+  if (contactsSaved > 0) {
+    const { profilingTask } = await import("../../profiling/trigger-tasks/profiling-router");
+    await profilingTask.trigger({
+      company_id: saved.id,
+      domain,
+      name: normalized.name,
+      region,
+      source: "manual",
+    });
+    return "qualified";
+  }
+  return "new";
+}

src/profiling/python-service/config.py ADDED Viewed

	@@ -0,0 +1,25 @@

+import os
+from dotenv import load_dotenv
+from pydantic_settings import BaseSettings
+load_dotenv()
+class Settings(BaseSettings):
+    # Supabase
+    SUPABASE_URL: str
+    SUPABASE_SERVICE_ROLE_KEY: str
+    # LLM (All on NVIDIA NIM — FREE)
+    NVIDIA_API_KEY: str
+    NVIDIA_NIM_BASE_URL: str = "https://integrate.api.nvidia.com/v1"
+    # Service auth
+    PYTHON_AI_SERVICE_SECRET: str
+    # Config
+    LOG_LEVEL: str = "INFO"
+    class Config:
+        env_file = "../../../.env"
+settings = Settings()

src/profiling/python-service/hallucination_guard.py ADDED Viewed

	@@ -0,0 +1,137 @@

+"""
+Hallucination Guard v2 — Grounded Verification
+Old approach: "Ask LLM for confidence" → LLM grades own exam → useless
+New approach: Cross-reference every claim against evidence → real verification
+Every LLM output field is checked:
+- Employee count → matches scraped data?
+- Industry → matches detected industry?
+- AI readiness "high" → do we actually have AI job postings?
+- PII in output → strip immediately
+"""
+import re
+import logging
+logger = logging.getLogger(__name__)
+def validate_profile_grounded(profile: dict, evidence: dict) -> dict:
+    """
+    Cross-check profile output against evidence.
+    Returns grounding result with corrections.
+    """
+    verified = []
+    unverified = []
+    corrections = {}
+    # ── Employee count ────────────────────────────────────────
+    summary = str(profile.get("profile_summary", ""))
+    known_emp = evidence.get("employee_count")
+    emp_match = re.search(r'(\d[\d,]+)\s*(employees?|people|staff)', summary, re.I)
+    if emp_match and known_emp:
+        claimed = int(emp_match.group(1).replace(",", ""))
+        if abs(claimed - known_emp) > known_emp * 0.3:
+            corrections["employee_count"] = {"claimed": claimed, "actual": known_emp}
+            verified.append("employee_count_corrected")
+        else:
+            verified.append("employee_count_accurate")
+    # ── AI readiness vs actual signals ────────────────────────
+    claimed_readiness = profile.get("ai_readiness", "")
+    ai_jobs = evidence.get("ai_job_count", 0)
+    tech_stack = evidence.get("tech_stack", [])
+    if claimed_readiness == "high" and ai_jobs == 0 and len(tech_stack) == 0:
+        corrections["ai_readiness"] = {"claimed": "high", "actual": "low"}
+        verified.append("ai_readiness_corrected")
+    elif claimed_readiness == "low" and ai_jobs >= 3:
+        corrections["ai_readiness"] = {"claimed": "low", "actual": "high"}
+        verified.append("ai_readiness_corrected")
+    else:
+        verified.append("ai_readiness_plausible")
+    # ── Company name in summary ───────────────────────────────
+    known_name = evidence.get("name", "")
+    if known_name and len(known_name) > 3:
+        name_words = known_name.lower().split()
+        summary_lower = summary.lower()
+        if any(w in summary_lower for w in name_words if len(w) > 2):
+            verified.append("company_name_present")
+        else:
+            unverified.append("company_name_may_differ")
+    # ── Evidence claims ───────────────────────────────────────
+    evidence_used = profile.get("evidence_used", [])
+    if isinstance(evidence_used, list):
+        all_evidence_text = " ".join([
+            str(evidence.get("website_text", "")),
+            " ".join(evidence.get("tech_stack", [])),
+            " ".join(evidence.get("pain_signals", [])),
+            str(evidence.get("description", "")),
+        ]).lower()
+        for claim in evidence_used:
+            claim_words = str(claim).lower().split()[:4]
+            if any(w in all_evidence_text for w in claim_words if len(w) > 3):
+                verified.append(f"evidence_grounded: {str(claim)[:30]}")
+            else:
+                unverified.append(f"evidence_unverifiable: {str(claim)[:30]}")
+    # ── PII check ─────────────────────────────────────────────
+    output_str = str(profile)
+    email_found = re.search(r'[\w.+-]+@[\w-]+\.[a-z]{2,}', output_str)
+    phone_found = re.search(r'\+?\d[\d\s\-().]{8,}', output_str)
+    if email_found:
+        unverified.append("pii_email_in_output")
+    if phone_found:
+        unverified.append("pii_phone_in_output")
+    # ── Grounding score ───────────────────────────────────────
+    total = len(verified) + len(unverified)
+    grounding_score = len(verified) / total if total > 0 else 0.5
+    result = {
+        "is_grounded": grounding_score >= 0.6,
+        "grounding_score": round(grounding_score, 2),
+        "verified_claims": verified,
+        "unverified_claims": unverified,
+        "corrections": corrections,
+    }
+    if not result["is_grounded"]:
+        logger.warning(f"Profile failed grounding: score={grounding_score:.2f}, corrections={len(corrections)}")
+    return result
+def validate_score_grounded(score: dict, profile: dict) -> dict:
+    """Validate scoring output for consistency."""
+    issues = []
+    total = score.get("total_score", -1)
+    if not (0 <= total <= 100):
+        issues.append(f"invalid_total_score:{total}")
+    tier = score.get("tier")
+    if tier not in ("hot", "warm", "nurture", "archive"):
+        issues.append(f"invalid_tier:{tier}")
+    # Cross-check tier vs score
+    expected_tier = (
+        "hot" if total >= 85 else
+        "warm" if total >= 70 else
+        "nurture" if total >= 50 else
+        "archive"
+    )
+    if tier != expected_tier:
+        issues.append(f"tier_score_mismatch: score={total} tier={tier} expected={expected_tier}")
+        score["tier"] = expected_tier  # auto-correct
+    return {
+        "is_valid": len(issues) == 0,
+        "issues": issues,
+    }

src/profiling/python-service/main.py ADDED Viewed

	@@ -0,0 +1,148 @@

+"""
+FastAPI Profiling Service v2 — NVIDIA NIM powered.
+Endpoints:
+  POST /profile  → Profile company + compute score (single pipeline)
+  GET  /health   → Service health check
+Security:
+  Bearer token authentication (shared secret with Node.js orchestration layer)
+"""
+import logging
+from contextlib import asynccontextmanager
+from fastapi import FastAPI, HTTPException, Depends
+from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
+from pydantic import BaseModel
+from typing import Optional
+from config import settings
+from profiler import generate_profile
+from scorer import compute_score
+from hallucination_guard import validate_score_grounded
+logging.basicConfig(level=getattr(logging, settings.LOG_LEVEL.upper(), logging.INFO))
+logger = logging.getLogger(__name__)
+# ─── Auth ─────────────────────────────────────────────────────
+security = HTTPBearer()
+def verify_token(credentials: HTTPAuthorizationCredentials = Depends(security)):
+    if credentials.credentials != settings.PYTHON_AI_SERVICE_SECRET:
+        raise HTTPException(status_code=401, detail="Invalid authentication")
+    return True
+# ─── Models ───────────────────────────────────────────────────
+class CompanyInput(BaseModel):
+    id: Optional[str] = None
+    name: str
+    industry: str = ""
+    employee_count: Optional[int] = None
+    description: str = ""
+    website_text: str = ""
+    linkedin_description: str = ""
+    tech_stack: list[str] = []
+    ai_job_count: int = 0
+    pain_signals: list[str] = []
+    service_match: Optional[str] = None
+class ContactInput(BaseModel):
+    full_name: str = ""
+    email: Optional[str] = None
+    email_verified: bool = False
+    linkedin_personal_url: Optional[str] = None
+    social_profiles: dict = {}
+class ProfileRequest(BaseModel):
+    company: CompanyInput
+    contacts: list[ContactInput] = []
+    trace_id: str = ""
+# ─── App ──────────────────────────────────────────────────────
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    logger.info("🚀 AI Profiling Service v2 starting...")
+    logger.info(f"   NVIDIA NIM: {settings.NVIDIA_NIM_BASE_URL}")
+    logger.info(f"   Models: GPT OSS → Gemma 3 → LLaMA 70B → LLaMA 8B → Deterministic")
+    yield
+    logger.info("AI Profiling Service shutting down")
+app = FastAPI(
+    title="AI Lead Profiling Service",
+    version="2.0.0",
+    lifespan=lifespan,
+)
+# ─── Endpoints ────────────────────────────────────────────────
+@app.get("/health")
+async def health():
+    return {
+        "status": "healthy",
+        "version": "2.0.0",
+        "models": {
+            "primary": "nvidia/llama-3.1-nemotron-ultra-253b-v1",
+            "secondary": "google/gemma-3-27b-it",
+            "tertiary": "meta/llama-3.3-70b-instruct",
+            "fast": "meta/llama-3.1-8b-instruct",
+        },
+    }
+@app.post("/profile")
+async def profile_company(request: ProfileRequest, _auth: bool = Depends(verify_token)):
+    """
+    Full profiling pipeline:
+    1. LLM generates profile (chain-of-thought, grounded)
+    2. LLM extracts signals for scoring
+    3. Code computes score deterministically
+    4. Both are validated for hallucinations
+    """
+    company_data = request.company.model_dump()
+    contacts_data = [c.model_dump() for c in request.contacts]
+    trace_id = request.trace_id
+    try:
+        # Step 1: Generate profile (LLM with grounding)
+        profile = await generate_profile(company_data, trace_id)
+        # Step 2: Compute score (LLM extracts signals → code computes)
+        score = await compute_score(company_data, profile, contacts_data, trace_id)
+        # Step 3: Validate score consistency
+        score_validation = validate_score_grounded(score, profile)
+        if not score_validation["is_valid"]:
+            logger.warning(f"Score validation issues: {score_validation['issues']}")
+        return {
+            "profile": profile,
+            "score": score,
+            "validation": {
+                "profile_grounded": profile.get("grounding_score", 0),
+                "profile_consistent": profile.get("is_consistent", True),
+                "score_valid": score_validation["is_valid"],
+                "score_issues": score_validation.get("issues", []),
+            },
+            "meta": {
+                "model_used": profile.get("llm_model", "unknown"),
+                "is_fallback": profile.get("is_fallback", False),
+                "tokens_used": profile.get("tokens_used", 0),
+                "trace_id": trace_id,
+            },
+        }
+    except Exception as e:
+        logger.error(f"Profiling failed for {company_data.get('name')}: {e}")
+        raise HTTPException(status_code=500, detail=str(e))
+# ─── Run ──────────────────────────────────────────────────────
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=True)

src/profiling/python-service/nvidia_client.py ADDED Viewed

	@@ -0,0 +1,254 @@

+"""
+Multi-Model LLM Client (Python) — All FREE on NVIDIA NIM
+3 models, 1 provider, 1 API key, $0 cost:
+  1. MiniMax M2.7    → Best reasoning, 4M context, built-in CoT
+  2. LLaMA 3.3 70B   → Reliable fallback
+  3. LLaMA 3.1 8B    → Fast, simple tasks
+  4. Deterministic    → Zero LLM fallback
+"""
+import time
+import json
+import hashlib
+import logging
+from typing import Optional
+from openai import AsyncOpenAI
+from config import settings
+logger = logging.getLogger(__name__)
+# ─── Model configs (ALL on NVIDIA NIM) ───────────────────────
+MODEL_CONFIGS = [
+    {
+        "name": "MiniMax M2.7",
+        "model": "minimaxai/minimax-m2.7",
+        "max_context": 4_000_000,
+        "best_for": "profiling, scoring, complex reasoning",
+    },
+    {
+        "name": "LLaMA 3.3 70B",
+        "model": "meta/llama-3.3-70b-instruct",
+        "max_context": 128_000,
+        "best_for": "general tasks, reliable fallback",
+    },
+    {
+        "name": "LLaMA 3.1 8B",
+        "model": "meta/llama-3.1-8b-instruct",
+        "max_context": 128_000,
+        "best_for": "email classification, simple checks",
+    },
+]
+# ─── Shared client (single provider) ─────────────────────────
+_client: Optional[AsyncOpenAI] = None
+def get_client() -> AsyncOpenAI:
+    global _client
+    if _client is None:
+        _client = AsyncOpenAI(
+            base_url=settings.NVIDIA_NIM_BASE_URL,
+            api_key=settings.NVIDIA_API_KEY,
+        )
+    return _client
+# ─── Main LLM call ───────────────────────────────────────────
+async def call_llm(
+    operation: str,
+    system_prompt: str,
+    user_prompt: str,
+    model_index: int = 0,
+    temperature: float = 0.2,
+    max_tokens: int = 1024,
+    json_mode: bool = True,
+    trace_id: str = "",
+    company_id: str = None,
+) -> dict:
+    """Call LLM with fallback: MiniMax → LLaMA 70B → LLaMA 8B → Deterministic"""
+    if model_index >= len(MODEL_CONFIGS):
+        logger.error(f"ALL models failed for {operation} — deterministic fallback")
+        return _deterministic_fallback()
+    config = MODEL_CONFIGS[model_index]
+    client = get_client()
+    start = time.time()
+    try:
+        kwargs = {
+            "model": config["model"],
+            "messages": [
+                {"role": "system", "content": system_prompt},
+                {"role": "user", "content": user_prompt},
+            ],
+            "temperature": temperature,
+            "max_tokens": max_tokens,
+            "top_p": 0.9,
+        }
+        if json_mode:
+            kwargs["response_format"] = {"type": "json_object"}
+        response = await client.chat.completions.create(**kwargs)
+        message = response.choices[0].message
+        content = message.content or ""
+        reasoning = getattr(message, "reasoning_content", None)
+        usage = response.usage
+        latency_ms = int((time.time() - start) * 1000)
+        parsed = _safe_parse_json(content) if json_mode else None
+        if json_mode and parsed is None:
+            logger.warning(f"JSON parse failed on {config['name']} — next model")
+            return await call_llm(operation, system_prompt, user_prompt,
+                                   model_index + 1, temperature, max_tokens,
+                                   json_mode, trace_id, company_id)
+        result = {
+            "content": content,
+            "reasoning": reasoning,
+            "parsed": parsed,
+            "model": config["name"],
+            "provider": "nvidia",
+            "tokens": {
+                "prompt": usage.prompt_tokens if usage else 0,
+                "completion": usage.completion_tokens if usage else 0,
+                "total": usage.total_tokens if usage else 0,
+            },
+            "latency_ms": latency_ms,
+            "fallback_used": False,
+        }
+        if reasoning:
+            logger.debug(f"MiniMax reasoning: {reasoning[:150]}...")
+        await _log_trace(trace_id, operation, config["name"], result, True, company_id)
+        return result
+    except Exception as e:
+        error_msg = str(e)
+        if "429" in error_msg:
+            logger.warning(f"Rate limited on {config['name']} — waiting 10s")
+            await _async_sleep(10)
+            return await call_llm(operation, system_prompt, user_prompt,
+                                   model_index, temperature, max_tokens,
+                                   json_mode, trace_id, company_id)
+        logger.warning(f"{config['name']} failed ({error_msg[:80]}) — next model")
+        return await call_llm(operation, system_prompt, user_prompt,
+                               model_index + 1, temperature, max_tokens,
+                               json_mode, trace_id, company_id)
+def _deterministic_fallback() -> dict:
+    return {
+        "content": "",
+        "reasoning": None,
+        "parsed": None,
+        "model": "deterministic_fallback",
+        "provider": "none",
+        "tokens": {"prompt": 0, "completion": 0, "total": 0},
+        "latency_ms": 0,
+        "fallback_used": True,
+    }
+# ─── Self-consistency check ──────────────────────────────────
+async def call_with_consistency(
+    operation: str,
+    system_prompt: str,
+    user_prompt: str,
+    trace_id: str = "",
+    company_id: str = None,
+) -> dict:
+    primary = await call_llm(operation, system_prompt, user_prompt,
+                              temperature=0.1, trace_id=trace_id, company_id=company_id)
+    if operation not in ("profile", "score"):
+        return {**primary, "is_consistent": True, "consistency_score": 1.0}
+    if primary.get("fallback_used"):
+        return {**primary, "is_consistent": True, "consistency_score": 0.5}
+    # MiniMax with reasoning = inherently more consistent
+    if primary.get("model") == "MiniMax M2.7" and primary.get("reasoning"):
+        return {**primary, "is_consistent": True, "consistency_score": 0.95}
+    secondary = await call_llm(operation, system_prompt, user_prompt,
+                                temperature=0.4, trace_id=trace_id, company_id=company_id)
+    score = _compare_outputs(primary.get("parsed"), secondary.get("parsed"))
+    return {**primary, "is_consistent": score >= 0.75, "consistency_score": score}
+def _compare_outputs(a: dict, b: dict) -> float:
+    if not a or not b:
+        return 0.5
+    matches = 0
+    total = 0
+    for key in ["ai_readiness", "tier", "service_match"]:
+        if key in a and key in b:
+            total += 1
+            if a[key] == b[key]:
+                matches += 1
+    for key in ["total_score", "company_fit"]:
+        av = a.get(key)
+        bv = b.get(key)
+        if isinstance(av, (int, float)) and isinstance(bv, (int, float)):
+            total += 1
+            if abs(av - bv) <= 10:
+                matches += 1
+    return matches / total if total > 0 else 1.0
+# ─── Helpers ─────────────────────────────────────────────────
+def _safe_parse_json(text: str) -> Optional[dict]:
+    content = text.strip()
+    if "```json" in content:
+        content = content.split("```json")[1].split("```")[0].strip()
+    elif "```" in content:
+        content = content.split("```")[1].split("```")[0].strip()
+    try:
+        return json.loads(content)
+    except json.JSONDecodeError:
+        import re
+        match = re.search(r'\{[\s\S]*\}', content)
+        if match:
+            try:
+                return json.loads(match.group())
+            except json.JSONDecodeError:
+                return None
+        return None
+async def _log_trace(trace_id, operation, model, result, success, company_id):
+    try:
+        from supabase import create_client
+        sb = create_client(settings.SUPABASE_URL, settings.SUPABASE_SERVICE_ROLE_KEY)
+        sb.table("llm_traces").insert({
+            "trace_id": trace_id,
+            "operation": operation,
+            "model": model,
+            "provider": "nvidia",
+            "prompt_tokens": result["tokens"]["prompt"] if result else 0,
+            "completion_tokens": result["tokens"]["completion"] if result else 0,
+            "total_tokens": result["tokens"]["total"] if result else 0,
+            "latency_ms": result.get("latency_ms", 0) if result else 0,
+            "success": success,
+            "fallback_used": result.get("fallback_used", True) if result else True,
+            "company_id": company_id,
+        }).execute()
+    except Exception as e:
+        logger.debug(f"Trace log failed (non-critical): {e}")
+async def _async_sleep(seconds: int):
+    import asyncio
+    await asyncio.sleep(seconds)

src/profiling/python-service/profiler.py ADDED Viewed

	@@ -0,0 +1,212 @@

+"""
+Profiler — Production-grade company profiling using NVIDIA NIM.
+Key differences from v1:
+1. Chain-of-thought reasoning forced (Step 1-5 before JSON)
+2. Few-shot examples (2 real-world examples in prompt)
+3. Grounding instruction ("UNKNOWN" for missing data)
+4. Evidence tracking (what data supported each claim)
+5. Deterministic fallback (zero hallucination when LLM fails)
+"""
+import logging
+from nvidia_client import call_with_consistency, MODELS
+from hallucination_guard import validate_profile_grounded
+logger = logging.getLogger(__name__)
+# ─── System prompt ────────────────────────────────────────────
+SYSTEM_PROMPT = """You are a business analyst for an AI automation agency.
+Your job: analyze a company and identify WHERE our AI services can help them.
+CRITICAL RULES:
+- Only state facts supported by the provided evidence
+- Write "UNKNOWN" for anything not in the data — NEVER guess
+- Your analysis determines whether a real salesperson contacts this company
+- Wrong analysis = wasted human time = unacceptable
+- Think step by step before concluding"""
+# ─── User prompt builder ─────────────────────────────────────
+def build_profile_prompt(data: dict) -> str:
+    return f"""ANALYZE THIS COMPANY:
+Name: {data.get('name', 'UNKNOWN')}
+Industry: {data.get('industry', 'UNKNOWN')}
+Employees: {data.get('employee_count', 'UNKNOWN')}
+Description: {(data.get('description') or 'NONE PROVIDED')[:400]}
+Website excerpt:
+{(data.get('website_text') or '')[:600]}
+LinkedIn description:
+{data.get('linkedin_description') or 'NONE'}
+Tech stack detected: {', '.join(data.get('tech_stack', [])) or 'NONE DETECTED'}
+Job postings mentioning AI/automation: {data.get('ai_job_count', 0)}
+Pain signals detected: {', '.join(data.get('pain_signals', [])) or 'NONE'}
+Service match suggestion: {data.get('service_match') or 'NONE'}
+STEP-BY-STEP ANALYSIS:
+Step 1: What does this company actually DO? (2 sentences, facts only)
+Step 2: What are their likely daily operational challenges? (based on industry + size)
+Step 3: What specific AI automation would save them time/money? (be specific)
+Step 4: Who in this organization would approve buying this service?
+Step 5: What outreach angle would resonate with this specific person?
+After reasoning through steps 1-5, output this JSON:
+{{
+  "profile_summary": "2-3 factual sentences about what this company does",
+  "pain_points": ["specific pain 1", "specific pain 2"],
+  "ai_use_case": "The single most compelling AI use case for them",
+  "ai_readiness": "low|medium|high",
+  "decision_maker_reasoning": "Who likely makes purchasing decisions and why",
+  "outreach_angle": "One specific sentence — the hook for first contact",
+  "confidence": 0.0,
+  "evidence_used": ["list which data points you relied on"],
+  "evidence_missing": ["list what data you wished you had"]
+}}
+EXAMPLE 1 (dental clinic, 6 employees):
+{{
+  "profile_summary": "ABC Dental is a 6-person dental practice in Houston offering general and cosmetic dentistry. They display their phone number prominently and use a basic contact form for appointments.",
+  "pain_points": ["Manual phone-based appointment scheduling during business hours only", "No after-hours patient communication capability"],
+  "ai_use_case": "AI receptionist to handle appointment booking, reminders, and after-hours calls",
+  "ai_readiness": "low",
+  "decision_maker_reasoning": "Practice owner (Dr. Smith, DDS) makes all purchasing decisions. Small practice = owner controls budget directly.",
+  "outreach_angle": "Stop losing patients to voicemail — our AI receptionist books appointments 24/7",
+  "confidence": 0.82,
+  "evidence_used": ["phone number on homepage", "contact form only", "6 staff listed", "no chatbot detected"],
+  "evidence_missing": ["annual revenue", "number of daily calls"]
+}}
+EXAMPLE 2 (manufacturing company, 150 employees):
+{{
+  "profile_summary": "XYZ Manufacturing is a UK-based manufacturer of industrial valves with 150 employees. They use SAP for ERP and are hiring a Data Analyst.",
+  "pain_points": ["Manual data extraction from legacy SAP system", "Production reporting requires manual spreadsheet compilation"],
+  "ai_use_case": "Automated reporting pipeline that extracts SAP data and generates dashboards",
+  "ai_readiness": "medium",
+  "decision_maker_reasoning": "Operations Director manages the data team and would champion this internally. CTO signs off on tech purchases.",
+  "outreach_angle": "Your Data Analyst job posting tells us you're drowning in manual SAP reports — we automate that entirely",
+  "confidence": 0.88,
+  "evidence_used": ["SAP detected in tech stack", "Data Analyst job posting", "150 employees"],
+  "evidence_missing": ["specific SAP modules used", "current reporting frequency"]
+}}"""
+# ─── Main profiling function ─────────────────────────────────
+async def generate_profile(company_data: dict, trace_id: str = "") -> dict:
+    """
+    Generate LLM profile with consistency checking and grounding.
+    Returns cleaned, grounded profile or deterministic fallback.
+    """
+    prompt = build_profile_prompt(company_data)
+    # Call with consistency check (2 temperatures, compare)
+    result = await call_with_consistency(
+        operation="profile",
+        system_prompt=SYSTEM_PROMPT,
+        user_prompt=prompt,
+        trace_id=trace_id,
+        company_id=company_data.get("id"),
+    )
+    # All models failed → deterministic fallback
+    if result.get("fallback_used") or not result.get("parsed"):
+        logger.warning(f"All LLM models failed for {company_data.get('name')} — using fallback")
+        return _deterministic_fallback(company_data)
+    profile = result["parsed"]
+    profile["llm_model"] = result["model"]
+    profile["is_fallback"] = False
+    profile["is_consistent"] = result.get("is_consistent", True)
+    profile["consistency_score"] = result.get("consistency_score", 1.0)
+    profile["tokens_used"] = result["tokens"]["total"]
+    # Grounding validation
+    grounding_result = validate_profile_grounded(profile, company_data)
+    profile["grounding_score"] = grounding_result["grounding_score"]
+    profile["corrections"] = grounding_result.get("corrections", {})
+    # Apply corrections
+    if grounding_result.get("corrections"):
+        for key, correction in grounding_result["corrections"].items():
+            if key in profile:
+                profile[key] = correction["actual"]
+    return profile
+# ─── Deterministic fallback ──────────────────────────────────
+def _deterministic_fallback(data: dict) -> dict:
+    """Zero-hallucination fallback. Only uses available facts."""
+    industry = data.get("industry", "business")
+    size = data.get("employee_count", "unknown")
+    name = data.get("name", "this company")
+    pain_signals = data.get("pain_signals", [])
+    service_match = data.get("service_match")
+    # Map service to pain points
+    pain_points = _get_pain_points(service_match, industry, pain_signals)
+    # AI readiness from evidence
+    ai_jobs = data.get("ai_job_count", 0)
+    tech_stack = data.get("tech_stack", [])
+    if ai_jobs >= 2: ai_readiness = "high"
+    elif tech_stack or ai_jobs >= 1: ai_readiness = "medium"
+    else: ai_readiness = "low"
+    return {
+        "profile_summary": f"{name} is a {industry} company with approximately {size} employees.",
+        "pain_points": pain_points,
+        "ai_use_case": _get_use_case(service_match, industry),
+        "ai_readiness": ai_readiness,
+        "decision_maker_reasoning": f"At a {size}-employee {industry} company, purchasing decisions are likely made by the owner or managing director.",
+        "outreach_angle": _get_outreach_angle(service_match, name),
+        "confidence": 0.5,
+        "evidence_used": [f"employee_count: {size}", f"industry: {industry}"] + pain_signals[:3],
+        "evidence_missing": ["revenue", "growth rate", "current tools"],
+        "llm_model": "deterministic_fallback",
+        "is_fallback": True,
+        "is_consistent": True,
+        "consistency_score": 1.0,
+        "grounding_score": 1.0,
+        "tokens_used": 0,
+        "corrections": {},
+    }
+def _get_pain_points(service, industry, detected_signals):
+    if detected_signals and len(detected_signals) >= 2:
+        return detected_signals[:2]
+    service_pains = {
+        "AI Receptionist": ["Manual phone handling during business hours only", "Missed calls and appointments outside working hours"],
+        "AI Customer Support": ["Manual ticket handling and slow response times", "No automated FAQ or chatbot for common questions"],
+        "AI Data Processing": ["Manual data entry and reporting overhead", "Legacy system inefficiencies"],
+        "AI Sales Automation": ["Manual outbound sales process", "Unqualified leads consuming sales team time"],
+        "AI Workflow Automation": ["Manual approval workflows", "Multiple disconnected tools and platforms"],
+    }
+    return service_pains.get(service, ["Manual operational processes", "Unoptimized workflow efficiency"])
+def _get_use_case(service, industry):
+    if service:
+        return f"{service} for {industry} operations"
+    return f"AI workflow automation for {industry} processes"
+def _get_outreach_angle(service, name):
+    angles = {
+        "AI Receptionist": f"Stop losing customers to voicemail — our AI handles calls 24/7 for {name}",
+        "AI Customer Support": f"Reduce support costs by 60% with AI-powered customer service for {name}",
+        "AI Data Processing": f"Eliminate manual reporting — our AI automates your data pipeline",
+        "AI Sales Automation": f"Double your sales pipeline efficiency with AI-powered outreach",
+    }
+    return angles.get(service, f"Reduce operational overhead with targeted AI automation for {name}")

src/profiling/python-service/requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+fastapi==0.111.0
+uvicorn[standard]==0.30.0
+httpx==0.27.0
+pydantic==2.7.0
+pydantic-settings==2.2.0
+python-dotenv==1.0.1
+openai==1.30.0
+supabase==2.4.0

src/profiling/python-service/scorer.py ADDED Viewed

	@@ -0,0 +1,260 @@

+"""
+Scorer v2 — Signal Extraction + Deterministic Scoring
+KEY DESIGN CHANGE:
+  Old: LLM computes score directly → hallucination risk
+  New: LLM extracts SIGNALS → Code computes score → zero hallucination
+LLM is good at: "Does this company have legacy SAP?" (yes/no)
+LLM is bad at:  "Give this company 73 out of 100" (arbitrary)
+So: LLM extracts signals, code does math.
+"""
+import logging
+from nvidia_client import call_llm, MODELS
+logger = logging.getLogger(__name__)
+# ─── Signal extraction prompt ────────────────────────────────
+SYSTEM_PROMPT = """You are a lead qualification engine.
+Your job: extract SIGNALS from company data. You do NOT compute the final score.
+The system computes scores deterministically from your signal extraction.
+CRITICAL RULES:
+- Extract only what the evidence supports
+- For each signal, cite which piece of evidence supports it
+- If evidence is weak or missing, say so honestly
+- Output ONLY the structured JSON requested"""
+def build_signal_prompt(data: dict, profile: dict, contacts: list) -> str:
+    has_verified_email = any(c.get("email_verified") for c in contacts)
+    has_linkedin = any(c.get("linkedin_personal_url") for c in contacts)
+    has_social = any(c.get("social_profiles") for c in contacts)
+    return f"""EXTRACT SIGNALS for lead scoring. Do not compute a score.
+Company: {data.get('name', 'UNKNOWN')}
+Industry: {data.get('industry', 'UNKNOWN')}
+Employees: {data.get('employee_count', 'UNKNOWN')}
+Tech stack: {', '.join(data.get('tech_stack', [])) or 'NONE'}
+AI job postings: {data.get('ai_job_count', 0)}
+Pain signals: {', '.join(data.get('pain_signals', [])) or 'NONE'}
+Service match: {data.get('service_match') or 'NONE'}
+AI readiness (from profile): {profile.get('ai_readiness', 'UNKNOWN')}
+Has verified email: {has_verified_email}
+Has personal LinkedIn: {has_linkedin}
+Has social profiles: {has_social}
+Growth signals count: {len(data.get('growth_signals', []))}
+Output JSON:
+{{
+  "company_fit_signals": {{
+    "industry_match": true,
+    "size_appropriate": true,
+    "evidence": "why"
+  }},
+  "ai_readiness_signals": {{
+    "level": "none|low|medium|high",
+    "tech_stack_relevant": false,
+    "ai_jobs_present": false,
+    "evidence": "why"
+  }},
+  "service_match_signals": {{
+    "matched": true,
+    "service_name": "which service",
+    "pain_count": 0,
+    "evidence": "which pain signals"
+  }},
+  "contact_quality_signals": {{
+    "email_verified": {str(has_verified_email).lower()},
+    "linkedin_found": {str(has_linkedin).lower()},
+    "decision_maker_identified": true
+  }},
+  "timing_signals": {{
+    "actively_growing": false,
+    "recently_active": true,
+    "evidence": "what suggests timing"
+  }},
+  "confidence": 0.0
+}}"""
+# ─── Main scoring function ───────────────────────────────────
+async def compute_score(
+    company_data: dict,
+    profile: dict,
+    contacts: list,
+    trace_id: str = ""
+) -> dict:
+    """
+    Step 1: LLM extracts signals (qualitative)
+    Step 2: Code computes score (deterministic, reproducible)
+    """
+    # ── Step 1: Signal extraction via LLM ─────────────────────
+    signals = await _extract_signals(company_data, profile, contacts, trace_id)
+    # ── Step 2: Deterministic scoring ─────────────────────────
+    score = _compute_deterministic_score(signals, company_data, profile, contacts)
+    return score
+async def _extract_signals(data, profile, contacts, trace_id) -> dict:
+    """Ask LLM to identify signals — NOT to score."""
+    try:
+        prompt = build_signal_prompt(data, profile, contacts)
+        result = await call_llm(
+            operation="score",
+            system_prompt=SYSTEM_PROMPT,
+            user_prompt=prompt,
+            model=MODELS["FAST"],  # 8B model — signal extraction is simple
+            temperature=0.1,
+            max_tokens=400,
+            json_mode=True,
+            trace_id=trace_id,
+            company_id=data.get("id"),
+        )
+        if result.get("parsed"):
+            return result["parsed"]
+    except Exception as e:
+        logger.warning(f"Signal extraction failed: {e}")
+    # Fallback: extract signals from raw data
+    return _extract_signals_deterministic(data, profile, contacts)
+def _extract_signals_deterministic(data, profile, contacts) -> dict:
+    """Rule-based signal extraction when LLM fails."""
+    has_email = any(c.get("email_verified") for c in contacts)
+    has_linkedin = any(c.get("linkedin_personal_url") for c in contacts)
+    return {
+        "company_fit_signals": {
+            "industry_match": bool(data.get("industry")),
+            "size_appropriate": (data.get("employee_count") or 0) >= 3,
+            "evidence": "deterministic",
+        },
+        "ai_readiness_signals": {
+            "level": profile.get("ai_readiness", "low"),
+            "tech_stack_relevant": len(data.get("tech_stack", [])) > 0,
+            "ai_jobs_present": data.get("ai_job_count", 0) > 0,
+            "evidence": "deterministic",
+        },
+        "service_match_signals": {
+            "matched": bool(data.get("service_match")),
+            "service_name": data.get("service_match", "NONE"),
+            "pain_count": len(data.get("pain_signals", [])),
+            "evidence": "deterministic",
+        },
+        "contact_quality_signals": {
+            "email_verified": has_email,
+            "linkedin_found": has_linkedin,
+            "decision_maker_identified": len(contacts) > 0,
+        },
+        "timing_signals": {
+            "actively_growing": data.get("ai_job_count", 0) > 0,
+            "recently_active": True,
+            "evidence": "deterministic",
+        },
+        "confidence": 0.5,
+    }
+# ─── Deterministic score computation ─────────────────────────
+# This is where the ACTUAL score is calculated.
+# No LLM involved — pure math from signals.
+def _compute_deterministic_score(signals: dict, data: dict, profile: dict, contacts: list) -> dict:
+    """
+    Weights:
+      company_fit:        25 pts
+      ai_readiness:       20 pts
+      service_match:      20 pts  (NEW — replaces old AI readiness weight)
+      decision_maker:     20 pts
+      timing:             15 pts
+    """
+    # ── Company Fit (25 pts) ──────────────────────────────────
+    fit = signals.get("company_fit_signals", {})
+    company_fit = 0
+    if fit.get("industry_match"): company_fit += 10
+    if fit.get("size_appropriate"): company_fit += 10
+    emp = data.get("employee_count") or 0
+    if emp >= 200: company_fit += 5
+    elif emp >= 50: company_fit += 3
+    elif emp >= 10: company_fit += 1
+    # ── AI Readiness (20 pts) ─────────────────────────────────
+    ai_sig = signals.get("ai_readiness_signals", {})
+    ai_readiness = 0
+    level = ai_sig.get("level", "low")
+    if level == "high": ai_readiness += 12
+    elif level == "medium": ai_readiness += 8
+    elif level == "low": ai_readiness += 3
+    if ai_sig.get("tech_stack_relevant"): ai_readiness += 4
+    if ai_sig.get("ai_jobs_present"): ai_readiness += 4
+    ai_readiness = min(20, ai_readiness)
+    # ── Service Match (20 pts) — KEY DIFFERENTIATOR ───────────
+    svc = signals.get("service_match_signals", {})
+    service_match = 0
+    if svc.get("matched"):
+        service_match += 10
+        pain_count = svc.get("pain_count", 0)
+        service_match += min(10, pain_count * 3)  # up to 10 pts for pain signals
+    service_match = min(20, service_match)
+    # ── Decision Maker Access (20 pts) ────────────────────────
+    contact = signals.get("contact_quality_signals", {})
+    dm = 0
+    if contact.get("email_verified"): dm += 12
+    elif any(c.get("email") for c in contacts): dm += 6
+    if contact.get("linkedin_found"): dm += 5
+    if contact.get("decision_maker_identified"): dm += 3
+    dm = min(20, dm)
+    # ── Timing (15 pts) ───────────────────────────────────────
+    timing = signals.get("timing_signals", {})
+    timing_score = 5  # base: company exists and has website
+    if timing.get("actively_growing"): timing_score += 5
+    if timing.get("recently_active"): timing_score += 3
+    if len(data.get("growth_signals", [])) >= 2: timing_score += 2
+    timing_score = min(15, timing_score)
+    # ── Total ─────────────────────────────────────────────────
+    total = company_fit + ai_readiness + service_match + dm + timing_score
+    tier = _score_to_tier(total)
+    return {
+        "company_fit": company_fit,
+        "ai_readiness_score": ai_readiness,
+        "service_match_score": service_match,
+        "decision_maker_access": dm,
+        "timing_score": timing_score,
+        "total_score": total,
+        "tier": tier,
+        "score_breakdown": {
+            "company_fit": f"{company_fit}/25",
+            "ai_readiness": f"{ai_readiness}/20",
+            "service_match": f"{service_match}/20",
+            "decision_maker": f"{dm}/20",
+            "timing": f"{timing_score}/15",
+        },
+        "score_reasoning": f"Deterministic score from {len(signals)} signal groups",
+        "llm_model": "deterministic_scorer",
+        "is_fallback": False,
+    }
+def _score_to_tier(score: int) -> str:
+    if score >= 85: return "hot"
+    if score >= 70: return "warm"
+    if score >= 50: return "nurture"
+    return "archive"

src/profiling/trigger-tasks/profiling-router.ts ADDED Viewed

	@@ -0,0 +1,158 @@

+import { task } from "@trigger.dev/sdk/v3";
+import axios from "axios";
+import { getEnv } from "../../shared/config/env";
+import { getSupabaseClient } from "../../shared/supabase/client";
+import { logger, auditLog } from "../../shared/utils/logger";
+import { CompanyDiscoveredPayload } from "../../shared/supabase/schema";
+/**
+ * Profiling Router — Trigger.dev task that:
+ * 1. Receives company.discovered event
+ * 2. Calls Python AI service for LLM profiling + scoring
+ * 3. Routes result: qualified → outreach queue, low score → nurture/archive
+ */
+export const profilingTask = task({
+  id: "profiling-router",
+  maxDuration: 300, // 5 min per company
+  run: async (payload: CompanyDiscoveredPayload) => {
+    const { company_id, domain, name, region, source } = payload;
+    const env = getEnv();
+    const db = getSupabaseClient();
+    logger.info({ company_id, domain }, "🧠 Profiling started");
+    try {
+      // ── Call Python AI Service ──────────────────────────────
+      const response = await axios.post(
+        `${env.PYTHON_AI_SERVICE_URL}/profile`,
+        { company_id, domain, name, region, source },
+        {
+          headers: {
+            "Content-Type": "application/json",
+            "x-service-secret": env.PYTHON_AI_SERVICE_SECRET,
+          },
+          timeout: 120_000, // 2 min timeout for LLM
+        }
+      );
+      const result = response.data;
+      logger.info(
+        { company_id, score: result.total_score, tier: result.tier },
+        "✅ Profiling complete"
+      );
+      // ── Route based on score tier ───────────────────────────
+      await routeByTier(company_id, result, db, env);
+      // ── Audit log ───────────────────────────────────────────
+      auditLog("lead_profiled", "company", {
+        company_id,
+        domain,
+        score: result.total_score,
+        tier: result.tier,
+        is_fallback: result.is_fallback,
+      });
+      return result;
+    } catch (err: unknown) {
+      // ── Python service unavailable → fallback ───────────────
+      if (axios.isAxiosError(err) && !err.response) {
+        logger.error({ company_id, domain }, "Python service unreachable — queuing for review");
+        await db.from("human_review_queue").insert({
+          type: "score_anomaly",
+          company_id,
+          payload: { reason: "python_service_unavailable", domain },
+        });
+        return { success: false, reason: "python_service_unavailable" };
+      }
+      throw err;
+    }
+  },
+});
+// ─── Score-based routing ──────────────────────────────────────
+async function routeByTier(
+  companyId: string,
+  result: { total_score: number; tier: string; needs_human_review: boolean },
+  db: ReturnType<typeof getSupabaseClient>,
+  env: ReturnType<typeof getEnv>
+) {
+  const { tier, total_score, needs_human_review } = result;
+  if (tier === "hot" || tier === "warm") {
+    if (needs_human_review && env.HUMAN_REVIEW_ENABLED) {
+      // Queue for human approval before outreach
+      logger.info({ companyId, tier }, "Routing to human review queue");
+      await db.from("human_review_queue").insert({
+        type: "outreach_approval",
+        company_id: companyId,
+        payload: { score: total_score, tier, reason: "human_review_required" },
+      });
+      await notifySlack(companyId, total_score, tier, env, "review");
+    } else {
+      // Qualified — trigger outreach (Step 3, to be built)
+      logger.info({ companyId, tier, score: total_score }, "🚀 Routing to outreach queue");
+      await notifySlack(companyId, total_score, tier, env, "qualified");
+      // Future: trigger outreach task
+      // await outreachTask.trigger({ company_id: companyId, tier });
+    }
+  } else if (tier === "nurture") {
+    logger.info({ companyId }, "Routing to nurture — re-score in 30 days");
+    // Future: schedule re-scoring task
+  } else {
+    logger.info({ companyId }, "Archived — score too low");
+  }
+}
+async function notifySlack(
+  companyId: string,
+  score: number,
+  tier: string,
+  env: ReturnType<typeof getEnv>,
+  type: "qualified" | "review"
+) {
+  try {
+    const db = getSupabaseClient();
+    const { data: company } = await db
+      .from("companies")
+      .select("name, domain, industry, employee_count")
+      .eq("id", companyId)
+      .single();
+    if (!company) return;
+    const emoji = tier === "hot" ? "🔥" : "✅";
+    const action = type === "review" ? "⏳ Needs Review" : "📤 Ready for Outreach";
+    const message = {
+      text: `${emoji} New Qualified Lead — ${action}`,
+      blocks: [
+        {
+          type: "section",
+          text: {
+            type: "mrkdwn",
+            text: `*${emoji} ${company.name}*\n${action}\n\n` +
+              `• *Score:* ${score}/100 — ${tier.toUpperCase()}\n` +
+              `• *Industry:* ${company.industry ?? "Unknown"}\n` +
+              `• *Employees:* ${company.employee_count ?? "Unknown"}\n` +
+              `• *Domain:* ${company.domain}`,
+          },
+        },
+      ],
+    };
+    const channelId = type === "review" ? env.SLACK_REVIEW_CHANNEL_ID : env.SLACK_ALERT_CHANNEL_ID;
+    await axios.post("https://slack.com/api/chat.postMessage", {
+      channel: channelId,
+      ...message,
+    }, {
+      headers: { Authorization: `Bearer ${env.SLACK_BOT_TOKEN}` },
+    });
+  } catch (err) {
+    logger.warn({ err }, "Slack notification failed — non-critical");
+  }
+}

src/shared/config/env.ts ADDED Viewed

	@@ -0,0 +1,66 @@

+import { z } from "zod";
+import * as dotenv from "dotenv";
+dotenv.config();
+const envSchema = z.object({
+  // ─── LLM (All on NVIDIA NIM — FREE) ────────────────────────
+  NVIDIA_API_KEY: z.string().min(5),
+  NVIDIA_NIM_BASE_URL: z.string().url().default("https://integrate.api.nvidia.com/v1"),
+  // ─── Supabase ──────────────────────────────────────────────
+  SUPABASE_URL: z.string().url(),
+  SUPABASE_SERVICE_ROLE_KEY: z.string().min(10),
+  // ─── Trigger.dev ───────────────────────────────────────────
+  TRIGGER_DEV_API_KEY: z.string().min(5),
+  TRIGGER_DEV_PROJECT_ID: z.string().min(3),
+  // ─── Web Research ──────────────────────────────────────────
+  SERPER_API_KEY: z.string().min(5),
+  // ─── Email Finding ─────────────────────────────────────────
+  HUNTER_API_KEY: z.string().min(5),
+  // ─── Email Verification ────────────────────────────────────
+  REOON_API_KEY: z.string().min(5),
+  // ─── Slack ─────────────────────────────────────────────────
+  SLACK_BOT_TOKEN: z.string().startsWith("xoxb-"),
+  SLACK_SIGNING_SECRET: z.string().min(5),
+  SLACK_ALERT_CHANNEL_ID: z.string(),
+  SLACK_REVIEW_CHANNEL_ID: z.string(),
+  // ─── Python AI Service ─────────────────────────────────────
+  PYTHON_AI_SERVICE_URL: z.string().url().default("http://localhost:8000"),
+  PYTHON_AI_SERVICE_SECRET: z.string().min(10),
+  // ─── System Config ─────────────────────────────────────────
+  NODE_ENV: z.enum(["development", "staging", "production"]).default("development"),
+  LOG_LEVEL: z.enum(["debug", "info", "warn", "error"]).default("info"),
+  DAILY_LEAD_QUOTA: z.coerce.number().default(10),
+  QUALITY_SCORE_THRESHOLD: z.coerce.number().default(70),
+  HUMAN_REVIEW_ENABLED: z.string().transform((v) => v === "true").default("true"),
+  DAILY_EMAIL_LIMIT: z.coerce.number().default(50),
+  DAILY_LINKEDIN_LIMIT: z.coerce.number().default(25),
+  SCHEDULE_START_HOUR_UTC: z.coerce.number().default(4),
+});
+type Env = z.infer<typeof envSchema>;
+let _env: Env;
+export function getEnv(): Env {
+  if (!_env) {
+    const result = envSchema.safeParse(process.env);
+    if (!result.success) {
+      console.error("❌ Invalid environment configuration:");
+      result.error.errors.forEach((e) => {
+        console.error(`  ${e.path.join(".")}: ${e.message}`);
+      });
+      process.exit(1);
+    }
+    _env = result.data;
+  }
+  return _env;
+}

src/shared/llm/grounding.ts ADDED Viewed

	@@ -0,0 +1,239 @@

+/**
+ * Grounded Hallucination Detection
+ *
+ * Google DeepMind approach: Every LLM claim must be traceable
+ * to a piece of evidence. Claims without evidence are stripped.
+ *
+ * This is NOT "ask LLM for confidence" — that's like asking
+ * a cheater to grade their own exam.
+ *
+ * This IS: cross-reference every output field against source data.
+ */
+import { logger } from "../utils/logger";
+export interface GroundingResult {
+  isGrounded: boolean;
+  groundingScore: number;         // 0.0-1.0
+  verifiedClaims: string[];       // claims that match evidence
+  unverifiedClaims: string[];     // claims with no evidence
+  strippedClaims: string[];       // claims removed from output
+  corrections: Record<string, { claimed: unknown; actual: unknown }>;
+}
+export interface EvidenceSet {
+  // Factual data we collected from providers/scrapers
+  company_name: string;
+  domain: string;
+  employee_count: number | null;
+  industry: string | null;
+  tech_stack: string[];
+  description: string | null;
+  website_text: string;
+  job_postings: string[];
+  ai_job_count: number;
+  linkedin_description: string | null;
+  country: string | null;
+  city: string | null;
+  pain_signals_detected: string[];
+}
+/**
+ * Validates LLM profile output against collected evidence.
+ * Returns cleaned profile with unverifiable claims stripped.
+ */
+export function groundProfile(
+  profile: Record<string, unknown>,
+  evidence: EvidenceSet
+): { cleaned: Record<string, unknown>; grounding: GroundingResult } {
+  const verified: string[] = [];
+  const unverified: string[] = [];
+  const stripped: string[] = [];
+  const corrections: Record<string, { claimed: unknown; actual: unknown }> = {};
+  const cleaned = { ...profile };
+  // ── Check profile_summary ──────────────────────────────────
+  const summary = String(profile.profile_summary ?? "");
+  // Does summary mention the right company?
+  if (summary.length > 20 && !containsName(summary, evidence.company_name)) {
+    stripped.push("summary_wrong_company");
+    // Don't strip — just flag. LLM may paraphrase the name.
+  }
+  // Does summary claim employee count?
+  const claimedEmpMatch = summary.match(/(\d[\d,]+)\s*(employees?|people|staff|team)/i);
+  if (claimedEmpMatch && evidence.employee_count) {
+    const claimed = parseInt(claimedEmpMatch[1].replace(/,/g, ""), 10);
+    if (Math.abs(claimed - evidence.employee_count) > evidence.employee_count * 0.3) {
+      corrections["employee_count"] = { claimed, actual: evidence.employee_count };
+      // Fix the claim in the summary
+      cleaned.profile_summary = summary.replace(
+        claimedEmpMatch[0],
+        `${evidence.employee_count} employees`
+      );
+      verified.push("employee_count_corrected");
+    } else {
+      verified.push("employee_count_accurate");
+    }
+  }
+  // ── Check industry claim ───────────────────────────────────
+  const claimedIndustry = summary.toLowerCase();
+  if (evidence.industry) {
+    const industryWords = evidence.industry.toLowerCase().split(/[\s_]+/);
+    const hasIndustryMention = industryWords.some(w => claimedIndustry.includes(w));
+    if (hasIndustryMention) {
+      verified.push("industry_match");
+    } else {
+      unverified.push("industry_may_differ");
+    }
+  }
+  // ── Check tech stack claims ─────────────────────────────────
+  if (Array.isArray(profile.evidence_used)) {
+    for (const claim of profile.evidence_used as string[]) {
+      const claimLower = claim.toLowerCase();
+      const isSupported =
+        evidence.tech_stack.some(t => claimLower.includes(t.toLowerCase())) ||
+        evidence.website_text.toLowerCase().includes(claimLower.slice(0, 20)) ||
+        evidence.job_postings.some(j => claimLower.includes(j.toLowerCase().slice(0, 15))) ||
+        evidence.pain_signals_detected.some(p => claimLower.includes(p.toLowerCase().slice(0, 15)));
+      if (isSupported) {
+        verified.push(`evidence: ${claim.slice(0, 40)}`);
+      } else {
+        unverified.push(`unverifiable: ${claim.slice(0, 40)}`);
+      }
+    }
+  }
+  // ── Check ai_readiness ─────────────────────────────────────
+  const claimedReadiness = String(profile.ai_readiness ?? "");
+  if (claimedReadiness === "high" && evidence.ai_job_count === 0 && evidence.tech_stack.length === 0) {
+    corrections["ai_readiness"] = { claimed: "high", actual: "low" };
+    cleaned.ai_readiness = "low";
+    verified.push("ai_readiness_corrected");
+  } else if (claimedReadiness === "low" && evidence.ai_job_count >= 3) {
+    corrections["ai_readiness"] = { claimed: "low", actual: "high" };
+    cleaned.ai_readiness = "high";
+    verified.push("ai_readiness_corrected");
+  } else {
+    verified.push("ai_readiness_plausible");
+  }
+  // ���─ Check for PII leakage ──────────────────────────────────
+  const outputStr = JSON.stringify(cleaned);
+  const emailPattern = /[\w.+-]+@[\w-]+\.[a-z]{2,}/gi;
+  const phonePattern = /\+?\d[\d\s\-().]{8,}/g;
+  if (emailPattern.test(outputStr)) {
+    stripped.push("pii_email_in_output");
+    // Strip emails from all string fields
+    for (const [key, val] of Object.entries(cleaned)) {
+      if (typeof val === "string") {
+        cleaned[key] = val.replace(emailPattern, "[EMAIL_REDACTED]");
+      }
+    }
+  }
+  if (phonePattern.test(outputStr)) {
+    stripped.push("pii_phone_in_output");
+    for (const [key, val] of Object.entries(cleaned)) {
+      if (typeof val === "string") {
+        cleaned[key] = val.replace(phonePattern, "[PHONE_REDACTED]");
+      }
+    }
+  }
+  // ── Compute grounding score ────────────────────────────────
+  const totalChecks = verified.length + unverified.length + stripped.length;
+  const groundingScore = totalChecks === 0 ? 0.5 : verified.length / totalChecks;
+  const result: GroundingResult = {
+    isGrounded: groundingScore >= 0.6 && stripped.length === 0,
+    groundingScore,
+    verifiedClaims: verified,
+    unverifiedClaims: unverified,
+    strippedClaims: stripped,
+    corrections,
+  };
+  if (!result.isGrounded) {
+    logger.warn(
+      { groundingScore: groundingScore.toFixed(2), corrections: Object.keys(corrections).length },
+      "Profile failed grounding — corrections applied"
+    );
+  }
+  return { cleaned, grounding: result };
+}
+/**
+ * Validates scoring signals against evidence.
+ * Scores are computed DETERMINISTICALLY from signals —
+ * LLM only extracts signals, code computes score.
+ */
+export function groundSignals(
+  signals: Record<string, unknown>,
+  evidence: EvidenceSet
+): { cleaned: Record<string, unknown>; grounding: GroundingResult } {
+  const verified: string[] = [];
+  const unverified: string[] = [];
+  const corrections: Record<string, { claimed: unknown; actual: unknown }> = {};
+  const cleaned = { ...signals };
+  // Verify company_fit_signals
+  const fitSignals = signals.company_fit_signals as Record<string, unknown> | undefined;
+  if (fitSignals) {
+    if (fitSignals.size_appropriate === true && evidence.employee_count !== null && evidence.employee_count < 3) {
+      corrections["size_appropriate"] = { claimed: true, actual: false };
+      verified.push("size_corrected");
+    } else {
+      verified.push("size_plausible");
+    }
+  }
+  // Verify ai_readiness_signals
+  const aiSignals = signals.ai_readiness_signals as Record<string, unknown> | undefined;
+  if (aiSignals) {
+    if (aiSignals.ai_jobs_present === true && evidence.ai_job_count === 0) {
+      corrections["ai_jobs_present"] = { claimed: true, actual: false };
+      verified.push("ai_jobs_corrected");
+    } else {
+      verified.push("ai_jobs_accurate");
+    }
+    if (aiSignals.tech_stack_relevant === true && evidence.tech_stack.length === 0) {
+      corrections["tech_stack_relevant"] = { claimed: true, actual: false };
+      verified.push("tech_stack_corrected");
+    } else {
+      verified.push("tech_stack_accurate");
+    }
+  }
+  const totalChecks = verified.length + unverified.length;
+  const groundingScore = totalChecks === 0 ? 0.5 : verified.length / totalChecks;
+  return {
+    cleaned,
+    grounding: {
+      isGrounded: groundingScore >= 0.6,
+      groundingScore,
+      verifiedClaims: verified,
+      unverifiedClaims: unverified,
+      strippedClaims: [],
+      corrections,
+    },
+  };
+}
+// ─── Helpers ─────────────────────────────────────────────────
+function containsName(text: string, name: string): boolean {
+  const words = name.toLowerCase().split(/\s+/);
+  const textLower = text.toLowerCase();
+  // At least one significant word from company name should be present
+  return words.some(w => w.length > 2 && textLower.includes(w));
+}

src/shared/llm/nvidia-client.ts ADDED Viewed

	@@ -0,0 +1,307 @@

+/**
+ * Multi-Model LLM Client — All FREE on NVIDIA NIM
+ *
+ * 3 models, 1 provider, 1 API key, $0 cost:
+ *
+ * Priority 1: MiniMax M2.7     → Best reasoning, 4M context, built-in CoT
+ * Priority 2: LLaMA 3.3 70B    → Reliable, proven, 128K context
+ * Priority 3: LLaMA 3.1 8B     → Fast, cheap, for simple tasks
+ * Priority 4: Deterministic     → Zero LLM, zero hallucination
+ *
+ * All on: https://integrate.api.nvidia.com/v1
+ * All use: same NVIDIA_API_KEY
+ *
+ * MiniMax M2.7 special feature:
+ *   Response includes `reasoning_content` field — chain-of-thought
+ *   reasoning happens AUTOMATICALLY inside the model.
+ *   We don't need to prompt "think step by step" — it does it natively.
+ */
+import axios, { AxiosError } from "axios";
+import { createHash } from "crypto";
+import { getEnv } from "../config/env";
+import { getSupabaseClient } from "../supabase/client";
+import { logger } from "../utils/logger";
+// ─── Types ───────────────────────────────────────────────────
+export interface LLMRequest {
+  operation: string;
+  modelIndex?: number;       // 0=MiniMax, 1=LLaMA70B, 2=LLaMA8B
+  systemPrompt: string;
+  userPrompt: string;
+  temperature?: number;
+  maxTokens?: number;
+  jsonMode?: boolean;
+  traceId: string;
+  companyId?: string;
+}
+export interface LLMResponse {
+  content: string;
+  reasoning: string | null;   // MiniMax's built-in chain-of-thought
+  parsed: Record<string, unknown> | null;
+  model: string;
+  provider: string;
+  tokens: { prompt: number; completion: number; total: number };
+  latencyMs: number;
+  grounded: boolean;
+  fallbackUsed: boolean;
+}
+// ─── Model configs (ALL on NVIDIA NIM, ALL FREE) ─────────────
+interface ModelConfig {
+  name: string;
+  model: string;
+  maxContext: number;
+  bestFor: string;
+}
+const MODEL_CONFIGS: ModelConfig[] = [
+  {
+    name: "MiniMax M2.7",
+    model: "minimaxai/minimax-m2.7",
+    maxContext: 4_000_000,         // 4M tokens!
+    bestFor: "profiling, scoring, complex reasoning",
+  },
+  {
+    name: "LLaMA 3.3 70B",
+    model: "meta/llama-3.3-70b-instruct",
+    maxContext: 128_000,
+    bestFor: "general tasks, reliable fallback",
+  },
+  {
+    name: "LLaMA 3.1 8B",
+    model: "meta/llama-3.1-8b-instruct",
+    maxContext: 128_000,
+    bestFor: "email classification, simple checks",
+  },
+];
+export const MODELS = {
+  MINIMAX: 0,       // Primary — best reasoning
+  LLAMA_70B: 1,     // Fallback — reliable
+  LLAMA_8B: 2,      // Fast — simple tasks
+  FAST: 2,          // alias
+} as const;
+// ─── Main LLM call ──────────────────────────────────────────
+export async function callLLM(request: LLMRequest): Promise<LLMResponse> {
+  const modelIndex = request.modelIndex ?? 0;
+  const env = getEnv();
+  if (modelIndex >= MODEL_CONFIGS.length) {
+    return deterministicFallback(request);
+  }
+  const config = MODEL_CONFIGS[modelIndex];
+  const startTime = Date.now();
+  const body: Record<string, unknown> = {
+    model: config.model,
+    messages: [
+      { role: "system", content: request.systemPrompt },
+      { role: "user", content: request.userPrompt },
+    ],
+    temperature: request.temperature ?? 0.2,
+    max_tokens: request.maxTokens ?? 1024,
+    top_p: 0.9,
+  };
+  if (request.jsonMode) {
+    body.response_format = { type: "json_object" };
+  }
+  try {
+    const response = await axios.post(
+      `${env.NVIDIA_NIM_BASE_URL}/chat/completions`,
+      body,
+      {
+        headers: {
+          Authorization: `Bearer ${env.NVIDIA_API_KEY}`,
+          "Content-Type": "application/json",
+        },
+        timeout: 90_000, // MiniMax can take longer for reasoning
+      }
+    );
+    const data = response.data;
+    const message = data.choices?.[0]?.message;
+    const content = message?.content ?? "";
+    const reasoning = message?.reasoning_content ?? null; // MiniMax CoT
+    const usage = data.usage ?? { prompt_tokens: 0, completion_tokens: 0, total_tokens: 0 };
+    const latencyMs = Date.now() - startTime;
+    let parsed: Record<string, unknown> | null = null;
+    if (request.jsonMode) {
+      parsed = safeParseJSON(content);
+      if (!parsed) {
+        logger.warn({ operation: request.operation, model: config.name }, "JSON parse failed → next model");
+        return callLLM({ ...request, modelIndex: modelIndex + 1 });
+      }
+    }
+    const result: LLMResponse = {
+      content,
+      reasoning,
+      parsed,
+      model: config.name,
+      provider: "nvidia",
+      tokens: {
+        prompt: usage.prompt_tokens,
+        completion: usage.completion_tokens,
+        total: usage.total_tokens,
+      },
+      latencyMs,
+      grounded: true,
+      fallbackUsed: false,
+    };
+    // Log MiniMax reasoning if present
+    if (reasoning) {
+      logger.debug({ operation: request.operation, reasoning: reasoning.slice(0, 200) },
+        "MiniMax reasoning captured");
+    }
+    await logLLMTrace(request, result, true, config);
+    return result;
+  } catch (err) {
+    if (err instanceof AxiosError) {
+      if (err.response?.status === 429) {
+        const retryAfter = parseInt(err.response.headers["retry-after"] ?? "5", 10);
+        logger.warn({ model: config.name, retryAfter }, "Rate limited → waiting");
+        await sleep(retryAfter * 1000);
+        return callLLM(request);
+      }
+      if (err.response?.status === 503 || err.response?.status === 500) {
+        logger.warn({ model: config.name, status: err.response?.status }, `${config.name} unavailable → next`);
+        return callLLM({ ...request, modelIndex: modelIndex + 1 });
+      }
+    }
+    logger.error({ model: config.name, err: String(err).slice(0, 200) }, "LLM call failed → next");
+    return callLLM({ ...request, modelIndex: modelIndex + 1 });
+  }
+}
+function deterministicFallback(request: LLMRequest): LLMResponse {
+  logger.error({ operation: request.operation }, "ALL models failed → deterministic fallback");
+  return {
+    content: "",
+    reasoning: null,
+    parsed: null,
+    model: "deterministic_fallback",
+    provider: "none",
+    tokens: { prompt: 0, completion: 0, total: 0 },
+    latencyMs: 0,
+    grounded: false,
+    fallbackUsed: true,
+  };
+}
+// ─── Self-consistency check ──────────────────────────────────
+// NOTE: MiniMax has built-in reasoning → consistency is higher
+// We still do dual-temperature check for critical operations
+export async function callLLMWithConsistencyCheck(
+  request: LLMRequest
+): Promise<{ primary: LLMResponse; isConsistent: boolean; consistencyScore: number }> {
+  const primary = await callLLM({ ...request, temperature: 0.1 });
+  if (!["profile", "score"].includes(request.operation)) {
+    return { primary, isConsistent: true, consistencyScore: 1.0 };
+  }
+  if (primary.fallbackUsed) {
+    return { primary, isConsistent: true, consistencyScore: 0.5 };
+  }
+  // MiniMax has reasoning → inherently more consistent
+  // Only do consistency check with LLaMA models
+  if (primary.model === "MiniMax M2.7" && primary.reasoning) {
+    // MiniMax showed its reasoning → trust it more
+    return { primary, isConsistent: true, consistencyScore: 0.95 };
+  }
+  const secondary = await callLLM({ ...request, temperature: 0.4, modelIndex: request.modelIndex });
+  const score = compareOutputs(primary, secondary);
+  return { primary, isConsistent: score >= 0.75, consistencyScore: score };
+}
+function compareOutputs(a: LLMResponse, b: LLMResponse): number {
+  if (!a.parsed || !b.parsed) return 0.5;
+  let matches = 0, total = 0;
+  for (const key of ["ai_readiness", "tier", "service_match"]) {
+    if (key in a.parsed && key in b.parsed) {
+      total++;
+      if (a.parsed[key] === b.parsed[key]) matches++;
+    }
+  }
+  for (const key of ["total_score", "company_fit"]) {
+    const aVal = a.parsed[key], bVal = b.parsed[key];
+    if (typeof aVal === "number" && typeof bVal === "number") {
+      total++;
+      if (Math.abs(aVal - bVal) <= 10) matches++;
+    }
+  }
+  return total === 0 ? 1.0 : matches / total;
+}
+// ─── Trace logging ───────────────────────────────────────────
+async function logLLMTrace(
+  request: LLMRequest,
+  response: LLMResponse | null,
+  success: boolean,
+  config?: ModelConfig
+): Promise<void> {
+  try {
+    const db = getSupabaseClient();
+    await db.from("llm_traces").insert({
+      trace_id: request.traceId,
+      operation: request.operation,
+      model: response?.model ?? config?.name ?? "unknown",
+      provider: "nvidia",
+      prompt_tokens: response?.tokens.prompt ?? 0,
+      completion_tokens: response?.tokens.completion ?? 0,
+      total_tokens: response?.tokens.total ?? 0,
+      latency_ms: response?.latencyMs ?? 0,
+      success,
+      fallback_used: response?.fallbackUsed ?? true,
+      company_id: request.companyId ?? null,
+      input_hash: hashText(request.userPrompt.slice(0, 200)),
+      output_hash: response ? hashText(response.content.slice(0, 200)) : null,
+    });
+  } catch (err) {
+    logger.warn({ err }, "Trace log failed — non-critical");
+  }
+}
+// ─── Helpers ─────────────────────────────────────────────────
+function safeParseJSON(text: string): Record<string, unknown> | null {
+  let content = text.trim();
+  if (content.includes("```json")) content = content.split("```json")[1].split("```")[0].trim();
+  else if (content.includes("```")) content = content.split("```")[1].split("```")[0].trim();
+  try {
+    return JSON.parse(content);
+  } catch {
+    const match = content.match(/\{[\s\S]*\}/);
+    if (match) { try { return JSON.parse(match[0]); } catch { return null; } }
+    return null;
+  }
+}
+function hashText(text: string): string {
+  return createHash("sha256").update(text).digest("hex").slice(0, 16);
+}
+function sleep(ms: number): Promise<void> {
+  return new Promise((resolve) => setTimeout(resolve, ms));
+}

src/shared/llm/prompts.ts ADDED Viewed

	@@ -0,0 +1,277 @@

+/**
+ * Production-grade prompt library.
+ *
+ * Design principles (Google/Anthropic standard):
+ * 1. Chain-of-thought: Force reasoning before conclusion
+ * 2. Few-shot examples: 2-3 examples for each prompt
+ * 3. Structured output: Exact JSON schema specified
+ * 4. Grounding instruction: "Only state what evidence supports"
+ * 5. Anti-hallucination: "Write UNKNOWN if data not provided"
+ * 6. Token-efficient: No verbose instructions, no repetition
+ */
+// ─── SYSTEM PROMPTS ──────────────────────────────────────────
+export const SYSTEM_PROMPTS = {
+  PROFILER: `You are a business analyst for an AI automation agency.
+Your job: analyze a company and identify WHERE our AI services can help them.
+CRITICAL RULES:
+- Only state facts supported by the provided evidence
+- Write "UNKNOWN" for anything not in the data — NEVER guess
+- Your analysis determines whether a real salesperson contacts this company
+- Wrong analysis = wasted human time = unacceptable
+- Think step by step before concluding`,
+  SCORER: `You are a lead qualification engine.
+Your job: extract SIGNALS from company data. You do NOT compute the final score.
+The system computes scores deterministically from your signal extraction.
+CRITICAL RULES:
+- Extract only what the evidence supports
+- For each signal, cite which piece of evidence supports it
+- If evidence is weak or missing, say so honestly
+- Output ONLY the structured JSON requested`,
+  EMAIL_CLASSIFIER: `You are a B2B email quality analyst.
+Your job: determine if a specific email address reaches a decision-maker.
+Consider company size, industry, and the email prefix meaning in context.
+CRITICAL RULES:
+- Small company (<20 people): admin@, operations@, office@ likely reaches owner
+- Large company (200+): same prefixes likely reach departments, not individuals
+- NEVER assume — reason from the evidence provided
+- When uncertain, err on the side of KEEPING the email (mark confidence low)`,
+  PAIN_DETECTOR: `You are an operations efficiency analyst.
+Your job: identify operational pain points in a company that AI automation can solve.
+You are NOT looking for companies that already use AI.
+You ARE looking for companies with manual, repetitive, or inefficient processes.
+CRITICAL RULES:
+- A phone number on homepage = manual call handling (pain point)
+- "Book by phone" = no online scheduling (pain point)
+- No chatbot visible = manual customer interaction (pain point)
+- Small staff + many services = overworked team (pain point)
+- These are REAL signals, not guesses`,
+} as const;
+// ─── PROFILING PROMPT ────────────────────────────────────────
+export function buildProfilePrompt(companyData: {
+  name: string;
+  industry: string;
+  employee_count: number | null;
+  description: string;
+  website_text: string;
+  tech_stack: string[];
+  job_postings: string[];
+  ai_job_count: number;
+  linkedin_description: string;
+  pain_signals: string[];
+  service_match: string | null;
+}): string {
+  return `ANALYZE THIS COMPANY:
+Name: ${companyData.name}
+Industry: ${companyData.industry || "UNKNOWN"}
+Employees: ${companyData.employee_count ?? "UNKNOWN"}
+Description: ${companyData.description || "NONE PROVIDED"}
+Website excerpt (first 600 chars):
+${(companyData.website_text || "").slice(0, 600)}
+LinkedIn description:
+${companyData.linkedin_description || "NONE"}
+Tech stack detected: ${companyData.tech_stack.length ? companyData.tech_stack.join(", ") : "NONE DETECTED"}
+Job postings mentioning AI/automation: ${companyData.ai_job_count}
+Pain signals detected: ${companyData.pain_signals.length ? companyData.pain_signals.join(", ") : "NONE"}
+Service match suggestion: ${companyData.service_match || "NONE"}
+STEP-BY-STEP ANALYSIS:
+Step 1: What does this company actually DO? (2 sentences, facts only)
+Step 2: What are their likely daily operational challenges? (based on industry + size)
+Step 3: What specific AI automation would save them time/money? (be specific)
+Step 4: Who in this organization would approve buying this service?
+Step 5: What outreach angle would resonate with this specific person?
+After reasoning through steps 1-5, output this JSON:
+{
+  "profile_summary": "2-3 factual sentences about what this company does",
+  "pain_points": ["specific pain 1", "specific pain 2"],
+  "ai_use_case": "The single most compelling AI use case for them",
+  "ai_readiness": "low|medium|high",
+  "decision_maker_reasoning": "Who likely makes purchasing decisions and why",
+  "outreach_angle": "One specific sentence — the hook for first contact",
+  "confidence": 0.0,
+  "evidence_used": ["list which data points you relied on"],
+  "evidence_missing": ["list what data you wished you had"]
+}
+EXAMPLE 1 (dental clinic, 6 employees):
+{
+  "profile_summary": "ABC Dental is a 6-person dental practice in Houston offering general and cosmetic dentistry. They display their phone number prominently and use a basic contact form for appointments.",
+  "pain_points": ["Manual phone-based appointment scheduling during business hours only", "No after-hours patient communication capability"],
+  "ai_use_case": "AI receptionist to handle appointment booking, reminders, and after-hours calls",
+  "ai_readiness": "low",
+  "decision_maker_reasoning": "Practice owner (Dr. Smith, DDS) makes all purchasing decisions. Small practice = owner controls budget directly.",
+  "outreach_angle": "Stop losing patients to voicemail — our AI receptionist books appointments 24/7, even when your front desk is closed",
+  "confidence": 0.82,
+  "evidence_used": ["phone number on homepage", "contact form only", "6 staff listed", "no chatbot detected"],
+  "evidence_missing": ["annual revenue", "number of daily calls", "current scheduling software"]
+}
+EXAMPLE 2 (manufacturing company, 150 employees):
+{
+  "profile_summary": "XYZ Manufacturing is a UK-based manufacturer of industrial valves with 150 employees. They use SAP for ERP and are hiring a Data Analyst, suggesting manual reporting pain.",
+  "pain_points": ["Manual data extraction from legacy SAP system", "Production reporting requires manual spreadsheet compilation"],
+  "ai_use_case": "Automated reporting pipeline that extracts SAP data and generates dashboards without manual intervention",
+  "ai_readiness": "medium",
+  "decision_maker_reasoning": "Operations Director (found on LinkedIn) manages the data team and would champion this internally. CTO signs off on tech purchases.",
+  "outreach_angle": "Your Data Analyst job posting tells us you're drowning in manual SAP reports — we automate that entirely",
+  "confidence": 0.88,
+  "evidence_used": ["SAP detected in tech stack", "Data Analyst job posting", "150 employees", "manufacturing industry"],
+  "evidence_missing": ["specific SAP modules used", "current reporting frequency"]
+}`;
+}
+// ─── SIGNAL EXTRACTION PROMPT (for scoring) ──────────────────
+export function buildSignalExtractionPrompt(companyData: {
+  name: string;
+  industry: string;
+  employee_count: number | null;
+  tech_stack: string[];
+  ai_job_count: number;
+  pain_signals: string[];
+  service_match: string | null;
+  has_verified_email: boolean;
+  has_linkedin: boolean;
+  has_social: boolean;
+  growth_signals_count: number;
+  website_active: boolean;
+}): string {
+  return `EXTRACT SIGNALS for lead scoring. Do not compute a score — just identify signals.
+Company: ${companyData.name}
+Industry: ${companyData.industry || "UNKNOWN"}
+Employees: ${companyData.employee_count ?? "UNKNOWN"}
+Tech stack: ${companyData.tech_stack.join(", ") || "NONE"}
+AI/automation job postings: ${companyData.ai_job_count}
+Pain signals detected: ${companyData.pain_signals.join(", ") || "NONE"}
+Service match: ${companyData.service_match || "NONE"}
+Has verified email: ${companyData.has_verified_email}
+Has personal LinkedIn: ${companyData.has_linkedin}
+Has social profiles: ${companyData.has_social}
+Growth signals count: ${companyData.growth_signals_count}
+Website recently active: ${companyData.website_active}
+Output JSON:
+{
+  "company_fit_signals": {
+    "industry_match": true|false,
+    "size_appropriate": true|false,
+    "evidence": "why"
+  },
+  "ai_readiness_signals": {
+    "level": "none|low|medium|high",
+    "tech_stack_relevant": true|false,
+    "ai_jobs_present": true|false,
+    "evidence": "why"
+  },
+  "service_match_signals": {
+    "matched": true|false,
+    "service_name": "which service fits",
+    "pain_count": 0,
+    "evidence": "which pain signals"
+  },
+  "contact_quality_signals": {
+    "email_verified": true|false,
+    "linkedin_found": true|false,
+    "decision_maker_identified": true|false
+  },
+  "timing_signals": {
+    "actively_growing": true|false,
+    "recently_active": true|false,
+    "evidence": "what suggests good timing"
+  },
+  "confidence": 0.0
+}`;
+}
+// ─── EMAIL CLASSIFICATION PROMPT ─────────────────────────────
+export function buildEmailClassifyPrompt(data: {
+  email: string;
+  company_name: string;
+  company_size: number | null;
+  industry: string;
+  website_snippet: string;
+}): string {
+  return `CLASSIFY this email address for B2B outreach viability.
+Email: ${data.email}
+Company: ${data.company_name}
+Size: ${data.company_size ?? "UNKNOWN"} employees
+Industry: ${data.industry || "UNKNOWN"}
+Website excerpt: ${(data.website_snippet || "").slice(0, 300)}
+Does "${data.email}" likely reach a person with purchasing authority?
+Consider:
+- Email prefix meaning in context of this company size
+- "${data.email.split("@")[0]}@" at a ${data.company_size ?? "unknown"}-person ${data.industry} company
+- Small companies: admin/operations/office = often the owner
+- Large companies: admin/operations = departments, not individuals
+Output JSON:
+{
+  "keep": true|false,
+  "confidence": 0.0,
+  "likely_reaches": "who this email probably reaches",
+  "reason": "one line why keep or reject"
+}`;
+}
+// ─── PAIN SIGNAL DETECTION PROMPT ────────────────────────────
+export function buildPainDetectionPrompt(data: {
+  company_name: string;
+  industry: string;
+  employee_count: number | null;
+  website_text: string;
+  page_elements: string[];  // ['phone_number', 'contact_form', 'no_chatbot', etc.]
+}): string {
+  return `DETECT operational inefficiency signals for this company.
+Company: ${data.company_name}
+Industry: ${data.industry || "UNKNOWN"}
+Size: ${data.employee_count ?? "UNKNOWN"} employees
+Website text (excerpt):
+${(data.website_text || "").slice(0, 500)}
+Page elements detected:
+${data.page_elements.join("\n")}
+IMPORTANT: You are NOT looking for AI signals. You are looking for MANUAL PROCESS signals.
+A phone number on a homepage IS a signal (manual call handling).
+A "Book by Phone" button IS a signal (no online scheduling).
+No live chat IS a signal (no automated customer interaction).
+Output JSON:
+{
+  "pain_signals": [
+    {"signal": "what you detected", "evidence": "where on page", "severity": "low|medium|high"}
+  ],
+  "service_match": "which AI service best fits: AI Receptionist|AI Customer Support|AI Data Processing|AI Sales Automation|AI Workflow Automation|NONE",
+  "match_confidence": 0.0,
+  "reasoning": "one paragraph explaining your analysis"
+}`;
+}

src/shared/observability/tracer.ts ADDED Viewed

	@@ -0,0 +1,118 @@

+/**
+ * Pipeline Observability — Trace ID system
+ *
+ * Every pipeline run gets a unique trace_id.
+ * Every operation within that run carries the trace_id.
+ * This enables: debugging, cost tracking, latency analysis.
+ *
+ * Think of it like a receipt number — every action is linked.
+ */
+import { randomUUID } from "crypto";
+import { getSupabaseClient } from "../supabase/client";
+import { logger } from "../utils/logger";
+export interface PipelineTrace {
+  traceId: string;
+  runId: string;         // discovery_runs.id
+  startedAt: number;     // Date.now()
+  operationCount: number;
+  totalTokens: number;
+  totalLatencyMs: number;
+  errors: string[];
+}
+// In-memory trace store (per pipeline run)
+const activeTraces = new Map<string, PipelineTrace>();
+/**
+ * Start a new pipeline trace. Call this at beginning of every discovery run.
+ */
+export function startTrace(runId: string): string {
+  const traceId = `trace_${randomUUID().slice(0, 8)}_${Date.now()}`;
+  activeTraces.set(traceId, {
+    traceId,
+    runId,
+    startedAt: Date.now(),
+    operationCount: 0,
+    totalTokens: 0,
+    totalLatencyMs: 0,
+    errors: [],
+  });
+  logger.info({ traceId, runId }, "🔍 Pipeline trace started");
+  return traceId;
+}
+/**
+ * Record an operation within a trace.
+ */
+export function recordOperation(
+  traceId: string,
+  operation: string,
+  tokens: number,
+  latencyMs: number,
+  success: boolean,
+  error?: string
+): void {
+  const trace = activeTraces.get(traceId);
+  if (!trace) return;
+  trace.operationCount++;
+  trace.totalTokens += tokens;
+  trace.totalLatencyMs += latencyMs;
+  if (!success && error) {
+    trace.errors.push(`${operation}: ${error}`);
+  }
+}
+/**
+ * End trace and persist summary to audit_log.
+ */
+export async function endTrace(traceId: string): Promise<PipelineTrace | null> {
+  const trace = activeTraces.get(traceId);
+  if (!trace) return null;
+  const duration = Date.now() - trace.startedAt;
+  logger.info({
+    traceId,
+    operations: trace.operationCount,
+    tokens: trace.totalTokens,
+    durationMs: duration,
+    errors: trace.errors.length,
+  }, "✅ Pipeline trace completed");
+  // Persist to audit log
+  try {
+    const db = getSupabaseClient();
+    await db.from("audit_log").insert({
+      action: "pipeline_trace_completed",
+      entity_type: "discovery_run",
+      entity_id: trace.runId,
+      details: {
+        trace_id: traceId,
+        duration_ms: duration,
+        operations: trace.operationCount,
+        total_tokens: trace.totalTokens,
+        total_latency_ms: trace.totalLatencyMs,
+        error_count: trace.errors.length,
+        errors: trace.errors.slice(0, 10), // cap at 10
+      },
+    });
+  } catch (err) {
+    logger.warn({ err }, "Failed to persist trace — non-critical");
+  }
+  activeTraces.delete(traceId);
+  return trace;
+}
+/**
+ * Get active trace (for passing to LLM calls etc.)
+ */
+export function getTrace(traceId: string): PipelineTrace | undefined {
+  return activeTraces.get(traceId);
+}

src/shared/pipeline/checkpoint.ts ADDED Viewed

	@@ -0,0 +1,143 @@

+/**
+ * Pipeline Checkpoint System — Crash Recovery
+ *
+ * Problem: Pipeline crashes at company #15 → restarts →
+ *          processes company #1-14 again = wasted API calls + duplicates
+ *
+ * Solution: Save checkpoint after each major stage.
+ *           On restart, resume from last checkpoint.
+ *
+ * Stages (in order):
+ *   scraped → filtered → contacts_found → profiled → scored → completed
+ */
+import { getSupabaseClient } from "../supabase/client";
+import { logger } from "../utils/logger";
+export type PipelineStage =
+  | "discovered"       // found in search results
+  | "scraped"          // website scraped
+  | "filtered"         // passed Gate 1 + 2
+  | "contacts_found"   // decision makers identified
+  | "emails_verified"  // emails found and verified
+  | "profiled"         // LLM profile generated
+  | "scored"           // score computed
+  | "completed";       // fully processed
+/**
+ * Save checkpoint for a company in a specific run.
+ * Stores intermediate data so pipeline can resume from this point.
+ */
+export async function saveCheckpoint(
+  runId: string,
+  domain: string,
+  stage: PipelineStage,
+  stageData: Record<string, unknown> = {}
+): Promise<void> {
+  const db = getSupabaseClient();
+  const { error } = await db.from("pipeline_checkpoints").upsert(
+    {
+      run_id: runId,
+      company_domain: domain,
+      stage,
+      stage_data: stageData,
+      completed: stage === "completed",
+      updated_at: new Date().toISOString(),
+    },
+    { onConflict: "run_id,company_domain" }
+  );
+  if (error) {
+    logger.warn({ domain, stage, error: error.message }, "Checkpoint save failed — non-critical");
+  }
+}
+/**
+ * Get the last checkpoint for a domain in a run.
+ * Returns null if no checkpoint exists (fresh start).
+ */
+export async function getCheckpoint(
+  runId: string,
+  domain: string
+): Promise<{ stage: PipelineStage; stageData: Record<string, unknown> } | null> {
+  const db = getSupabaseClient();
+  const { data } = await db
+    .from("pipeline_checkpoints")
+    .select("stage, stage_data")
+    .eq("run_id", runId)
+    .eq("company_domain", domain)
+    .maybeSingle();
+  if (!data) return null;
+  return { stage: data.stage as PipelineStage, stageData: data.stage_data ?? {} };
+}
+/**
+ * Check if a domain was already fully processed in ANY recent run.
+ * Prevents re-processing across separate runs (not just within one run).
+ */
+export async function isAlreadyProcessed(domain: string, withinDays = 30): Promise<boolean> {
+  const db = getSupabaseClient();
+  const cutoff = new Date();
+  cutoff.setDate(cutoff.getDate() - withinDays);
+  const { data } = await db
+    .from("pipeline_checkpoints")
+    .select("id")
+    .eq("company_domain", domain)
+    .eq("completed", true)
+    .gte("updated_at", cutoff.toISOString())
+    .limit(1)
+    .maybeSingle();
+  return !!data;
+}
+/**
+ * Get all incomplete companies in a run (for resume).
+ * Returns list of domains and their last stage.
+ */
+export async function getIncompleteCompanies(
+  runId: string
+): Promise<{ domain: string; stage: PipelineStage; stageData: Record<string, unknown> }[]> {
+  const db = getSupabaseClient();
+  const { data } = await db
+    .from("pipeline_checkpoints")
+    .select("company_domain, stage, stage_data")
+    .eq("run_id", runId)
+    .eq("completed", false);
+  return (data ?? []).map((d) => ({
+    domain: d.company_domain,
+    stage: d.stage as PipelineStage,
+    stageData: d.stage_data ?? {},
+  }));
+}
+/**
+ * Stage ordering — used to determine if we can skip ahead.
+ */
+const STAGE_ORDER: PipelineStage[] = [
+  "discovered", "scraped", "filtered", "contacts_found",
+  "emails_verified", "profiled", "scored", "completed",
+];
+export function isStageComplete(currentStage: PipelineStage, requiredStage: PipelineStage): boolean {
+  return STAGE_ORDER.indexOf(currentStage) >= STAGE_ORDER.indexOf(requiredStage);
+}
+/**
+ * Helper to determine where to resume processing for a company.
+ */
+export function getResumePoint(checkpoint: { stage: PipelineStage } | null): PipelineStage {
+  if (!checkpoint) return "discovered";
+  // Resume from the NEXT stage after the last completed one
+  const idx = STAGE_ORDER.indexOf(checkpoint.stage);
+  if (idx < 0 || idx >= STAGE_ORDER.length - 1) return "discovered";
+  return STAGE_ORDER[idx + 1];
+}

src/shared/supabase/client.ts ADDED Viewed

	@@ -0,0 +1,15 @@

+import { createClient } from "@supabase/supabase-js";
+import { getEnv } from "../config/env";
+let _client: ReturnType<typeof createClient> | null = null;
+export function getSupabaseClient() {
+  if (!_client) {
+    const env = getEnv();
+    _client = createClient(env.SUPABASE_URL, env.SUPABASE_SERVICE_ROLE_KEY, {
+      auth: { persistSession: false },
+      db: { schema: "public" },
+    });
+  }
+  return _client;
+}

src/shared/supabase/schema.ts ADDED Viewed

	@@ -0,0 +1,184 @@

+// ─── TypeScript types matching Supabase schema ─────────────────
+export type CompanyStatus =
+  | "discovered" | "researching" | "profiled"
+  | "qualified" | "nurture" | "archived" | "suppressed";
+export type ContactStatus =
+  | "found" | "email_verified" | "email_invalid"
+  | "linkedin_only" | "suppressed";
+export type LeadTier = "hot" | "warm" | "nurture" | "archive";
+export type OutreachChannel = "email" | "linkedin";
+export type OutreachStatus =
+  | "queued" | "sent" | "opened" | "replied"
+  | "bounced" | "failed" | "review_needed";
+export type IntentType =
+  | "interested" | "question" | "not_now"
+  | "not_interested" | "out_of_office" | "wrong_person" | "unknown";
+export type ReviewStatus = "pending" | "approved" | "rejected" | "edited";
+// ─── Table row types ─────────────────────────────────────────
+export interface IcpConfig {
+  id: string;
+  name: string;
+  min_employees: number;
+  industries: string[];
+  exclude_industries: string[];
+  geographies: string[];
+  keywords: string[];
+  tech_signals: string[];
+  score_threshold: number;
+  is_active: boolean;
+  created_at: string;
+  updated_at: string;
+}
+export interface RotationState {
+  id: string;
+  week_number: number;
+  region: string;
+  started_at: string;
+  completed_at: string | null;
+  companies_found: number;
+  leads_qualified: number;
+}
+export interface Company {
+  id: string;
+  domain: string;
+  name: string;
+  industry: string | null;
+  employee_count: number | null;
+  employee_range: string | null;
+  description: string | null;
+  website_url: string | null;
+  linkedin_url: string | null;
+  country: string | null;
+  region: string | null;
+  tech_stack: string[];
+  growth_signals: GrowthSignal[];
+  raw_data: Record<string, unknown>;
+  source: string;
+  status: CompanyStatus;
+  discovered_at: string;
+  updated_at: string;
+}
+export interface GrowthSignal {
+  type: "job_posting" | "news" | "funding" | "social_post" | "expansion";
+  content: string;
+  source_url?: string;
+  ai_related: boolean;
+  detected_at: string;
+}
+export interface Contact {
+  id: string;
+  company_id: string;
+  full_name: string;
+  first_name: string | null;
+  last_name: string | null;
+  title: string;
+  seniority: "c_suite" | "vp" | "director" | "manager" | null;
+  email: string | null;
+  email_verified: boolean;
+  email_source: "hunter" | "snov" | "pattern" | null;
+  linkedin_url: string | null;
+  linkedin_verified: boolean;
+  status: ContactStatus;
+  suppressed: boolean;
+  suppressed_at: string | null;
+  suppressed_reason: string | null;
+  created_at: string;
+  updated_at: string;
+}
+export interface Evidence {
+  id: string;
+  company_id: string;
+  type: "job_posting" | "news" | "social_post" | "website_text" | "tech_stack";
+  content: string;
+  source_url: string | null;
+  ai_signal: boolean;
+  collected_at: string;
+}
+export interface LeadProfile {
+  id: string;
+  company_id: string;
+  profile_summary: string;
+  pain_points: string[];
+  ai_use_case: string | null;
+  ai_readiness: "low" | "medium" | "high";
+  outreach_angle: string | null;
+  llm_model: string;
+  llm_confidence: number | null;
+  is_fallback: boolean;
+  created_at: string;
+}
+export interface LeadScore {
+  id: string;
+  company_id: string;
+  contact_id: string | null;
+  total_score: number;
+  tier: LeadTier;
+  company_fit: number | null;
+  ai_readiness: number | null;
+  decision_maker: number | null;
+  growth_signal: number | null;
+  engagement_potential: number | null;
+  score_reasoning: string | null;
+  scored_at: string;
+}
+export interface HumanReviewItem {
+  id: string;
+  type: "outreach_approval" | "score_anomaly" | "escalation";
+  company_id: string | null;
+  contact_id: string | null;
+  payload: Record<string, unknown>;
+  status: ReviewStatus;
+  reviewer_notes: string | null;
+  resolved_at: string | null;
+  created_at: string;
+}
+// ─── Insert types (no id/timestamps) ─────────────────────────
+export type InsertCompany = Omit<Company, "id" | "discovered_at" | "updated_at">;
+export type InsertContact = Omit<Contact, "id" | "created_at" | "updated_at">;
+export type InsertEvidence = Omit<Evidence, "id" | "collected_at">;
+export type InsertLeadProfile = Omit<LeadProfile, "id" | "created_at">;
+export type InsertLeadScore = Omit<LeadScore, "id" | "scored_at">;
+// ─── Trigger.dev event payloads ────────────────────────────────
+export interface CompanyDiscoveredPayload {
+  company_id: string;
+  domain: string;
+  name: string;
+  region: string;
+  source: "auto" | "manual";
+}
+export interface LeadScoredPayload {
+  lead_score_id: string;
+  company_id: string;
+  contact_id: string | null;
+  total_score: number;
+  tier: LeadTier;
+}
+export interface OutreachQueuedPayload {
+  company_id: string;
+  contact_id: string;
+  score: number;
+  tier: LeadTier;
+}

src/shared/utils/logger.ts ADDED Viewed

	@@ -0,0 +1,40 @@

+import pino from "pino";
+import { getEnv } from "../config/env";
+// PII fields that will be redacted in logs
+const PII_FIELDS = ["email", "full_name", "first_name", "last_name", "phone", "linkedin_url"];
+function redactPii(obj: Record<string, unknown>): Record<string, unknown> {
+  const result: Record<string, unknown> = {};
+  for (const [key, value] of Object.entries(obj)) {
+    if (PII_FIELDS.includes(key) && typeof value === "string") {
+      // Show first 3 chars + *** e.g. "joh***"
+      result[key] = value.length > 3 ? `${value.slice(0, 3)}***` : "***";
+    } else if (value && typeof value === "object" && !Array.isArray(value)) {
+      result[key] = redactPii(value as Record<string, unknown>);
+    } else {
+      result[key] = value;
+    }
+  }
+  return result;
+}
+const env = getEnv();
+export const logger = pino({
+  level: env.LOG_LEVEL,
+  transport:
+    env.NODE_ENV === "development"
+      ? { target: "pino-pretty", options: { colorize: true } }
+      : undefined,
+  serializers: {
+    // Auto-redact PII in any "contact" or "data" field
+    contact: (val: Record<string, unknown>) => redactPii(val),
+    data: (val: Record<string, unknown>) => redactPii(val),
+  },
+});
+// Convenience method for audit-safe logging
+export function auditLog(action: string, entity: string, details: Record<string, unknown>) {
+  logger.info({ action, entity, details: redactPii(details) }, `[AUDIT] ${action}`);
+}

src/shared/utils/rate-limiter.ts ADDED Viewed

	@@ -0,0 +1,103 @@

+import { logger } from "./logger";
+interface BucketState {
+  tokens: number;
+  lastRefill: number;
+}
+/**
+ * Token bucket rate limiter per provider.
+ * Controls how many API calls can be made per time window.
+ */
+export class RateLimiter {
+  private buckets = new Map<string, BucketState>();
+  constructor(
+    private readonly maxTokens: number,
+    private readonly refillRateMs: number // how often to fully refill
+  ) {}
+  /**
+   * Returns true if the call is allowed, false if rate limit exceeded.
+   */
+  tryConsume(provider: string, tokens = 1): boolean {
+    const now = Date.now();
+    let bucket = this.buckets.get(provider);
+    if (!bucket) {
+      bucket = { tokens: this.maxTokens, lastRefill: now };
+      this.buckets.set(provider, bucket);
+    }
+    // Refill based on elapsed time
+    const elapsed = now - bucket.lastRefill;
+    if (elapsed >= this.refillRateMs) {
+      bucket.tokens = this.maxTokens;
+      bucket.lastRefill = now;
+    }
+    if (bucket.tokens < tokens) {
+      logger.warn({ provider, tokensLeft: bucket.tokens }, `[RateLimit] ${provider} throttled`);
+      return false;
+    }
+    bucket.tokens -= tokens;
+    return true;
+  }
+  /**
+   * Wait until a token is available (blocking version).
+   */
+  async consume(provider: string, tokens = 1): Promise<void> {
+    while (!this.tryConsume(provider, tokens)) {
+      await new Promise((r) => setTimeout(r, 500));
+    }
+  }
+}
+// ─── Daily quota tracker (persisted in memory, resets at midnight) ────────
+interface DailyQuota {
+  count: number;
+  date: string; // YYYY-MM-DD
+}
+const dailyQuotas = new Map<string, DailyQuota>();
+function todayStr(): string {
+  return new Date().toISOString().split("T")[0];
+}
+export function checkDailyQuota(key: string, limit: number): boolean {
+  const today = todayStr();
+  const quota = dailyQuotas.get(key);
+  if (!quota || quota.date !== today) {
+    dailyQuotas.set(key, { count: 0, date: today });
+    return true;
+  }
+  if (quota.count >= limit) {
+    logger.warn({ key, count: quota.count, limit }, `[DailyQuota] ${key} limit reached`);
+    return false;
+  }
+  return true;
+}
+export function incrementDailyQuota(key: string): void {
+  const today = todayStr();
+  const quota = dailyQuotas.get(key) ?? { count: 0, date: today };
+  if (quota.date !== today) {
+    quota.count = 0;
+    quota.date = today;
+  }
+  quota.count += 1;
+  dailyQuotas.set(key, quota);
+}
+// Pre-configured limiters for each provider
+export const serperLimiter = new RateLimiter(10, 60_000);   // 10 req/min
+export const hunterLimiter = new RateLimiter(5, 60_000);    // 5 req/min
+export const snovLimiter = new RateLimiter(5, 60_000);      // 5 req/min
+export const reoonLimiter = new RateLimiter(10, 60_000);    // 10 req/min
+export const playwrightLimiter = new RateLimiter(3, 10_000); // 3 pages per 10s

src/shared/utils/retry.ts ADDED Viewed

	@@ -0,0 +1,195 @@

+/**
+ * Production-grade retry logic — failure-type-aware.
+ *
+ * NOT "retry 3 times with delay" (naive approach).
+ * Instead: each failure type gets a different response.
+ *
+ * 429 → respect Retry-After header, wait, then retry
+ * 503 → exponential backoff WITH JITTER (prevent thundering herd)
+ * 500 → retry 2x, then dead-letter for manual review
+ * 422 → permanent failure, do not retry (bad input)
+ * ECONNRESET → network issue, retry with short delay
+ * TIMEOUT → retry with longer timeout
+ */
+import { AxiosError } from "axios";
+import { logger } from "./logger";
+export interface RetryConfig {
+  provider: string;
+  maxRetries?: number;       // default 3
+  baseDelayMs?: number;      // default 1000
+  maxDelayMs?: number;       // default 30000
+}
+// ─── Circuit breaker state ───────────────────────────────────
+interface CircuitState {
+  failures: number;
+  lastFailure: number;
+  isOpen: boolean;
+  halfOpenAt: number;        // when to try again
+}
+const circuits = new Map<string, CircuitState>();
+const CIRCUIT_THRESHOLD = 5;        // failures before opening
+const CIRCUIT_RESET_MS = 60_000;    // 1 min cooldown
+export function isCircuitOpen(provider: string): boolean {
+  const state = circuits.get(provider);
+  if (!state?.isOpen) return false;
+  // Check if enough time has passed (half-open)
+  if (Date.now() >= state.halfOpenAt) {
+    state.isOpen = false;  // allow one attempt
+    return false;
+  }
+  return true;
+}
+export function recordSuccess(provider: string): void {
+  circuits.set(provider, {
+    failures: 0,
+    lastFailure: 0,
+    isOpen: false,
+    halfOpenAt: 0,
+  });
+}
+export function recordFailure(provider: string): void {
+  const state = circuits.get(provider) ?? {
+    failures: 0, lastFailure: 0, isOpen: false, halfOpenAt: 0,
+  };
+  state.failures++;
+  state.lastFailure = Date.now();
+  if (state.failures >= CIRCUIT_THRESHOLD) {
+    state.isOpen = true;
+    state.halfOpenAt = Date.now() + CIRCUIT_RESET_MS;
+    logger.warn({ provider, failures: state.failures }, "Circuit OPEN — provider temporarily disabled");
+  }
+  circuits.set(provider, state);
+}
+// ─── Failure classification ──────────────────────────────────
+type FailureType =
+  | "rate_limited"         // 429
+  | "server_error"         // 500
+  | "service_unavailable"  // 503
+  | "bad_input"            // 422, 400
+  | "auth_failed"          // 401, 403
+  | "network_error"        // ECONNRESET, ENOTFOUND
+  | "timeout"              // ETIMEDOUT, ESOCKETTIMEDOUT
+  | "unknown";
+function classifyFailure(err: unknown): { type: FailureType; retryable: boolean; waitMs: number } {
+  if (err instanceof AxiosError) {
+    const status = err.response?.status;
+    const retryAfter = parseInt(err.response?.headers?.["retry-after"] ?? "0", 10);
+    switch (status) {
+      case 429:
+        return {
+          type: "rate_limited",
+          retryable: true,
+          waitMs: retryAfter ? retryAfter * 1000 : 10_000,
+        };
+      case 503:
+        return { type: "service_unavailable", retryable: true, waitMs: 5_000 };
+      case 500:
+        return { type: "server_error", retryable: true, waitMs: 3_000 };
+      case 422:
+      case 400:
+        return { type: "bad_input", retryable: false, waitMs: 0 };
+      case 401:
+      case 403:
+        return { type: "auth_failed", retryable: false, waitMs: 0 };
+    }
+    // Network errors
+    const code = err.code;
+    if (code === "ECONNRESET" || code === "ENOTFOUND" || code === "ECONNREFUSED") {
+      return { type: "network_error", retryable: true, waitMs: 2_000 };
+    }
+    if (code === "ETIMEDOUT" || code === "ESOCKETTIMEDOUT") {
+      return { type: "timeout", retryable: true, waitMs: 3_000 };
+    }
+  }
+  return { type: "unknown", retryable: true, waitMs: 2_000 };
+}
+// ─── Main retry function ────────────────────────────────────
+export async function withRetry<T>(
+  fn: () => Promise<T>,
+  config: RetryConfig
+): Promise<T> {
+  const maxRetries = config.maxRetries ?? 3;
+  const baseDelay = config.baseDelayMs ?? 1000;
+  const maxDelay = config.maxDelayMs ?? 30_000;
+  let attempt = 0;
+  while (true) {
+    try {
+      const result = await fn();
+      if (attempt > 0) {
+        // Recovered after retry — record success
+        recordSuccess(config.provider);
+        logger.info({ provider: config.provider, attempts: attempt + 1 }, "Retry succeeded");
+      }
+      return result;
+    } catch (err) {
+      attempt++;
+      const failure = classifyFailure(err);
+      // Permanent failure — don't retry
+      if (!failure.retryable) {
+        logger.error(
+          { provider: config.provider, failureType: failure.type, attempt },
+          "Permanent failure — not retrying"
+        );
+        recordFailure(config.provider);
+        throw err;
+      }
+      // Max retries exceeded
+      if (attempt >= maxRetries) {
+        logger.error(
+          { provider: config.provider, failureType: failure.type, attempts: attempt },
+          "Max retries exceeded"
+        );
+        recordFailure(config.provider);
+        throw err;
+      }
+      // Calculate wait time with jitter
+      // Jitter prevents thundering herd: 1000 requests don't all retry at same time
+      const exponentialDelay = Math.min(
+        maxDelay,
+        baseDelay * Math.pow(2, attempt - 1)
+      );
+      const jitter = Math.random() * exponentialDelay * 0.3; // ±30% jitter
+      const waitMs = Math.max(failure.waitMs, exponentialDelay + jitter);
+      logger.warn(
+        {
+          provider: config.provider,
+          failureType: failure.type,
+          attempt,
+          maxRetries,
+          waitMs: Math.round(waitMs),
+        },
+        `Retry ${attempt}/${maxRetries} after ${Math.round(waitMs)}ms`
+      );
+      await sleep(waitMs);
+    }
+  }
+}
+function sleep(ms: number): Promise<void> {
+  return new Promise((resolve) => setTimeout(resolve, ms));
+}

src/slack/slack-commands.ts ADDED Viewed

	@@ -0,0 +1,249 @@

+/**
+ * Slack Command Handler — Bidirectional Intelligence
+ *
+ * Handles incoming Slack slash commands and messages.
+ * Uses LLM for natural language understanding when needed.
+ *
+ * Commands:
+ *   /discover              → asks clarifying questions
+ *   /discover region:UK    → direct run with params
+ *   /leads                 → show today's qualified leads
+ *   /lead [company]        → full lead details
+ *   /status                → system status
+ *   /pause                 → pause automatic runs
+ *   /resume                → resume automatic runs
+ *   /quota [number]        → set today's quota
+ *   /quota [number] always → permanent change
+ */
+import { getSupabaseClient } from "../../shared/supabase/client";
+import { setQuotaOverride, isSystemPaused } from "../../discovery/lib/territory-manager";
+import { sendClarifyingQuestions } from "./slack-service";
+import { logger } from "../../shared/utils/logger";
+export interface SlackCommand {
+  command: string;
+  text: string;
+  userId: string;
+  channelId: string;
+}
+/**
+ * Route incoming slash commands.
+ */
+export async function handleSlackCommand(cmd: SlackCommand): Promise<string> {
+  const { command, text } = cmd;
+  const args = text.trim().toLowerCase();
+  switch (command) {
+    case "/discover":
+      return handleDiscover(args, cmd);
+    case "/leads":
+      return handleLeads();
+    case "/lead":
+      return handleLeadDetail(text);
+    case "/status":
+      return handleStatus();
+    case "/pause":
+      return handlePause();
+    case "/resume":
+      return handleResume();
+    case "/quota":
+      return handleQuota(text);
+    default:
+      return `Unknown command: ${command}`;
+  }
+}
+// ─── /discover ───────────────────────────────────────────────
+async function handleDiscover(args: string, cmd: SlackCommand): Promise<string> {
+  // Parse structured params if provided
+  const params = parseParams(args);
+  if (params.region && params.industry) {
+    // Direct run — no questions needed
+    const { manualDiscoveryTask } = await import("../../discovery/trigger-tasks/manual-discovery");
+    await manualDiscoveryTask.trigger({
+      region: params.region.toUpperCase(),
+      industry: params.industry,
+      maxCompanies: parseInt(params.max ?? "20", 10),
+      triggeredBy: `slack:${cmd.userId}`,
+    });
+    return `🚀 Manual discovery started:\n• Region: ${params.region.toUpperCase()}\n• Industry: ${params.industry}\n• Max: ${params.max ?? 20}\nI'll notify you when complete.`;
+  }
+  if (args && !params.region) {
+    // Natural language: "aj China pe kaam karo"
+    // Ask clarifying questions
+    await sendClarifyingQuestions(args, [
+      {
+        question: "Which cities?",
+        options: ["All major cities", "Capital only", "Let me specify..."],
+      },
+      {
+        question: "Which industry?",
+        options: ["Healthcare (dental, medical)", "Manufacturing", "Technology/SaaS", "All service businesses"],
+      },
+      {
+        question: "How many leads?",
+        options: ["5 (quick)", "10 (standard)", "20 (deep scan)"],
+      },
+    ]);
+    return "I've posted clarifying questions ☝️";
+  }
+  // No args — interactive mode
+  return "Usage:\n• `/discover region:UK industry:dental` — direct run\n• `/discover China pe kaam karo` — natural language\n• `/discover` — this help message";
+}
+// ─── /leads ──────────────────────────────────────────────────
+async function handleLeads(): Promise<string> {
+  const db = getSupabaseClient();
+  const today = new Date();
+  today.setHours(0, 0, 0, 0);
+  const { data: leads } = await db
+    .from("lead_scores")
+    .select(`
+      total_score, tier,
+      companies (name, domain, industry, city, service_match),
+      contacts (full_name, email, email_verified, linkedin_personal_url)
+    `)
+    .gte("created_at", today.toISOString())
+    .order("total_score", { ascending: false });
+  if (!leads?.length) return "No leads found today yet.";
+  const lines = leads.map((l: any, i: number) => {
+    const emoji = l.tier === "hot" ? "🔥" : l.tier === "warm" ? "✅" : "📋";
+    const email = l.contacts?.email_verified ? "📧✓" : l.contacts?.email ? "📧" : "—";
+    const li = l.contacts?.linkedin_personal_url ? "💼" : "—";
+    return `${emoji} ${l.total_score} | ${l.companies?.name ?? "?"} | ${l.companies?.industry ?? "?"} | ${l.companies?.city ?? "?"} | ${email} ${li} | ${l.companies?.service_match ?? "—"}`;
+  });
+  return `*Today's Leads (${leads.length}):*\n\n` +
+    `Score | Company | Industry | City | Channels | Service\n` +
+    `${"─".repeat(60)}\n` +
+    lines.join("\n") +
+    `\n\nType \`/lead [company name]\` for full details`;
+}
+// ─── /lead [company] ──────────────���─────────────────────────
+async function handleLeadDetail(companySearch: string): Promise<string> {
+  if (!companySearch.trim()) return "Usage: `/lead ABC Dental`";
+  const db = getSupabaseClient();
+  const { data: companies } = await db
+    .from("companies")
+    .select("*")
+    .ilike("name", `%${companySearch.trim()}%`)
+    .limit(1);
+  if (!companies?.length) return `No company found matching "${companySearch}"`;
+  const company = companies[0];
+  const { data: contacts } = await db.from("contacts").select("*").eq("company_id", company.id);
+  const { data: scores } = await db.from("lead_scores").select("*").eq("company_id", company.id).limit(1);
+  const { data: profiles } = await db.from("lead_profiles").select("*").eq("company_id", company.id).limit(1);
+  const score = scores?.[0];
+  const profile = profiles?.[0];
+  const contact = contacts?.[0];
+  return `*${company.name}*\n` +
+    `Domain: ${company.domain}\n` +
+    `Industry: ${company.industry ?? "?"} · Employees: ${company.employee_count ?? "?"}\n` +
+    `City: ${company.city ?? "?"} · ${company.country ?? "?"}\n` +
+    `Service Match: ${company.service_match ?? "—"}\n` +
+    `LinkedIn: ${company.linkedin_url ?? "—"}\n\n` +
+    `*Score:* ${score?.total_score ?? "?"}/100 — ${score?.tier?.toUpperCase() ?? "?"}\n` +
+    `  Fit: ${score?.company_fit ?? "?"}/25 · AI: ${score?.ai_readiness ?? "?"}/20 · Service: ${score?.service_match_score ?? "?"}/20\n` +
+    `  Contact: ${score?.decision_maker ?? "?"}/20 · Timing: ${score?.timing_score ?? "?"}/15\n\n` +
+    `*Profile:*\n${profile?.profile_summary ?? "No profile yet"}\n` +
+    `Pain: ${(profile?.pain_points ?? []).join(", ")}\n` +
+    `Angle: _${profile?.outreach_angle ?? "?"}_\n\n` +
+    `*Contact:* ${contact?.full_name ?? "?"} — ${contact?.title ?? "?"}\n` +
+    `  Email: ${contact?.email ?? "—"} ${contact?.email_verified ? "✓" : ""}\n` +
+    `  LinkedIn: ${contact?.linkedin_personal_url ?? "—"}\n` +
+    `  Social: ${JSON.stringify(contact?.social_profiles ?? {})}`;
+}
+// ─── /status ─────────────────────────────────────────────────
+async function handleStatus(): Promise<string> {
+  const db = getSupabaseClient();
+  const paused = await isSystemPaused();
+  const { data: quotaConfig } = await db.from("system_config").select("value").eq("key", "daily_quota").single();
+  const quota = quotaConfig?.value;
+  const { data: territory } = await db.from("system_config").select("value").eq("key", "current_territory").single();
+  const pos = territory?.value;
+  const { data: todayRuns } = await db
+    .from("discovery_runs")
+    .select("status, leads_qualified")
+    .gte("ran_at", new Date(new Date().setHours(0, 0, 0, 0)).toISOString());
+  const todayLeads = todayRuns?.reduce((sum: number, r: any) => sum + (r.leads_qualified ?? 0), 0) ?? 0;
+  return `*System Status*\n` +
+    `State: ${paused ? "⏸️ PAUSED" : "▶️ RUNNING"}\n` +
+    `Daily Quota: ${(quota as any)?.today_override ?? (quota as any)?.default ?? 10}\n` +
+    `Leads Today: ${todayLeads}\n` +
+    `Current Territory: ${(pos as any)?.countryCode ?? "?"} city#${(pos as any)?.cityIndex ?? 0}\n` +
+    `Runs Today: ${todayRuns?.length ?? 0}`;
+}
+// ─── /pause, /resume ─────────────────────────────────────────
+async function handlePause(): Promise<string> {
+  const db = getSupabaseClient();
+  await db.from("system_config").update({
+    value: { enabled: true, paused: true, paused_by: "slack" },
+    updated_at: new Date().toISOString(),
+  }).eq("key", "auto_mode");
+  return "⏸️ System paused. Automatic runs will not start.\nType `/resume` to restart.";
+}
+async function handleResume(): Promise<string> {
+  const db = getSupabaseClient();
+  await db.from("system_config").update({
+    value: { enabled: true, paused: false, paused_by: null },
+    updated_at: new Date().toISOString(),
+  }).eq("key", "auto_mode");
+  return "▶️ System resumed. Next automatic run will proceed on schedule.";
+}
+// ─── /quota ──────────────────────────────────────────────────
+async function handleQuota(text: string): Promise<string> {
+  const parts = text.trim().split(/\s+/);
+  const num = parseInt(parts[0], 10);
+  if (isNaN(num) || num < 1 || num > 100) {
+    return "Usage: `/quota 15` (today only) or `/quota 15 always` (permanent)";
+  }
+  const permanent = parts[1] === "always" || parts[1] === "permanent";
+  await setQuotaOverride(num, permanent);
+  return permanent
+    ? `✅ Daily quota permanently set to ${num} leads/day`
+    : `✅ Today's quota set to ${num} leads. Tomorrow back to default.`;
+}
+// ─── Helpers ───────────────────────────��─────────────────────
+function parseParams(text: string): Record<string, string> {
+  const params: Record<string, string> = {};
+  const matches = text.matchAll(/(\w+):(\S+)/g);
+  for (const match of matches) {
+    params[match[1]] = match[2];
+  }
+  return params;
+}

src/slack/slack-service.ts ADDED Viewed

	@@ -0,0 +1,273 @@

+/**
+ * Slack Service — 3-Layer Data Delivery
+ *
+ * Layer 1: Daily Digest (1 rich message per day — summary table)
+ * Layer 2: Real-time Alerts (only HOT leads 85+ — immediate)
+ * Layer 3: Commands (/leads, /discover, /status, /pause, /quota)
+ *
+ * NOT Slack blast — organized, formatted, actionable.
+ */
+import axios from "axios";
+import { getEnv } from "../../shared/config/env";
+import { getSupabaseClient } from "../../shared/supabase/client";
+import { logger } from "../../shared/utils/logger";
+// ─── Slack API helper ────────────────────────────────────────
+async function postMessage(channelId: string, blocks: unknown[], text: string): Promise<void> {
+  const env = getEnv();
+  try {
+    await axios.post("https://slack.com/api/chat.postMessage", {
+      channel: channelId,
+      text,
+      blocks,
+    }, {
+      headers: { Authorization: `Bearer ${env.SLACK_BOT_TOKEN}` },
+      timeout: 5_000,
+    });
+  } catch (err) {
+    logger.warn({ err }, "Slack post failed — non-critical");
+  }
+}
+// ─── LAYER 1: Daily Digest ───────────────────────────────────
+export async function sendDailyDigest(runSummary: {
+  territory: string;
+  industry: string;
+  companiesSearched: number;
+  leadsQualified: number;
+  hotLeads: number;
+  warmLeads: number;
+  nurtureLeads: number;
+  tokensUsed: number;
+  durationMinutes: number;
+}): Promise<void> {
+  const env = getEnv();
+  const db = getSupabaseClient();
+  // Fetch today's qualified leads
+  const today = new Date();
+  today.setHours(0, 0, 0, 0);
+  const { data: leads } = await db
+    .from("lead_scores")
+    .select(`
+      total_score, tier,
+      companies (name, domain, industry, employee_count, city, service_match),
+      contacts (full_name, title, email, email_verified, linkedin_personal_url)
+    `)
+    .gte("created_at", today.toISOString())
+    .order("total_score", { ascending: false })
+    .limit(20);
+  // Build lead table
+  const leadRows = (leads ?? []).map((lead: any, i: number) => {
+    const emoji = lead.tier === "hot" ? "🔥" : lead.tier === "warm" ? "✅" : "📋";
+    const company = lead.companies;
+    const contact = lead.contacts;
+    const emailIcon = contact?.email_verified ? "📧✓" : contact?.email ? "📧" : "—";
+    const linkedinIcon = contact?.linkedin_personal_url ? "💼✓" : "—";
+    return `${emoji} *${company?.name ?? "Unknown"}* — ${lead.total_score}/100 ${lead.tier.toUpperCase()}\n` +
+      `    ${company?.industry ?? "?"} · ${company?.employee_count ?? "?"} emp · ${company?.city ?? "?"}\n` +
+      `    ${contact?.full_name ?? "?"} (${contact?.title ?? "?"})\n` +
+      `    ${emailIcon} ${linkedinIcon} · Match: ${company?.service_match ?? "—"}`;
+  }).join("\n\n");
+  const blocks = [
+    // Header
+    {
+      type: "header",
+      text: { type: "plain_text", text: `📊 Daily Lead Report — ${formatDate(new Date())}` },
+    },
+    // Summary stats
+    {
+      type: "section",
+      text: {
+        type: "mrkdwn",
+        text: `*Territory:* ${runSummary.territory} → ${runSummary.industry}\n` +
+          `*Searched:* ${runSummary.companiesSearched} companies\n` +
+          `*Qualified:* ${runSummary.leadsQualified} leads ` +
+          `(🔥 ${runSummary.hotLeads} hot · ✅ ${runSummary.warmLeads} warm · 📋 ${runSummary.nurtureLeads} nurture)\n` +
+          `*Duration:* ${runSummary.durationMinutes} min · *Tokens:* ${runSummary.tokensUsed.toLocaleString()}`,
+      },
+    },
+    { type: "divider" },
+    // Lead list
+    {
+      type: "section",
+      text: {
+        type: "mrkdwn",
+        text: leadRows || "_No qualified leads found today_",
+      },
+    },
+    { type: "divider" },
+    // Actions
+    {
+      type: "context",
+      elements: [
+        {
+          type: "mrkdwn",
+          text: "Type `/leads` for full details · `/discover region:UK` for manual run · `/status` for system status",
+        },
+      ],
+    },
+  ];
+  await postMessage(env.SLACK_ALERT_CHANNEL_ID, blocks,
+    `Daily Report: ${runSummary.leadsQualified} leads found`);
+}
+// ─── LAYER 2: Hot Lead Alert (85+ only) ──────────────────────
+export async function sendHotLeadAlert(lead: {
+  companyName: string;
+  domain: string;
+  industry: string;
+  employeeCount: number | null;
+  city: string | null;
+  score: number;
+  tier: string;
+  contactName: string;
+  contactTitle: string;
+  email: string | null;
+  emailVerified: boolean;
+  linkedinPersonal: string | null;
+  linkedinCompany: string | null;
+  serviceMatch: string | null;
+  outreachAngle: string;
+  painPoints: string[];
+  socialProfiles: Record<string, string | null>;
+}): Promise<void> {
+  const env = getEnv();
+  const emoji = lead.score >= 90 ? "🔥🔥🔥" : lead.score >= 85 ? "🔥🔥" : "🔥";
+  // Contact channels summary
+  const channels: string[] = [];
+  if (lead.email && lead.emailVerified) channels.push(`📧 ${lead.email} ✓`);
+  else if (lead.email) channels.push(`📧 ${lead.email} (unverified)`);
+  if (lead.linkedinPersonal) channels.push(`💼 <${lead.linkedinPersonal}|LinkedIn>`);
+  if (lead.linkedinCompany) channels.push(`🏢 <${lead.linkedinCompany}|Company LI>`);
+  if (lead.socialProfiles?.instagram) channels.push(`📷 <${lead.socialProfiles.instagram}|Instagram>`);
+  if (lead.socialProfiles?.facebook) channels.push(`👥 <${lead.socialProfiles.facebook}|Facebook>`);
+  const blocks = [
+    {
+      type: "header",
+      text: { type: "plain_text", text: `${emoji} HOT LEAD — ${lead.companyName}` },
+    },
+    {
+      type: "section",
+      fields: [
+        { type: "mrkdwn", text: `*Score:*\n${lead.score}/100 — ${lead.tier.toUpperCase()}` },
+        { type: "mrkdwn", text: `*Industry:*\n${lead.industry}` },
+        { type: "mrkdwn", text: `*Employees:*\n${lead.employeeCount ?? "Unknown"}` },
+        { type: "mrkdwn", text: `*Location:*\n${lead.city ?? "Unknown"}` },
+        { type: "mrkdwn", text: `*Service Match:*\n${lead.serviceMatch ?? "General"}` },
+        { type: "mrkdwn", text: `*Domain:*\n${lead.domain}` },
+      ],
+    },
+    { type: "divider" },
+    {
+      type: "section",
+      text: {
+        type: "mrkdwn",
+        text: `*👤 Decision Maker:*\n${lead.contactName} — ${lead.contactTitle}\n\n` +
+          `*📱 Channels:*\n${channels.join("\n") || "None found"}\n\n` +
+          `*🎯 Outreach Angle:*\n_"${lead.outreachAngle}"_\n\n` +
+          `*💢 Pain Points:*\n${lead.painPoints.map(p => `• ${p}`).join("\n")}`,
+      },
+    },
+  ];
+  await postMessage(env.SLACK_ALERT_CHANNEL_ID, blocks,
+    `🔥 HOT LEAD: ${lead.companyName} — Score ${lead.score}`);
+}
+// ─── LAYER 2: Run Progress Updates ──────────────────────────
+export async function sendRunStarted(territory: string, industry: string, quota: number): Promise<void> {
+  const env = getEnv();
+  await postMessage(env.SLACK_ALERT_CHANNEL_ID, [
+    {
+      type: "section",
+      text: {
+        type: "mrkdwn",
+        text: `🚀 *Daily run started*\n` +
+          `Territory: ${territory} → ${industry}\n` +
+          `Quota: ${quota} leads\n` +
+          `Estimated: ~90 min`,
+      },
+    },
+  ], `Run started: ${territory} ${industry}`);
+}
+export async function sendRunProgress(qualified: number, quota: number, searched: number): Promise<void> {
+  const env = getEnv();
+  const progress = Math.round((qualified / quota) * 100);
+  const bar = "█".repeat(Math.round(progress / 10)) + "░".repeat(10 - Math.round(progress / 10));
+  await postMessage(env.SLACK_ALERT_CHANNEL_ID, [
+    {
+      type: "section",
+      text: {
+        type: "mrkdwn",
+        text: `📊 *Progress:* ${qualified}/${quota} qualified [${bar}] ${progress}%\n` +
+          `Searched: ${searched} companies`,
+      },
+    },
+  ], `Progress: ${qualified}/${quota}`);
+}
+// ─── LAYER 3: Clarifying Questions ──────────────────────────
+export async function sendClarifyingQuestions(
+  userMessage: string,
+  questions: { question: string; options: string[] }[]
+): Promise<void> {
+  const env = getEnv();
+  const blocks: unknown[] = [
+    {
+      type: "section",
+      text: {
+        type: "mrkdwn",
+        text: `🤔 *Got it: "${userMessage}"*\nMujhe kuch clarify karna hai:`,
+      },
+    },
+  ];
+  for (const q of questions) {
+    blocks.push({
+      type: "section",
+      text: {
+        type: "mrkdwn",
+        text: `*${q.question}*\n${q.options.map((o, i) => `${i + 1}. ${o}`).join("\n")}`,
+      },
+    });
+  }
+  blocks.push({
+    type: "context",
+    elements: [{
+      type: "mrkdwn",
+      text: "Just reply with numbers (e.g., `1 2 3`) or type your own answer",
+    }],
+  });
+  await postMessage(env.SLACK_ALERT_CHANNEL_ID, blocks,
+    "Clarifying questions for manual discovery");
+}
+// ─── Helpers ─────────────────────────────────────────────────
+function formatDate(date: Date): string {
+  return date.toLocaleDateString("en-US", {
+    weekday: "long",
+    year: "numeric",
+    month: "long",
+    day: "numeric",
+  });
+}

src/trigger.ts ADDED Viewed

	@@ -0,0 +1,8 @@

+/**
+ * Trigger.dev entry point — registers all tasks.
+ * This file must export all tasks for Trigger.dev to discover them.
+ */
+export { autoDiscoveryTask, autoDiscoverySchedule } from "./discovery/trigger-tasks/auto-discovery";
+export { manualDiscoveryTask } from "./discovery/trigger-tasks/manual-discovery";
+export { profilingTask } from "./profiling/trigger-tasks/profiling-router";

supabase/migrations/001_initial_schema.sql ADDED Viewed

	@@ -0,0 +1,279 @@

+-- ============================================================
+-- AI Client Acquisition System — Supabase Schema
+-- Run this in Supabase SQL Editor
+-- ============================================================
+-- Enable pgcrypto for UUID generation
+CREATE EXTENSION IF NOT EXISTS "pgcrypto";
+CREATE EXTENSION IF NOT EXISTS "pg_trgm"; -- for fuzzy name matching
+-- ─── ENUMS ──────────────────────────────────────────────────
+CREATE TYPE company_status AS ENUM (
+  'discovered', 'researching', 'profiled',
+  'qualified', 'nurture', 'archived', 'suppressed'
+);
+CREATE TYPE contact_status AS ENUM (
+  'found', 'email_verified', 'email_invalid',
+  'linkedin_only', 'suppressed'
+);
+CREATE TYPE lead_tier AS ENUM ('hot', 'warm', 'nurture', 'archive');
+CREATE TYPE outreach_channel AS ENUM ('email', 'linkedin');
+CREATE TYPE outreach_status AS ENUM (
+  'queued', 'sent', 'opened', 'replied',
+  'bounced', 'failed', 'review_needed'
+);
+CREATE TYPE intent_type AS ENUM (
+  'interested', 'question', 'not_now',
+  'not_interested', 'out_of_office', 'wrong_person', 'unknown'
+);
+CREATE TYPE review_status AS ENUM ('pending', 'approved', 'rejected', 'edited');
+-- ─── CORE TABLES ────────────────────────────────────────────
+-- ICP Configuration (editable from dashboard)
+CREATE TABLE icp_config (
+  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  name        TEXT NOT NULL DEFAULT 'default',
+  min_employees INTEGER NOT NULL DEFAULT 50,
+  industries  TEXT[] NOT NULL DEFAULT ARRAY['technology','manufacturing','logistics','healthcare','finance'],
+  exclude_industries TEXT[] NOT NULL DEFAULT ARRAY['government','non-profit','education'],
+  geographies TEXT[] NOT NULL DEFAULT ARRAY['US','UK','AU','UAE','SA'],
+  keywords    TEXT[] NOT NULL DEFAULT ARRAY['automation','digital transformation','AI','operations'],
+  tech_signals TEXT[] DEFAULT ARRAY['salesforce','hubspot','legacy_erp','sap'],
+  score_threshold INTEGER NOT NULL DEFAULT 70,
+  is_active   BOOLEAN NOT NULL DEFAULT true,
+  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+-- Weekly rotation state
+CREATE TABLE rotation_state (
+  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  week_number INTEGER NOT NULL,          -- 1=USA, 2=UK, 3=AU, 4=Gulf
+  region      TEXT NOT NULL,
+  started_at  TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  completed_at TIMESTAMPTZ,
+  companies_found INTEGER DEFAULT 0,
+  leads_qualified INTEGER DEFAULT 0
+);
+-- Companies discovered
+CREATE TABLE companies (
+  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  domain          TEXT NOT NULL UNIQUE,
+  name            TEXT NOT NULL,
+  industry        TEXT,
+  employee_count  INTEGER,
+  employee_range  TEXT,                  -- "50-200", "200-500" etc
+  description     TEXT,
+  website_url     TEXT,
+  linkedin_url    TEXT,
+  country         TEXT,
+  region          TEXT,
+  tech_stack      TEXT[],
+  growth_signals  JSONB DEFAULT '[]',    -- job posts, news, funding
+  raw_data        JSONB DEFAULT '{}',
+  source          TEXT NOT NULL,         -- 'serper', 'linkedin', 'manual'
+  status          company_status NOT NULL DEFAULT 'discovered',
+  discovered_at   TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+CREATE INDEX idx_companies_domain ON companies(domain);
+CREATE INDEX idx_companies_status ON companies(status);
+CREATE INDEX idx_companies_country ON companies(country);
+CREATE INDEX idx_companies_name_trgm ON companies USING GIN (name gin_trgm_ops);
+-- Contacts (decision-makers)
+CREATE TABLE contacts (
+  id                UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  company_id        UUID NOT NULL REFERENCES companies(id) ON DELETE CASCADE,
+  full_name         TEXT NOT NULL,
+  first_name        TEXT,
+  last_name         TEXT,
+  title             TEXT NOT NULL,
+  seniority         TEXT,               -- 'c_suite','vp','director','manager'
+  email             TEXT,
+  email_verified    BOOLEAN DEFAULT FALSE,
+  email_source      TEXT,              -- 'hunter','snov','pattern'
+  linkedin_url      TEXT,
+  linkedin_verified BOOLEAN DEFAULT FALSE,
+  status            contact_status NOT NULL DEFAULT 'found',
+  suppressed        BOOLEAN NOT NULL DEFAULT FALSE,
+  suppressed_at     TIMESTAMPTZ,
+  suppressed_reason TEXT,
+  created_at        TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  updated_at        TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+CREATE INDEX idx_contacts_company ON contacts(company_id);
+CREATE INDEX idx_contacts_email ON contacts(email);
+CREATE INDEX idx_contacts_suppressed ON contacts(suppressed);
+-- Evidence gathered during research
+CREATE TABLE evidence (
+  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  company_id  UUID NOT NULL REFERENCES companies(id) ON DELETE CASCADE,
+  type        TEXT NOT NULL,            -- 'job_posting','news','social_post','website_text'
+  content     TEXT NOT NULL,
+  source_url  TEXT,
+  ai_signal   BOOLEAN DEFAULT FALSE,   -- does this mention AI/automation?
+  collected_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+CREATE INDEX idx_evidence_company ON evidence(company_id);
+-- Lead profiles (LLM output)
+CREATE TABLE lead_profiles (
+  id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  company_id          UUID NOT NULL REFERENCES companies(id) ON DELETE CASCADE,
+  profile_summary     TEXT NOT NULL,
+  pain_points         TEXT[] DEFAULT '{}',
+  ai_use_case         TEXT,
+  ai_readiness        TEXT NOT NULL DEFAULT 'medium', -- low/medium/high
+  outreach_angle      TEXT,
+  llm_model           TEXT NOT NULL,
+  llm_confidence      NUMERIC(3,2),
+  is_fallback         BOOLEAN DEFAULT FALSE,
+  created_at          TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+-- Lead scores
+CREATE TABLE lead_scores (
+  id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  company_id          UUID NOT NULL REFERENCES companies(id) ON DELETE CASCADE,
+  contact_id          UUID REFERENCES contacts(id),
+  total_score         INTEGER NOT NULL CHECK (total_score BETWEEN 0 AND 100),
+  tier                lead_tier NOT NULL,
+  company_fit         INTEGER,          -- 0-25
+  ai_readiness        INTEGER,          -- 0-25
+  decision_maker      INTEGER,          -- 0-20
+  growth_signal       INTEGER,          -- 0-15
+  engagement_potential INTEGER,         -- 0-15
+  score_reasoning     TEXT,
+  scored_at           TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+CREATE INDEX idx_scores_company ON lead_scores(company_id);
+CREATE INDEX idx_scores_tier ON lead_scores(tier);
+CREATE INDEX idx_scores_total ON lead_scores(total_score DESC);
+-- ─── OUTREACH TABLES ────────────────────────────────────────
+CREATE TABLE outreach_sequences (
+  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  company_id      UUID NOT NULL REFERENCES companies(id) ON DELETE CASCADE,
+  contact_id      UUID NOT NULL REFERENCES contacts(id),
+  current_step    INTEGER NOT NULL DEFAULT 0,
+  total_steps     INTEGER NOT NULL DEFAULT 5,
+  next_action_at  TIMESTAMPTZ,
+  status          TEXT NOT NULL DEFAULT 'active', -- active/paused/completed/stopped
+  stopped_reason  TEXT,
+  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+CREATE TABLE outreach_log (
+  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  sequence_id     UUID REFERENCES outreach_sequences(id),
+  company_id      UUID NOT NULL REFERENCES companies(id),
+  contact_id      UUID NOT NULL REFERENCES contacts(id),
+  channel         outreach_channel NOT NULL,
+  step_number     INTEGER NOT NULL,
+  template_id     TEXT,
+  message_hash    TEXT NOT NULL UNIQUE,  -- prevent duplicate sends
+  subject         TEXT,
+  status          outreach_status NOT NULL DEFAULT 'queued',
+  provider_id     TEXT,                  -- external message ID from Resend/LinkedIn
+  sent_at         TIMESTAMPTZ,
+  opened_at       TIMESTAMPTZ,
+  replied_at      TIMESTAMPTZ,
+  bounced_at      TIMESTAMPTZ,
+  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+CREATE INDEX idx_outreach_company ON outreach_log(company_id);
+CREATE INDEX idx_outreach_hash ON outreach_log(message_hash);
+CREATE INDEX idx_outreach_status ON outreach_log(status);
+-- ─── ENGAGEMENT TABLES ──────────────────────────────────────
+CREATE TABLE engagement_log (
+  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  outreach_id     UUID REFERENCES outreach_log(id),
+  company_id      UUID NOT NULL REFERENCES companies(id),
+  contact_id      UUID NOT NULL REFERENCES contacts(id),
+  signal_type     TEXT NOT NULL,         -- 'open','reply','bounce','linkedin_accept'
+  intent          intent_type,
+  raw_content     TEXT,                  -- actual reply text (for NLP)
+  detected_at     TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+-- ─── SYSTEM TABLES ──────────────────────────────────────────
+CREATE TABLE suppression_list (
+  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  email       TEXT,
+  domain      TEXT,
+  reason      TEXT NOT NULL,             -- 'unsubscribed','bounced','manual'
+  added_at    TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+CREATE INDEX idx_suppression_email ON suppression_list(email);
+CREATE INDEX idx_suppression_domain ON suppression_list(domain);
+CREATE TABLE human_review_queue (
+  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  type            TEXT NOT NULL,         -- 'outreach_approval','score_anomaly','escalation'
+  company_id      UUID REFERENCES companies(id),
+  contact_id      UUID REFERENCES contacts(id),
+  payload         JSONB NOT NULL,        -- full context for reviewer
+  status          review_status NOT NULL DEFAULT 'pending',
+  reviewer_notes  TEXT,
+  resolved_at     TIMESTAMPTZ,
+  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+CREATE INDEX idx_review_status ON human_review_queue(status);
+CREATE TABLE api_usage_log (
+  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  provider    TEXT NOT NULL,             -- 'serper','hunter','snov','reoon'
+  endpoint    TEXT,
+  credits_used INTEGER DEFAULT 1,
+  success     BOOLEAN NOT NULL,
+  error_msg   TEXT,
+  called_at   TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+CREATE TABLE audit_log (
+  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  action      TEXT NOT NULL,
+  entity_type TEXT NOT NULL,
+  entity_id   UUID,
+  actor       TEXT NOT NULL DEFAULT 'system',
+  details     JSONB DEFAULT '{}',
+  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+-- ─── SEED DATA ──────────────────────────────────────────────
+INSERT INTO icp_config (name, min_employees, industries, geographies, keywords, score_threshold)
+VALUES (
+  'default',
+  50,
+  ARRAY['technology','software','manufacturing','logistics','supply_chain','healthcare','finance','real_estate_tech','retail_tech'],
+  ARRAY['US','UK','AU','UAE','SA','SG'],
+  ARRAY['automation','digital transformation','AI','machine learning','operations','workflow','efficiency'],
+  70
+);
+INSERT INTO rotation_state (week_number, region)
+VALUES (1, 'US');

supabase/migrations/002_phase1_enhancements.sql ADDED Viewed

	@@ -0,0 +1,242 @@

+-- ============================================================
+-- Migration 002 — Phase 1 Enhancements
+-- Territory management, service profiles, social profiles,
+-- discovery run tracking, pipeline checkpoints, LLM traces
+-- ============================================================
+-- ─── SERVICE PROFILES ────────────────────────────────────────
+-- What services WE offer → what industries → what pain signals to look for
+CREATE TABLE service_profiles (
+  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  service_name TEXT NOT NULL UNIQUE,          -- 'AI Receptionist'
+  description TEXT,
+  target_industries TEXT[] NOT NULL,          -- ['dental','medical','legal','salon']
+  min_employees INTEGER DEFAULT 3,
+  max_employees INTEGER DEFAULT 500,
+  pain_signals TEXT[] NOT NULL,              -- website signals to detect
+  score_boost INTEGER NOT NULL DEFAULT 15,   -- points added when matched
+  outreach_keywords TEXT[],                  -- words to use in outreach
+  is_active   BOOLEAN DEFAULT true,
+  created_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+-- Seed service profiles
+INSERT INTO service_profiles (service_name, description, target_industries, min_employees, pain_signals, score_boost, outreach_keywords)
+VALUES
+  ('AI Receptionist',
+   'Automated phone answering, appointment booking, 24/7 availability',
+   ARRAY['dental','medical','veterinary','legal','salon','spa','real_estate','accounting','chiropractic'],
+   3,
+   ARRAY['phone number prominent','book appointment','call us','receptionist','front desk','office hours','schedule a visit'],
+   20,
+   ARRAY['missed calls','after hours','appointment booking','front desk costs']),
+  ('AI Customer Support',
+   'Chatbot, ticket automation, FAQ automation',
+   ARRAY['ecommerce','saas','retail','hospitality','travel','insurance','telecom'],
+   10,
+   ARRAY['contact form','support email','FAQ page','help center','no chatbot','submit a ticket'],
+   15,
+   ARRAY['support costs','response time','ticket volume','customer satisfaction']),
+  ('AI Data Processing',
+   'Document processing, report automation, ERP modernization',
+   ARRAY['manufacturing','logistics','finance','healthcare','supply_chain','construction','energy'],
+   50,
+   ARRAY['legacy ERP','SAP','manual reporting','spreadsheet','data entry','compliance reporting'],
+   25,
+   ARRAY['manual processes','reporting overhead','data accuracy','compliance automation']),
+  ('AI Sales Automation',
+   'Outreach automation, CRM enrichment, lead scoring',
+   ARRAY['b2b_saas','consulting','recruitment','insurance','financial_services','marketing_agency'],
+   10,
+   ARRAY['sales team','CRM','outbound','SDR','BDR','sales development','pipeline'],
+   20,
+   ARRAY['pipeline velocity','lead qualification','outbound efficiency','sales productivity']),
+  ('AI Workflow Automation',
+   'General process automation, integration, workflow optimization',
+   ARRAY['technology','professional_services','education','media','nonprofit_large','government_contractor'],
+   20,
+   ARRAY['manual process','approval workflow','internal tools','legacy system','multiple platforms'],
+   15,
+   ARRAY['operational efficiency','process bottlenecks','tool consolidation','workflow speed']);
+-- ─── TERRITORY GRID ──────────────────────────────────────────
+-- Every city × industry = one territory unit
+CREATE TABLE territory_grid (
+  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  country     TEXT NOT NULL,
+  country_code TEXT NOT NULL,
+  city        TEXT NOT NULL,
+  tier        INTEGER NOT NULL DEFAULT 1,   -- 1=major city, 2=mid, 3=small
+  timezone    TEXT,                          -- 'America/New_York'
+  is_active   BOOLEAN DEFAULT true,
+  UNIQUE(country_code, city)
+);
+-- Seed US Tier 1 cities
+INSERT INTO territory_grid (country, country_code, city, tier, timezone) VALUES
+  ('United States', 'US', 'New York', 1, 'America/New_York'),
+  ('United States', 'US', 'Los Angeles', 1, 'America/Los_Angeles'),
+  ('United States', 'US', 'Chicago', 1, 'America/Chicago'),
+  ('United States', 'US', 'Houston', 1, 'America/Chicago'),
+  ('United States', 'US', 'Phoenix', 1, 'America/Phoenix'),
+  ('United States', 'US', 'Philadelphia', 1, 'America/New_York'),
+  ('United States', 'US', 'San Antonio', 1, 'America/Chicago'),
+  ('United States', 'US', 'San Diego', 1, 'America/Los_Angeles'),
+  ('United States', 'US', 'Dallas', 1, 'America/Chicago'),
+  ('United States', 'US', 'Austin', 1, 'America/Chicago'),
+  ('United States', 'US', 'San Francisco', 1, 'America/Los_Angeles'),
+  ('United States', 'US', 'Seattle', 1, 'America/Los_Angeles'),
+  ('United States', 'US', 'Denver', 1, 'America/Denver'),
+  ('United States', 'US', 'Boston', 1, 'America/New_York'),
+  ('United States', 'US', 'Miami', 1, 'America/New_York');
+-- UK cities
+INSERT INTO territory_grid (country, country_code, city, tier, timezone) VALUES
+  ('United Kingdom', 'GB', 'London', 1, 'Europe/London'),
+  ('United Kingdom', 'GB', 'Manchester', 1, 'Europe/London'),
+  ('United Kingdom', 'GB', 'Birmingham', 1, 'Europe/London'),
+  ('United Kingdom', 'GB', 'Leeds', 2, 'Europe/London'),
+  ('United Kingdom', 'GB', 'Edinburgh', 2, 'Europe/London'),
+  ('United Kingdom', 'GB', 'Bristol', 2, 'Europe/London'),
+  ('United Kingdom', 'GB', 'Glasgow', 2, 'Europe/London');
+-- Australia cities
+INSERT INTO territory_grid (country, country_code, city, tier, timezone) VALUES
+  ('Australia', 'AU', 'Sydney', 1, 'Australia/Sydney'),
+  ('Australia', 'AU', 'Melbourne', 1, 'Australia/Melbourne'),
+  ('Australia', 'AU', 'Brisbane', 1, 'Australia/Brisbane'),
+  ('Australia', 'AU', 'Perth', 2, 'Australia/Perth'),
+  ('Australia', 'AU', 'Adelaide', 2, 'Australia/Adelaide');
+-- Gulf cities
+INSERT INTO territory_grid (country, country_code, city, tier, timezone) VALUES
+  ('United Arab Emirates', 'AE', 'Dubai', 1, 'Asia/Dubai'),
+  ('United Arab Emirates', 'AE', 'Abu Dhabi', 1, 'Asia/Dubai'),
+  ('Saudi Arabia', 'SA', 'Riyadh', 1, 'Asia/Riyadh'),
+  ('Saudi Arabia', 'SA', 'Jeddah', 2, 'Asia/Riyadh'),
+  ('Qatar', 'QA', 'Doha', 1, 'Asia/Qatar');
+-- ─── DISCOVERY RUNS ──────────────────────────────────────────
+-- Track every search execution — prevents duplicate searches
+CREATE TABLE discovery_runs (
+  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  run_type        TEXT NOT NULL,          -- 'auto' | 'manual'
+  territory_id    UUID REFERENCES territory_grid(id),
+  country_code    TEXT NOT NULL,
+  city            TEXT NOT NULL,
+  industry        TEXT NOT NULL,
+  search_queries  TEXT[],                 -- actual Google queries used
+  companies_found INTEGER DEFAULT 0,
+  companies_passed_gate1 INTEGER DEFAULT 0,
+  companies_passed_gate2 INTEGER DEFAULT 0,
+  leads_qualified INTEGER DEFAULT 0,
+  quota_target    INTEGER DEFAULT 10,
+  status          TEXT DEFAULT 'running', -- 'running','completed','failed','partial'
+  triggered_by    TEXT DEFAULT 'system',  -- 'system' | 'slack:username'
+  ran_at          TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  completed_at    TIMESTAMPTZ
+);
+CREATE INDEX idx_discovery_runs_territory ON discovery_runs(city, industry, ran_at DESC);
+-- ─── TERRITORY PROGRESS ─────────────────────────────────────
+-- Tracks which city+industry combos have been covered and when
+CREATE TABLE territory_progress (
+  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  territory_id    UUID REFERENCES territory_grid(id),
+  industry        TEXT NOT NULL,
+  last_run_at     TIMESTAMPTZ NOT NULL,
+  next_eligible_at TIMESTAMPTZ NOT NULL,  -- last_run + 30 days
+  total_leads     INTEGER DEFAULT 0,
+  UNIQUE(territory_id, industry)
+);
+-- ─── PIPELINE CHECKPOINTS ───────────────────────────────────
+-- Allows pipeline to resume from failure point (idempotency)
+CREATE TABLE pipeline_checkpoints (
+  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  run_id          UUID REFERENCES discovery_runs(id),
+  company_domain  TEXT NOT NULL,
+  stage           TEXT NOT NULL,           -- 'scraped','filtered','contacts_found','profiled','scored'
+  stage_data      JSONB DEFAULT '{}',      -- intermediate data for resume
+  completed       BOOLEAN DEFAULT false,
+  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  updated_at      TIMESTAMPTZ NOT NULL DEFAULT NOW(),
+  UNIQUE(run_id, company_domain)
+);
+-- ─── LLM CALL TRACES ────────────────────────────────────────
+-- Every LLM call logged for cost tracking and debugging
+CREATE TABLE llm_traces (
+  id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+  trace_id        TEXT NOT NULL,           -- pipeline run trace
+  operation       TEXT NOT NULL,           -- 'profile','score','email_classify','pain_detect'
+  model           TEXT NOT NULL,           -- 'meta/llama-3.3-70b-instruct'
+  provider        TEXT NOT NULL DEFAULT 'nvidia',
+  prompt_tokens   INTEGER,
+  completion_tokens INTEGER,
+  total_tokens    INTEGER,
+  latency_ms      INTEGER,
+  success         BOOLEAN NOT NULL,
+  fallback_used   BOOLEAN DEFAULT false,
+  grounding_score NUMERIC(3,2),            -- 0.00-1.00 how well grounded
+  company_id      UUID REFERENCES companies(id),
+  input_hash      TEXT,                    -- hash of prompt (no PII stored)
+  output_hash     TEXT,                    -- hash of output
+  created_at      TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+CREATE INDEX idx_llm_traces_trace ON llm_traces(trace_id);
+CREATE INDEX idx_llm_traces_company ON llm_traces(company_id);
+-- ─── SYSTEM CONFIG ──────────────────────���────────────────────
+-- Runtime configuration that Slack commands can modify
+CREATE TABLE system_config (
+  key         TEXT PRIMARY KEY,
+  value       JSONB NOT NULL,
+  updated_by  TEXT DEFAULT 'system',
+  updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
+);
+INSERT INTO system_config (key, value) VALUES
+  ('daily_quota', '{"default": 10, "today_override": null}'),
+  ('schedule', '{"start_hour_utc": 4, "enabled": true}'),
+  ('auto_mode', '{"enabled": true, "paused": false, "paused_by": null}'),
+  ('current_territory', '{"country_code": "US", "city_index": 0, "industry_index": 0}');
+-- ─── ADD SOCIAL PROFILES TO CONTACTS ─────────────────────────
+ALTER TABLE contacts ADD COLUMN IF NOT EXISTS linkedin_personal_url TEXT;
+ALTER TABLE contacts ADD COLUMN IF NOT EXISTS social_profiles JSONB DEFAULT '{}';
+ALTER TABLE contacts ADD COLUMN IF NOT EXISTS email_verification_layers JSONB DEFAULT '{}';
+ALTER TABLE contacts ADD COLUMN IF NOT EXISTS email_tier TEXT;       -- 'personal','authority','context_verified','rejected'
+ALTER TABLE contacts ADD COLUMN IF NOT EXISTS authority_confirmed BOOLEAN DEFAULT false;
+-- ─── ADD CITY TO COMPANIES ───────────────────────────────────
+ALTER TABLE companies ADD COLUMN IF NOT EXISTS city TEXT;
+ALTER TABLE companies ADD COLUMN IF NOT EXISTS service_match TEXT;   -- matched service name
+ALTER TABLE companies ADD COLUMN IF NOT EXISTS service_match_score INTEGER DEFAULT 0;
+ALTER TABLE companies ADD COLUMN IF NOT EXISTS pain_signals TEXT[] DEFAULT '{}';
+ALTER TABLE companies ADD COLUMN IF NOT EXISTS trace_id TEXT;        -- pipeline trace