rwttrter / README.md
plexdx's picture
Upload 26 files
64d289f verified
---
title: Omnichannel Fact & Hallucination Intelligence System
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: mit
app_port: 7860
---
# Omnichannel Fact & Hallucination Intelligence System
**Near-zero-latency real-time fact-checking and AI hallucination detection β€” deployed universally via a browser extension across X/Twitter, YouTube, Instagram, news sites, and AI chat interfaces.**
---
## Architecture
```
Browser Extension (WXT + React 19 + Framer Motion)
β”‚ WebSocket (wss://)
β–Ό
FastAPI Backend ──► Redis Stack (cache, 6h/15min TTL)
β”‚
β”œβ”€β”€β–Ί Gatekeeper: Groq llama3-8b-8192 (<120ms p95)
β”‚ └── noise β†’ drop | fact β†’ continue
β”‚
β”œβ”€β”€β–Ί RAG Pipeline (concurrent)
β”‚ β”œβ”€β”€ FastEmbed BGE-M3 embeddings (CPU, multilingual)
β”‚ β”œβ”€β”€ Qdrant ANN search (HNSW ef=128, top-8, 72h window)
β”‚ └── Memgraph trust graph traversal (in-memory Cypher)
β”‚
β”œβ”€β”€β–Ί Grok Sensor (concurrent)
β”‚ └── X API v2 velocity + Community Notes
β”‚
└──► Prefect Flow (multi-agent evaluation)
β”œβ”€β”€ misinformation_task: Groq mixtral-8x7b-32768
└── hallucination_task: Claude Haiku (AI platforms only)
β”‚
β–Ό
AnalysisResult β†’ WebSocket β†’ Extension β†’ DOM highlight + hover card
```
---
## Stack
| Layer | Technology | Why |
|-------|-----------|-----|
| Extension framework | WXT v0.19 + React 19 | HMR, multi-browser, TypeScript-first, Vite |
| Extension state | Zustand + chrome.storage.sync | Persistent, reactive, cross-context |
| LLM gatekeeper | Groq llama3-8b-8192 | 800+ tok/s, <100ms, no GPU needed |
| LLM evaluation | LiteLLM β†’ Groq mixtral-8x7b / llama3-70b | All free via Groq β€” swap providers without code changes |
| Embeddings | BGE-M3 via FastEmbed | 100+ languages, 1024-dim, CPU-native, free |
| Vector DB | Qdrant (self-hosted) | Sub-ms HNSW search, no vendor lock-in |
| Graph DB | Memgraph (in-memory) | 10–100x faster than Neo4j for trust scoring |
| Message queue | Redpanda | Kafka-compatible, no JVM, 10x lower latency |
| Orchestration | Prefect | Native async, DAG flows, built-in retry |
| Cache | Redis Stack (RedisJSON) | Structured claim cache, TTL per verdict color |
| Package manager | uv | 10–100x faster than pip, lockfiles |
| Hashing | xxhash (client + server) | Sub-microsecond content deduplication |
| Edge tunnel | Cloudflare Tunnel | Zero-config TLS, no exposed ports |
| Observability | structlog + rich | Structured JSON logs, colorized dev output |
---
## Quick Start (HuggingFace Spaces)
This Space runs the **backend + demo UI** via Docker. The browser extension is a separate build.
### Required Secrets (set in Space settings β†’ Secrets)
| Secret | Required | Description |
|--------|----------|-------------|
| `GROQ_API_KEY` | Recommended | Groq API key β€” powers all 3 LLM agents (gatekeeper, misinformation, hallucination). Free tier: 30 req/min |
| `X_BEARER_TOKEN` | Optional | X API v2 bearer token for tweet velocity + Community Notes |
**Without any API keys**: The system runs in `DEMO_MODE=true` with deterministic mock results β€” great for exploring the UI and architecture without credentials.
Get a free key:
- Groq: https://console.groq.com (free tier: 30 req/min β€” covers all 3 LLM agents)
### Run Locally
```bash
git clone <repo>
cd omnichannel-fact-intelligence
# Copy env template
cp .env.example .env
# Edit .env with your API keys
# Start all services (Qdrant, Memgraph, Redpanda, Redis, FastAPI)
docker compose up
# Visit http://localhost:7860 for the demo UI
```
### Run Backend Only (no Docker for infra)
```bash
cd backend
# Install uv (if not installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies
uv sync
# Set env vars
export GROQ_API_KEY=your_key
export DEMO_MODE=true # Skip infrastructure deps for quick testing
# Start FastAPI
uv run uvicorn main:app --host 0.0.0.0 --port 7860 --reload
```
---
## Browser Extension Setup
### Prerequisites
```bash
cd extension
npm install # or: bun install
```
### Development (Chrome)
```bash
# Set your backend URL (or use cloudflared tunnel)
WS_URL=ws://localhost:7860/ws npx wxt dev --browser chrome
```
### Production Build
```bash
# Build for all browsers
WS_URL=wss://fact-engine.your-domain.com/ws npx wxt build
# Chrome: .output/chrome-mv3/
# Firefox: .output/firefox-mv3/
```
### Load in Chrome
1. Navigate to `chrome://extensions`
2. Enable **Developer mode** (top right)
3. Click **Load unpacked** β†’ select `.output/chrome-mv3/`
4. Visit X/Twitter, YouTube, or any news site β€” facts will begin highlighting
---
## Highlight Color Semantics
| Color | Hex | Meaning |
|-------|-----|---------|
| 🟒 Green | `#22c55e` | Fact-checked β€” corroborated by β‰₯2 sources, trust score β‰₯ 0.65 |
| 🟑 Yellow | `#eab308` | Unverified β€” breaking news, weak corroboration, high velocity |
| πŸ”΄ Red | `#ef4444` | Debunked β€” refuted by β‰₯2 independent sources or Community Note active |
| 🟣 Purple | `#a855f7` | AI hallucination β€” fabricated citation, impossibility, contradiction |
---
## Trust Score Algorithm
```
score = 0.5 (baseline)
+ 0.30 if Author.verified AND account_type IN ['government', 'official_news']
+ 0.05 per corroborating Source node (capped at +0.25, i.e. 5 sources)
- 0.40 if any Source has an active Community Note
= clamp(score, 0.0, 1.0)
```
---
## Data Pipeline
Three async Redpanda producers simulate the omnichannel firehose:
| Producer | Topic | Rate | Source |
|----------|-------|------|--------|
| twitter_producer | `raw.twitter` | 50 eps | Mock X posts |
| instagram_producer | `raw.instagram` | 20 eps | Mock story text (OCR-extracted) |
| youtube_producer | `raw.youtube` | 10 eps | Mock VTT transcript chunks |
A single async consumer aggregates all three, deduplicates by `content_hash`, and upserts into Qdrant + Memgraph.
---
## Extension Modes
| Mode | Shows |
|------|-------|
| Minimal | Red + Purple only |
| Normal (default) | Red + Purple + Yellow |
| Advanced | All colors including Green |
---
## File Structure
```
omnichannel-fact-intelligence/
β”œβ”€β”€ docker-compose.yml # All services in one command
β”œβ”€β”€ .env.example # Environment template
β”‚
β”œβ”€β”€ backend/
β”‚ β”œβ”€β”€ Dockerfile # uv + Python 3.12
β”‚ β”œβ”€β”€ pyproject.toml # All deps pinned (uv-compatible)
β”‚ β”œβ”€β”€ main.py # FastAPI app, WebSocket, Redis cache
β”‚ β”œβ”€β”€ gatekeeper.py # Groq fact/noise classifier (<120ms p95)
β”‚ β”œβ”€β”€ rag_pipeline.py # BGE-M3 + Qdrant + Memgraph trust graph
β”‚ β”œβ”€β”€ grok_sensor.py # X API v2 + Community Notes
β”‚ β”œβ”€β”€ agents.py # Prefect flow + LiteLLM multi-agent eval
β”‚ β”œβ”€β”€ core/
β”‚ β”‚ β”œβ”€β”€ config.py # Pydantic-settings centralized config
β”‚ β”‚ └── models.py # All Pydantic v2 models
β”‚ β”œβ”€β”€ producers/
β”‚ β”‚ └── producers.py # Twitter + Instagram + YouTube + consumer
β”‚ └── static/
β”‚ └── index.html # Demo UI (served at /)
β”‚
β”œβ”€β”€ extension/
β”‚ β”œβ”€β”€ wxt.config.ts # WXT framework config
β”‚ β”œβ”€β”€ stores/
β”‚ β”‚ └── extensionStore.ts # Zustand + chrome.storage.sync
β”‚ └── entrypoints/
β”‚ β”œβ”€β”€ background.ts # Persistent WS connection + message routing
β”‚ β”œβ”€β”€ content.tsx # MutationObserver + highlight + hover card
β”‚ └── popup.tsx # Master toggle + mode selector + badge
β”‚
└── infra/
└── tunnel_setup.sh # Cloudflare Tunnel setup script
```
---
## License
MIT β€” see LICENSE for details.