Spaces:

plexdx
/

rwttrter

No application file

App Files Files Community

rwttrter / README.md

plexdx

Upload 26 files

64d289f verified 4 days ago

preview code

raw

history blame contribute delete

8.06 kB

	---
	title: Omnichannel Fact & Hallucination Intelligence System
	emoji: 🔍
	colorFrom: blue
	colorTo: purple
	sdk: docker
	pinned: false
	license: mit
	app_port: 7860
	---

	# Omnichannel Fact & Hallucination Intelligence System

	Near-zero-latency real-time fact-checking and AI hallucination detection — deployed universally via a browser extension across X/Twitter, YouTube, Instagram, news sites, and AI chat interfaces.

	---

	## Architecture

	```
	Browser Extension (WXT + React 19 + Framer Motion)
	│ WebSocket (wss://)
	▼
	FastAPI Backend ──► Redis Stack (cache, 6h/15min TTL)
	│
	├──► Gatekeeper: Groq llama3-8b-8192 (<120ms p95)
	│ └── noise → drop \| fact → continue
	│
	├──► RAG Pipeline (concurrent)
	│ ├── FastEmbed BGE-M3 embeddings (CPU, multilingual)
	│ ├── Qdrant ANN search (HNSW ef=128, top-8, 72h window)
	│ └── Memgraph trust graph traversal (in-memory Cypher)
	│
	├──► Grok Sensor (concurrent)
	│ └── X API v2 velocity + Community Notes
	│
	└──► Prefect Flow (multi-agent evaluation)
	├── misinformation_task: Groq mixtral-8x7b-32768
	└── hallucination_task: Claude Haiku (AI platforms only)
	│
	▼
	AnalysisResult → WebSocket → Extension → DOM highlight + hover card
	```

	---

	## Stack

	\| Layer \| Technology \| Why \|
	\|-------\|-----------\|-----\|
	\| Extension framework \| WXT v0.19 + React 19 \| HMR, multi-browser, TypeScript-first, Vite \|
	\| Extension state \| Zustand + chrome.storage.sync \| Persistent, reactive, cross-context \|
	\| LLM gatekeeper \| Groq llama3-8b-8192 \| 800+ tok/s, <100ms, no GPU needed \|
	\| LLM evaluation \| LiteLLM → Groq mixtral-8x7b / llama3-70b \| All free via Groq — swap providers without code changes \|
	\| Embeddings \| BGE-M3 via FastEmbed \| 100+ languages, 1024-dim, CPU-native, free \|
	\| Vector DB \| Qdrant (self-hosted) \| Sub-ms HNSW search, no vendor lock-in \|
	\| Graph DB \| Memgraph (in-memory) \| 10–100x faster than Neo4j for trust scoring \|
	\| Message queue \| Redpanda \| Kafka-compatible, no JVM, 10x lower latency \|
	\| Orchestration \| Prefect \| Native async, DAG flows, built-in retry \|
	\| Cache \| Redis Stack (RedisJSON) \| Structured claim cache, TTL per verdict color \|
	\| Package manager \| uv \| 10–100x faster than pip, lockfiles \|
	\| Hashing \| xxhash (client + server) \| Sub-microsecond content deduplication \|
	\| Edge tunnel \| Cloudflare Tunnel \| Zero-config TLS, no exposed ports \|
	\| Observability \| structlog + rich \| Structured JSON logs, colorized dev output \|

	---

	## Quick Start (HuggingFace Spaces)

	This Space runs the backend + demo UI via Docker. The browser extension is a separate build.

	### Required Secrets (set in Space settings → Secrets)

	\| Secret \| Required \| Description \|
	\|--------\|----------\|-------------\|
	\| `GROQ_API_KEY` \| Recommended \| Groq API key — powers all 3 LLM agents (gatekeeper, misinformation, hallucination). Free tier: 30 req/min \|
	\| `X_BEARER_TOKEN` \| Optional \| X API v2 bearer token for tweet velocity + Community Notes \|

	Without any API keys: The system runs in `DEMO_MODE=true` with deterministic mock results — great for exploring the UI and architecture without credentials.

	Get a free key:
	- Groq: https://console.groq.com (free tier: 30 req/min — covers all 3 LLM agents)

	### Run Locally

	```bash
	git clone <repo>
	cd omnichannel-fact-intelligence

	# Copy env template
	cp .env.example .env
	# Edit .env with your API keys

	# Start all services (Qdrant, Memgraph, Redpanda, Redis, FastAPI)
	docker compose up

	# Visit http://localhost:7860 for the demo UI
	```

	### Run Backend Only (no Docker for infra)

	```bash
	cd backend

	# Install uv (if not installed)
	curl -LsSf https://astral.sh/uv/install.sh \| sh

	# Install dependencies
	uv sync

	# Set env vars
	export GROQ_API_KEY=your_key
	export DEMO_MODE=true # Skip infrastructure deps for quick testing

	# Start FastAPI
	uv run uvicorn main:app --host 0.0.0.0 --port 7860 --reload
	```

	---

	## Browser Extension Setup

	### Prerequisites
	```bash
	cd extension
	npm install # or: bun install
	```

	### Development (Chrome)
	```bash
	# Set your backend URL (or use cloudflared tunnel)
	WS_URL=ws://localhost:7860/ws npx wxt dev --browser chrome
	```

	### Production Build
	```bash
	# Build for all browsers
	WS_URL=wss://fact-engine.your-domain.com/ws npx wxt build

	# Chrome: .output/chrome-mv3/
	# Firefox: .output/firefox-mv3/
	```

	### Load in Chrome
	1. Navigate to `chrome://extensions`
	2. Enable Developer mode (top right)
	3. Click Load unpacked → select `.output/chrome-mv3/`
	4. Visit X/Twitter, YouTube, or any news site — facts will begin highlighting

	---

	## Highlight Color Semantics

	\| Color \| Hex \| Meaning \|
	\|-------\|-----\|---------\|
	\| 🟢 Green \| `#22c55e` \| Fact-checked — corroborated by ≥2 sources, trust score ≥ 0.65 \|
	\| 🟡 Yellow \| `#eab308` \| Unverified — breaking news, weak corroboration, high velocity \|
	\| 🔴 Red \| `#ef4444` \| Debunked — refuted by ≥2 independent sources or Community Note active \|
	\| 🟣 Purple \| `#a855f7` \| AI hallucination — fabricated citation, impossibility, contradiction \|

	---

	## Trust Score Algorithm

	```
	score = 0.5 (baseline)
	+ 0.30 if Author.verified AND account_type IN ['government', 'official_news']
	+ 0.05 per corroborating Source node (capped at +0.25, i.e. 5 sources)
	- 0.40 if any Source has an active Community Note
	= clamp(score, 0.0, 1.0)
	```

	---

	## Data Pipeline

	Three async Redpanda producers simulate the omnichannel firehose:

	\| Producer \| Topic \| Rate \| Source \|
	\|----------\|-------\|------\|--------\|
	\| twitter_producer \| `raw.twitter` \| 50 eps \| Mock X posts \|
	\| instagram_producer \| `raw.instagram` \| 20 eps \| Mock story text (OCR-extracted) \|
	\| youtube_producer \| `raw.youtube` \| 10 eps \| Mock VTT transcript chunks \|

	A single async consumer aggregates all three, deduplicates by `content_hash`, and upserts into Qdrant + Memgraph.

	---

	## Extension Modes

	\| Mode \| Shows \|
	\|------\|-------\|
	\| Minimal \| Red + Purple only \|
	\| Normal (default) \| Red + Purple + Yellow \|
	\| Advanced \| All colors including Green \|

	---

	## File Structure

	```
	omnichannel-fact-intelligence/
	├── docker-compose.yml # All services in one command
	├── .env.example # Environment template
	│
	├── backend/
	│ ├── Dockerfile # uv + Python 3.12
	│ ├── pyproject.toml # All deps pinned (uv-compatible)
	│ ├── main.py # FastAPI app, WebSocket, Redis cache
	│ ├── gatekeeper.py # Groq fact/noise classifier (<120ms p95)
	│ ├── rag_pipeline.py # BGE-M3 + Qdrant + Memgraph trust graph
	│ ├── grok_sensor.py # X API v2 + Community Notes
	│ ├── agents.py # Prefect flow + LiteLLM multi-agent eval
	│ ├── core/
	│ │ ├── config.py # Pydantic-settings centralized config
	│ │ └── models.py # All Pydantic v2 models
	│ ├── producers/
	│ │ └── producers.py # Twitter + Instagram + YouTube + consumer
	│ └── static/
	│ └── index.html # Demo UI (served at /)
	│
	├── extension/
	│ ├── wxt.config.ts # WXT framework config
	│ ├── stores/
	│ │ └── extensionStore.ts # Zustand + chrome.storage.sync
	│ └── entrypoints/
	│ ├── background.ts # Persistent WS connection + message routing
	│ ├── content.tsx # MutationObserver + highlight + hover card
	│ └── popup.tsx # Master toggle + mode selector + badge
	│
	└── infra/
	└── tunnel_setup.sh # Cloudflare Tunnel setup script
	```

	---

	## License

	MIT — see LICENSE for details.