--- title: Proofly API emoji: 🛡️ colorFrom: blue colorTo: purple sdk: docker pinned: false app_port: 7860 --- # Proofly An AI-powered claim verification system that gathers evidence from 7 live sources, builds a semantic vector index, and uses Natural Language Inference (NLI) to produce a **True / False / Mixture/Uncertain** verdict — with full user authentication, history tracking, and a premium responsive UI. --- ## Features - **JWT Authentication** — Register, login, logout with bcrypt-peppered passwords and HttpOnly cookie tokens - **Per-User History** — Every fact-check is saved to MongoDB Atlas; view, delete, or clear your history - **7 Evidence Sources** - Static Knowledge Base (local, instant — no network needed) - Wikidata (free entity facts, no API key) - 12 RSS Feeds (BBC, CNN, Al Jazeera, NYT, The Hindu, NDTV, …) - GDELT Project (global news events, no API key) - NewsAPI (quality English headlines, requires free API key) - Wikipedia REST API (encyclopedic summaries) - DuckDuckGo HTML scrape (automatic fallback) - **AI Pipeline** — `all-MiniLM-L6-v2` for semantic embeddings + FAISS vector search + `facebook/bart-large-mnli` for NLI - **KB Short-Circuit** — Skips slow live fetches when the knowledge base already has a strong match (≥ 0.65 similarity) - **Image OCR** — Upload an image → EasyOCR extracts text → auto-fills the claim field - **Security** — Flask-Talisman security headers, Flask-Limiter rate limiting, JWT blocklist on logout - **Responsive UI** — Premium dark/light theme, permanent sidebar on all screen sizes --- ## Setup ### Prerequisites - Python 3.8+ - MongoDB Atlas account (free tier works) - (Optional) NewsAPI key — https://newsapi.org ### 1. Clone ```bash git clone https://github.com/yourusername/proofly.git cd proofly ``` ### 2. Install dependencies ```bash pip install -r requirements.txt ``` > PyTorch + Transformers models (~1–2 GB) download automatically on first run. ### 3. Configure `.env` Copy `.env.example` to `.env` and fill in: ```env # MongoDB Atlas MONGO_URI=mongodb+srv://:@.mongodb.net/?appName= MONGO_DB_NAME=factcheck # FAISS index file path FAISS_FILE=faiss.index # NewsAPI (free key at newsapi.org) NEWS_API_KEY=your_key_here # Flask FLASK_SECRET_KEY=your_long_random_secret_key # JWT JWT_SECRET_KEY=your_jwt_secret JWT_ACCESS_TOKEN_MINS=15 JWT_REFRESH_TOKEN_DAYS=7 # Password pepper — keep secret, never commit BCRYPT_PEPPER=your_pepper_string # Bot identity header USER_AGENT=ProoflyBot/1.0 ``` ### 4. Initialise MongoDB collections & indexes ```bash python setup_db.py ``` Creates all 4 collections (`users`, `history`, `evidence`, `revoked_tokens`) with validators and indexes on Atlas. ### 5. Pre-populate evidence index *(recommended before first use)* ```bash python update_data.py ``` Fetches from all sources across 24 broad topics and builds the FAISS index. Re-run weekly to keep evidence fresh. ### 6. Run ```bash python app.py ``` Open `http://localhost:5000` — register an account and start fact-checking. --- ## Project Structure ``` newsXX/ ├── app.py # Flask app — routes, JWT config, security middleware ├── auth.py # Auth Blueprint — register / login / logout / refresh ├── api_wrapper.py # Per-request pipeline: evidence → FAISS → NLI → verdict ├── model.py # AI models + 7 evidence fetchers ├── update_data.py # Offline bulk evidence updater + FAISS index builder ├── knowledge_base.py # ~80 curated static facts (no network required) ├── setup_db.py # One-time MongoDB Atlas collection + index setup ├── project/ │ ├── config.py # All settings from .env (single source of truth) │ └── database.py # MongoDB helpers (Borg singleton, CRUD, TTL) ├── templates/ │ ├── index.html # Dashboard / claim submission │ ├── results.html # Verdict + evidence + NLI breakdown │ ├── history.html # User claim history │ ├── login.html # Login page │ └── register.html # Register page ├── static/ │ └── style.css # Full design system (dark/light theme, responsive) ├── .env # Local secrets (never commit) ├── .env.example # Template ├── requirements.txt # Python dependencies └── faiss.index # Vector index (built by update_data.py) ``` --- ## How the Verdict Works ``` Claim → Embed (MiniLM) → Knowledge Base check ↓ if score ≥ 0.65 → skip live fetches Wikidata + RSS + GDELT + NewsAPI + Wikipedia ↓ if < 3 items → DuckDuckGo fallback Build FAISS index ↓ Top-5 most similar evidence items ↓ NLI (BART-MNLI) on each piece ↓ Majority vote → True / False / Mixture/Uncertain ``` | Condition | Verdict | |---|---| | More entailment results than contradiction | ✅ **True** | | More contradiction results than entailment | ❌ **False** | | Tied or average scores below 0.4 | ⚠️ **Mixture/Uncertain** | --- ## MongoDB Collections | Collection | Purpose | Auto-cleanup | |---|---|---| | `users` | Accounts with hashed passwords | — | | `history` | Per-user fact-check records | — | | `evidence` | Scraped text for FAISS | TTL 30 days | | `revoked_tokens` | JWT logout blocklist | TTL at token expiry | --- ## Dependencies | Package | Purpose | |---|---| | `flask` | Web framework | | `flask-jwt-extended` | JWT access + refresh tokens via cookies | | `flask-bcrypt` | Password hashing | | `flask-limiter` | Rate limiting on auth endpoints | | `flask-talisman` | HTTP security headers | | `pymongo` | MongoDB Atlas driver | | `python-dotenv` | `.env` loading | | `sentence-transformers` | MiniLM-L6 embeddings | | `transformers` | BART-MNLI NLI pipeline | | `faiss-cpu` | Vector similarity search | | `requests` | HTTP calls to APIs | | `beautifulsoup4` | DuckDuckGo HTML scraping | | `feedparser` | RSS feed parsing | | `numpy` | Numerical operations | | `torch` | Deep learning backend | | `easyocr` | Image OCR | | `Pillow` | Image processing | --- ## Security Notes - Passwords are hashed with **bcrypt** + a server-side **pepper** — a leaked database alone cannot crack them - JWT tokens stored in **HttpOnly** cookies — inaccessible to JavaScript (XSS-safe) - `SameSite=Strict` cookie policy prevents CSRF - Rate limiting: 5 login attempts / minute, 3 register attempts / minute per IP - All security headers enforced by Flask-Talisman --- ## Contributing Pull requests welcome. Please open an issue first for major changes. --- ## License Open-source. See repository for license details.