proofly / README.md
Pragthedon's picture
Update README with Hugging Face Spaces config block
53da193
metadata
title: Proofly API
emoji: πŸ›‘οΈ
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860

Proofly

An AI-powered claim verification system that gathers evidence from 7 live sources, builds a semantic vector index, and uses Natural Language Inference (NLI) to produce a True / False / Mixture/Uncertain verdict β€” with full user authentication, history tracking, and a premium responsive UI.


Features

  • JWT Authentication β€” Register, login, logout with bcrypt-peppered passwords and HttpOnly cookie tokens
  • Per-User History β€” Every fact-check is saved to MongoDB Atlas; view, delete, or clear your history
  • 7 Evidence Sources
    • Static Knowledge Base (local, instant β€” no network needed)
    • Wikidata (free entity facts, no API key)
    • 12 RSS Feeds (BBC, CNN, Al Jazeera, NYT, The Hindu, NDTV, …)
    • GDELT Project (global news events, no API key)
    • NewsAPI (quality English headlines, requires free API key)
    • Wikipedia REST API (encyclopedic summaries)
    • DuckDuckGo HTML scrape (automatic fallback)
  • AI Pipeline β€” all-MiniLM-L6-v2 for semantic embeddings + FAISS vector search + facebook/bart-large-mnli for NLI
  • KB Short-Circuit β€” Skips slow live fetches when the knowledge base already has a strong match (β‰₯ 0.65 similarity)
  • Image OCR β€” Upload an image β†’ EasyOCR extracts text β†’ auto-fills the claim field
  • Security β€” Flask-Talisman security headers, Flask-Limiter rate limiting, JWT blocklist on logout
  • Responsive UI β€” Premium dark/light theme, permanent sidebar on all screen sizes

Setup

Prerequisites

  • Python 3.8+
  • MongoDB Atlas account (free tier works)
  • (Optional) NewsAPI key β€” https://newsapi.org

1. Clone

git clone https://github.com/yourusername/proofly.git
cd proofly

2. Install dependencies

pip install -r requirements.txt

PyTorch + Transformers models (~1–2 GB) download automatically on first run.

3. Configure .env

Copy .env.example to .env and fill in:

# MongoDB Atlas
MONGO_URI=mongodb+srv://<user>:<password>@<cluster>.mongodb.net/?appName=<app>
MONGO_DB_NAME=factcheck

# FAISS index file path
FAISS_FILE=faiss.index

# NewsAPI (free key at newsapi.org)
NEWS_API_KEY=your_key_here

# Flask
FLASK_SECRET_KEY=your_long_random_secret_key

# JWT
JWT_SECRET_KEY=your_jwt_secret
JWT_ACCESS_TOKEN_MINS=15
JWT_REFRESH_TOKEN_DAYS=7

# Password pepper β€” keep secret, never commit
BCRYPT_PEPPER=your_pepper_string

# Bot identity header
USER_AGENT=ProoflyBot/1.0

4. Initialise MongoDB collections & indexes

python setup_db.py

Creates all 4 collections (users, history, evidence, revoked_tokens) with validators and indexes on Atlas.

5. Pre-populate evidence index (recommended before first use)

python update_data.py

Fetches from all sources across 24 broad topics and builds the FAISS index. Re-run weekly to keep evidence fresh.

6. Run

python app.py

Open http://localhost:5000 β€” register an account and start fact-checking.


Project Structure

newsXX/
β”œβ”€β”€ app.py                  # Flask app β€” routes, JWT config, security middleware
β”œβ”€β”€ auth.py                 # Auth Blueprint β€” register / login / logout / refresh
β”œβ”€β”€ api_wrapper.py          # Per-request pipeline: evidence β†’ FAISS β†’ NLI β†’ verdict
β”œβ”€β”€ model.py                # AI models + 7 evidence fetchers
β”œβ”€β”€ update_data.py          # Offline bulk evidence updater + FAISS index builder
β”œβ”€β”€ knowledge_base.py       # ~80 curated static facts (no network required)
β”œβ”€β”€ setup_db.py             # One-time MongoDB Atlas collection + index setup
β”œβ”€β”€ project/
β”‚   β”œβ”€β”€ config.py           # All settings from .env (single source of truth)
β”‚   └── database.py         # MongoDB helpers (Borg singleton, CRUD, TTL)
β”œβ”€β”€ templates/
β”‚   β”œβ”€β”€ index.html          # Dashboard / claim submission
β”‚   β”œβ”€β”€ results.html        # Verdict + evidence + NLI breakdown
β”‚   β”œβ”€β”€ history.html        # User claim history
β”‚   β”œβ”€β”€ login.html          # Login page
β”‚   └── register.html       # Register page
β”œβ”€β”€ static/
β”‚   └── style.css           # Full design system (dark/light theme, responsive)
β”œβ”€β”€ .env                    # Local secrets (never commit)
β”œβ”€β”€ .env.example            # Template
β”œβ”€β”€ requirements.txt        # Python dependencies
└── faiss.index             # Vector index (built by update_data.py)

How the Verdict Works

Claim β†’ Embed (MiniLM) β†’ Knowledge Base check
                              ↓ if score β‰₯ 0.65 β†’ skip live fetches
                         Wikidata + RSS + GDELT + NewsAPI + Wikipedia
                              ↓ if < 3 items β†’ DuckDuckGo fallback
                         Build FAISS index
                              ↓
                         Top-5 most similar evidence items
                              ↓
                         NLI (BART-MNLI) on each piece
                              ↓
                    Majority vote β†’ True / False / Mixture/Uncertain
Condition Verdict
More entailment results than contradiction βœ… True
More contradiction results than entailment ❌ False
Tied or average scores below 0.4 ⚠️ Mixture/Uncertain

MongoDB Collections

Collection Purpose Auto-cleanup
users Accounts with hashed passwords β€”
history Per-user fact-check records β€”
evidence Scraped text for FAISS TTL 30 days
revoked_tokens JWT logout blocklist TTL at token expiry

Dependencies

Package Purpose
flask Web framework
flask-jwt-extended JWT access + refresh tokens via cookies
flask-bcrypt Password hashing
flask-limiter Rate limiting on auth endpoints
flask-talisman HTTP security headers
pymongo MongoDB Atlas driver
python-dotenv .env loading
sentence-transformers MiniLM-L6 embeddings
transformers BART-MNLI NLI pipeline
faiss-cpu Vector similarity search
requests HTTP calls to APIs
beautifulsoup4 DuckDuckGo HTML scraping
feedparser RSS feed parsing
numpy Numerical operations
torch Deep learning backend
easyocr Image OCR
Pillow Image processing

Security Notes

  • Passwords are hashed with bcrypt + a server-side pepper β€” a leaked database alone cannot crack them
  • JWT tokens stored in HttpOnly cookies β€” inaccessible to JavaScript (XSS-safe)
  • SameSite=Strict cookie policy prevents CSRF
  • Rate limiting: 5 login attempts / minute, 3 register attempts / minute per IP
  • All security headers enforced by Flask-Talisman

Contributing

Pull requests welcome. Please open an issue first for major changes.


License

Open-source. See repository for license details.