Spaces:
Running
title: Proofly API
emoji: π‘οΈ
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860
Proofly
An AI-powered claim verification system that gathers evidence from 7 live sources, builds a semantic vector index, and uses Natural Language Inference (NLI) to produce a True / False / Mixture/Uncertain verdict β with full user authentication, history tracking, and a premium responsive UI.
Features
- JWT Authentication β Register, login, logout with bcrypt-peppered passwords and HttpOnly cookie tokens
- Per-User History β Every fact-check is saved to MongoDB Atlas; view, delete, or clear your history
- 7 Evidence Sources
- Static Knowledge Base (local, instant β no network needed)
- Wikidata (free entity facts, no API key)
- 12 RSS Feeds (BBC, CNN, Al Jazeera, NYT, The Hindu, NDTV, β¦)
- GDELT Project (global news events, no API key)
- NewsAPI (quality English headlines, requires free API key)
- Wikipedia REST API (encyclopedic summaries)
- DuckDuckGo HTML scrape (automatic fallback)
- AI Pipeline β
all-MiniLM-L6-v2for semantic embeddings + FAISS vector search +facebook/bart-large-mnlifor NLI - KB Short-Circuit β Skips slow live fetches when the knowledge base already has a strong match (β₯ 0.65 similarity)
- Image OCR β Upload an image β EasyOCR extracts text β auto-fills the claim field
- Security β Flask-Talisman security headers, Flask-Limiter rate limiting, JWT blocklist on logout
- Responsive UI β Premium dark/light theme, permanent sidebar on all screen sizes
Setup
Prerequisites
- Python 3.8+
- MongoDB Atlas account (free tier works)
- (Optional) NewsAPI key β https://newsapi.org
1. Clone
git clone https://github.com/yourusername/proofly.git
cd proofly
2. Install dependencies
pip install -r requirements.txt
PyTorch + Transformers models (~1β2 GB) download automatically on first run.
3. Configure .env
Copy .env.example to .env and fill in:
# MongoDB Atlas
MONGO_URI=mongodb+srv://<user>:<password>@<cluster>.mongodb.net/?appName=<app>
MONGO_DB_NAME=factcheck
# FAISS index file path
FAISS_FILE=faiss.index
# NewsAPI (free key at newsapi.org)
NEWS_API_KEY=your_key_here
# Flask
FLASK_SECRET_KEY=your_long_random_secret_key
# JWT
JWT_SECRET_KEY=your_jwt_secret
JWT_ACCESS_TOKEN_MINS=15
JWT_REFRESH_TOKEN_DAYS=7
# Password pepper β keep secret, never commit
BCRYPT_PEPPER=your_pepper_string
# Bot identity header
USER_AGENT=ProoflyBot/1.0
4. Initialise MongoDB collections & indexes
python setup_db.py
Creates all 4 collections (users, history, evidence, revoked_tokens) with validators and indexes on Atlas.
5. Pre-populate evidence index (recommended before first use)
python update_data.py
Fetches from all sources across 24 broad topics and builds the FAISS index. Re-run weekly to keep evidence fresh.
6. Run
python app.py
Open http://localhost:5000 β register an account and start fact-checking.
Project Structure
newsXX/
βββ app.py # Flask app β routes, JWT config, security middleware
βββ auth.py # Auth Blueprint β register / login / logout / refresh
βββ api_wrapper.py # Per-request pipeline: evidence β FAISS β NLI β verdict
βββ model.py # AI models + 7 evidence fetchers
βββ update_data.py # Offline bulk evidence updater + FAISS index builder
βββ knowledge_base.py # ~80 curated static facts (no network required)
βββ setup_db.py # One-time MongoDB Atlas collection + index setup
βββ project/
β βββ config.py # All settings from .env (single source of truth)
β βββ database.py # MongoDB helpers (Borg singleton, CRUD, TTL)
βββ templates/
β βββ index.html # Dashboard / claim submission
β βββ results.html # Verdict + evidence + NLI breakdown
β βββ history.html # User claim history
β βββ login.html # Login page
β βββ register.html # Register page
βββ static/
β βββ style.css # Full design system (dark/light theme, responsive)
βββ .env # Local secrets (never commit)
βββ .env.example # Template
βββ requirements.txt # Python dependencies
βββ faiss.index # Vector index (built by update_data.py)
How the Verdict Works
Claim β Embed (MiniLM) β Knowledge Base check
β if score β₯ 0.65 β skip live fetches
Wikidata + RSS + GDELT + NewsAPI + Wikipedia
β if < 3 items β DuckDuckGo fallback
Build FAISS index
β
Top-5 most similar evidence items
β
NLI (BART-MNLI) on each piece
β
Majority vote β True / False / Mixture/Uncertain
| Condition | Verdict |
|---|---|
| More entailment results than contradiction | β True |
| More contradiction results than entailment | β False |
| Tied or average scores below 0.4 | β οΈ Mixture/Uncertain |
MongoDB Collections
| Collection | Purpose | Auto-cleanup |
|---|---|---|
users |
Accounts with hashed passwords | β |
history |
Per-user fact-check records | β |
evidence |
Scraped text for FAISS | TTL 30 days |
revoked_tokens |
JWT logout blocklist | TTL at token expiry |
Dependencies
| Package | Purpose |
|---|---|
flask |
Web framework |
flask-jwt-extended |
JWT access + refresh tokens via cookies |
flask-bcrypt |
Password hashing |
flask-limiter |
Rate limiting on auth endpoints |
flask-talisman |
HTTP security headers |
pymongo |
MongoDB Atlas driver |
python-dotenv |
.env loading |
sentence-transformers |
MiniLM-L6 embeddings |
transformers |
BART-MNLI NLI pipeline |
faiss-cpu |
Vector similarity search |
requests |
HTTP calls to APIs |
beautifulsoup4 |
DuckDuckGo HTML scraping |
feedparser |
RSS feed parsing |
numpy |
Numerical operations |
torch |
Deep learning backend |
easyocr |
Image OCR |
Pillow |
Image processing |
Security Notes
- Passwords are hashed with bcrypt + a server-side pepper β a leaked database alone cannot crack them
- JWT tokens stored in HttpOnly cookies β inaccessible to JavaScript (XSS-safe)
SameSite=Strictcookie policy prevents CSRF- Rate limiting: 5 login attempts / minute, 3 register attempts / minute per IP
- All security headers enforced by Flask-Talisman
Contributing
Pull requests welcome. Please open an issue first for major changes.
License
Open-source. See repository for license details.