Spaces:
Sleeping
Sleeping
| title: Proofly API | |
| emoji: π‘οΈ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| app_port: 7860 | |
| # Proofly | |
| An AI-powered claim verification system that gathers evidence from 7 live sources, builds a semantic vector index, and uses Natural Language Inference (NLI) to produce a **True / False / Mixture/Uncertain** verdict β with full user authentication, history tracking, and a premium responsive UI. | |
| --- | |
| ## Features | |
| - **JWT Authentication** β Register, login, logout with bcrypt-peppered passwords and HttpOnly cookie tokens | |
| - **Per-User History** β Every fact-check is saved to MongoDB Atlas; view, delete, or clear your history | |
| - **7 Evidence Sources** | |
| - Static Knowledge Base (local, instant β no network needed) | |
| - Wikidata (free entity facts, no API key) | |
| - 12 RSS Feeds (BBC, CNN, Al Jazeera, NYT, The Hindu, NDTV, β¦) | |
| - GDELT Project (global news events, no API key) | |
| - NewsAPI (quality English headlines, requires free API key) | |
| - Wikipedia REST API (encyclopedic summaries) | |
| - DuckDuckGo HTML scrape (automatic fallback) | |
| - **AI Pipeline** β `all-MiniLM-L6-v2` for semantic embeddings + FAISS vector search + `facebook/bart-large-mnli` for NLI | |
| - **KB Short-Circuit** β Skips slow live fetches when the knowledge base already has a strong match (β₯ 0.65 similarity) | |
| - **Image OCR** β Upload an image β EasyOCR extracts text β auto-fills the claim field | |
| - **Security** β Flask-Talisman security headers, Flask-Limiter rate limiting, JWT blocklist on logout | |
| - **Responsive UI** β Premium dark/light theme, permanent sidebar on all screen sizes | |
| --- | |
| ## Setup | |
| ### Prerequisites | |
| - Python 3.8+ | |
| - MongoDB Atlas account (free tier works) | |
| - (Optional) NewsAPI key β https://newsapi.org | |
| ### 1. Clone | |
| ```bash | |
| git clone https://github.com/yourusername/proofly.git | |
| cd proofly | |
| ``` | |
| ### 2. Install dependencies | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| > PyTorch + Transformers models (~1β2 GB) download automatically on first run. | |
| ### 3. Configure `.env` | |
| Copy `.env.example` to `.env` and fill in: | |
| ```env | |
| # MongoDB Atlas | |
| MONGO_URI=mongodb+srv://<user>:<password>@<cluster>.mongodb.net/?appName=<app> | |
| MONGO_DB_NAME=factcheck | |
| # FAISS index file path | |
| FAISS_FILE=faiss.index | |
| # NewsAPI (free key at newsapi.org) | |
| NEWS_API_KEY=your_key_here | |
| # Flask | |
| FLASK_SECRET_KEY=your_long_random_secret_key | |
| # JWT | |
| JWT_SECRET_KEY=your_jwt_secret | |
| JWT_ACCESS_TOKEN_MINS=15 | |
| JWT_REFRESH_TOKEN_DAYS=7 | |
| # Password pepper β keep secret, never commit | |
| BCRYPT_PEPPER=your_pepper_string | |
| # Bot identity header | |
| USER_AGENT=ProoflyBot/1.0 | |
| ``` | |
| ### 4. Initialise MongoDB collections & indexes | |
| ```bash | |
| python setup_db.py | |
| ``` | |
| Creates all 4 collections (`users`, `history`, `evidence`, `revoked_tokens`) with validators and indexes on Atlas. | |
| ### 5. Pre-populate evidence index *(recommended before first use)* | |
| ```bash | |
| python update_data.py | |
| ``` | |
| Fetches from all sources across 24 broad topics and builds the FAISS index. Re-run weekly to keep evidence fresh. | |
| ### 6. Run | |
| ```bash | |
| python app.py | |
| ``` | |
| Open `http://localhost:5000` β register an account and start fact-checking. | |
| --- | |
| ## Project Structure | |
| ``` | |
| newsXX/ | |
| βββ app.py # Flask app β routes, JWT config, security middleware | |
| βββ auth.py # Auth Blueprint β register / login / logout / refresh | |
| βββ api_wrapper.py # Per-request pipeline: evidence β FAISS β NLI β verdict | |
| βββ model.py # AI models + 7 evidence fetchers | |
| βββ update_data.py # Offline bulk evidence updater + FAISS index builder | |
| βββ knowledge_base.py # ~80 curated static facts (no network required) | |
| βββ setup_db.py # One-time MongoDB Atlas collection + index setup | |
| βββ project/ | |
| β βββ config.py # All settings from .env (single source of truth) | |
| β βββ database.py # MongoDB helpers (Borg singleton, CRUD, TTL) | |
| βββ templates/ | |
| β βββ index.html # Dashboard / claim submission | |
| β βββ results.html # Verdict + evidence + NLI breakdown | |
| β βββ history.html # User claim history | |
| β βββ login.html # Login page | |
| β βββ register.html # Register page | |
| βββ static/ | |
| β βββ style.css # Full design system (dark/light theme, responsive) | |
| βββ .env # Local secrets (never commit) | |
| βββ .env.example # Template | |
| βββ requirements.txt # Python dependencies | |
| βββ faiss.index # Vector index (built by update_data.py) | |
| ``` | |
| --- | |
| ## How the Verdict Works | |
| ``` | |
| Claim β Embed (MiniLM) β Knowledge Base check | |
| β if score β₯ 0.65 β skip live fetches | |
| Wikidata + RSS + GDELT + NewsAPI + Wikipedia | |
| β if < 3 items β DuckDuckGo fallback | |
| Build FAISS index | |
| β | |
| Top-5 most similar evidence items | |
| β | |
| NLI (BART-MNLI) on each piece | |
| β | |
| Majority vote β True / False / Mixture/Uncertain | |
| ``` | |
| | Condition | Verdict | | |
| |---|---| | |
| | More entailment results than contradiction | β **True** | | |
| | More contradiction results than entailment | β **False** | | |
| | Tied or average scores below 0.4 | β οΈ **Mixture/Uncertain** | | |
| --- | |
| ## MongoDB Collections | |
| | Collection | Purpose | Auto-cleanup | | |
| |---|---|---| | |
| | `users` | Accounts with hashed passwords | β | | |
| | `history` | Per-user fact-check records | β | | |
| | `evidence` | Scraped text for FAISS | TTL 30 days | | |
| | `revoked_tokens` | JWT logout blocklist | TTL at token expiry | | |
| --- | |
| ## Dependencies | |
| | Package | Purpose | | |
| |---|---| | |
| | `flask` | Web framework | | |
| | `flask-jwt-extended` | JWT access + refresh tokens via cookies | | |
| | `flask-bcrypt` | Password hashing | | |
| | `flask-limiter` | Rate limiting on auth endpoints | | |
| | `flask-talisman` | HTTP security headers | | |
| | `pymongo` | MongoDB Atlas driver | | |
| | `python-dotenv` | `.env` loading | | |
| | `sentence-transformers` | MiniLM-L6 embeddings | | |
| | `transformers` | BART-MNLI NLI pipeline | | |
| | `faiss-cpu` | Vector similarity search | | |
| | `requests` | HTTP calls to APIs | | |
| | `beautifulsoup4` | DuckDuckGo HTML scraping | | |
| | `feedparser` | RSS feed parsing | | |
| | `numpy` | Numerical operations | | |
| | `torch` | Deep learning backend | | |
| | `easyocr` | Image OCR | | |
| | `Pillow` | Image processing | | |
| --- | |
| ## Security Notes | |
| - Passwords are hashed with **bcrypt** + a server-side **pepper** β a leaked database alone cannot crack them | |
| - JWT tokens stored in **HttpOnly** cookies β inaccessible to JavaScript (XSS-safe) | |
| - `SameSite=Strict` cookie policy prevents CSRF | |
| - Rate limiting: 5 login attempts / minute, 3 register attempts / minute per IP | |
| - All security headers enforced by Flask-Talisman | |
| --- | |
| ## Contributing | |
| Pull requests welcome. Please open an issue first for major changes. | |
| --- | |
| ## License | |
| Open-source. See repository for license details. | |