Spaces:
Sleeping
Sleeping
File size: 7,077 Bytes
53da193 4f48a4e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 | ---
title: Proofly API
emoji: π‘οΈ
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860
---
# Proofly
An AI-powered claim verification system that gathers evidence from 7 live sources, builds a semantic vector index, and uses Natural Language Inference (NLI) to produce a **True / False / Mixture/Uncertain** verdict β with full user authentication, history tracking, and a premium responsive UI.
---
## Features
- **JWT Authentication** β Register, login, logout with bcrypt-peppered passwords and HttpOnly cookie tokens
- **Per-User History** β Every fact-check is saved to MongoDB Atlas; view, delete, or clear your history
- **7 Evidence Sources**
- Static Knowledge Base (local, instant β no network needed)
- Wikidata (free entity facts, no API key)
- 12 RSS Feeds (BBC, CNN, Al Jazeera, NYT, The Hindu, NDTV, β¦)
- GDELT Project (global news events, no API key)
- NewsAPI (quality English headlines, requires free API key)
- Wikipedia REST API (encyclopedic summaries)
- DuckDuckGo HTML scrape (automatic fallback)
- **AI Pipeline** β `all-MiniLM-L6-v2` for semantic embeddings + FAISS vector search + `facebook/bart-large-mnli` for NLI
- **KB Short-Circuit** β Skips slow live fetches when the knowledge base already has a strong match (β₯ 0.65 similarity)
- **Image OCR** β Upload an image β EasyOCR extracts text β auto-fills the claim field
- **Security** β Flask-Talisman security headers, Flask-Limiter rate limiting, JWT blocklist on logout
- **Responsive UI** β Premium dark/light theme, permanent sidebar on all screen sizes
---
## Setup
### Prerequisites
- Python 3.8+
- MongoDB Atlas account (free tier works)
- (Optional) NewsAPI key β https://newsapi.org
### 1. Clone
```bash
git clone https://github.com/yourusername/proofly.git
cd proofly
```
### 2. Install dependencies
```bash
pip install -r requirements.txt
```
> PyTorch + Transformers models (~1β2 GB) download automatically on first run.
### 3. Configure `.env`
Copy `.env.example` to `.env` and fill in:
```env
# MongoDB Atlas
MONGO_URI=mongodb+srv://<user>:<password>@<cluster>.mongodb.net/?appName=<app>
MONGO_DB_NAME=factcheck
# FAISS index file path
FAISS_FILE=faiss.index
# NewsAPI (free key at newsapi.org)
NEWS_API_KEY=your_key_here
# Flask
FLASK_SECRET_KEY=your_long_random_secret_key
# JWT
JWT_SECRET_KEY=your_jwt_secret
JWT_ACCESS_TOKEN_MINS=15
JWT_REFRESH_TOKEN_DAYS=7
# Password pepper β keep secret, never commit
BCRYPT_PEPPER=your_pepper_string
# Bot identity header
USER_AGENT=ProoflyBot/1.0
```
### 4. Initialise MongoDB collections & indexes
```bash
python setup_db.py
```
Creates all 4 collections (`users`, `history`, `evidence`, `revoked_tokens`) with validators and indexes on Atlas.
### 5. Pre-populate evidence index *(recommended before first use)*
```bash
python update_data.py
```
Fetches from all sources across 24 broad topics and builds the FAISS index. Re-run weekly to keep evidence fresh.
### 6. Run
```bash
python app.py
```
Open `http://localhost:5000` β register an account and start fact-checking.
---
## Project Structure
```
newsXX/
βββ app.py # Flask app β routes, JWT config, security middleware
βββ auth.py # Auth Blueprint β register / login / logout / refresh
βββ api_wrapper.py # Per-request pipeline: evidence β FAISS β NLI β verdict
βββ model.py # AI models + 7 evidence fetchers
βββ update_data.py # Offline bulk evidence updater + FAISS index builder
βββ knowledge_base.py # ~80 curated static facts (no network required)
βββ setup_db.py # One-time MongoDB Atlas collection + index setup
βββ project/
β βββ config.py # All settings from .env (single source of truth)
β βββ database.py # MongoDB helpers (Borg singleton, CRUD, TTL)
βββ templates/
β βββ index.html # Dashboard / claim submission
β βββ results.html # Verdict + evidence + NLI breakdown
β βββ history.html # User claim history
β βββ login.html # Login page
β βββ register.html # Register page
βββ static/
β βββ style.css # Full design system (dark/light theme, responsive)
βββ .env # Local secrets (never commit)
βββ .env.example # Template
βββ requirements.txt # Python dependencies
βββ faiss.index # Vector index (built by update_data.py)
```
---
## How the Verdict Works
```
Claim β Embed (MiniLM) β Knowledge Base check
β if score β₯ 0.65 β skip live fetches
Wikidata + RSS + GDELT + NewsAPI + Wikipedia
β if < 3 items β DuckDuckGo fallback
Build FAISS index
β
Top-5 most similar evidence items
β
NLI (BART-MNLI) on each piece
β
Majority vote β True / False / Mixture/Uncertain
```
| Condition | Verdict |
|---|---|
| More entailment results than contradiction | β
**True** |
| More contradiction results than entailment | β **False** |
| Tied or average scores below 0.4 | β οΈ **Mixture/Uncertain** |
---
## MongoDB Collections
| Collection | Purpose | Auto-cleanup |
|---|---|---|
| `users` | Accounts with hashed passwords | β |
| `history` | Per-user fact-check records | β |
| `evidence` | Scraped text for FAISS | TTL 30 days |
| `revoked_tokens` | JWT logout blocklist | TTL at token expiry |
---
## Dependencies
| Package | Purpose |
|---|---|
| `flask` | Web framework |
| `flask-jwt-extended` | JWT access + refresh tokens via cookies |
| `flask-bcrypt` | Password hashing |
| `flask-limiter` | Rate limiting on auth endpoints |
| `flask-talisman` | HTTP security headers |
| `pymongo` | MongoDB Atlas driver |
| `python-dotenv` | `.env` loading |
| `sentence-transformers` | MiniLM-L6 embeddings |
| `transformers` | BART-MNLI NLI pipeline |
| `faiss-cpu` | Vector similarity search |
| `requests` | HTTP calls to APIs |
| `beautifulsoup4` | DuckDuckGo HTML scraping |
| `feedparser` | RSS feed parsing |
| `numpy` | Numerical operations |
| `torch` | Deep learning backend |
| `easyocr` | Image OCR |
| `Pillow` | Image processing |
---
## Security Notes
- Passwords are hashed with **bcrypt** + a server-side **pepper** β a leaked database alone cannot crack them
- JWT tokens stored in **HttpOnly** cookies β inaccessible to JavaScript (XSS-safe)
- `SameSite=Strict` cookie policy prevents CSRF
- Rate limiting: 5 login attempts / minute, 3 register attempts / minute per IP
- All security headers enforced by Flask-Talisman
---
## Contributing
Pull requests welcome. Please open an issue first for major changes.
---
## License
Open-source. See repository for license details.
|