Spaces:

Pragthedon
/

proofly

Sleeping

File size: 7,077 Bytes

---
title: Proofly API
emoji: 🛡️
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
app_port: 7860
---

# Proofly

An AI-powered claim verification system that gathers evidence from 7 live sources, builds a semantic vector index, and uses Natural Language Inference (NLI) to produce a **True / False / Mixture/Uncertain** verdict — with full user authentication, history tracking, and a premium responsive UI.

---

## Features

- **JWT Authentication** — Register, login, logout with bcrypt-peppered passwords and HttpOnly cookie tokens
- **Per-User History** — Every fact-check is saved to MongoDB Atlas; view, delete, or clear your history
- **7 Evidence Sources**
  - Static Knowledge Base (local, instant — no network needed)
  - Wikidata (free entity facts, no API key)
  - 12 RSS Feeds (BBC, CNN, Al Jazeera, NYT, The Hindu, NDTV, …)
  - GDELT Project (global news events, no API key)
  - NewsAPI (quality English headlines, requires free API key)
  - Wikipedia REST API (encyclopedic summaries)
  - DuckDuckGo HTML scrape (automatic fallback)
- **AI Pipeline** — `all-MiniLM-L6-v2` for semantic embeddings + FAISS vector search + `facebook/bart-large-mnli` for NLI
- **KB Short-Circuit** — Skips slow live fetches when the knowledge base already has a strong match (≥ 0.65 similarity)
- **Image OCR** — Upload an image → EasyOCR extracts text → auto-fills the claim field
- **Security** — Flask-Talisman security headers, Flask-Limiter rate limiting, JWT blocklist on logout
- **Responsive UI** — Premium dark/light theme, permanent sidebar on all screen sizes

---

## Setup

### Prerequisites
- Python 3.8+
- MongoDB Atlas account (free tier works)
- (Optional) NewsAPI key — https://newsapi.org

### 1. Clone
```bash
git clone https://github.com/yourusername/proofly.git
cd proofly
```

### 2. Install dependencies
```bash
pip install -r requirements.txt
```
> PyTorch + Transformers models (~1–2 GB) download automatically on first run.

### 3. Configure `.env`
Copy `.env.example` to `.env` and fill in:

```env
# MongoDB Atlas
MONGO_URI=mongodb+srv://<user>:<password>@<cluster>.mongodb.net/?appName=<app>
MONGO_DB_NAME=factcheck

# FAISS index file path
FAISS_FILE=faiss.index

# NewsAPI (free key at newsapi.org)
NEWS_API_KEY=your_key_here

# Flask
FLASK_SECRET_KEY=your_long_random_secret_key

# JWT
JWT_SECRET_KEY=your_jwt_secret
JWT_ACCESS_TOKEN_MINS=15
JWT_REFRESH_TOKEN_DAYS=7

# Password pepper — keep secret, never commit
BCRYPT_PEPPER=your_pepper_string

# Bot identity header
USER_AGENT=ProoflyBot/1.0
```

### 4. Initialise MongoDB collections & indexes
```bash
python setup_db.py
```
Creates all 4 collections (`users`, `history`, `evidence`, `revoked_tokens`) with validators and indexes on Atlas.

### 5. Pre-populate evidence index *(recommended before first use)*
```bash
python update_data.py
```
Fetches from all sources across 24 broad topics and builds the FAISS index. Re-run weekly to keep evidence fresh.

### 6. Run
```bash
python app.py
```
Open `http://localhost:5000` — register an account and start fact-checking.

---

## Project Structure

```
newsXX/
├── app.py                  # Flask app — routes, JWT config, security middleware
├── auth.py                 # Auth Blueprint — register / login / logout / refresh
├── api_wrapper.py          # Per-request pipeline: evidence → FAISS → NLI → verdict
├── model.py                # AI models + 7 evidence fetchers
├── update_data.py          # Offline bulk evidence updater + FAISS index builder
├── knowledge_base.py       # ~80 curated static facts (no network required)
├── setup_db.py             # One-time MongoDB Atlas collection + index setup
├── project/
│   ├── config.py           # All settings from .env (single source of truth)
│   └── database.py         # MongoDB helpers (Borg singleton, CRUD, TTL)
├── templates/
│   ├── index.html          # Dashboard / claim submission
│   ├── results.html        # Verdict + evidence + NLI breakdown
│   ├── history.html        # User claim history
│   ├── login.html          # Login page
│   └── register.html       # Register page
├── static/
│   └── style.css           # Full design system (dark/light theme, responsive)
├── .env                    # Local secrets (never commit)
├── .env.example            # Template
├── requirements.txt        # Python dependencies
└── faiss.index             # Vector index (built by update_data.py)
```

---

## How the Verdict Works

```
Claim → Embed (MiniLM) → Knowledge Base check
                              ↓ if score ≥ 0.65 → skip live fetches
                         Wikidata + RSS + GDELT + NewsAPI + Wikipedia
                              ↓ if < 3 items → DuckDuckGo fallback
                         Build FAISS index
                              ↓
                         Top-5 most similar evidence items
                              ↓
                         NLI (BART-MNLI) on each piece
                              ↓
                    Majority vote → True / False / Mixture/Uncertain
```

| Condition | Verdict |
|---|---|
| More entailment results than contradiction | ✅ **True** |
| More contradiction results than entailment | ❌ **False** |
| Tied or average scores below 0.4 | ⚠️ **Mixture/Uncertain** |

---

## MongoDB Collections

| Collection | Purpose | Auto-cleanup |
|---|---|---|
| `users` | Accounts with hashed passwords | — |
| `history` | Per-user fact-check records | — |
| `evidence` | Scraped text for FAISS | TTL 30 days |
| `revoked_tokens` | JWT logout blocklist | TTL at token expiry |

---

## Dependencies

| Package | Purpose |
|---|---|
| `flask` | Web framework |
| `flask-jwt-extended` | JWT access + refresh tokens via cookies |
| `flask-bcrypt` | Password hashing |
| `flask-limiter` | Rate limiting on auth endpoints |
| `flask-talisman` | HTTP security headers |
| `pymongo` | MongoDB Atlas driver |
| `python-dotenv` | `.env` loading |
| `sentence-transformers` | MiniLM-L6 embeddings |
| `transformers` | BART-MNLI NLI pipeline |
| `faiss-cpu` | Vector similarity search |
| `requests` | HTTP calls to APIs |
| `beautifulsoup4` | DuckDuckGo HTML scraping |
| `feedparser` | RSS feed parsing |
| `numpy` | Numerical operations |
| `torch` | Deep learning backend |
| `easyocr` | Image OCR |
| `Pillow` | Image processing |

---

## Security Notes

- Passwords are hashed with **bcrypt** + a server-side **pepper** — a leaked database alone cannot crack them
- JWT tokens stored in **HttpOnly** cookies — inaccessible to JavaScript (XSS-safe)
- `SameSite=Strict` cookie policy prevents CSRF
- Rate limiting: 5 login attempts / minute, 3 register attempts / minute per IP
- All security headers enforced by Flask-Talisman

---

## Contributing

Pull requests welcome. Please open an issue first for major changes.

---

## License

Open-source. See repository for license details.