ο»Ώ# π‘οΈ TruthLens β BERT-Based Fake News Detector
A full-stack web application that detects fake news using a large language model (LLM) as the primary classifier, backed by a fine-tuned BERT transformer model, real-time Google News RSS validation, image OCR analysis, API rate limiting, and a fully animated React interface with MongoDB-backed user authentication.
π Live Demo
| Link | |
|---|---|
| π₯οΈ Frontend (React App) | https://truth-lens-bert-based-fake-news-and.vercel.app |
| βοΈ Backend API | https://suryakf-truthlens-backend.hf.space |
| π Swagger / API Docs | https://suryakf-truthlens-backend.hf.space/docs |
The backend runs on Hugging Face Spaces (CPU Basic β 2 vCPU, 16 GB RAM). The frontend is deployed on Vercel with global CDN. The database is MongoDB Atlas (M0 free cluster).
β¨ Features
Core Detection Pipeline
- Fine-tuned BERT (Primary) β PyTorch BERT model (~95% accuracy)
- Three-label output β
REAL/FAKE/UNVERIFIED. The LLM outputs UNVERIFIED when evidence is inconclusive, avoiding over-flagging real recent news as fake. - Confidence Scoring β Per-prediction probability distribution visualised as a live pie chart.
- Batch Analysis β Submit up to 10 news texts in one request.
News Source Validation
- Google News RSS β Free real-time headline search (no API key required). Retrieves title, source, publish date, and article description.
- NewsAPI Integration β Extended article lookup with source attribution.
- SerpAPI Integration β Fallback search-engine news verification.
- Live context injection β All retrieved articles (headline + summary + URL + publish date) are passed directly into the LLM's prompt so it cross-references the claim against real-world evidence.
Image & OCR
- Screenshot Upload β Paste or upload a screenshot of a news headline/article.
- Mistral OCR β Extracts title, body text, source, and date from the image.
- Same pipeline as text β After OCR, the extracted headline goes through the same LLM-primary flow (news search β LLM with context β BERT fallback).
Rate Limiting
API rate limits enforced via slowapi (per client IP):
| Endpoint | Limit |
|---|---|
POST /api/predict |
30 / minute |
POST /api/batch-predict |
5 / minute |
POST /api/image-predict |
10 / minute |
POST /api/extract-image-text |
10 / minute |
POST /api/auth/login |
5 / minute |
POST /api/auth/register |
3 / minute |
Authentication & History
- JWT Authentication β 24-hour access tokens, bcrypt-hashed passwords.
- Prediction History β Every analysis stored with timestamp and label in MongoDB.
- User Dashboard β Live stats, streak counter, accuracy breakdown.
Developer Experience
- Rotating Log Files β All API activity written to
logs/app.log(10 MB cap, 5 backups). - Swagger / ReDoc β Auto-generated interactive API docs at
/docsand/redoc. - Environment-Driven Config β All secrets via
.env.
ποΈ Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND (React + Vite) β
β Home β Login β Register β Dashboard β
β GSAP ScrollTrigger Β· Framer Motion Β· TailwindCSS Β· Recharts β
ββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β HTTPS / JWT
ββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββ
β BACKEND (FastAPI) β
β Rate Limiting (slowapi) β Logging Middleware β logs/app.log β
β /api/predict /api/batch-predict /api/image-predict β
ββββββββ¬βββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββ
β β
βΌ STEP 1 βΌ STEP 1 (image)
βββββββββββββββββββ βββββββββββββββββββββββ
β News Validator β β Mistral OCR β
β Google News RSS β β Extracts title + β
β NewsAPI β β text from image β
β SerpAPI β ββββββββββββ¬βββββββββββ
ββββββββββ¬βββββββββ β
β articles (title+desc+date+url) β extracted headline
βΌ STEP 2 (PRIMARY) βΌ STEP 2 (PRIMARY)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LLM Fact-Checker β
β Primary model β Fallback 1 β Fallback 2 β
β Output: REAL / FAKE / UNVERIFIED + confidence + reasoning β
βββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β (only if ALL Gemini models fail)
βΌ STEP 3 (FALLBACK)
ββββββββββββββββββββββββββ
β Fine-tuned BERT β
β PyTorch + HF ~95% acc β
ββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ
β MongoDB Atlas (Motor async) β
β users collection Β· predictions collection β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Hybrid Model Architecture (Mermaid)
flowchart TD
A[Input Text] --> B[Tokenizer<br/>bert-base-uncased]
B --> C[input_ids, attention_mask]
C --> D[BERT Encoder<br/>Hidden Size: 768]
D --> E[Dropout]
E --> F[BiLSTM<br/>2 layers, hidden=256, bidirectional]
F --> G[LayerNorm<br/>Output dim: 512]
G --> H[Multi-Head Self-Attention<br/>8 heads]
H --> I[Global Max Pooling<br/>across sequence]
I --> J[MLP Classifier]
J --> J1[Linear 512->256 + ReLU + Dropout]
J1 --> J2[Linear 256->128 + ReLU + Dropout]
J2 --> J3[Linear 128->2]
J3 --> K[Logits: Real vs Fake]
K --> L[Softmax / Argmax Prediction]
subgraph Training
M[CrossEntropyLoss<br/>class weights + label smoothing]
N[AdamW + LR Scheduler<br/>Warmup + Weight Decay]
O[Early Stopping<br/>monitor val F1]
end
J3 --> M
M --> N
N --> O
π Project Structure
FinalYearProject/
βββ app/
β βββ main.py # FastAPI app, CORS, rate limiter, logging middleware
β βββ auth.py # JWT token logic, bcrypt helpers
β βββ database.py # Motor async MongoDB client
β βββ limiter.py # Shared slowapi Limiter instance
β βββ api/
β β βββ routes.py # Prediction endpoints (/api/predict, /api/batch-predict, /api/image-predict)
β β βββ auth_routes.py # Auth endpoints (/api/auth/*)
β βββ models/
β β βββ bert_model.py # BERT inference wrapper (fallback only)
β βββ schemas/
β β βββ prediction.py # Pydantic request/response models
β β βββ auth.py # User & token schemas
β βββ utils/
β βββ ai_verification.py # LLM fact-checker (primary classifier)
β βββ news_validator.py # Multi-source news validation + RSS parser
β βββ image_ocr.py # Mistral OCR β image upload + text extraction
β βββ logger.py # RotatingFileHandler logger factory
βββ enhanced_bert_liar_model/ # BERT fine-tuned on LIAR dataset (fallback)
βββ enhanced_bert_welfake_model/ # BERT fine-tuned on WELFake dataset (fallback)
βββ frontend/
β βββ src/
β βββ App.jsx
β βββ api/index.js
β βββ context/AuthContext.jsx
β βββ motion/ # GSAP + Framer Motion helpers
β βββ pages/ # Home, Dashboard, Login, Register
βββ logs/ # Auto-created β rotating app.log
βββ Data/WELFake_Dataset.csv
βββ Notebook/
β βββ bert_finetune_notebook.ipynb
β βββ wel-fakebert-finetune-notebook.ipynb
βββ run_api.py
βββ pyproject.toml
βββ README.md
π Production Deployment
Browser
ββββΆ Vercel (React/Vite frontend)
βββ VITE_API_URL βββΆ Hugging Face Spaces (FastAPI + BERT + LLM)
βββ MONGODB_URL βββΆ MongoDB Atlas
| Layer | Platform | Plan |
|---|---|---|
| Frontend | Vercel | Free |
| Backend | Hugging Face Spaces | CPU Basic (Free) |
| Database | MongoDB Atlas | M0 Free |
Deploy your own copy
Backend (HF Spaces)
- Fork this repo and create a new Space (SDK: Docker)
- Copy
app/,enhanced_bert_*/,run_api.py,Dockerfile.huggingface(rename toDockerfile) - Add secrets in Space Settings:
| Secret | Description |
|---|---|
MONGODB_URL |
MongoDB Atlas connection string |
SECRET_KEY |
JWT signing secret |
AI_API_KEY |
LLM API key for the primary fact-checker |
MISTRAL_API_KEY |
Mistral API key (for image OCR) |
ALLOWED_ORIGINS |
Comma-separated frontend URLs |
Frontend (Vercel)
- Import your GitHub repo β set Root Directory to
frontend - Add env var:
VITE_API_URL=https://YOUR_HF_USER-your-space.hf.space/api
π» Local Development
Prerequisites
- Python 3.11+, Node.js 18+
- UV package manager
- MongoDB Atlas account
- LLM API key (for the primary fact-checker)
- Mistral API key (free at mistral.ai) β for image OCR
1. Install Backend
git clone <your-repo-url>
cd FinalYearProject
pip install uv
uv sync
2. Configure Environment
Create .env in the project root:
# MongoDB Atlas
MONGODB_URL=mongodb+srv://username:password@cluster.mongodb.net/?retryWrites=true&w=majority
DATABASE_NAME=fake_news_detector
# JWT
SECRET_KEY=your-super-secret-jwt-key-change-in-production
ACCESS_TOKEN_EXPIRE_MINUTES=1440
# LLM API key (primary fact-checker)
AI_API_KEY=your_api_key_here
# Mistral AI (image OCR)
MISTRAL_API_KEY=your_mistral_api_key_here
# News Validation (optional β Google News RSS is free)
NEWSAPI_KEY=your_newsapi_key
SERPAPI_KEY=your_serpapi_key
# CORS
ALLOWED_ORIGINS=http://localhost:5173,http://localhost:3000
3. Start the Backend
python run_api.py
- API: http://localhost:8000
- Swagger: http://localhost:8000/docs
4. Start the Frontend
cd frontend
npm install
npm run dev
Frontend: http://localhost:5173
π API Reference
Authentication
| Method | Endpoint | Rate Limit | Description |
|---|---|---|---|
POST |
/api/auth/register |
3/min | Create a new user account |
POST |
/api/auth/login |
5/min | Login and receive a JWT token |
GET |
/api/auth/me |
β | Get current authenticated user |
GET |
/api/auth/history |
β | Retrieve prediction history |
GET |
/api/auth/stats |
β | Get total/real/fake counts |
POST |
/api/auth/logout |
β | Logout |
Predictions (JWT required)
| Method | Endpoint | Rate Limit | Description |
|---|---|---|---|
POST |
/api/predict |
30/min | Analyse a single news headline |
POST |
/api/batch-predict |
5/min | Analyse up to 10 texts in one call |
POST |
/api/image-predict |
10/min | OCR + analyse a news screenshot |
POST |
/api/extract-image-text |
10/min | OCR only (no prediction) |
Example β Single Prediction
Request:
curl -X POST http://localhost:8000/api/predict \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"title": "Scientists discover new planet in solar system"}'
Response:
{
"text": "Scientists discover new planet in solar system",
"prediction": "unverified",
"confidence": 0.62,
"probabilities": { "real": 0.62, "fake": 0.38 },
"is_fake": false,
"prediction_source": "llm_primary",
"context_articles_used": 2,
"news_insight": "βΉοΈ Limited related news coverage found."
}
π§ Technology Stack
Backend
| Library | Purpose |
|---|---|
| FastAPI | Async REST API framework |
| Uvicorn | ASGI server |
| google-genai | LLM SDK β primary fact-checker |
| mistralai | Mistral OCR β image text extraction |
| slowapi | Per-IP API rate limiting |
| PyTorch | BERT model inference (fallback) |
| Transformers (HuggingFace) | Tokeniser + BERT model architecture |
| Motor | Async MongoDB driver |
| python-jose | JWT token generation & validation |
| passlib[bcrypt] | Password hashing |
| requests + beautifulsoup4 | News RSS scraping |
| newsapi-python | NewsAPI client |
| serpapi | SerpAPI client |
Frontend
| Library | Purpose |
|---|---|
| React 18 | UI component library |
| Vite | Build tool & dev server |
| TailwindCSS 3 | Utility-first styling |
| GSAP + ScrollTrigger | Scroll-driven animations |
| Framer Motion | Page transition system |
| Recharts | Pie chart visualisation |
| Axios | HTTP client with interceptors |
π€ Classification Details
LLM Fact-Checker (Primary)
| Property | Value |
|---|---|
| Input | User claim + live news articles (headline, summary, date, URL) |
| Output labels | REAL / FAKE / UNVERIFIED |
| Fallback chain | Multiple model tiers tried automatically on quota errors |
| Context | Receives live Google News articles before deciding |
UNVERIFIED is returned when the LLM cannot confirm or deny the claim from available evidence (e.g. very recent events not yet widely reported). It maps to is_fake: false with capped confidence (β€ 68%).
FAKE is only returned when retrieved articles directly contradict the specific factual assertion β not merely because the claim is surprising or uses dramatic language.
BERT (Fallback)
| Property | Value |
|---|---|
| Architecture | BERT (bert-base-uncased) |
| Training | LIAR dataset (binarised) |
| Max token length | 512 |
| Accuracy | ~95% |
| When used | Only when all Gemini models fail |
π Security
- JWT tokens with configurable expiry (default 24 hours)
- Bcrypt password hashing
- Per-IP rate limiting on all public endpoints
- CORS middleware (configurable via
ALLOWED_ORIGINS) - Pydantic input validation on all endpoints
- Environment-variable-driven secrets
π§ Environment Variables Reference
| Variable | Required | Description |
|---|---|---|
MONGODB_URL |
β | MongoDB Atlas connection string |
DATABASE_NAME |
β | Target database name |
SECRET_KEY |
β | Secret used to sign JWT tokens |
AI_API_KEY |
β | LLM API key (primary fact-checker) |
MISTRAL_API_KEY |
β | Mistral API key (image OCR) |
ACCESS_TOKEN_EXPIRE_MINUTES |
β | Token TTL (default: 1440) |
NEWSAPI_KEY |
β | NewsAPI key |
SERPAPI_KEY |
β | SerpAPI key |
ALLOWED_ORIGINS |
β | Comma-separated CORS origins |
ENABLE_AI_CHECK |
β | Set false to force BERT-only mode |
π Datasets
LIAR Dataset
| Property | Detail |
|---|---|
| Source | W. Wang, 2017 β UCSB |
| Size | ~12,800 labelled statements |
| Labels | 6-class β binarised to fake / real |
| Domain | Political statements (PolitiFact) |
| License | Public domain |
WELFake Dataset
| Property | Detail |
|---|---|
| Source | Verma et al., 2021 |
| Size | 72,134 articles (35,028 fake Β· 37,106 real) |
| Domain | Mixed: Kaggle, Reuters, BuzzFeed |
| License | CC BY 4.0 |
π§ͺ Training Notebooks
| Notebook | Description |
|---|---|
Notebook/bert_finetune_notebook.ipynb |
BERT fine-tuning on LIAR dataset |
Notebook/wel-fakebert-finetune-notebook.ipynb |
BERT fine-tuning on WELFake dataset |
π€ Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/my-feature - Commit:
git commit -m "feat: add my feature" - Push:
git push origin feature/my-feature - Open a Pull Request
π License
MIT License
π Acknowledgements
- LIAR Dataset β W. Wang, 2017
- WELFake Dataset β Verma et al., 2021
- Hugging Face Transformers β BERT tokeniser and model utilities
- Primary LLM fact-checker β contextual claim verification against live news
- Mistral AI β Image OCR
π‘οΈ Built to fight misinformation β TruthLens