cd FinalYearProject
pip install uv
uv sync
```
### 2. Configure Environment
Create `.env` in the project root:
```env
# MongoDB Atlas
MONGODB_URL=mongodb+srv://username:password@cluster.mongodb.net/?retryWrites=true&w=majority
DATABASE_NAME=fake_news_detector
# JWT
SECRET_KEY=your-super-secret-jwt-key-change-in-production
ACCESS_TOKEN_EXPIRE_MINUTES=1440
# LLM API key (primary fact-checker)
AI_API_KEY=your_api_key_here
# Mistral AI (image OCR)
MISTRAL_API_KEY=your_mistral_api_key_here
# News Validation (optional — Google News RSS is free)
NEWSAPI_KEY=your_newsapi_key
SERPAPI_KEY=your_serpapi_key
# CORS
ALLOWED_ORIGINS=http://localhost:5173,http://localhost:3000
```
### 3. Start the Backend
```bash
python run_api.py
```
- API: **http://localhost:8000**
- Swagger: **http://localhost:8000/docs**
### 4. Start the Frontend
```bash
cd frontend
npm install
npm run dev
```
Frontend: **http://localhost:5173**
---
## 🔐 API Reference
### Authentication
| Method | Endpoint | Rate Limit | Description |
|--------|----------|------------|-------------|
| `POST` | `/api/auth/register` | 3/min | Create a new user account |
| `POST` | `/api/auth/login` | 5/min | Login and receive a JWT token |
| `GET` | `/api/auth/me` | — | Get current authenticated user |
| `GET` | `/api/auth/history` | — | Retrieve prediction history |
| `GET` | `/api/auth/stats` | — | Get total/real/fake counts |
| `POST` | `/api/auth/logout` | — | Logout |
### Predictions (JWT required)
| Method | Endpoint | Rate Limit | Description |
|--------|----------|------------|-------------|
| `POST` | `/api/predict` | 30/min | Analyse a single news headline |
| `POST` | `/api/batch-predict` | 5/min | Analyse up to 10 texts in one call |
| `POST` | `/api/image-predict` | 10/min | OCR + analyse a news screenshot |
| `POST` | `/api/extract-image-text` | 10/min | OCR only (no prediction) |
### Example — Single Prediction
**Request:**
```bash
curl -X POST http://localhost:8000/api/predict \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"title": "Scientists discover new planet in solar system"}'
```
**Response:**
```json
{
"text": "Scientists discover new planet in solar system",
"prediction": "unverified",
"confidence": 0.62,
"probabilities": { "real": 0.62, "fake": 0.38 },
"is_fake": false,
"prediction_source": "llm_primary",
"context_articles_used": 2,
"news_insight": "ℹ️ Limited related news coverage found."
}
```
---
## 🔧 Technology Stack
### Backend
| Library | Purpose |
|---------|---------|
| FastAPI | Async REST API framework |
| Uvicorn | ASGI server |
| **google-genai** | **LLM SDK — primary fact-checker** |
| **mistralai** | **Mistral OCR — image text extraction** |
| **slowapi** | **Per-IP API rate limiting** |
| PyTorch | BERT model inference (fallback) |
| Transformers (HuggingFace) | Tokeniser + BERT model architecture |
| Motor | Async MongoDB driver |
| python-jose | JWT token generation & validation |
| passlib[bcrypt] | Password hashing |
| requests + beautifulsoup4 | News RSS scraping |
| newsapi-python | NewsAPI client |
| serpapi | SerpAPI client |
### Frontend
| Library | Purpose |
|---------|---------|
| React 18 | UI component library |
| Vite | Build tool & dev server |
| TailwindCSS 3 | Utility-first styling |
| GSAP + ScrollTrigger | Scroll-driven animations |
| Framer Motion | Page transition system |
| Recharts | Pie chart visualisation |
| Axios | HTTP client with interceptors |
---
## 🤖 Classification Details
### LLM Fact-Checker (Primary)
| Property | Value |
|----------|-------|
| Input | User claim + live news articles (headline, summary, date, URL) |
| Output labels | `REAL` / `FAKE` / `UNVERIFIED` |
| Fallback chain | Multiple model tiers tried automatically on quota errors |
| Context | Receives live Google News articles before deciding |
**UNVERIFIED** is returned when the LLM cannot confirm or deny the claim from available evidence (e.g. very recent events not yet widely reported). It maps to `is_fake: false` with capped confidence (≤ 68%).
**FAKE** is only returned when retrieved articles **directly contradict** the specific factual assertion — not merely because the claim is surprising or uses dramatic language.
### BERT (Fallback)
| Property | Value |
|----------|-------|
| Architecture | BERT (bert-base-uncased) |
| Training | LIAR dataset (binarised) |
| Max token length | 512 |
| Accuracy | ~95% |
| When used | Only when all Gemini models fail |
---
## 🔒 Security
- JWT tokens with configurable expiry (default 24 hours)
- Bcrypt password hashing
- Per-IP rate limiting on all public endpoints
- CORS middleware (configurable via `ALLOWED_ORIGINS`)
- Pydantic input validation on all endpoints
- Environment-variable-driven secrets
---
## 🔧 Environment Variables Reference
| Variable | Required | Description |
|----------|----------|-------------|
| `MONGODB_URL` | ✅ | MongoDB Atlas connection string |
| `DATABASE_NAME` | ✅ | Target database name |
| `SECRET_KEY` | ✅ | Secret used to sign JWT tokens |
| `AI_API_KEY` | ✅ | LLM API key (primary fact-checker) |
| `MISTRAL_API_KEY` | ✅ | Mistral API key (image OCR) |
| `ACCESS_TOKEN_EXPIRE_MINUTES` | ❌ | Token TTL (default: 1440) |
| `NEWSAPI_KEY` | ❌ | NewsAPI key |
| `SERPAPI_KEY` | ❌ | SerpAPI key |
| `ALLOWED_ORIGINS` | ❌ | Comma-separated CORS origins |
| `ENABLE_AI_CHECK` | ❌ | Set `false` to force BERT-only mode |
---
## 📂 Datasets
### LIAR Dataset
| Property | Detail |
|----------|--------|
| **Source** | [W. Wang, 2017](https://aclanthology.org/P17-2067/) — UCSB |
| **Size** | ~12,800 labelled statements |
| **Labels** | 6-class → binarised to fake / real |
| **Domain** | Political statements (PolitiFact) |
| **License** | Public domain |
### WELFake Dataset
| Property | Detail |
|----------|--------|
| **Source** | [Verma et al., 2021](https://doi.org/10.1109/TVCG.2021.3071339) |
| **Size** | 72,134 articles (35,028 fake · 37,106 real) |
| **Domain** | Mixed: Kaggle, Reuters, BuzzFeed |
| **License** | CC BY 4.0 |
---
## 🧪 Training Notebooks
| Notebook | Description |
|----------|-------------|
| `Notebook/bert_finetune_notebook.ipynb` | BERT fine-tuning on LIAR dataset |
| `Notebook/wel-fakebert-finetune-notebook.ipynb` | BERT fine-tuning on WELFake dataset |
---
## 🤝 Contributing
1. Fork the repository
2. Create a feature branch: `git checkout -b feature/my-feature`
3. Commit: `git commit -m "feat: add my feature"`
4. Push: `git push origin feature/my-feature`
5. Open a Pull Request
---
## 📄 License
MIT License
---
## 🙏 Acknowledgements
- [LIAR Dataset](https://www.cs.ucsb.edu/~william/data/liar_dataset.zip) — W. Wang, 2017
- [WELFake Dataset](https://zenodo.org/record/4561253) — Verma et al., 2021
- [Hugging Face Transformers](https://huggingface.co/) — BERT tokeniser and model utilities
- Primary LLM fact-checker — contextual claim verification against live news
- [Mistral AI](https://mistral.ai/) — Image OCR
---
🛡️ Built to fight misinformation — TruthLens