philverify-api / README.md
Ryan Christian D. Deniega
fix: use valid emoji for HF Spaces metadata
cadb6ae
---
title: PhilVerify API
emoji: πŸ”
colorFrom: red
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
---
<p align="center">
<img src="frontend/public/logo.svg" alt="PhilVerify Logo" width="150">
</p>
<p align="center">
<em>Multimodal fake news detection for Philippine social media.</em>
</p>
<p align="center">
<img src="https://img.shields.io/badge/Machine_Learning_2-Final_Project-blue?style=flat-square" alt="Project Status">
<img src="https://img.shields.io/badge/Python-3.12-blue?style=flat-square&logo=python" alt="Python">
<img src="https://img.shields.io/badge/FastAPI-0.115-009688?style=flat-square&logo=fastapi" alt="FastAPI">
<img src="https://img.shields.io/badge/React-18-61DAFB?style=flat-square&logo=react" alt="React">
<img src="https://img.shields.io/badge/License-MIT-yellow?style=flat-square" alt="License">
</p>
---
## ✨ Features
- **🎀 Multimodal Detection** β€” Verify raw text, news URLs, images (Tesseract OCR), and video/audio (Whisper ASR)
- **πŸ‡΅πŸ‡­ Language-Aware** β€” Seamlessly handles Tagalog, English, and Taglish content
- **🧠 Advanced NLP Pipeline** β€” Real-time entity recognition, sentiment/emotion analysis, and clickbait detection
- **βš–οΈ Two-Layer Scoring** β€” Combines ML classification (TF-IDF/RoBERTa) with NewsAPI evidence retrieval
- **πŸ›‘οΈ PH-Domain Verification** β€” Integrated database of Philippine news domain credibility tiers
---
## πŸš€ Quick Start
### Prerequisites
1. **Python 3.12+**
2. **Tesseract OCR** (`brew install tesseract`)
3. **Node.js** (for frontend development)
### Installation
```bash
# Clone the repository
git clone https://github.com/SemiAutomat1c/philverify.git
cd philverify
# Set up Backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Set up Frontend
cd frontend
npm install
```
### Run
```bash
# Backend (from project root)
uvicorn main:app --reload --port 8000
# Frontend
cd frontend
npm run dev
```
---
## πŸ› οΈ Tech Stack
| Component | Technology |
|-----------|------------|
| **Core Backend** | Python 3.12, FastAPI, Pydantic v2 |
| **NLP Engine** | spaCy, HuggingFace Transformers, langdetect |
| **ML Classification** | scikit-learn (TF-IDF + LogReg), XLM-RoBERTa |
| **OCR / ASR** | Tesseract (PH+EN support), OpenAI Whisper |
| **Frontend** | React, TailwindCSS, Chart.js, Vite |
---
## πŸ“ Project Structure
```
PhilVerify/
β”œβ”€β”€ main.py # FastAPI app entry point
β”œβ”€β”€ config.py # Settings (pydantic-settings)
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
β”œβ”€β”€ domain_credibility.json # PH domain tier database
β”‚
β”œβ”€β”€ api/
β”‚ β”œβ”€β”€ schemas.py # Pydantic request/response models
β”‚ └── routes/
β”‚ β”œβ”€β”€ verify.py # POST /verify/text|url|image|video
β”‚ β”œβ”€β”€ history.py # GET /history
β”‚ └── trends.py # GET /trends
β”‚
β”œβ”€β”€ nlp/ # NLP preprocessing pipeline
β”‚ β”œβ”€β”€ preprocessor.py # Clean, tokenize, remove stopwords (EN+TL)
β”‚ β”œβ”€β”€ language_detector.py # Tagalog / English / Taglish detection
β”‚ β”œβ”€β”€ ner.py # Named entity recognition + PH entity hints
β”‚ β”œβ”€β”€ sentiment.py # Sentiment + emotion analysis
β”‚ β”œβ”€β”€ clickbait.py # Clickbait pattern detection
β”‚ └── claim_extractor.py # Extract falsifiable claim for evidence search
β”‚
β”œβ”€β”€ ml/
β”‚ └── tfidf_classifier.py # Layer 1 β€” TF-IDF baseline classifier
β”‚
β”œβ”€β”€ evidence/
β”‚ └── news_fetcher.py # Layer 2 β€” NewsAPI + cosine similarity
β”‚
β”œβ”€β”€ scoring/
β”‚ └── engine.py # Orchestrates full pipeline + final score
β”‚
β”œβ”€β”€ inputs/
β”‚ β”œβ”€β”€ url_scraper.py # BeautifulSoup article extractor
β”‚ β”œβ”€β”€ ocr.py # Tesseract OCR
β”‚ └── asr.py # Whisper ASR
β”‚
└── tests/
└── test_philverify.py # 23 unit + integration tests
```
---
## πŸ“… Roadmap
- [x] Phase 1 β€” FastAPI backend skeleton
- [x] Phase 2 β€” NLP preprocessing pipeline
- [x] Phase 3 β€” TF-IDF baseline classifier
- [/] Phase 4 β€” NewsAPI evidence retrieval
- [ ] Phase 5 β€” Scoring engine refinement (stance detection)
- [ ] Phase 6 β€” React web dashboard
- [ ] Phase 7 β€” Chrome Extension (Manifest V3)
- [ ] Phase 8 β€” Fine-tune XLM-RoBERTa / TLUnified-RoBERTa
---
## 🀝 Contributing
Contributions welcome! Please feel free to submit a Pull Request.
---
<p align="center">
<strong>⚠️ Disclaimer</strong><br>
<em>This tool is meant for research and educational purposes. Use responsibly and ethically when verifying information on social media.</em>
</p>
## πŸ“ License
MIT