philverify-api / README.md
Ryan Christian D. Deniega
fix: use valid emoji for HF Spaces metadata
cadb6ae
metadata
title: PhilVerify API
emoji: πŸ”
colorFrom: red
colorTo: blue
sdk: docker
app_port: 7860
pinned: false

PhilVerify Logo

Multimodal fake news detection for Philippine social media.

Project Status Python FastAPI React License


✨ Features

  • 🎀 Multimodal Detection β€” Verify raw text, news URLs, images (Tesseract OCR), and video/audio (Whisper ASR)
  • πŸ‡΅πŸ‡­ Language-Aware β€” Seamlessly handles Tagalog, English, and Taglish content
  • 🧠 Advanced NLP Pipeline β€” Real-time entity recognition, sentiment/emotion analysis, and clickbait detection
  • βš–οΈ Two-Layer Scoring β€” Combines ML classification (TF-IDF/RoBERTa) with NewsAPI evidence retrieval
  • πŸ›‘οΈ PH-Domain Verification β€” Integrated database of Philippine news domain credibility tiers

πŸš€ Quick Start

Prerequisites

  1. Python 3.12+
  2. Tesseract OCR (brew install tesseract)
  3. Node.js (for frontend development)

Installation

# Clone the repository
git clone https://github.com/SemiAutomat1c/philverify.git
cd philverify

# Set up Backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Set up Frontend
cd frontend
npm install

Run

# Backend (from project root)
uvicorn main:app --reload --port 8000

# Frontend
cd frontend
npm run dev

πŸ› οΈ Tech Stack

Component Technology
Core Backend Python 3.12, FastAPI, Pydantic v2
NLP Engine spaCy, HuggingFace Transformers, langdetect
ML Classification scikit-learn (TF-IDF + LogReg), XLM-RoBERTa
OCR / ASR Tesseract (PH+EN support), OpenAI Whisper
Frontend React, TailwindCSS, Chart.js, Vite

πŸ“ Project Structure

PhilVerify/
β”œβ”€β”€ main.py                  # FastAPI app entry point
β”œβ”€β”€ config.py                # Settings (pydantic-settings)
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
β”œβ”€β”€ domain_credibility.json  # PH domain tier database
β”‚
β”œβ”€β”€ api/
β”‚   β”œβ”€β”€ schemas.py           # Pydantic request/response models
β”‚   └── routes/
β”‚       β”œβ”€β”€ verify.py        # POST /verify/text|url|image|video
β”‚       β”œβ”€β”€ history.py       # GET /history
β”‚       └── trends.py        # GET /trends
β”‚
β”œβ”€β”€ nlp/                     # NLP preprocessing pipeline
β”‚   β”œβ”€β”€ preprocessor.py      # Clean, tokenize, remove stopwords (EN+TL)
β”‚   β”œβ”€β”€ language_detector.py # Tagalog / English / Taglish detection
β”‚   β”œβ”€β”€ ner.py               # Named entity recognition + PH entity hints
β”‚   β”œβ”€β”€ sentiment.py         # Sentiment + emotion analysis
β”‚   β”œβ”€β”€ clickbait.py         # Clickbait pattern detection
β”‚   └── claim_extractor.py   # Extract falsifiable claim for evidence search
β”‚
β”œβ”€β”€ ml/
β”‚   └── tfidf_classifier.py  # Layer 1 β€” TF-IDF baseline classifier
β”‚
β”œβ”€β”€ evidence/
β”‚   └── news_fetcher.py      # Layer 2 β€” NewsAPI + cosine similarity
β”‚
β”œβ”€β”€ scoring/
β”‚   └── engine.py            # Orchestrates full pipeline + final score
β”‚
β”œβ”€β”€ inputs/
β”‚   β”œβ”€β”€ url_scraper.py       # BeautifulSoup article extractor
β”‚   β”œβ”€β”€ ocr.py               # Tesseract OCR
β”‚   └── asr.py               # Whisper ASR
β”‚
└── tests/
    └── test_philverify.py   # 23 unit + integration tests

πŸ“… Roadmap

  • Phase 1 β€” FastAPI backend skeleton
  • Phase 2 β€” NLP preprocessing pipeline
  • Phase 3 β€” TF-IDF baseline classifier
  • [/] Phase 4 β€” NewsAPI evidence retrieval
  • Phase 5 β€” Scoring engine refinement (stance detection)
  • Phase 6 β€” React web dashboard
  • Phase 7 β€” Chrome Extension (Manifest V3)
  • Phase 8 β€” Fine-tune XLM-RoBERTa / TLUnified-RoBERTa

🀝 Contributing

Contributions welcome! Please feel free to submit a Pull Request.


⚠️ Disclaimer
This tool is meant for research and educational purposes. Use responsibly and ethically when verifying information on social media.

πŸ“ License

MIT