Spaces:
Running
Running
metadata
title: PhilVerify API
emoji: π
colorFrom: red
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
Multimodal fake news detection for Philippine social media.
β¨ Features
- π€ Multimodal Detection β Verify raw text, news URLs, images (Tesseract OCR), and video/audio (Whisper ASR)
- π΅π Language-Aware β Seamlessly handles Tagalog, English, and Taglish content
- π§ Advanced NLP Pipeline β Real-time entity recognition, sentiment/emotion analysis, and clickbait detection
- βοΈ Two-Layer Scoring β Combines ML classification (TF-IDF/RoBERTa) with NewsAPI evidence retrieval
- π‘οΈ PH-Domain Verification β Integrated database of Philippine news domain credibility tiers
π Quick Start
Prerequisites
- Python 3.12+
- Tesseract OCR (
brew install tesseract) - Node.js (for frontend development)
Installation
# Clone the repository
git clone https://github.com/SemiAutomat1c/philverify.git
cd philverify
# Set up Backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Set up Frontend
cd frontend
npm install
Run
# Backend (from project root)
uvicorn main:app --reload --port 8000
# Frontend
cd frontend
npm run dev
π οΈ Tech Stack
| Component | Technology |
|---|---|
| Core Backend | Python 3.12, FastAPI, Pydantic v2 |
| NLP Engine | spaCy, HuggingFace Transformers, langdetect |
| ML Classification | scikit-learn (TF-IDF + LogReg), XLM-RoBERTa |
| OCR / ASR | Tesseract (PH+EN support), OpenAI Whisper |
| Frontend | React, TailwindCSS, Chart.js, Vite |
π Project Structure
PhilVerify/
βββ main.py # FastAPI app entry point
βββ config.py # Settings (pydantic-settings)
βββ requirements.txt
βββ .env.example
βββ domain_credibility.json # PH domain tier database
β
βββ api/
β βββ schemas.py # Pydantic request/response models
β βββ routes/
β βββ verify.py # POST /verify/text|url|image|video
β βββ history.py # GET /history
β βββ trends.py # GET /trends
β
βββ nlp/ # NLP preprocessing pipeline
β βββ preprocessor.py # Clean, tokenize, remove stopwords (EN+TL)
β βββ language_detector.py # Tagalog / English / Taglish detection
β βββ ner.py # Named entity recognition + PH entity hints
β βββ sentiment.py # Sentiment + emotion analysis
β βββ clickbait.py # Clickbait pattern detection
β βββ claim_extractor.py # Extract falsifiable claim for evidence search
β
βββ ml/
β βββ tfidf_classifier.py # Layer 1 β TF-IDF baseline classifier
β
βββ evidence/
β βββ news_fetcher.py # Layer 2 β NewsAPI + cosine similarity
β
βββ scoring/
β βββ engine.py # Orchestrates full pipeline + final score
β
βββ inputs/
β βββ url_scraper.py # BeautifulSoup article extractor
β βββ ocr.py # Tesseract OCR
β βββ asr.py # Whisper ASR
β
βββ tests/
βββ test_philverify.py # 23 unit + integration tests
π Roadmap
- Phase 1 β FastAPI backend skeleton
- Phase 2 β NLP preprocessing pipeline
- Phase 3 β TF-IDF baseline classifier
- [/] Phase 4 β NewsAPI evidence retrieval
- Phase 5 β Scoring engine refinement (stance detection)
- Phase 6 β React web dashboard
- Phase 7 β Chrome Extension (Manifest V3)
- Phase 8 β Fine-tune XLM-RoBERTa / TLUnified-RoBERTa
π€ Contributing
Contributions welcome! Please feel free to submit a Pull Request.
β οΈ Disclaimer
This tool is meant for research and educational purposes. Use responsibly and ethically when verifying information on social media.
π License
MIT