Spaces:

SemiAutomat1c
/

philverify-api

Running

App Files Files Community

philverify-api / README.md

Ryan Christian D. Deniega

fix: use valid emoji for HF Spaces metadata

cadb6ae 13 days ago

preview code

raw

history blame contribute delete

4.8 kB

metadata

title: PhilVerify API
emoji: 🔍
colorFrom: red
colorTo: blue
sdk: docker
app_port: 7860
pinned: false

Multimodal fake news detection for Philippine social media.

✨ Features

🎤 Multimodal Detection — Verify raw text, news URLs, images (Tesseract OCR), and video/audio (Whisper ASR)
🇵🇭 Language-Aware — Seamlessly handles Tagalog, English, and Taglish content
🧠 Advanced NLP Pipeline — Real-time entity recognition, sentiment/emotion analysis, and clickbait detection
⚖️ Two-Layer Scoring — Combines ML classification (TF-IDF/RoBERTa) with NewsAPI evidence retrieval
🛡️ PH-Domain Verification — Integrated database of Philippine news domain credibility tiers

🚀 Quick Start

Prerequisites

Python 3.12+
Tesseract OCR (brew install tesseract)
Node.js (for frontend development)

Installation

# Clone the repository
git clone https://github.com/SemiAutomat1c/philverify.git
cd philverify

# Set up Backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Set up Frontend
cd frontend
npm install

Run

# Backend (from project root)
uvicorn main:app --reload --port 8000

# Frontend
cd frontend
npm run dev

🛠️ Tech Stack

Component	Technology
Core Backend	Python 3.12, FastAPI, Pydantic v2
NLP Engine	spaCy, HuggingFace Transformers, langdetect
ML Classification	scikit-learn (TF-IDF + LogReg), XLM-RoBERTa
OCR / ASR	Tesseract (PH+EN support), OpenAI Whisper
Frontend	React, TailwindCSS, Chart.js, Vite

📁 Project Structure

PhilVerify/
├── main.py                  # FastAPI app entry point
├── config.py                # Settings (pydantic-settings)
├── requirements.txt
├── .env.example
├── domain_credibility.json  # PH domain tier database
│
├── api/
│   ├── schemas.py           # Pydantic request/response models
│   └── routes/
│       ├── verify.py        # POST /verify/text|url|image|video
│       ├── history.py       # GET /history
│       └── trends.py        # GET /trends
│
├── nlp/                     # NLP preprocessing pipeline
│   ├── preprocessor.py      # Clean, tokenize, remove stopwords (EN+TL)
│   ├── language_detector.py # Tagalog / English / Taglish detection
│   ├── ner.py               # Named entity recognition + PH entity hints
│   ├── sentiment.py         # Sentiment + emotion analysis
│   ├── clickbait.py         # Clickbait pattern detection
│   └── claim_extractor.py   # Extract falsifiable claim for evidence search
│
├── ml/
│   └── tfidf_classifier.py  # Layer 1 — TF-IDF baseline classifier
│
├── evidence/
│   └── news_fetcher.py      # Layer 2 — NewsAPI + cosine similarity
│
├── scoring/
│   └── engine.py            # Orchestrates full pipeline + final score
│
├── inputs/
│   ├── url_scraper.py       # BeautifulSoup article extractor
│   ├── ocr.py               # Tesseract OCR
│   └── asr.py               # Whisper ASR
│
└── tests/
    └── test_philverify.py   # 23 unit + integration tests

📅 Roadmap

Phase 1 — FastAPI backend skeleton
Phase 2 — NLP preprocessing pipeline
Phase 3 — TF-IDF baseline classifier
[/] Phase 4 — NewsAPI evidence retrieval
Phase 5 — Scoring engine refinement (stance detection)
Phase 6 — React web dashboard
Phase 7 — Chrome Extension (Manifest V3)
Phase 8 — Fine-tune XLM-RoBERTa / TLUnified-RoBERTa

🤝 Contributing

Contributions welcome! Please feel free to submit a Pull Request.

⚠️ Disclaimer
This tool is meant for research and educational purposes. Use responsibly and ethically when verifying information on social media.

📝 License

MIT