philverify-api / README.md
Ryan Christian D. Deniega
docs: update README with current deployment, features, and project structure
70fdb2e
|
raw
history blame
7.17 kB
metadata
title: PhilVerify API
emoji: πŸ”
colorFrom: red
colorTo: blue
sdk: docker
app_port: 7860
pinned: false

PhilVerify Logo

Multimodal fake news detection for Philippine social media.

Project Status Python FastAPI React License

🌐 Live Demo  β€’  πŸ“– API Docs


✨ Features

  • 🎀 Multimodal Detection β€” Verify raw text, news URLs, images, and video/audio
  • πŸ–ΌοΈ Image OCR β€” Extract and analyze text from screenshots and images (Tesseract fil+eng)
  • 🎬 Video Frame OCR β€” Extract on-screen text from video frames alongside Whisper speech transcription
  • πŸ”Š Speech Transcription β€” Transcribe audio/video content using OpenAI Whisper
  • πŸ‡΅πŸ‡­ Language-Aware β€” Seamlessly handles Tagalog, English, and Taglish content
  • 🧠 Advanced NLP Pipeline β€” Real-time entity recognition, sentiment/emotion analysis, and clickbait detection
  • βš–οΈ Two-Layer Scoring β€” Combines ML classification (TF-IDF) with NewsAPI evidence retrieval
  • πŸ›‘οΈ PH-Domain Verification β€” Integrated database of Philippine news domain credibility tiers

πŸš€ Deployment

Service Platform URL
Frontend Firebase Hosting https://philverify.web.app
Backend API Hugging Face Spaces (Docker) https://semiautomat1c-philverify-api.hf.space
API Docs Swagger UI (auto-generated) https://semiautomat1c-philverify-api.hf.space/docs

πŸ–₯️ Local Development

Prerequisites

  1. Python 3.12+
  2. Tesseract OCR β€” brew install tesseract tesseract-lang
  3. ffmpeg β€” brew install ffmpeg (required for video frame extraction)
  4. Node.js 18+ (for frontend)

Installation

# Clone the repository
git clone https://github.com/SemiAutomat1c/philverify.git
cd philverify

# Set up backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Set up frontend
cd frontend
npm install

Run

# Backend (from project root, with venv active)
uvicorn main:app --reload --port 8000

# Frontend (in a separate terminal)
cd frontend
npm run dev

The frontend dev server proxies /api requests to http://localhost:8000 automatically.

Environment Variables

Copy .env.example to .env and fill in your keys:

NEWS_API_KEY=your_newsapi_key
FIREBASE_PROJECT_ID=your_project_id

For frontend production builds, set VITE_API_BASE_URL in frontend/.env.production:

VITE_API_BASE_URL=https://your-hf-space.hf.space/api

πŸ› οΈ Tech Stack

Component Technology
Core Backend Python 3.12, FastAPI, Pydantic v2
NLP Engine spaCy, HuggingFace Transformers, langdetect
ML Classification scikit-learn (TF-IDF + Logistic Regression)
OCR Tesseract (fil+eng), pytesseract, Pillow
ASR OpenAI Whisper (base model)
Video Processing ffmpeg (frame extraction), asyncio parallel pipeline
Frontend React 18, TailwindCSS, Chart.js, Vite 7
Backend Hosting Hugging Face Spaces (Docker SDK, port 7860)
Frontend Hosting Firebase Hosting

πŸ“ Project Structure

PhilVerify/
β”œβ”€β”€ main.py                  # FastAPI app entry point + health endpoints
β”œβ”€β”€ config.py                # Settings (pydantic-settings)
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ Dockerfile               # Docker image for HF Spaces (port 7860)
β”œβ”€β”€ domain_credibility.json  # PH news domain credibility tier database
β”‚
β”œβ”€β”€ api/
β”‚   β”œβ”€β”€ schemas.py           # Pydantic request/response models
β”‚   └── routes/
β”‚       β”œβ”€β”€ verify.py        # POST /api/verify β€” handles text/url/image/video
β”‚       β”œβ”€β”€ history.py       # GET /api/history
β”‚       └── trends.py        # GET /api/trends
β”‚
β”œβ”€β”€ nlp/                     # NLP preprocessing pipeline
β”‚   β”œβ”€β”€ preprocessor.py      # Clean, tokenize, remove stopwords (EN+TL)
β”‚   β”œβ”€β”€ language_detector.py # Tagalog / English / Taglish detection
β”‚   β”œβ”€β”€ ner.py               # Named entity recognition + PH entity hints
β”‚   β”œβ”€β”€ sentiment.py         # Sentiment + emotion analysis
β”‚   β”œβ”€β”€ clickbait.py         # Clickbait pattern detection
β”‚   └── claim_extractor.py   # Extract falsifiable claim for evidence search
β”‚
β”œβ”€β”€ ml/
β”‚   └── tfidf_classifier.py  # Layer 1 β€” TF-IDF baseline classifier
β”‚
β”œβ”€β”€ evidence/
β”‚   └── news_fetcher.py      # Layer 2 β€” NewsAPI + cosine similarity
β”‚
β”œβ”€β”€ scoring/
β”‚   └── engine.py            # Orchestrates full pipeline + final score
β”‚
β”œβ”€β”€ inputs/
β”‚   β”œβ”€β”€ url_scraper.py       # BeautifulSoup article extractor
β”‚   β”œβ”€β”€ ocr.py               # Tesseract OCR for images
β”‚   β”œβ”€β”€ asr.py               # Whisper ASR + combined video transcription
β”‚   └── video_ocr.py         # ffmpeg frame extraction + Tesseract OCR for video
β”‚
β”œβ”€β”€ frontend/                # React + Vite frontend
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ pages/
β”‚   β”‚   β”‚   └── VerifyPage.jsx   # Main fact-check UI (tabs, results, chips)
β”‚   β”‚   └── api.js               # API client (supports VITE_API_BASE_URL)
β”‚   └── .env.production          # Production API base URL
β”‚
└── tests/
    └── test_philverify.py   # Unit + integration tests

πŸ“… Roadmap

  • Phase 1 β€” FastAPI backend skeleton
  • Phase 2 β€” NLP preprocessing pipeline
  • Phase 3 β€” TF-IDF baseline classifier
  • Phase 4 β€” NewsAPI evidence retrieval
  • Phase 5 β€” React web dashboard with multimodal input
  • Phase 6 β€” Deploy to Hugging Face Spaces (backend) + Firebase (frontend)
  • Phase 7 β€” Video frame OCR (ffmpeg + Tesseract alongside Whisper ASR)
  • Phase 8 β€” Scoring engine refinement (stance detection)
  • Phase 9 β€” Chrome Extension (Manifest V3)
  • Phase 10 β€” Fine-tune XLM-RoBERTa / TLUnified-RoBERTa

🀝 Contributing

Contributions welcome! Please feel free to submit a Pull Request.


⚠️ Disclaimer
This tool is meant for research and educational purposes. Use responsibly and ethically when verifying information on social media.

πŸ“ License

MIT