--- title: PhilVerify API emoji: 🔍 colorFrom: red colorTo: blue sdk: docker app_port: 7860 pinned: false ---

Multimodal fake news detection for Philippine social media.

--- ## ✨ Features - **🎤 Multimodal Detection** — Verify raw text, news URLs, images (Tesseract OCR), and video/audio (Whisper ASR) - **🇵🇭 Language-Aware** — Seamlessly handles Tagalog, English, and Taglish content - **🧠 Advanced NLP Pipeline** — Real-time entity recognition, sentiment/emotion analysis, and clickbait detection - **⚖️ Two-Layer Scoring** — Combines ML classification (TF-IDF/RoBERTa) with NewsAPI evidence retrieval - **🛡️ PH-Domain Verification** — Integrated database of Philippine news domain credibility tiers --- ## 🚀 Quick Start ### Prerequisites 1. **Python 3.12+** 2. **Tesseract OCR** (`brew install tesseract`) 3. **Node.js** (for frontend development) ### Installation ```bash # Clone the repository git clone https://github.com/SemiAutomat1c/philverify.git cd philverify # Set up Backend python3 -m venv venv source venv/bin/activate pip install -r requirements.txt # Set up Frontend cd frontend npm install ``` ### Run ```bash # Backend (from project root) uvicorn main:app --reload --port 8000 # Frontend cd frontend npm run dev ``` --- ## 🛠️ Tech Stack | Component | Technology | |-----------|------------| | **Core Backend** | Python 3.12, FastAPI, Pydantic v2 | | **NLP Engine** | spaCy, HuggingFace Transformers, langdetect | | **ML Classification** | scikit-learn (TF-IDF + LogReg), XLM-RoBERTa | | **OCR / ASR** | Tesseract (PH+EN support), OpenAI Whisper | | **Frontend** | React, TailwindCSS, Chart.js, Vite | --- ## 📁 Project Structure ``` PhilVerify/ ├── main.py # FastAPI app entry point ├── config.py # Settings (pydantic-settings) ├── requirements.txt ├── .env.example ├── domain_credibility.json # PH domain tier database │ ├── api/ │ ├── schemas.py # Pydantic request/response models │ └── routes/ │ ├── verify.py # POST /verify/text|url|image|video │ ├── history.py # GET /history │ └── trends.py # GET /trends │ ├── nlp/ # NLP preprocessing pipeline │ ├── preprocessor.py # Clean, tokenize, remove stopwords (EN+TL) │ ├── language_detector.py # Tagalog / English / Taglish detection │ ├── ner.py # Named entity recognition + PH entity hints │ ├── sentiment.py # Sentiment + emotion analysis │ ├── clickbait.py # Clickbait pattern detection │ └── claim_extractor.py # Extract falsifiable claim for evidence search │ ├── ml/ │ └── tfidf_classifier.py # Layer 1 — TF-IDF baseline classifier │ ├── evidence/ │ └── news_fetcher.py # Layer 2 — NewsAPI + cosine similarity │ ├── scoring/ │ └── engine.py # Orchestrates full pipeline + final score │ ├── inputs/ │ ├── url_scraper.py # BeautifulSoup article extractor │ ├── ocr.py # Tesseract OCR │ └── asr.py # Whisper ASR │ └── tests/ └── test_philverify.py # 23 unit + integration tests ``` --- ## 📅 Roadmap - [x] Phase 1 — FastAPI backend skeleton - [x] Phase 2 — NLP preprocessing pipeline - [x] Phase 3 — TF-IDF baseline classifier - [/] Phase 4 — NewsAPI evidence retrieval - [ ] Phase 5 — Scoring engine refinement (stance detection) - [ ] Phase 6 — React web dashboard - [ ] Phase 7 — Chrome Extension (Manifest V3) - [ ] Phase 8 — Fine-tune XLM-RoBERTa / TLUnified-RoBERTa --- ## 🤝 Contributing Contributions welcome! Please feel free to submit a Pull Request. ---

⚠️ Disclaimer
This tool is meant for research and educational purposes. Use responsibly and ethically when verifying information on social media.

## 📝 License MIT