Spaces:
Running
Running
| title: PhilVerify API | |
| emoji: π | |
| colorFrom: red | |
| colorTo: blue | |
| sdk: docker | |
| app_port: 7860 | |
| pinned: false | |
| <p align="center"> | |
| <img src="frontend/public/logo.svg" alt="PhilVerify Logo" width="150"> | |
| </p> | |
| <p align="center"> | |
| <em>Multimodal fake news detection for Philippine social media.</em> | |
| </p> | |
| <p align="center"> | |
| <img src="https://img.shields.io/badge/Machine_Learning_2-Final_Project-blue?style=flat-square" alt="Project Status"> | |
| <img src="https://img.shields.io/badge/Python-3.12-blue?style=flat-square&logo=python" alt="Python"> | |
| <img src="https://img.shields.io/badge/FastAPI-0.115-009688?style=flat-square&logo=fastapi" alt="FastAPI"> | |
| <img src="https://img.shields.io/badge/React-18-61DAFB?style=flat-square&logo=react" alt="React"> | |
| <img src="https://img.shields.io/badge/License-MIT-yellow?style=flat-square" alt="License"> | |
| </p> | |
| --- | |
| ## β¨ Features | |
| - **π€ Multimodal Detection** β Verify raw text, news URLs, images (Tesseract OCR), and video/audio (Whisper ASR) | |
| - **π΅π Language-Aware** β Seamlessly handles Tagalog, English, and Taglish content | |
| - **π§ Advanced NLP Pipeline** β Real-time entity recognition, sentiment/emotion analysis, and clickbait detection | |
| - **βοΈ Two-Layer Scoring** β Combines ML classification (TF-IDF/RoBERTa) with NewsAPI evidence retrieval | |
| - **π‘οΈ PH-Domain Verification** β Integrated database of Philippine news domain credibility tiers | |
| --- | |
| ## π Quick Start | |
| ### Prerequisites | |
| 1. **Python 3.12+** | |
| 2. **Tesseract OCR** (`brew install tesseract`) | |
| 3. **Node.js** (for frontend development) | |
| ### Installation | |
| ```bash | |
| # Clone the repository | |
| git clone https://github.com/SemiAutomat1c/philverify.git | |
| cd philverify | |
| # Set up Backend | |
| python3 -m venv venv | |
| source venv/bin/activate | |
| pip install -r requirements.txt | |
| # Set up Frontend | |
| cd frontend | |
| npm install | |
| ``` | |
| ### Run | |
| ```bash | |
| # Backend (from project root) | |
| uvicorn main:app --reload --port 8000 | |
| # Frontend | |
| cd frontend | |
| npm run dev | |
| ``` | |
| --- | |
| ## π οΈ Tech Stack | |
| | Component | Technology | | |
| |-----------|------------| | |
| | **Core Backend** | Python 3.12, FastAPI, Pydantic v2 | | |
| | **NLP Engine** | spaCy, HuggingFace Transformers, langdetect | | |
| | **ML Classification** | scikit-learn (TF-IDF + LogReg), XLM-RoBERTa | | |
| | **OCR / ASR** | Tesseract (PH+EN support), OpenAI Whisper | | |
| | **Frontend** | React, TailwindCSS, Chart.js, Vite | | |
| --- | |
| ## π Project Structure | |
| ``` | |
| PhilVerify/ | |
| βββ main.py # FastAPI app entry point | |
| βββ config.py # Settings (pydantic-settings) | |
| βββ requirements.txt | |
| βββ .env.example | |
| βββ domain_credibility.json # PH domain tier database | |
| β | |
| βββ api/ | |
| β βββ schemas.py # Pydantic request/response models | |
| β βββ routes/ | |
| β βββ verify.py # POST /verify/text|url|image|video | |
| β βββ history.py # GET /history | |
| β βββ trends.py # GET /trends | |
| β | |
| βββ nlp/ # NLP preprocessing pipeline | |
| β βββ preprocessor.py # Clean, tokenize, remove stopwords (EN+TL) | |
| β βββ language_detector.py # Tagalog / English / Taglish detection | |
| β βββ ner.py # Named entity recognition + PH entity hints | |
| β βββ sentiment.py # Sentiment + emotion analysis | |
| β βββ clickbait.py # Clickbait pattern detection | |
| β βββ claim_extractor.py # Extract falsifiable claim for evidence search | |
| β | |
| βββ ml/ | |
| β βββ tfidf_classifier.py # Layer 1 β TF-IDF baseline classifier | |
| β | |
| βββ evidence/ | |
| β βββ news_fetcher.py # Layer 2 β NewsAPI + cosine similarity | |
| β | |
| βββ scoring/ | |
| β βββ engine.py # Orchestrates full pipeline + final score | |
| β | |
| βββ inputs/ | |
| β βββ url_scraper.py # BeautifulSoup article extractor | |
| β βββ ocr.py # Tesseract OCR | |
| β βββ asr.py # Whisper ASR | |
| β | |
| βββ tests/ | |
| βββ test_philverify.py # 23 unit + integration tests | |
| ``` | |
| --- | |
| ## π Roadmap | |
| - [x] Phase 1 β FastAPI backend skeleton | |
| - [x] Phase 2 β NLP preprocessing pipeline | |
| - [x] Phase 3 β TF-IDF baseline classifier | |
| - [/] Phase 4 β NewsAPI evidence retrieval | |
| - [ ] Phase 5 β Scoring engine refinement (stance detection) | |
| - [ ] Phase 6 β React web dashboard | |
| - [ ] Phase 7 β Chrome Extension (Manifest V3) | |
| - [ ] Phase 8 β Fine-tune XLM-RoBERTa / TLUnified-RoBERTa | |
| --- | |
| ## π€ Contributing | |
| Contributions welcome! Please feel free to submit a Pull Request. | |
| --- | |
| <p align="center"> | |
| <strong>β οΈ Disclaimer</strong><br> | |
| <em>This tool is meant for research and educational purposes. Use responsibly and ethically when verifying information on social media.</em> | |
| </p> | |
| ## π License | |
| MIT | |