Spaces:
Running
Running
File size: 4,798 Bytes
a98615e cadb6ae a98615e f934e95 9724119 f934e95 3fac962 f934e95 9724119 f934e95 9724119 f934e95 9724119 f934e95 9724119 f934e95 9724119 f934e95 9724119 f934e95 9724119 f934e95 9724119 f934e95 9724119 f934e95 9724119 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 | ---
title: PhilVerify API
emoji: π
colorFrom: red
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
---
<p align="center">
<img src="frontend/public/logo.svg" alt="PhilVerify Logo" width="150">
</p>
<p align="center">
<em>Multimodal fake news detection for Philippine social media.</em>
</p>
<p align="center">
<img src="https://img.shields.io/badge/Machine_Learning_2-Final_Project-blue?style=flat-square" alt="Project Status">
<img src="https://img.shields.io/badge/Python-3.12-blue?style=flat-square&logo=python" alt="Python">
<img src="https://img.shields.io/badge/FastAPI-0.115-009688?style=flat-square&logo=fastapi" alt="FastAPI">
<img src="https://img.shields.io/badge/React-18-61DAFB?style=flat-square&logo=react" alt="React">
<img src="https://img.shields.io/badge/License-MIT-yellow?style=flat-square" alt="License">
</p>
---
## β¨ Features
- **π€ Multimodal Detection** β Verify raw text, news URLs, images (Tesseract OCR), and video/audio (Whisper ASR)
- **π΅π Language-Aware** β Seamlessly handles Tagalog, English, and Taglish content
- **π§ Advanced NLP Pipeline** β Real-time entity recognition, sentiment/emotion analysis, and clickbait detection
- **βοΈ Two-Layer Scoring** β Combines ML classification (TF-IDF/RoBERTa) with NewsAPI evidence retrieval
- **π‘οΈ PH-Domain Verification** β Integrated database of Philippine news domain credibility tiers
---
## π Quick Start
### Prerequisites
1. **Python 3.12+**
2. **Tesseract OCR** (`brew install tesseract`)
3. **Node.js** (for frontend development)
### Installation
```bash
# Clone the repository
git clone https://github.com/SemiAutomat1c/philverify.git
cd philverify
# Set up Backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Set up Frontend
cd frontend
npm install
```
### Run
```bash
# Backend (from project root)
uvicorn main:app --reload --port 8000
# Frontend
cd frontend
npm run dev
```
---
## π οΈ Tech Stack
| Component | Technology |
|-----------|------------|
| **Core Backend** | Python 3.12, FastAPI, Pydantic v2 |
| **NLP Engine** | spaCy, HuggingFace Transformers, langdetect |
| **ML Classification** | scikit-learn (TF-IDF + LogReg), XLM-RoBERTa |
| **OCR / ASR** | Tesseract (PH+EN support), OpenAI Whisper |
| **Frontend** | React, TailwindCSS, Chart.js, Vite |
---
## π Project Structure
```
PhilVerify/
βββ main.py # FastAPI app entry point
βββ config.py # Settings (pydantic-settings)
βββ requirements.txt
βββ .env.example
βββ domain_credibility.json # PH domain tier database
β
βββ api/
β βββ schemas.py # Pydantic request/response models
β βββ routes/
β βββ verify.py # POST /verify/text|url|image|video
β βββ history.py # GET /history
β βββ trends.py # GET /trends
β
βββ nlp/ # NLP preprocessing pipeline
β βββ preprocessor.py # Clean, tokenize, remove stopwords (EN+TL)
β βββ language_detector.py # Tagalog / English / Taglish detection
β βββ ner.py # Named entity recognition + PH entity hints
β βββ sentiment.py # Sentiment + emotion analysis
β βββ clickbait.py # Clickbait pattern detection
β βββ claim_extractor.py # Extract falsifiable claim for evidence search
β
βββ ml/
β βββ tfidf_classifier.py # Layer 1 β TF-IDF baseline classifier
β
βββ evidence/
β βββ news_fetcher.py # Layer 2 β NewsAPI + cosine similarity
β
βββ scoring/
β βββ engine.py # Orchestrates full pipeline + final score
β
βββ inputs/
β βββ url_scraper.py # BeautifulSoup article extractor
β βββ ocr.py # Tesseract OCR
β βββ asr.py # Whisper ASR
β
βββ tests/
βββ test_philverify.py # 23 unit + integration tests
```
---
## π
Roadmap
- [x] Phase 1 β FastAPI backend skeleton
- [x] Phase 2 β NLP preprocessing pipeline
- [x] Phase 3 β TF-IDF baseline classifier
- [/] Phase 4 β NewsAPI evidence retrieval
- [ ] Phase 5 β Scoring engine refinement (stance detection)
- [ ] Phase 6 β React web dashboard
- [ ] Phase 7 β Chrome Extension (Manifest V3)
- [ ] Phase 8 β Fine-tune XLM-RoBERTa / TLUnified-RoBERTa
---
## π€ Contributing
Contributions welcome! Please feel free to submit a Pull Request.
---
<p align="center">
<strong>β οΈ Disclaimer</strong><br>
<em>This tool is meant for research and educational purposes. Use responsibly and ethically when verifying information on social media.</em>
</p>
## π License
MIT
|