File size: 4,798 Bytes
a98615e
 
cadb6ae
a98615e
 
 
 
 
 
 
f934e95
 
 
 
 
 
 
 
 
 
 
 
 
9724119
f934e95
3fac962
f934e95
9724119
f934e95
 
 
 
 
9724119
 
 
f934e95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9724119
f934e95
 
 
 
9724119
 
 
f934e95
9724119
f934e95
 
 
 
 
 
 
9724119
 
 
f934e95
9724119
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f934e95
9724119
 
 
 
f934e95
9724119
 
 
 
 
 
 
f934e95
 
 
 
 
 
 
 
 
 
 
 
9724119
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
title: PhilVerify API
emoji: πŸ”
colorFrom: red
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
---

<p align="center">
  <img src="frontend/public/logo.svg" alt="PhilVerify Logo" width="150">
</p>
<p align="center">
  <em>Multimodal fake news detection for Philippine social media.</em>
</p>
<p align="center">
  <img src="https://img.shields.io/badge/Machine_Learning_2-Final_Project-blue?style=flat-square" alt="Project Status">
  <img src="https://img.shields.io/badge/Python-3.12-blue?style=flat-square&logo=python" alt="Python">
  <img src="https://img.shields.io/badge/FastAPI-0.115-009688?style=flat-square&logo=fastapi" alt="FastAPI">
  <img src="https://img.shields.io/badge/React-18-61DAFB?style=flat-square&logo=react" alt="React">
  <img src="https://img.shields.io/badge/License-MIT-yellow?style=flat-square" alt="License">
</p>

---

## ✨ Features

- **🎀 Multimodal Detection** β€” Verify raw text, news URLs, images (Tesseract OCR), and video/audio (Whisper ASR)
- **πŸ‡΅πŸ‡­ Language-Aware** β€” Seamlessly handles Tagalog, English, and Taglish content
- **🧠 Advanced NLP Pipeline** β€” Real-time entity recognition, sentiment/emotion analysis, and clickbait detection
- **βš–οΈ Two-Layer Scoring** β€” Combines ML classification (TF-IDF/RoBERTa) with NewsAPI evidence retrieval
- **πŸ›‘οΈ PH-Domain Verification** β€” Integrated database of Philippine news domain credibility tiers

---

## πŸš€ Quick Start

### Prerequisites

1. **Python 3.12+**
2. **Tesseract OCR** (`brew install tesseract`)
3. **Node.js** (for frontend development)

### Installation

```bash
# Clone the repository
git clone https://github.com/SemiAutomat1c/philverify.git
cd philverify

# Set up Backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Set up Frontend
cd frontend
npm install
```

### Run

```bash
# Backend (from project root)
uvicorn main:app --reload --port 8000

# Frontend
cd frontend
npm run dev
```

---

## πŸ› οΈ Tech Stack

| Component | Technology |
|-----------|------------|
| **Core Backend** | Python 3.12, FastAPI, Pydantic v2 |
| **NLP Engine** | spaCy, HuggingFace Transformers, langdetect |
| **ML Classification** | scikit-learn (TF-IDF + LogReg), XLM-RoBERTa |
| **OCR / ASR** | Tesseract (PH+EN support), OpenAI Whisper |
| **Frontend** | React, TailwindCSS, Chart.js, Vite |

---

## πŸ“ Project Structure

```
PhilVerify/
β”œβ”€β”€ main.py                  # FastAPI app entry point
β”œβ”€β”€ config.py                # Settings (pydantic-settings)
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
β”œβ”€β”€ domain_credibility.json  # PH domain tier database
β”‚
β”œβ”€β”€ api/
β”‚   β”œβ”€β”€ schemas.py           # Pydantic request/response models
β”‚   └── routes/
β”‚       β”œβ”€β”€ verify.py        # POST /verify/text|url|image|video
β”‚       β”œβ”€β”€ history.py       # GET /history
β”‚       └── trends.py        # GET /trends
β”‚
β”œβ”€β”€ nlp/                     # NLP preprocessing pipeline
β”‚   β”œβ”€β”€ preprocessor.py      # Clean, tokenize, remove stopwords (EN+TL)
β”‚   β”œβ”€β”€ language_detector.py # Tagalog / English / Taglish detection
β”‚   β”œβ”€β”€ ner.py               # Named entity recognition + PH entity hints
β”‚   β”œβ”€β”€ sentiment.py         # Sentiment + emotion analysis
β”‚   β”œβ”€β”€ clickbait.py         # Clickbait pattern detection
β”‚   └── claim_extractor.py   # Extract falsifiable claim for evidence search
β”‚
β”œβ”€β”€ ml/
β”‚   └── tfidf_classifier.py  # Layer 1 β€” TF-IDF baseline classifier
β”‚
β”œβ”€β”€ evidence/
β”‚   └── news_fetcher.py      # Layer 2 β€” NewsAPI + cosine similarity
β”‚
β”œβ”€β”€ scoring/
β”‚   └── engine.py            # Orchestrates full pipeline + final score
β”‚
β”œβ”€β”€ inputs/
β”‚   β”œβ”€β”€ url_scraper.py       # BeautifulSoup article extractor
β”‚   β”œβ”€β”€ ocr.py               # Tesseract OCR
β”‚   └── asr.py               # Whisper ASR
β”‚
└── tests/
    └── test_philverify.py   # 23 unit + integration tests
```

---

## πŸ“… Roadmap

- [x] Phase 1 β€” FastAPI backend skeleton
- [x] Phase 2 β€” NLP preprocessing pipeline
- [x] Phase 3 β€” TF-IDF baseline classifier
- [/] Phase 4 β€” NewsAPI evidence retrieval
- [ ] Phase 5 β€” Scoring engine refinement (stance detection)
- [ ] Phase 6 β€” React web dashboard
- [ ] Phase 7 β€” Chrome Extension (Manifest V3)
- [ ] Phase 8 β€” Fine-tune XLM-RoBERTa / TLUnified-RoBERTa

---

## 🀝 Contributing

Contributions welcome! Please feel free to submit a Pull Request.

---

<p align="center">
  <strong>⚠️ Disclaimer</strong><br>
  <em>This tool is meant for research and educational purposes. Use responsibly and ethically when verifying information on social media.</em>
</p>

## πŸ“ License

MIT