Ryan Christian D. Deniega commited on
Commit
f934e95
Β·
1 Parent(s): 3fac962

docs: redesign README with premium style

Browse files
Files changed (1) hide show
  1. README.md +77 -105
README.md CHANGED
@@ -1,42 +1,80 @@
1
- # PhilVerify πŸ‡΅πŸ‡­πŸ”
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
- **Machine learning 2 final project**
4
 
5
- **Multimodal fake news detection for Philippine social media.**
6
 
7
- PhilVerify combines ML-based text classification with evidence retrieval to detect misinformation in Tagalog, English, and Taglish content. It supports text, URL, image (OCR), and video (ASR) inputs.
 
 
 
 
8
 
9
  ---
10
 
11
- ## Features
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
13
- - **4 Input Types** β€” raw text, news URL, image (Tesseract OCR), video/audio (Whisper ASR)
14
- - **Language-Aware** β€” detects Tagalog / English / Taglish automatically
15
- - **NLP Pipeline** β€” NER, sentiment, emotion, clickbait detection, claim extraction
16
- - **Two-Layer Scoring**
17
- - Layer 1: TF-IDF + Logistic Regression classifier (β†’ fine-tuned XLM-RoBERTa)
18
- - Layer 2: NewsAPI evidence retrieval + cosine similarity + stance detection
19
- - **Final Score** = `(ML Γ— 0.40) + (Evidence Γ— 0.60)` β†’ Credible / Unverified / Likely Fake
20
- - **Philippine Domain Credibility DB** β€” 4-tier system (Rappler Tier 1 β†’ known fake sites Tier 4)
21
 
22
  ---
23
 
24
- ## Tech Stack
25
 
26
- | Layer | Tech |
27
- |---|---|
28
- | Backend | FastAPI, Python 3.12, Pydantic v2 |
29
- | NLP | spaCy, HuggingFace Transformers, langdetect |
30
- | ML Classifier | scikit-learn (TF-IDF + LogReg β†’ XLM-RoBERTa) |
31
- | OCR | Tesseract (`fil+eng`) |
32
- | ASR | OpenAI Whisper |
33
- | Evidence | NewsAPI, sentence-transformers |
34
- | Frontend *(planned)* | React, TailwindCSS, Chart.js |
35
- | Extension *(planned)* | Chrome Manifest V3 |
36
 
37
  ---
38
 
39
- ## Project Structure
40
 
41
  ```
42
  PhilVerify/
@@ -81,89 +119,12 @@ PhilVerify/
81
 
82
  ---
83
 
84
- ## Getting Started
85
-
86
- ### 1. Clone & set up environment
87
-
88
- ```bash
89
- git clone https://github.com/SemiAutomat1c/philverify.git
90
- cd philverify
91
- python3 -m venv venv
92
- source venv/bin/activate
93
- pip install -r requirements.txt
94
- ```
95
-
96
- ### 2. Configure environment variables
97
-
98
- ```bash
99
- cp .env.example .env
100
- # Edit .env and add your NEWS_API_KEY (optional but recommended)
101
- ```
102
-
103
- ### 3. Run the API
104
-
105
- ```bash
106
- uvicorn main:app --reload --port 8000
107
- ```
108
-
109
- ### 4. Explore the docs
110
-
111
- Open **http://localhost:8000/docs** for the interactive Swagger UI.
112
-
113
- ---
114
-
115
- ## API Endpoints
116
-
117
- | Method | Endpoint | Description |
118
- |---|---|---|
119
- | `POST` | `/verify/text` | Verify raw text |
120
- | `POST` | `/verify/url` | Verify a news URL |
121
- | `POST` | `/verify/image` | Verify an image (OCR) |
122
- | `POST` | `/verify/video` | Verify audio/video (Whisper ASR) |
123
- | `GET` | `/history` | Verification history (paginated) |
124
- | `GET` | `/trends` | Trending fake-news entities & topics |
125
-
126
- ### Example request
127
-
128
- ```bash
129
- curl -X POST http://localhost:8000/verify/text \
130
- -H "Content-Type: application/json" \
131
- -d '{"text": "GRABE! Namatay daw ang tatlong tao sa bagong sakit na kumakalat sa Pilipinas!"}'
132
- ```
133
-
134
- ### Example response
135
-
136
- ```json
137
- {
138
- "verdict": "Likely Fake",
139
- "confidence": 82.4,
140
- "final_score": 34.2,
141
- "layer1": { "verdict": "Likely Fake", "confidence": 82.4, "triggered_features": ["namatay", "sakit", "kumakalat"] },
142
- "layer2": { "verdict": "Unverified", "evidence_score": 50.0, "sources": [] },
143
- "entities": { "persons": [], "organizations": [], "locations": ["Pilipinas"], "dates": [] },
144
- "sentiment": "high negative",
145
- "emotion": "fear",
146
- "language": "Tagalog"
147
- }
148
- ```
149
-
150
- ---
151
-
152
- ## Running Tests
153
-
154
- ```bash
155
- pytest tests/ -v
156
- # 23 passed in ~1s
157
- ```
158
-
159
- ---
160
-
161
- ## Roadmap
162
 
163
  - [x] Phase 1 β€” FastAPI backend skeleton
164
  - [x] Phase 2 β€” NLP preprocessing pipeline
165
  - [x] Phase 3 β€” TF-IDF baseline classifier
166
- - [ ] Phase 4 β€” NewsAPI evidence retrieval
167
  - [ ] Phase 5 β€” Scoring engine refinement (stance detection)
168
  - [ ] Phase 6 β€” React web dashboard
169
  - [ ] Phase 7 β€” Chrome Extension (Manifest V3)
@@ -171,6 +132,17 @@ pytest tests/ -v
171
 
172
  ---
173
 
174
- ## License
 
 
 
 
 
 
 
 
 
 
 
175
 
176
  MIT
 
1
+ <p align="center">
2
+ <img src="frontend/public/logo.svg" alt="PhilVerify Logo" width="150">
3
+ </p>
4
+ <p align="center">
5
+ <em>Multimodal fake news detection for Philippine social media.</em>
6
+ </p>
7
+ <p align="center">
8
+ <img src="https://img.shields.io/badge/Machine_Learning_2-Final_Project-blue?style=flat-square" alt="Project Status">
9
+ <img src="https://img.shields.io/badge/Python-3.12-blue?style=flat-square&logo=python" alt="Python">
10
+ <img src="https://img.shields.io/badge/FastAPI-0.115-009688?style=flat-square&logo=fastapi" alt="FastAPI">
11
+ <img src="https://img.shields.io/badge/React-18-61DAFB?style=flat-square&logo=react" alt="React">
12
+ <img src="https://img.shields.io/badge/License-MIT-yellow?style=flat-square" alt="License">
13
+ </p>
14
 
15
+ ---
16
 
17
+ ## ✨ Features
18
 
19
+ - **🎀 Multimodal Detection** β€” Verify raw text, news URLs, images (Tesseract OCR), and video/audio (Whisper ASR)
20
+ - **πŸ‡΅πŸ‡­ Language-Aware** β€” Seamlessly handles Tagalog, English, and Taglish content
21
+ - **🧠 Advanced NLP Pipeline** β€” Real-time entity recognition, sentiment/emotion analysis, and clickbait detection
22
+ - **βš–οΈ Two-Layer Scoring** β€” Combines ML classification (TF-IDF/RoBERTa) with NewsAPI evidence retrieval
23
+ - **πŸ›‘οΈ PH-Domain Verification** β€” Integrated database of Philippine news domain credibility tiers
24
 
25
  ---
26
 
27
+ ## πŸš€ Quick Start
28
+
29
+ ### Prerequisites
30
+
31
+ 1. **Python 3.12+**
32
+ 2. **Tesseract OCR** (`brew install tesseract`)
33
+ 3. **Node.js** (for frontend development)
34
+
35
+ ### Installation
36
+
37
+ ```bash
38
+ # Clone the repository
39
+ git clone https://github.com/SemiAutomat1c/philverify.git
40
+ cd philverify
41
+
42
+ # Set up Backend
43
+ python3 -m venv venv
44
+ source venv/bin/activate
45
+ pip install -r requirements.txt
46
+
47
+ # Set up Frontend
48
+ cd frontend
49
+ npm install
50
+ ```
51
+
52
+ ### Run
53
+
54
+ ```bash
55
+ # Backend (from project root)
56
+ uvicorn main:app --reload --port 8000
57
 
58
+ # Frontend
59
+ cd frontend
60
+ npm run dev
61
+ ```
 
 
 
 
62
 
63
  ---
64
 
65
+ ## πŸ› οΈ Tech Stack
66
 
67
+ | Component | Technology |
68
+ |-----------|------------|
69
+ | **Core Backend** | Python 3.12, FastAPI, Pydantic v2 |
70
+ | **NLP Engine** | spaCy, HuggingFace Transformers, langdetect |
71
+ | **ML Classification** | scikit-learn (TF-IDF + LogReg), XLM-RoBERTa |
72
+ | **OCR / ASR** | Tesseract (PH+EN support), OpenAI Whisper |
73
+ | **Frontend** | React, TailwindCSS, Chart.js, Vite |
 
 
 
74
 
75
  ---
76
 
77
+ ## πŸ“ Project Structure
78
 
79
  ```
80
  PhilVerify/
 
119
 
120
  ---
121
 
122
+ ## πŸ“… Roadmap
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
123
 
124
  - [x] Phase 1 β€” FastAPI backend skeleton
125
  - [x] Phase 2 β€” NLP preprocessing pipeline
126
  - [x] Phase 3 β€” TF-IDF baseline classifier
127
+ - [/] Phase 4 β€” NewsAPI evidence retrieval
128
  - [ ] Phase 5 β€” Scoring engine refinement (stance detection)
129
  - [ ] Phase 6 β€” React web dashboard
130
  - [ ] Phase 7 β€” Chrome Extension (Manifest V3)
 
132
 
133
  ---
134
 
135
+ ## 🀝 Contributing
136
+
137
+ Contributions welcome! Please feel free to submit a Pull Request.
138
+
139
+ ---
140
+
141
+ <p align="center">
142
+ <strong>⚠️ Disclaimer</strong><br>
143
+ <em>This tool is meant for research and educational purposes. Use responsibly and ethically when verifying information on social media.</em>
144
+ </p>
145
+
146
+ ## πŸ“ License
147
 
148
  MIT