Ryan Christian D. Deniega commited on
Commit
70fdb2e
Β·
1 Parent(s): aef3106

docs: update README with current deployment, features, and project structure

Browse files
Files changed (1) hide show
  1. README.md +74 -26
README.md CHANGED
@@ -21,26 +21,44 @@ pinned: false
21
  <img src="https://img.shields.io/badge/React-18-61DAFB?style=flat-square&logo=react" alt="React">
22
  <img src="https://img.shields.io/badge/License-MIT-yellow?style=flat-square" alt="License">
23
  </p>
 
 
 
 
24
 
25
  ---
26
 
27
  ## ✨ Features
28
 
29
- - **🎀 Multimodal Detection** β€” Verify raw text, news URLs, images (Tesseract OCR), and video/audio (Whisper ASR)
 
 
 
30
  - **πŸ‡΅πŸ‡­ Language-Aware** β€” Seamlessly handles Tagalog, English, and Taglish content
31
  - **🧠 Advanced NLP Pipeline** β€” Real-time entity recognition, sentiment/emotion analysis, and clickbait detection
32
- - **βš–οΈ Two-Layer Scoring** β€” Combines ML classification (TF-IDF/RoBERTa) with NewsAPI evidence retrieval
33
  - **πŸ›‘οΈ PH-Domain Verification** β€” Integrated database of Philippine news domain credibility tiers
34
 
35
  ---
36
 
37
- ## πŸš€ Quick Start
 
 
 
 
 
 
 
 
 
 
38
 
39
  ### Prerequisites
40
 
41
  1. **Python 3.12+**
42
- 2. **Tesseract OCR** (`brew install tesseract`)
43
- 3. **Node.js** (for frontend development)
 
44
 
45
  ### Installation
46
 
@@ -49,12 +67,12 @@ pinned: false
49
  git clone https://github.com/SemiAutomat1c/philverify.git
50
  cd philverify
51
 
52
- # Set up Backend
53
  python3 -m venv venv
54
  source venv/bin/activate
55
  pip install -r requirements.txt
56
 
57
- # Set up Frontend
58
  cd frontend
59
  npm install
60
  ```
@@ -62,14 +80,30 @@ npm install
62
  ### Run
63
 
64
  ```bash
65
- # Backend (from project root)
66
  uvicorn main:app --reload --port 8000
67
 
68
- # Frontend
69
  cd frontend
70
  npm run dev
71
  ```
72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  ---
74
 
75
  ## πŸ› οΈ Tech Stack
@@ -78,9 +112,13 @@ npm run dev
78
  |-----------|------------|
79
  | **Core Backend** | Python 3.12, FastAPI, Pydantic v2 |
80
  | **NLP Engine** | spaCy, HuggingFace Transformers, langdetect |
81
- | **ML Classification** | scikit-learn (TF-IDF + LogReg), XLM-RoBERTa |
82
- | **OCR / ASR** | Tesseract (PH+EN support), OpenAI Whisper |
83
- | **Frontend** | React, TailwindCSS, Chart.js, Vite |
 
 
 
 
84
 
85
  ---
86
 
@@ -88,18 +126,18 @@ npm run dev
88
 
89
  ```
90
  PhilVerify/
91
- β”œβ”€β”€ main.py # FastAPI app entry point
92
  β”œβ”€β”€ config.py # Settings (pydantic-settings)
93
  β”œβ”€β”€ requirements.txt
94
- β”œβ”€β”€ .env.example
95
- β”œβ”€β”€ domain_credibility.json # PH domain tier database
96
  β”‚
97
  β”œβ”€β”€ api/
98
  β”‚ β”œβ”€β”€ schemas.py # Pydantic request/response models
99
  β”‚ └── routes/
100
- β”‚ β”œβ”€β”€ verify.py # POST /verify/text|url|image|video
101
- β”‚ β”œβ”€β”€ history.py # GET /history
102
- β”‚ └── trends.py # GET /trends
103
  β”‚
104
  β”œβ”€β”€ nlp/ # NLP preprocessing pipeline
105
  β”‚ β”œβ”€β”€ preprocessor.py # Clean, tokenize, remove stopwords (EN+TL)
@@ -120,11 +158,19 @@ PhilVerify/
120
  β”‚
121
  β”œβ”€β”€ inputs/
122
  β”‚ β”œβ”€β”€ url_scraper.py # BeautifulSoup article extractor
123
- β”‚ β”œβ”€β”€ ocr.py # Tesseract OCR
124
- β”‚ └── asr.py # Whisper ASR
 
 
 
 
 
 
 
 
125
  β”‚
126
  └── tests/
127
- └── test_philverify.py # 23 unit + integration tests
128
  ```
129
 
130
  ---
@@ -134,11 +180,13 @@ PhilVerify/
134
  - [x] Phase 1 β€” FastAPI backend skeleton
135
  - [x] Phase 2 β€” NLP preprocessing pipeline
136
  - [x] Phase 3 β€” TF-IDF baseline classifier
137
- - [/] Phase 4 β€” NewsAPI evidence retrieval
138
- - [ ] Phase 5 β€” Scoring engine refinement (stance detection)
139
- - [ ] Phase 6 β€” React web dashboard
140
- - [ ] Phase 7 β€” Chrome Extension (Manifest V3)
141
- - [ ] Phase 8 β€” Fine-tune XLM-RoBERTa / TLUnified-RoBERTa
 
 
142
 
143
  ---
144
 
 
21
  <img src="https://img.shields.io/badge/React-18-61DAFB?style=flat-square&logo=react" alt="React">
22
  <img src="https://img.shields.io/badge/License-MIT-yellow?style=flat-square" alt="License">
23
  </p>
24
+ <p align="center">
25
+ <a href="https://philverify.web.app"><strong>🌐 Live Demo</strong></a> &nbsp;β€’&nbsp;
26
+ <a href="https://semiautomat1c-philverify-api.hf.space/docs"><strong>πŸ“– API Docs</strong></a>
27
+ </p>
28
 
29
  ---
30
 
31
  ## ✨ Features
32
 
33
+ - **🎀 Multimodal Detection** β€” Verify raw text, news URLs, images, and video/audio
34
+ - **πŸ–ΌοΈ Image OCR** β€” Extract and analyze text from screenshots and images (Tesseract fil+eng)
35
+ - **🎬 Video Frame OCR** β€” Extract on-screen text from video frames alongside Whisper speech transcription
36
+ - **πŸ”Š Speech Transcription** β€” Transcribe audio/video content using OpenAI Whisper
37
  - **πŸ‡΅πŸ‡­ Language-Aware** β€” Seamlessly handles Tagalog, English, and Taglish content
38
  - **🧠 Advanced NLP Pipeline** β€” Real-time entity recognition, sentiment/emotion analysis, and clickbait detection
39
+ - **βš–οΈ Two-Layer Scoring** β€” Combines ML classification (TF-IDF) with NewsAPI evidence retrieval
40
  - **πŸ›‘οΈ PH-Domain Verification** β€” Integrated database of Philippine news domain credibility tiers
41
 
42
  ---
43
 
44
+ ## πŸš€ Deployment
45
+
46
+ | Service | Platform | URL |
47
+ |---------|----------|-----|
48
+ | **Frontend** | Firebase Hosting | https://philverify.web.app |
49
+ | **Backend API** | Hugging Face Spaces (Docker) | https://semiautomat1c-philverify-api.hf.space |
50
+ | **API Docs** | Swagger UI (auto-generated) | https://semiautomat1c-philverify-api.hf.space/docs |
51
+
52
+ ---
53
+
54
+ ## πŸ–₯️ Local Development
55
 
56
  ### Prerequisites
57
 
58
  1. **Python 3.12+**
59
+ 2. **Tesseract OCR** β€” `brew install tesseract tesseract-lang`
60
+ 3. **ffmpeg** β€” `brew install ffmpeg` (required for video frame extraction)
61
+ 4. **Node.js 18+** (for frontend)
62
 
63
  ### Installation
64
 
 
67
  git clone https://github.com/SemiAutomat1c/philverify.git
68
  cd philverify
69
 
70
+ # Set up backend
71
  python3 -m venv venv
72
  source venv/bin/activate
73
  pip install -r requirements.txt
74
 
75
+ # Set up frontend
76
  cd frontend
77
  npm install
78
  ```
 
80
  ### Run
81
 
82
  ```bash
83
+ # Backend (from project root, with venv active)
84
  uvicorn main:app --reload --port 8000
85
 
86
+ # Frontend (in a separate terminal)
87
  cd frontend
88
  npm run dev
89
  ```
90
 
91
+ The frontend dev server proxies `/api` requests to `http://localhost:8000` automatically.
92
+
93
+ ### Environment Variables
94
+
95
+ Copy `.env.example` to `.env` and fill in your keys:
96
+
97
+ ```
98
+ NEWS_API_KEY=your_newsapi_key
99
+ FIREBASE_PROJECT_ID=your_project_id
100
+ ```
101
+
102
+ For frontend production builds, set `VITE_API_BASE_URL` in `frontend/.env.production`:
103
+ ```
104
+ VITE_API_BASE_URL=https://your-hf-space.hf.space/api
105
+ ```
106
+
107
  ---
108
 
109
  ## πŸ› οΈ Tech Stack
 
112
  |-----------|------------|
113
  | **Core Backend** | Python 3.12, FastAPI, Pydantic v2 |
114
  | **NLP Engine** | spaCy, HuggingFace Transformers, langdetect |
115
+ | **ML Classification** | scikit-learn (TF-IDF + Logistic Regression) |
116
+ | **OCR** | Tesseract (fil+eng), pytesseract, Pillow |
117
+ | **ASR** | OpenAI Whisper (base model) |
118
+ | **Video Processing** | ffmpeg (frame extraction), asyncio parallel pipeline |
119
+ | **Frontend** | React 18, TailwindCSS, Chart.js, Vite 7 |
120
+ | **Backend Hosting** | Hugging Face Spaces (Docker SDK, port 7860) |
121
+ | **Frontend Hosting** | Firebase Hosting |
122
 
123
  ---
124
 
 
126
 
127
  ```
128
  PhilVerify/
129
+ β”œβ”€β”€ main.py # FastAPI app entry point + health endpoints
130
  β”œβ”€β”€ config.py # Settings (pydantic-settings)
131
  β”œβ”€β”€ requirements.txt
132
+ β”œβ”€β”€ Dockerfile # Docker image for HF Spaces (port 7860)
133
+ β”œβ”€β”€ domain_credibility.json # PH news domain credibility tier database
134
  β”‚
135
  β”œβ”€β”€ api/
136
  β”‚ β”œβ”€β”€ schemas.py # Pydantic request/response models
137
  β”‚ └── routes/
138
+ β”‚ β”œβ”€β”€ verify.py # POST /api/verify β€” handles text/url/image/video
139
+ β”‚ β”œβ”€β”€ history.py # GET /api/history
140
+ β”‚ └── trends.py # GET /api/trends
141
  β”‚
142
  β”œβ”€β”€ nlp/ # NLP preprocessing pipeline
143
  β”‚ β”œβ”€β”€ preprocessor.py # Clean, tokenize, remove stopwords (EN+TL)
 
158
  β”‚
159
  β”œβ”€β”€ inputs/
160
  β”‚ β”œβ”€β”€ url_scraper.py # BeautifulSoup article extractor
161
+ β”‚ β”œβ”€β”€ ocr.py # Tesseract OCR for images
162
+ β”‚ β”œβ”€β”€ asr.py # Whisper ASR + combined video transcription
163
+ β”‚ └── video_ocr.py # ffmpeg frame extraction + Tesseract OCR for video
164
+ β”‚
165
+ β”œβ”€β”€ frontend/ # React + Vite frontend
166
+ β”‚ β”œβ”€β”€ src/
167
+ β”‚ β”‚ β”œβ”€β”€ pages/
168
+ β”‚ β”‚ β”‚ └── VerifyPage.jsx # Main fact-check UI (tabs, results, chips)
169
+ β”‚ β”‚ └── api.js # API client (supports VITE_API_BASE_URL)
170
+ β”‚ └── .env.production # Production API base URL
171
  β”‚
172
  └── tests/
173
+ └── test_philverify.py # Unit + integration tests
174
  ```
175
 
176
  ---
 
180
  - [x] Phase 1 β€” FastAPI backend skeleton
181
  - [x] Phase 2 β€” NLP preprocessing pipeline
182
  - [x] Phase 3 β€” TF-IDF baseline classifier
183
+ - [x] Phase 4 β€” NewsAPI evidence retrieval
184
+ - [x] Phase 5 β€” React web dashboard with multimodal input
185
+ - [x] Phase 6 β€” Deploy to Hugging Face Spaces (backend) + Firebase (frontend)
186
+ - [x] Phase 7 β€” Video frame OCR (ffmpeg + Tesseract alongside Whisper ASR)
187
+ - [ ] Phase 8 β€” Scoring engine refinement (stance detection)
188
+ - [ ] Phase 9 β€” Chrome Extension (Manifest V3)
189
+ - [ ] Phase 10 β€” Fine-tune XLM-RoBERTa / TLUnified-RoBERTa
190
 
191
  ---
192