Spaces:

SemiAutomat1c
/

philverify-api

Running

App Files Files Community

Ryan Christian D. Deniega commited on Mar 2

Commit

70fdb2e

1 Parent(s): aef3106

docs: update README with current deployment, features, and project structure

Browse files

Files changed (1) hide show

README.md +74 -26

README.md CHANGED Viewed

@@ -21,26 +21,44 @@ pinned: false
   <img src="https://img.shields.io/badge/React-18-61DAFB?style=flat-square&logo=react" alt="React">
   <img src="https://img.shields.io/badge/License-MIT-yellow?style=flat-square" alt="License">
 </p>
 ---
 ## ✨ Features
-- **🎤 Multimodal Detection** — Verify raw text, news URLs, images (Tesseract OCR), and video/audio (Whisper ASR)
 - **🇵🇭 Language-Aware** — Seamlessly handles Tagalog, English, and Taglish content
 - **🧠 Advanced NLP Pipeline** — Real-time entity recognition, sentiment/emotion analysis, and clickbait detection
-- **⚖️ Two-Layer Scoring** — Combines ML classification (TF-IDF/RoBERTa) with NewsAPI evidence retrieval
 - **🛡️ PH-Domain Verification** — Integrated database of Philippine news domain credibility tiers
 ---
-## 🚀 Quick Start
 ### Prerequisites
 1. **Python 3.12+**
-2. **Tesseract OCR** (`brew install tesseract`)
-3. **Node.js** (for frontend development)
 ### Installation
@@ -49,12 +67,12 @@ pinned: false
 git clone https://github.com/SemiAutomat1c/philverify.git
 cd philverify
-# Set up Backend
 python3 -m venv venv
 source venv/bin/activate
 pip install -r requirements.txt
-# Set up Frontend
 cd frontend
 npm install
 ```
@@ -62,14 +80,30 @@ npm install
 ### Run
 ```bash
-# Backend (from project root)
 uvicorn main:app --reload --port 8000
-# Frontend
 cd frontend
 npm run dev
 ```
 ---
 ## 🛠️ Tech Stack
@@ -78,9 +112,13 @@ npm run dev
 |-----------|------------|
 | **Core Backend** | Python 3.12, FastAPI, Pydantic v2 |
 | **NLP Engine** | spaCy, HuggingFace Transformers, langdetect |
-| **ML Classification** | scikit-learn (TF-IDF + LogReg), XLM-RoBERTa |
-| **OCR / ASR** | Tesseract (PH+EN support), OpenAI Whisper |
-| **Frontend** | React, TailwindCSS, Chart.js, Vite |
 ---
@@ -88,18 +126,18 @@ npm run dev
 ```
 PhilVerify/
-├── main.py                  # FastAPI app entry point
 ├── config.py                # Settings (pydantic-settings)
 ├── requirements.txt
-├── .env.example
-├── domain_credibility.json  # PH domain tier database
 │
 ├── api/
 │   ├── schemas.py           # Pydantic request/response models
 │   └── routes/
-│       ├── verify.py        # POST /verify/text|url|image|video
-│       ├── history.py       # GET /history
-│       └── trends.py        # GET /trends
 │
 ├── nlp/                     # NLP preprocessing pipeline
 │   ├── preprocessor.py      # Clean, tokenize, remove stopwords (EN+TL)
@@ -120,11 +158,19 @@ PhilVerify/
 │
 ├── inputs/
 │   ├── url_scraper.py       # BeautifulSoup article extractor
-│   ├── ocr.py               # Tesseract OCR
-│   └── asr.py               # Whisper ASR
 │
 └── tests/
-    └── test_philverify.py   # 23 unit + integration tests
 ```
 ---
@@ -134,11 +180,13 @@ PhilVerify/
 - [x] Phase 1 — FastAPI backend skeleton
 - [x] Phase 2 — NLP preprocessing pipeline
 - [x] Phase 3 — TF-IDF baseline classifier
-- [/] Phase 4 — NewsAPI evidence retrieval
-- [ ] Phase 5 — Scoring engine refinement (stance detection)
-- [ ] Phase 6 — React web dashboard
-- [ ] Phase 7 — Chrome Extension (Manifest V3)
-- [ ] Phase 8 — Fine-tune XLM-RoBERTa / TLUnified-RoBERTa
 ---

   <img src="https://img.shields.io/badge/React-18-61DAFB?style=flat-square&logo=react" alt="React">
   <img src="https://img.shields.io/badge/License-MIT-yellow?style=flat-square" alt="License">
 </p>
+<p align="center">
+  <a href="https://philverify.web.app"><strong>🌐 Live Demo</strong></a> &nbsp;•&nbsp;
+  <a href="https://semiautomat1c-philverify-api.hf.space/docs"><strong>📖 API Docs</strong></a>
+</p>
 ---
 ## ✨ Features
+- **🎤 Multimodal Detection** — Verify raw text, news URLs, images, and video/audio
+- **🖼️ Image OCR** — Extract and analyze text from screenshots and images (Tesseract fil+eng)
+- **🎬 Video Frame OCR** — Extract on-screen text from video frames alongside Whisper speech transcription
+- **🔊 Speech Transcription** — Transcribe audio/video content using OpenAI Whisper
 - **🇵🇭 Language-Aware** — Seamlessly handles Tagalog, English, and Taglish content
 - **🧠 Advanced NLP Pipeline** — Real-time entity recognition, sentiment/emotion analysis, and clickbait detection
+- **⚖️ Two-Layer Scoring** — Combines ML classification (TF-IDF) with NewsAPI evidence retrieval
 - **🛡️ PH-Domain Verification** — Integrated database of Philippine news domain credibility tiers
 ---
+## 🚀 Deployment
+| Service | Platform | URL |
+|---------|----------|-----|
+| **Frontend** | Firebase Hosting | https://philverify.web.app |
+| **Backend API** | Hugging Face Spaces (Docker) | https://semiautomat1c-philverify-api.hf.space |
+| **API Docs** | Swagger UI (auto-generated) | https://semiautomat1c-philverify-api.hf.space/docs |
+---
+## 🖥️ Local Development
 ### Prerequisites
 1. **Python 3.12+**
+2. **Tesseract OCR** — `brew install tesseract tesseract-lang`
+3. **ffmpeg** — `brew install ffmpeg` (required for video frame extraction)
+4. **Node.js 18+** (for frontend)
 ### Installation
 git clone https://github.com/SemiAutomat1c/philverify.git
 cd philverify
+# Set up backend
 python3 -m venv venv
 source venv/bin/activate
 pip install -r requirements.txt
+# Set up frontend
 cd frontend
 npm install
 ```
 ### Run
 ```bash
+# Backend (from project root, with venv active)
 uvicorn main:app --reload --port 8000
+# Frontend (in a separate terminal)
 cd frontend
 npm run dev
 ```
+The frontend dev server proxies `/api` requests to `http://localhost:8000` automatically.
+### Environment Variables
+Copy `.env.example` to `.env` and fill in your keys:
+```
+NEWS_API_KEY=your_newsapi_key
+FIREBASE_PROJECT_ID=your_project_id
+```
+For frontend production builds, set `VITE_API_BASE_URL` in `frontend/.env.production`:
+```
+VITE_API_BASE_URL=https://your-hf-space.hf.space/api
+```
 ---
 ## 🛠️ Tech Stack
 |-----------|------------|
 | **Core Backend** | Python 3.12, FastAPI, Pydantic v2 |
 | **NLP Engine** | spaCy, HuggingFace Transformers, langdetect |
+| **ML Classification** | scikit-learn (TF-IDF + Logistic Regression) |
+| **OCR** | Tesseract (fil+eng), pytesseract, Pillow |
+| **ASR** | OpenAI Whisper (base model) |
+| **Video Processing** | ffmpeg (frame extraction), asyncio parallel pipeline |
+| **Frontend** | React 18, TailwindCSS, Chart.js, Vite 7 |
+| **Backend Hosting** | Hugging Face Spaces (Docker SDK, port 7860) |
+| **Frontend Hosting** | Firebase Hosting |
 ---
 ```
 PhilVerify/
+├── main.py                  # FastAPI app entry point + health endpoints
 ├── config.py                # Settings (pydantic-settings)
 ├── requirements.txt
+├── Dockerfile               # Docker image for HF Spaces (port 7860)
+├── domain_credibility.json  # PH news domain credibility tier database
 │
 ├── api/
 │   ├── schemas.py           # Pydantic request/response models
 │   └── routes/
+│       ├── verify.py        # POST /api/verify — handles text/url/image/video
+│       ├── history.py       # GET /api/history
+│       └── trends.py        # GET /api/trends
 │
 ├── nlp/                     # NLP preprocessing pipeline
 │   ├── preprocessor.py      # Clean, tokenize, remove stopwords (EN+TL)
 │
 ├── inputs/
 │   ├── url_scraper.py       # BeautifulSoup article extractor
+│   ├── ocr.py               # Tesseract OCR for images
+│   ├── asr.py               # Whisper ASR + combined video transcription
+│   └── video_ocr.py         # ffmpeg frame extraction + Tesseract OCR for video
+│
+├── frontend/                # React + Vite frontend
+│   ├── src/
+│   │   ├── pages/
+│   │   │   └── VerifyPage.jsx   # Main fact-check UI (tabs, results, chips)
+│   │   └── api.js               # API client (supports VITE_API_BASE_URL)
+│   └── .env.production          # Production API base URL
 │
 └── tests/
+    └── test_philverify.py   # Unit + integration tests
 ```
 ---
 - [x] Phase 1 — FastAPI backend skeleton
 - [x] Phase 2 — NLP preprocessing pipeline
 - [x] Phase 3 — TF-IDF baseline classifier
+- [x] Phase 4 — NewsAPI evidence retrieval
+- [x] Phase 5 — React web dashboard with multimodal input
+- [x] Phase 6 — Deploy to Hugging Face Spaces (backend) + Firebase (frontend)
+- [x] Phase 7 — Video frame OCR (ffmpeg + Tesseract alongside Whisper ASR)
+- [ ] Phase 8 — Scoring engine refinement (stance detection)
+- [ ] Phase 9 — Chrome Extension (Manifest V3)
+- [ ] Phase 10 — Fine-tune XLM-RoBERTa / TLUnified-RoBERTa
 ---