Spaces:
Running
Running
Ryan Christian D. Deniega
docs: update README with current deployment, features, and project structure
70fdb2e metadata
title: PhilVerify API
emoji: π
colorFrom: red
colorTo: blue
sdk: docker
app_port: 7860
pinned: false
Multimodal fake news detection for Philippine social media.
π Live Demo β’ π API Docs
β¨ Features
- π€ Multimodal Detection β Verify raw text, news URLs, images, and video/audio
- πΌοΈ Image OCR β Extract and analyze text from screenshots and images (Tesseract fil+eng)
- π¬ Video Frame OCR β Extract on-screen text from video frames alongside Whisper speech transcription
- π Speech Transcription β Transcribe audio/video content using OpenAI Whisper
- π΅π Language-Aware β Seamlessly handles Tagalog, English, and Taglish content
- π§ Advanced NLP Pipeline β Real-time entity recognition, sentiment/emotion analysis, and clickbait detection
- βοΈ Two-Layer Scoring β Combines ML classification (TF-IDF) with NewsAPI evidence retrieval
- π‘οΈ PH-Domain Verification β Integrated database of Philippine news domain credibility tiers
π Deployment
| Service | Platform | URL |
|---|---|---|
| Frontend | Firebase Hosting | https://philverify.web.app |
| Backend API | Hugging Face Spaces (Docker) | https://semiautomat1c-philverify-api.hf.space |
| API Docs | Swagger UI (auto-generated) | https://semiautomat1c-philverify-api.hf.space/docs |
π₯οΈ Local Development
Prerequisites
- Python 3.12+
- Tesseract OCR β
brew install tesseract tesseract-lang - ffmpeg β
brew install ffmpeg(required for video frame extraction) - Node.js 18+ (for frontend)
Installation
# Clone the repository
git clone https://github.com/SemiAutomat1c/philverify.git
cd philverify
# Set up backend
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Set up frontend
cd frontend
npm install
Run
# Backend (from project root, with venv active)
uvicorn main:app --reload --port 8000
# Frontend (in a separate terminal)
cd frontend
npm run dev
The frontend dev server proxies /api requests to http://localhost:8000 automatically.
Environment Variables
Copy .env.example to .env and fill in your keys:
NEWS_API_KEY=your_newsapi_key
FIREBASE_PROJECT_ID=your_project_id
For frontend production builds, set VITE_API_BASE_URL in frontend/.env.production:
VITE_API_BASE_URL=https://your-hf-space.hf.space/api
π οΈ Tech Stack
| Component | Technology |
|---|---|
| Core Backend | Python 3.12, FastAPI, Pydantic v2 |
| NLP Engine | spaCy, HuggingFace Transformers, langdetect |
| ML Classification | scikit-learn (TF-IDF + Logistic Regression) |
| OCR | Tesseract (fil+eng), pytesseract, Pillow |
| ASR | OpenAI Whisper (base model) |
| Video Processing | ffmpeg (frame extraction), asyncio parallel pipeline |
| Frontend | React 18, TailwindCSS, Chart.js, Vite 7 |
| Backend Hosting | Hugging Face Spaces (Docker SDK, port 7860) |
| Frontend Hosting | Firebase Hosting |
π Project Structure
PhilVerify/
βββ main.py # FastAPI app entry point + health endpoints
βββ config.py # Settings (pydantic-settings)
βββ requirements.txt
βββ Dockerfile # Docker image for HF Spaces (port 7860)
βββ domain_credibility.json # PH news domain credibility tier database
β
βββ api/
β βββ schemas.py # Pydantic request/response models
β βββ routes/
β βββ verify.py # POST /api/verify β handles text/url/image/video
β βββ history.py # GET /api/history
β βββ trends.py # GET /api/trends
β
βββ nlp/ # NLP preprocessing pipeline
β βββ preprocessor.py # Clean, tokenize, remove stopwords (EN+TL)
β βββ language_detector.py # Tagalog / English / Taglish detection
β βββ ner.py # Named entity recognition + PH entity hints
β βββ sentiment.py # Sentiment + emotion analysis
β βββ clickbait.py # Clickbait pattern detection
β βββ claim_extractor.py # Extract falsifiable claim for evidence search
β
βββ ml/
β βββ tfidf_classifier.py # Layer 1 β TF-IDF baseline classifier
β
βββ evidence/
β βββ news_fetcher.py # Layer 2 β NewsAPI + cosine similarity
β
βββ scoring/
β βββ engine.py # Orchestrates full pipeline + final score
β
βββ inputs/
β βββ url_scraper.py # BeautifulSoup article extractor
β βββ ocr.py # Tesseract OCR for images
β βββ asr.py # Whisper ASR + combined video transcription
β βββ video_ocr.py # ffmpeg frame extraction + Tesseract OCR for video
β
βββ frontend/ # React + Vite frontend
β βββ src/
β β βββ pages/
β β β βββ VerifyPage.jsx # Main fact-check UI (tabs, results, chips)
β β βββ api.js # API client (supports VITE_API_BASE_URL)
β βββ .env.production # Production API base URL
β
βββ tests/
βββ test_philverify.py # Unit + integration tests
π Roadmap
- Phase 1 β FastAPI backend skeleton
- Phase 2 β NLP preprocessing pipeline
- Phase 3 β TF-IDF baseline classifier
- Phase 4 β NewsAPI evidence retrieval
- Phase 5 β React web dashboard with multimodal input
- Phase 6 β Deploy to Hugging Face Spaces (backend) + Firebase (frontend)
- Phase 7 β Video frame OCR (ffmpeg + Tesseract alongside Whisper ASR)
- Phase 8 β Scoring engine refinement (stance detection)
- Phase 9 β Chrome Extension (Manifest V3)
- Phase 10 β Fine-tune XLM-RoBERTa / TLUnified-RoBERTa
π€ Contributing
Contributions welcome! Please feel free to submit a Pull Request.
β οΈ Disclaimer
This tool is meant for research and educational purposes. Use responsibly and ethically when verifying information on social media.
π License
MIT