VFacts / README.md
Keqing Li
Final verified deployment for HF Space
c9f5b32
metadata
title: VFacts
emoji: 😀
colorFrom: gray
colorTo: gray
sdk: docker
pinned: false

VFacts - Video Veracity & Analysis Platform

Research Overview

The liarMP4 project investigates the efficacy of Generative AI (GenAI) systems in detecting "contextual malformation" in video content, as opposed to traditional Predictive AI (PredAI) which focuses on metadata and engagement velocity.

While traditional content moderation relies on scalar probabilities derived from tabular data (account age, keyword triggers), this research proposes a Fractal Chain-of-Thought methodology. This approach utilizes Multimodal Large Language Models to analyze the semantic dissonance between visual evidence, audio waveforms, and textual claims.

The system generates Veracity Vectors, multi-dimensional scores representing Visual Integrity, Audio Integrity, and Cross-Modal Alignment—outputting data in a strict Token-Oriented Object Notation (TOON) schema.

Key Features

  • Predictive Benchmarking: Comparison against AutoGluon/Gradient Boosting models trained on engagement metadata.
  • Fractal Chain-of-Thought (FCoT): A recursive inference strategy that hypothesizes intent at a macro-scale and verifies pixel/audio artifacts at a meso-scale.
  • TOON Schema: A standardized output format ensuring strict type adherence for database integration.
  • Human-in-the-Loop (HITL) Protocol: A browser-based grounding workflow to calibrate AI "reasoning" against human authorial intent.

Project Resources

Repository Structure

  • src/: Core inference logic for the Generative AI pipeline and FCoT implementation.
  • preprocessing_tools/: Scripts for training Predictive AI models on tabular datasets.
  • extension/: Browser extension source code for the Human-in-the-Loop labeling workflow.
  • data/: Benchmark datasets containing engagement metadata and manual veracity labels.

Installation and Usage

This project is containerized to ensure reproducibility across different environments. The entire pipeline, including the inference logic and database connections, can be deployed using Docker.

Prerequisites

  • Docker Engine
  • Docker Compose

Deployment Instructions

  1. Clone the repository:

    git clone https://github.com/DevKlim/LiarMP4.git
    
  2. Navigate to the project directory:

    cd LiarMP4/liarMP4
    
  3. Build and run the containerized environment:

    docker-compose up --build
    

The system will initialize the backend services and expose the necessary endpoints for the analysis pipeline.

License

This research project is open-source. Please refer to the LICENSE file in the repository for specific terms regarding usage and distribution.

Authors

Kliment Ho, Shiwei Yang, Keqing Li