🔍 Multimodal Fake Review Detector

Detect AI-generated fake reviews using text-image contrastive learning.

Model Description

This model analyzes both review text and associated product images to detect fake reviews. It leverages:

BERT for text encoding (contextual semantics)
CLIP Vision for image encoding (visual features)
Contrastive Learning for text-image alignment scoring

Key Insight

Authentic reviews exhibit higher semantic consistency between text descriptions and product images. AI-generated fake reviews often have lower text-image alignment since text and images are generated independently.

Performance

Metric	Score
Accuracy	91.2%
F1 Score	91.0%
Precision	91.5%
Recall	91.2%
AUC-ROC	96.2%

Dataset

Trained on AiGen-FoodReview dataset:

20,144 review-image pairs
50% authentic (real Yelp/TripAdvisor reviews)
50% fake (GPT-4-Turbo text + DALL-E-2 images)

Usage

Web Interface

Simply upload an image and enter review text to get:

Prediction: Authentic or Fake
Confidence Score: Model certainty
Consistency Score: Text-image alignment measure
Interpretation: Human-readable explanation

Python API

from app import load_model, predict
from PIL import Image

# Load model
load_model()

# Predict
image = Image.open("food.jpg")
result = predict("Amazing pizza with perfect crust!", image)
print(result)

Architecture

Input: (Review Text, Product Image)
         │              │
         ▼              ▼
    ┌─────────┐   ┌──────────┐
    │  BERT   │   │  CLIP    │
    │ Encoder │   │ Vision   │
    └────┬────┘   └────┬─────┘
         │              │
    ┌────┴────┐   ┌────┴─────┐
    │ 768-dim │   │ 768-dim  │
    └────┬────┘   └────┬─────┘
         │              │
    ┌────┴────┐   ┌────┴─────┐
    │Text Proj│   │Image Proj│
    │  Head   │   │  Head    │
    └────┬────┘   └────┬─────┘
         │              │
         ▼              ▼
    ┌─────────────────────┐
    │  Consistency Score  │ ← Cosine Similarity
    │  (Contrastive Loss) │
    └─────────────────────┘
         │
         ▼
    ┌─────────────────────┐
    │   Concatenation     │
    │   (1536-dim)        │
    └──────────┬──────────┘
               │
    ┌──────────▼──────────┐
    │   MLP Classifier    │
    │  512 → 256 → 2      │
    └──────────┬──────────┘
               │
               ▼
    Output: [Authentic, Fake]

Citation

If you use this model, please cite:

@article{multimodal_fake_review_2026,
  title={Multimodal Fake Review Detection Using Contrastive Learning:
         Leveraging Text-Image Alignment for AI-Generated Content Identification},
  author={Your Name},
  year={2026}
}

License

MIT License

Acknowledgments

AiGen-FoodReview Dataset: Hugging Face
BERT: Google Research
CLIP: OpenAI

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support