πŸ” Multimodal Fake Review Detector

Detect AI-generated fake reviews using text-image contrastive learning.

Model Description

This model analyzes both review text and associated product images to detect fake reviews. It leverages:

  • BERT for text encoding (contextual semantics)
  • CLIP Vision for image encoding (visual features)
  • Contrastive Learning for text-image alignment scoring

Key Insight

Authentic reviews exhibit higher semantic consistency between text descriptions and product images. AI-generated fake reviews often have lower text-image alignment since text and images are generated independently.

Performance

Metric Score
Accuracy 91.2%
F1 Score 91.0%
Precision 91.5%
Recall 91.2%
AUC-ROC 96.2%

Dataset

Trained on AiGen-FoodReview dataset:

  • 20,144 review-image pairs
  • 50% authentic (real Yelp/TripAdvisor reviews)
  • 50% fake (GPT-4-Turbo text + DALL-E-2 images)

Usage

Web Interface

Simply upload an image and enter review text to get:

  • Prediction: Authentic or Fake
  • Confidence Score: Model certainty
  • Consistency Score: Text-image alignment measure
  • Interpretation: Human-readable explanation

Python API

from app import load_model, predict
from PIL import Image

# Load model
load_model()

# Predict
image = Image.open("food.jpg")
result = predict("Amazing pizza with perfect crust!", image)
print(result)

Architecture

Input: (Review Text, Product Image)
         β”‚              β”‚
         β–Ό              β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  BERT   β”‚   β”‚  CLIP    β”‚
    β”‚ Encoder β”‚   β”‚ Vision   β”‚
    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
         β”‚              β”‚
    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
    β”‚ 768-dim β”‚   β”‚ 768-dim  β”‚
    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
         β”‚              β”‚
    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
    β”‚Text Projβ”‚   β”‚Image Projβ”‚
    β”‚  Head   β”‚   β”‚  Head    β”‚
    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
         β”‚              β”‚
         β–Ό              β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Consistency Score  β”‚ ← Cosine Similarity
    β”‚  (Contrastive Loss) β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   Concatenation     β”‚
    β”‚   (1536-dim)        β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚   MLP Classifier    β”‚
    β”‚  512 β†’ 256 β†’ 2      β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
    Output: [Authentic, Fake]

Citation

If you use this model, please cite:

@article{multimodal_fake_review_2026,
  title={Multimodal Fake Review Detection Using Contrastive Learning:
         Leveraging Text-Image Alignment for AI-Generated Content Identification},
  author={Your Name},
  year={2026}
}

License

MIT License

Acknowledgments

  • AiGen-FoodReview Dataset: Hugging Face
  • BERT: Google Research
  • CLIP: OpenAI
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support