Energy-Based Constraint Networks

Pretrained weights for "Energy-Based Constraint Networks: Learning Structural Coherence Across Modalities"

Paper: Zenodo Code: GitHub

Checkpoints

Text Domain (frozen BERT-base-uncased encoder)

File	Params	CLI flag	Result
`nl_bert_constraint_best.pt`	7.4M	—	93.4% on 6 trained corruption types, 87.2% on 9 unseen types

Vision Domain (frozen DINOv2 ViT-B/14 encoder)

File	Params	CLI flag	Role
`pretrained_paired_best.pt`	3.6M	`--struct_checkpoint`	Structural branch. Corruption pretrained + FF++ paired fine-tuning. Detects face swaps, expression transfers, identity manipulations
`freq_only_best.pt`	3.6M	`--freq_checkpoint`	Frequency branch. Processes DINOv2(frequency heatmap). Detects GAN smoothing, texture loss
`local_only_best.pt`	3.6M	`--local_checkpoint`	Local texture branch. Trained on local corruptions + NeuralTextures. Detects neural rendering, localized inconsistencies
`vision_constraint_celeb.pt`	3.6M	`--pretrained`	Corruption-only pretrained model (zero deepfake training data). Used as initialization for structural branch. Achieves 0.850 AUC on FF++ Deepfakes without seeing any deepfakes

Combined Vision Results (structural + frequency + local texture)

Benchmark	AUC (mean ± std, 5 seeds)
FF++ Deepfakes	0.959 ± 0.001
FF++ Face2Face	0.909 ± 0.003
FF++ FaceSwap	0.919 ± 0.003
FF++ NeuralTextures	0.880 ± 0.005
FF++ FaceShifter	0.897 ± 0.005
Celeb-DF (cross-dataset, no training data)	0.870 ± 0.019

Usage

Text

import torch
import torch.nn as nn
from model import ConstraintNetwork
from bert_encoder import BERTWindowEncoder

# Load encoder and model
encoder = BERTWindowEncoder("bert-base-uncased", "cuda", layer=-2, window_size=8)
model = ConstraintNetwork(d_model=384, d_state=64, vocab_size=None,
                          max_seq_len=32, dropout=0.15, alpha=0.3).cuda()
model.input_proj = nn.Linear(768, 384).cuda()
model.load_state_dict(torch.load("nl_bert_constraint_best.pt", map_location="cuda"))
model.eval()

# Evaluate a paragraph
text = "Marie Curie was born in Warsaw. She studied physics in Paris."
embs = encoder.encode_paragraph(text, max_windows=32).cuda()
if embs.shape[0] < 32:
    pad = torch.zeros(32 - embs.shape[0], 768, device="cuda")
    embs = torch.cat([embs, pad])
embs = embs.unsqueeze(0)
with torch.no_grad():
    energy = model(embs)
print(f"Energy: {energy.item():+.4f}")  # lower = more coherent

Vision (three-branch evaluation)

python eval_combined_final.py \
    --struct_checkpoint pretrained_paired_best.pt \
    --freq_checkpoint freq_only_best.pt \
    --local_checkpoint local_only_best.pt \
    --real_dir ff_c23_faces/original \
    --fake_dirs ff_c23_faces/Deepfakes ff_c23_faces/Face2Face \
                ff_c23_faces/FaceSwap ff_c23_faces/NeuralTextures \
                ff_c23_faces/FaceShifter \
    --max_images 5000

Training from scratch

# Text: train on WikiText-103 (embeddings cached after first run)
python train_bert.py

# Vision: train three branches independently
python train_paired.py --real_dir ... --fake_dirs ... --pretrained vision_constraint_celeb.pt
python train_freq_only.py --real_dir ... --fake_dirs ...
python train_local_only.py --real_dir ... --fake_dirs ...

Architecture

Input → SSM (×6) → Dual-head attention (×2) → Energy head → E(x)
        linear       causal + bidirectional     mean + α·max
        cost         single W per head
                     no Q/K/V, no KV cache

Same architecture for both text and vision — only the input projection layer changes
Each forward pass is stateless (no KV cache)
Per-position energy decomposition enables violation localization
α = 0.3 (insensitive in [0.2, 1.0] across both modalities)

Requirements

Python 3.10+
PyTorch 2.0+
CUDA GPU
transformers (BERT, DINOv2)
datasets (WikiText-103)
mamba-ssm + causal-conv1d

Citation

@article{shinde2026energy,
  title={Energy-Based Constraint Networks: Learning Structural Coherence Across Modalities},
  author={Shinde, Chirag},
  year={2026},
  url={https://github.com/cs-cmyk/energy-constraint-networks}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support