SciPeerAI-7B

World's first LLM fine-tuned specifically for scientific fraud detection.

Built by Sameer Nadeem β€” BS Data Science student, Bahawalpur, Pakistan.


What This Model Does

SciPeerAI-7B analyzes scientific papers and outputs a structured JSON report covering 14 fraud dimensions simultaneously β€” something no other model does.

Given a paper's title, authors, abstract, and text, it returns:

  • Overall fraud confidence score (0.0 to 1.0)
  • Risk score across 14 modules
  • Fraud type classification (20 fraud taxonomy types)
  • Final verdict: FRAUD DETECTED or CLEAN PAPER

Model Details

Field Value
Base model mistralai/Mistral-7B-Instruct-v0.3
Fine-tuning method QLoRA (4-bit, nf4)
LoRA rank r=16, alpha=32
Target modules q_proj, k_proj, v_proj, o_proj
Training steps 500
Final training loss 0.2352
Training hardware Kaggle T4 x2 (32GB RAM)
Training time 147.7 minutes
Dataset SciPeerBench v1.1 (644 papers)

Performance

Test Paper Expected Model Output
Wakefield 1998 (Lancet) β€” retracted vaccine-autism fraud FRAUD fraud_confidence=0.99 βœ…
NDM-1 resistance paper (Lancet ID, 2010) β€” landmark clean paper CLEAN fraud_confidence=0.05 βœ…

Quick Start

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base = "mistralai/Mistral-7B-Instruct-v0.3"
adapter = "Abu-Sameer-66/SciPeerAI-7B"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)

prompt = """<s>[INST] Analyze this scientific paper for fraud and integrity issues:

Title: Your Paper Title
Authors: Author Names
Year: 2024
Journal: Journal Name
Abstract: Your abstract text here...

Provide a detailed JSON analysis with fraud scores across all 14 dimensions. [/INST]
"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=300, do_sample=False)
print(tokenizer.decode(output[0], skip_special_tokens=True))

14 Fraud Detection Dimensions

Module What It Detects
Statistical Audit p-hacking, sample size issues, round numbers
Figure Forensics pHash + ELA + brightness manipulation
Methodology Checker causation claims, missing control groups
Citation Analyzer self-citation rings, unsupported claims
Reproducibility Scanner code/data/ethics/preregistration
Novelty Scorer structural signals + Semantic Scholar API
GRIM Test mathematically impossible means
SPRITE Test impossible distributions + SD verification
Granularity Analyzer digit preference, Benford Law
P-Curve Analyzer publication bias, p-value clustering
Effect Size Validator Cohen d, power analysis, inflated effects
Retraction Checker retracted citations via CrossRef live API
Citation Cartel Detector citation rings, network manipulation
LLM Paper Detector burstiness, TTR, uniformity patterns

Part of SciPeerAI System


Citation

@misc{nadeem2026scipeerai,
  title={SciPeerAI: Multi-dimensional Automated Scientific Integrity Analysis System},
  author={Sameer Nadeem},
  year={2026},
  url={https://huggingface.co/Abu-Sameer-66/SciPeerAI-7B}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Abu-Sameer-66/SciPeerAI-7B

Finetuned
(479)
this model

Dataset used to train Abu-Sameer-66/SciPeerAI-7B