abacus-cheat-tell-v3

Binary classifier for detecting anachronisms in mathematical and scientific texts from pre-modern traditions. Part of the ABACUS project - AGI verification via pre-modern mathematical reasoning.

Model Description

abacus-cheat-tell-v3 detects whether a passage from pre-modern mathematical texts contains anachronistic language or concepts (i.e., ideas that could not have existed at the time of writing). This is a core component of the ABACUS "no-cheating" protocol: any model trained on pre-modern corpora that produces post-1930 concepts is flagged.

Architecture: ModernBertForSequenceClassification (answerdotai/ModernBERT-base, 149M params) Labels: 0 = authentic, 1 = anachronism Warm-started from: idirectships/abacus-cheat-tell-v2

v2 vs v3 Metrics

Metric	v2 (baseline)	v3	Delta
F1	0.6522	0.7368	+0.0846
Accuracy	0.5429	0.7143	+0.1714
Precision	n/a	0.6667	-
Recall	n/a	0.8235	-

v3 was trained on the abacus-cheat-tell-eval-v3 train split (140 balanced examples) and uses explicit anachronism metadata with insertion position. v2 was trained on an unknown dataset of similar size over 4 epochs.

Training Details

Dataset: idirectships/abacus-cheat-tell-eval-v3 train split

140 examples, perfectly balanced: 70 authentic / 70 anachronism
Traditions: Greek, Chinese, Japanese, Indian, Islamic, Babylonian, Egyptian, Mayan

Hyperparameters:

Base model: idirectships/abacus-cheat-tell-v2 (warm-start from v2 weights)
Learning rate: 1e-5
Batch size: 8 (train), 16 (eval)
Max epochs: 10 (best at epoch 7, early stopping patience=3)
Warmup steps: 30
Weight decay: 0.05
Optimizer: AdamW fused
Training hardware: MacBook M4 Max (Apple MPS backend)
Wall clock: 13.0 minutes (782s)

Intended Use

Primary use: ABACUS provenance pipeline - anachronism filter for pre-modern mathematical TEAs (Translation-Era Artifacts). A text scoring anachronism (label=1) likely contains post-1930 mathematical terminology embedded in a historical context, indicating contaminated training data.

Limitations

Small eval set (35 examples) - F1 has high variance at this scale
May flag legitimate retrospective commentary (e.g., "this anticipates Fermat's Last Theorem") as anachronism
Specialized on mathematical/scientific domains - not intended for general anachronism detection
Limited coverage of Indigenous mathematical traditions in training data

License

Apache-2.0 (matching base model and v2)

Production Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("idirectships/abacus-cheat-tell-v3")
mdl = AutoModelForSequenceClassification.from_pretrained("idirectships/abacus-cheat-tell-v3")
mdl.train(False)

def classify(text):
    inputs = tok(text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        logits = mdl(**inputs).logits
    label_id = logits.argmax(-1).item()
    score = torch.softmax(logits, dim=-1)[0][label_id].item()
    return {"label": ["authentic", "anachronism"][label_id], "score": round(score, 4)}

print(classify("Archimedes calculated pi using a polygon approximation method."))
# -> {'label': 'authentic', 'score': ...}

print(classify("Newton's discovery of quantum entanglement in 1687 led to..."))
# -> {'label': 'anachronism', 'score': ...}

Downloads last month: 49

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for idirectships/abacus-cheat-tell-v3

Base model

answerdotai/ModernBERT-base

Finetuned

idirectships/abacus-cheat-tell-v2

Finetuned

(1)

this model

Evaluation results

F1 on idirectships/abacus-cheat-tell-eval-v3
self-reported

0.737
Accuracy on idirectships/abacus-cheat-tell-eval-v3
self-reported

0.714
Precision on idirectships/abacus-cheat-tell-eval-v3
self-reported

0.667
Recall on idirectships/abacus-cheat-tell-eval-v3
self-reported

0.824