abacus-cheat-tell-v3

Binary classifier for detecting anachronisms in mathematical and scientific texts from pre-modern traditions. Part of the ABACUS project - AGI verification via pre-modern mathematical reasoning.

Model Description

abacus-cheat-tell-v3 detects whether a passage from pre-modern mathematical texts contains anachronistic language or concepts (i.e., ideas that could not have existed at the time of writing). This is a core component of the ABACUS "no-cheating" protocol: any model trained on pre-modern corpora that produces post-1930 concepts is flagged.

Architecture: ModernBertForSequenceClassification (answerdotai/ModernBERT-base, 149M params) Labels: 0 = authentic, 1 = anachronism Warm-started from: idirectships/abacus-cheat-tell-v2

v2 vs v3 Metrics

Metric v2 (baseline) v3 Delta
F1 0.6522 0.7368 +0.0846
Accuracy 0.5429 0.7143 +0.1714
Precision n/a 0.6667 -
Recall n/a 0.8235 -

v3 was trained on the abacus-cheat-tell-eval-v3 train split (140 balanced examples) and uses explicit anachronism metadata with insertion position. v2 was trained on an unknown dataset of similar size over 4 epochs.

Training Details

Dataset: idirectships/abacus-cheat-tell-eval-v3 train split

  • 140 examples, perfectly balanced: 70 authentic / 70 anachronism
  • Traditions: Greek, Chinese, Japanese, Indian, Islamic, Babylonian, Egyptian, Mayan

Hyperparameters:

  • Base model: idirectships/abacus-cheat-tell-v2 (warm-start from v2 weights)
  • Learning rate: 1e-5
  • Batch size: 8 (train), 16 (eval)
  • Max epochs: 10 (best at epoch 7, early stopping patience=3)
  • Warmup steps: 30
  • Weight decay: 0.05
  • Optimizer: AdamW fused
  • Training hardware: MacBook M4 Max (Apple MPS backend)
  • Wall clock: 13.0 minutes (782s)

Intended Use

Primary use: ABACUS provenance pipeline - anachronism filter for pre-modern mathematical TEAs (Translation-Era Artifacts). A text scoring anachronism (label=1) likely contains post-1930 mathematical terminology embedded in a historical context, indicating contaminated training data.

Limitations

  • Small eval set (35 examples) - F1 has high variance at this scale
  • May flag legitimate retrospective commentary (e.g., "this anticipates Fermat's Last Theorem") as anachronism
  • Specialized on mathematical/scientific domains - not intended for general anachronism detection
  • Limited coverage of Indigenous mathematical traditions in training data

License

Apache-2.0 (matching base model and v2)

Production Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("idirectships/abacus-cheat-tell-v3")
mdl = AutoModelForSequenceClassification.from_pretrained("idirectships/abacus-cheat-tell-v3")
mdl.train(False)

def classify(text):
    inputs = tok(text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        logits = mdl(**inputs).logits
    label_id = logits.argmax(-1).item()
    score = torch.softmax(logits, dim=-1)[0][label_id].item()
    return {"label": ["authentic", "anachronism"][label_id], "score": round(score, 4)}

print(classify("Archimedes calculated pi using a polygon approximation method."))
# -> {'label': 'authentic', 'score': ...}

print(classify("Newton's discovery of quantum entanglement in 1687 led to..."))
# -> {'label': 'anachronism', 'score': ...}
Downloads last month
49
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for idirectships/abacus-cheat-tell-v3

Finetuned
(1)
this model

Evaluation results

  • F1 on idirectships/abacus-cheat-tell-eval-v3
    self-reported
    0.737
  • Accuracy on idirectships/abacus-cheat-tell-eval-v3
    self-reported
    0.714
  • Precision on idirectships/abacus-cheat-tell-eval-v3
    self-reported
    0.667
  • Recall on idirectships/abacus-cheat-tell-eval-v3
    self-reported
    0.824