lmprobe: Linear Probe on bitnet-b1.58-2B-4T

Truth probe for 'The city of X is not in Y' (negated) statements. Exploratory โ€” weak signal (80.7%). Semantic/factual knowledge partially degrades under ternary quantization.

Classes

  • 0: false_statement
  • 1: true_statement

Usage

from lmprobe import LinearProbe

probe = LinearProbe.from_hub("latent-lab/neg-cities-truth-bitnet-2b", trust_classifier=True)
predictions = probe.predict(["your text here"])

Probe Details

  • Base model: microsoft/bitnet-b1.58-2B-4T
  • Model revision: 04c3b9ad9361b824064a1f25ea60a8be9599b127
  • Layers: all (0โ€“29, 30 layers)
  • Pooling: last_token
  • Classifier: logistic_regression
  • Task: classification
  • Random state: 42

Evaluation

Metric Value
accuracy 0.8067
auroc 0.8807
f1 0.8153
precision 0.7805
recall 0.8533

Training Data

  • Positive examples: 598

  • Negative examples: 598

  • Positive hash: sha256:d56c622bb238b4fc7fe6af316ea83bda26ddbafa8b2abd69d12339578e3ddce3

  • Negative hash: sha256:1e025516c05fc715dd18c40041035caee2e30fe91596e7e04422963e5b56f46a

  • Evaluation samples: 300

  • Evaluation hash: sha256:d9cce3adc1ba4e9c7401399afb3e403c6dd3f9fca232d6fbb927c63cd2f079e4

Error Analysis

This probe exhibits negation blindness: it ignores the word "not" and classifies based solely on whether the city-country pairing is correct.

Pattern in misclassifications (58/300 test errors):

  • True negations misclassified as false: "The city of X is not in Y" where X is indeed not in Y โ†’ probe sees correct city-country pair, predicts "true statement" (label 1), but the ground truth is label 1 (true), so this actually works by accident when the underlying fact is wrong
  • The systematic failure: when "The city of X is not in Y" and X is in Y (a false statement), the probe sees the correct pairing and predicts "true" (label 1) instead of "false" (label 0)

This is consistent with the Geometry of Truth paper's finding that negation comprehension requires larger models. At BitNet 2B's effective capacity, truth representations encode factual associations but not logical operators.

Limitations

This is an exploratory probe with weak signal (80.7% accuracy). The probe's truth direction does not account for negation โ€” it detects city-country factual associations regardless of logical structure. Not suitable for applications requiring negation understanding.

Reproducibility

  • lmprobe version: 0.5.8
  • Python: 3.12.3
  • PyTorch: 2.10.0+cu128
  • scikit-learn: 1.8.0
  • transformers: 5.3.0
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for latent-lab/neg-cities-truth-bitnet-2b

Finetuned
(16)
this model