lmprobe: Linear Probe on bitnet-b1.58-2B-4T

Truth probe for 'The city of X is not in Y' (negated) statements. Exploratory — weak signal (80.7%). Semantic/factual knowledge partially degrades under ternary quantization.

Classes

0: false_statement
1: true_statement

Usage

from lmprobe import LinearProbe

probe = LinearProbe.from_hub("latent-lab/neg-cities-truth-bitnet-2b", trust_classifier=True)
predictions = probe.predict(["your text here"])

Probe Details

Base model: microsoft/bitnet-b1.58-2B-4T
Model revision: 04c3b9ad9361b824064a1f25ea60a8be9599b127
Layers: all (0–29, 30 layers)
Pooling: last_token
Classifier: logistic_regression
Task: classification
Random state: 42

Evaluation

Metric	Value
accuracy	0.8067
auroc	0.8807
f1	0.8153
precision	0.7805
recall	0.8533

Training Data

Positive examples: 598
Negative examples: 598
Positive hash: sha256:d56c622bb238b4fc7fe6af316ea83bda26ddbafa8b2abd69d12339578e3ddce3
Negative hash: sha256:1e025516c05fc715dd18c40041035caee2e30fe91596e7e04422963e5b56f46a
Evaluation samples: 300
Evaluation hash: sha256:d9cce3adc1ba4e9c7401399afb3e403c6dd3f9fca232d6fbb927c63cd2f079e4

Error Analysis

This probe exhibits negation blindness: it ignores the word "not" and classifies based solely on whether the city-country pairing is correct.

Pattern in misclassifications (58/300 test errors):

True negations misclassified as false: "The city of X is not in Y" where X is indeed not in Y → probe sees correct city-country pair, predicts "true statement" (label 1), but the ground truth is label 1 (true), so this actually works by accident when the underlying fact is wrong
The systematic failure: when "The city of X is not in Y" and X is in Y (a false statement), the probe sees the correct pairing and predicts "true" (label 1) instead of "false" (label 0)

This is consistent with the Geometry of Truth paper's finding that negation comprehension requires larger models. At BitNet 2B's effective capacity, truth representations encode factual associations but not logical operators.

Limitations

This is an exploratory probe with weak signal (80.7% accuracy). The probe's truth direction does not account for negation — it detects city-country factual associations regardless of logical structure. Not suitable for applications requiring negation understanding.

Reproducibility

lmprobe version: 0.5.8
Python: 3.12.3
PyTorch: 2.10.0+cu128
scikit-learn: 1.8.0
transformers: 5.3.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for latent-lab/neg-cities-truth-bitnet-2b

Base model

microsoft/bitnet-b1.58-2B-4T

Finetuned

(18)

this model