lmprobe: Linear Probe on bitnet-b1.58-2B-4T
Truth probe for 'The city of X is not in Y' (negated) statements. Exploratory โ weak signal (80.7%). Semantic/factual knowledge partially degrades under ternary quantization.
Classes
- 0: false_statement
- 1: true_statement
Usage
from lmprobe import LinearProbe
probe = LinearProbe.from_hub("latent-lab/neg-cities-truth-bitnet-2b", trust_classifier=True)
predictions = probe.predict(["your text here"])
Probe Details
- Base model:
microsoft/bitnet-b1.58-2B-4T - Model revision:
04c3b9ad9361b824064a1f25ea60a8be9599b127 - Layers: all (0โ29, 30 layers)
- Pooling: last_token
- Classifier: logistic_regression
- Task: classification
- Random state: 42
Evaluation
| Metric | Value |
|---|---|
| accuracy | 0.8067 |
| auroc | 0.8807 |
| f1 | 0.8153 |
| precision | 0.7805 |
| recall | 0.8533 |
Training Data
Positive examples: 598
Negative examples: 598
Positive hash:
sha256:d56c622bb238b4fc7fe6af316ea83bda26ddbafa8b2abd69d12339578e3ddce3Negative hash:
sha256:1e025516c05fc715dd18c40041035caee2e30fe91596e7e04422963e5b56f46aEvaluation samples: 300
Evaluation hash:
sha256:d9cce3adc1ba4e9c7401399afb3e403c6dd3f9fca232d6fbb927c63cd2f079e4
Error Analysis
This probe exhibits negation blindness: it ignores the word "not" and classifies based solely on whether the city-country pairing is correct.
Pattern in misclassifications (58/300 test errors):
- True negations misclassified as false: "The city of X is not in Y" where X is indeed not in Y โ probe sees correct city-country pair, predicts "true statement" (label 1), but the ground truth is label 1 (true), so this actually works by accident when the underlying fact is wrong
- The systematic failure: when "The city of X is not in Y" and X is in Y (a false statement), the probe sees the correct pairing and predicts "true" (label 1) instead of "false" (label 0)
This is consistent with the Geometry of Truth paper's finding that negation comprehension requires larger models. At BitNet 2B's effective capacity, truth representations encode factual associations but not logical operators.
Limitations
This is an exploratory probe with weak signal (80.7% accuracy). The probe's truth direction does not account for negation โ it detects city-country factual associations regardless of logical structure. Not suitable for applications requiring negation understanding.
Reproducibility
- lmprobe version: 0.5.8
- Python: 3.12.3
- PyTorch: 2.10.0+cu128
- scikit-learn: 1.8.0
- transformers: 5.3.0
Model tree for latent-lab/neg-cities-truth-bitnet-2b
Base model
microsoft/bitnet-b1.58-2B-4T