PoolBench โ BERT Scorers
Fine-tuned bert-base-uncased classifiers for automatic concept scoring of steered LLM outputs. One classifier per concept, trained on the PoolBench corpus.
These are Classifier B in the PoolBench evaluation pipeline: they score whether a steered generation exhibits the target concept, enabling the D2 SCP metric.
Concepts (17)
academic_tone, bureaucratic, causation, code_docs, conditionality, contrast, deference, depression, frustration, hedging, imdb_sentiment, legal_formality, narrative, negation_density, numerical_precision, planning, toxicity
File structure
One subdirectory per concept, each a standard HuggingFace AutoModelForSequenceClassification checkpoint:
{concept}/config.json
{concept}/model.safetensors
{concept}/tokenizer files...
Loading
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
concept = "causation"
tokenizer = AutoTokenizer.from_pretrained(f"nips234678/poolbench-bert-scorers/{concept}")
model = AutoModelForSequenceClassification.from_pretrained(f"nips234678/poolbench-bert-scorers/{concept}")
inputs = tokenizer("The result was caused by the earlier event.", return_tensors="pt", truncation=True)
with torch.no_grad():
logits = model(**inputs).logits
pred = logits.argmax(-1).item() # 1 = concept present, 0 = absent
Training details
- Base model:
bert-base-uncased - Training split: 700 passages per class per concept
- Evaluation split: 300 passages per class per concept
- Labels: 1 = concept present, 0 = concept absent
Citation
@misc{poolbench2026,
title={PoolBench: Evaluating Pooling Strategies for Activation Steering Vectors},
author={Anonymous},
year={2026},
}
Model tree for nips234678/poolbench-bert-scorers
Base model
google-bert/bert-base-uncased