PoolBench — BERT Scorers

Fine-tuned bert-base-uncased classifiers for automatic concept scoring of steered LLM outputs. One classifier per concept, trained on the PoolBench corpus.

These are Classifier B in the PoolBench evaluation pipeline: they score whether a steered generation exhibits the target concept, enabling the D2 SCP metric.

Concepts (17)

academic_tone, bureaucratic, causation, code_docs, conditionality, contrast, deference, depression, frustration, hedging, imdb_sentiment, legal_formality, narrative, negation_density, numerical_precision, planning, toxicity

File structure

One subdirectory per concept, each a standard HuggingFace AutoModelForSequenceClassification checkpoint:

{concept}/config.json
{concept}/model.safetensors
{concept}/tokenizer files...

Loading

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

concept = "causation"
tokenizer = AutoTokenizer.from_pretrained(f"nips234678/poolbench-bert-scorers/{concept}")
model = AutoModelForSequenceClassification.from_pretrained(f"nips234678/poolbench-bert-scorers/{concept}")

inputs = tokenizer("The result was caused by the earlier event.", return_tensors="pt", truncation=True)
with torch.no_grad():
    logits = model(**inputs).logits
pred = logits.argmax(-1).item()  # 1 = concept present, 0 = absent

Training details

Base model: bert-base-uncased
Training split: 700 passages per class per concept
Evaluation split: 300 passages per class per concept
Labels: 1 = concept present, 0 = concept absent

Citation

@misc{poolbench2026,
  title={PoolBench: Evaluating Pooling Strategies for Activation Steering Vectors},
  author={Anonymous},
  year={2026},
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for nips234678/poolbench-bert-scorers

Base model

google-bert/bert-base-uncased

Finetuned

(6682)

this model