SetFit with jhu-clsp/mmBERT-small

This is a SetFit model that can be used for Text Classification. This SetFit model uses jhu-clsp/mmBERT-small as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: jhu-clsp/mmBERT-small
Classification head: a LogisticRegression instance
Maximum Sequence Length: 8192 tokens
Number of Classes: 2 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
not toxic	'tom holland surprised kids at the hospital in his spider-man suit' 'appreciate ya' 'the stars are extra bright tonight'
toxic	'go die in a fire screaming' 'desantis was right send migrants to marthas vineyard' 'send them back in cages'

Evaluation

Metrics

Label	Accuracy
all	0.9785

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("johnpaulbin/toxicity-setfit-1")
# Run inference
preds = model("habits")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	1	4.8995	81

Label	Training Sample Count
not toxic	8770
toxic	6322

Training Hyperparameters

batch_size: (64, 64)
num_epochs: (1, 1)
max_steps: -1
sampling_strategy: num_iterations
num_iterations: 2
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: True
warmup_proportion: 0.1
l2_weight: 0.01
seed: 42
eval_max_steps: -1
load_best_model_at_end: True

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0000	1	0.2577	-
0.0000	50	0.3826	-
0.0000	100	0.4065	-
0.0000	150	0.3851	-
0.0000	200	0.4038	-
0.0011	1	0.3989	-
0.0530	50	0.3241	-
0.1059	100	0.1508	-
0.1589	150	0.0692	-
0.2119	200	0.0523	-
0.2648	250	0.0351	-
0.3178	300	0.0267	-
0.3708	350	0.0184	-
0.4237	400	0.0187	-
0.4767	450	0.0143	-
0.5297	500	0.0154	-
0.5826	550	0.0117	-
0.6356	600	0.0103	-
0.6886	650	0.0081	-
0.7415	700	0.0075	-
0.7945	750	0.0076	-
0.8475	800	0.0057	-
0.9004	850	0.0041	-
0.9534	900	0.0058	-
1.0	944	-	0.0380

Framework Versions

Python: 3.12.12
SetFit: 1.2.0.dev0
Sentence Transformers: 5.2.0
Transformers: 4.57.3
PyTorch: 2.9.0+cu126
Datasets: 4.0.0
Tokenizers: 0.22.2

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

Downloads last month: 6

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for johnpaulbin/toxicity-setfit-1

Base model

jhu-clsp/mmBERT-small

Finetuned

(39)

this model

Paper for johnpaulbin/toxicity-setfit-1

Efficient Few-Shot Learning Without Prompts

Paper • 2209.11055 • Published Sep 22, 2022 • 7

Evaluation results

Accuracy on Unknown
test set self-reported

0.979