SetFit with jhu-clsp/mmBERT-small

This is a SetFit model that can be used for Text Classification. This SetFit model uses jhu-clsp/mmBERT-small as the Sentence Transformer embedding model. A LogisticRegression instance is used for classification.

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned Sentence Transformer.

Model Details

Model Description

Model Type: SetFit
Sentence Transformer body: jhu-clsp/mmBERT-small
Classification head: a LogisticRegression instance
Maximum Sequence Length: 8192 tokens
Number of Classes: 2 classes

Model Sources

Repository: SetFit on GitHub
Paper: Efficient Few-Shot Learning Without Prompts
Blogpost: SetFit: Efficient Few-Shot Learning Without Prompts

Model Labels

Label	Examples
not toxic	'tom holland surprised kids at the hospital in his spider-man suit' 'appreciate ya' 'the stars are extra bright tonight'
toxic	'go die in a fire screaming' 'desantis was right send migrants to marthas vineyard' 'send them back in cages'

Evaluation

Metrics

Label	Accuracy
all	0.9785

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("johnpaulbin/toxicity-setfit-2")
# Run inference
preds = model("habits")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	1	4.8995	81

Label	Training Sample Count
not toxic	8770
toxic	6322

Training Hyperparameters

batch_size: (128, 128)
num_epochs: (1, 1)
max_steps: -1
sampling_strategy: num_iterations
num_iterations: 8
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: True
warmup_proportion: 0.1
l2_weight: 0.01
seed: 42
eval_max_steps: -1
load_best_model_at_end: True

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0005	1	0.4526	-
0.0265	50	0.3981	-
0.0530	100	0.3785	-
0.0795	150	0.3517	-
0.1060	200	0.313	-
0.0005	1	0.2697	-
0.0265	50	0.2356	-
0.0530	100	0.1318	-
0.0795	150	0.0683	-
0.1060	200	0.0393	-
0.1325	250	0.0229	-
0.1590	300	0.0237	-
0.1855	350	0.0146	-
0.2120	400	0.0128	-
0.2385	450	0.0132	-
0.2650	500	0.0063	-
0.2915	550	0.0078	-
0.3180	600	0.0036	-
0.3445	650	0.0038	-
0.3710	700	0.0047	-
0.3975	750	0.0044	-
0.4240	800	0.0028	-
0.4505	850	0.0022	-
0.4769	900	0.0013	-
0.5034	950	0.0019	-
0.5299	1000	0.0018	-
0.5564	1050	0.0012	-
0.5829	1100	0.0016	-
0.6094	1150	0.0011	-
0.6359	1200	0.0011	-
0.6624	1250	0.0009	-
0.6889	1300	0.0009	-
0.7154	1350	0.0009	-
0.7419	1400	0.0011	-
0.7684	1450	0.0011	-
0.7949	1500	0.0006	-
0.8214	1550	0.0011	-
0.8479	1600	0.0011	-
0.8744	1650	0.0017	-
0.9009	1700	0.0005	-
0.9274	1750	0.0006	-
0.9539	1800	0.0006	-
0.9804	1850	0.0008	-
1.0	1887	-	0.0368

Framework Versions

Python: 3.12.12
SetFit: 1.2.0.dev0
Sentence Transformers: 5.2.0
Transformers: 4.57.3
PyTorch: 2.9.0+cu126
Datasets: 4.0.0
Tokenizers: 0.22.2

Citation

BibTeX

@article{https://doi.org/10.48550/arxiv.2209.11055,
    doi = {10.48550/ARXIV.2209.11055},
    url = {https://arxiv.org/abs/2209.11055},
    author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {Efficient Few-Shot Learning Without Prompts},
    publisher = {arXiv},
    year = {2022},
    copyright = {Creative Commons Attribution 4.0 International}
}

Downloads last month: 6

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for johnpaulbin/toxicity-setfit-2

Base model

jhu-clsp/mmBERT-small

Finetuned

(39)

this model

Paper for johnpaulbin/toxicity-setfit-2

Efficient Few-Shot Learning Without Prompts

Paper • 2209.11055 • Published Sep 22, 2022 • 7

Evaluation results

Accuracy on Unknown
test set self-reported

0.979