paraphrase-MiniLM-L3-v2_immig

This SetFit model was trained on 48 title-abstracts samples (24 per class) to differeniate between published studies related to immigration/migration research and those that are not.

Model Type: SetFit
Sentence Transformer body: sentence-transformers/paraphrase-MiniLM-L3-v2
Classification head: a LogisticRegression instance
Train data/script repository: SetFit on GitHub

Evaluation

Metrics

Label	Accuracy	Precision	Recall	F1
all	0.9812	0.9934	0.9868	0.9901

Uses

Direct Use for Inference

First install the SetFit library:

pip install setfit

Then you can load this model and run inference.

from setfit import SetFitModel

model = SetFitModel.from_pretrained("mmarbach/paraphrase-MiniLM-L3-v2_immig")
preds = model("TITLE: ...  ABSTRACT: ....")

Training Details

Training Set Metrics

Training set	Min	Median	Max
Word count	97	155.6458	262

Label	Training Sample Count
immigration_topic	24
other_topic	24

Training Hyperparameters

batch_size: (16, 16)
num_epochs: (4, 4)
max_steps: -1
sampling_strategy: oversampling
body_learning_rate: (2e-05, 1e-05)
head_learning_rate: 0.01
loss: CosineSimilarityLoss
distance_metric: cosine_distance
margin: 0.25
end_to_end: False
use_amp: False
warmup_proportion: 0.1
l2_weight: 0.01
seed: 42
eval_max_steps: -1
load_best_model_at_end: False

Training Results

Epoch	Step	Training Loss	Validation Loss
0.0133	1	0.288	-
0.6667	50	0.1935	-
1.0	75	-	0.0980
1.3333	100	0.0472	-
2.0	150	0.0118	0.0767
2.6667	200	0.0057	-
3.0	225	-	0.0719
3.3333	250	0.0047	-
4.0	300	0.0039	0.0718

Framework Versions

Python: 3.12.11
SetFit: 1.1.2
Sentence Transformers: 5.0.0
Transformers: 4.53.0
PyTorch: 2.7.1
Datasets: 3.6.0
Tokenizers: 0.21.2

Downloads last month: 17

Safetensors

Model size

17.4M params

Tensor type

F32

Model tree for sumtxt/paraphrase-MiniLM-L3-v2_immig

Base model

sentence-transformers/paraphrase-MiniLM-L3-v2

Finetuned

(25)

this model

Evaluation results

Accuracy on Unknown
test set self-reported

0.981
Precision on Unknown
test set self-reported

0.993
Recall on Unknown
test set self-reported

0.987
F1 on Unknown
test set self-reported

0.990