paraphrase-MiniLM-L3-v2_immig
This SetFit model was trained on 48 title-abstracts samples (24 per class) to differeniate between published studies
related to immigration/migration research and those that are not.
Evaluation
Metrics
| Label |
Accuracy |
Precision |
Recall |
F1 |
| all |
0.9812 |
0.9934 |
0.9868 |
0.9901 |
Uses
Direct Use for Inference
First install the SetFit library:
pip install setfit
Then you can load this model and run inference.
from setfit import SetFitModel
model = SetFitModel.from_pretrained("mmarbach/paraphrase-MiniLM-L3-v2_immig")
preds = model("TITLE: ... ABSTRACT: ....")
Training Details
Training Set Metrics
| Training set |
Min |
Median |
Max |
| Word count |
97 |
155.6458 |
262 |
| Label |
Training Sample Count |
| immigration_topic |
24 |
| other_topic |
24 |
Training Hyperparameters
- batch_size: (16, 16)
- num_epochs: (4, 4)
- max_steps: -1
- sampling_strategy: oversampling
- body_learning_rate: (2e-05, 1e-05)
- head_learning_rate: 0.01
- loss: CosineSimilarityLoss
- distance_metric: cosine_distance
- margin: 0.25
- end_to_end: False
- use_amp: False
- warmup_proportion: 0.1
- l2_weight: 0.01
- seed: 42
- eval_max_steps: -1
- load_best_model_at_end: False
Training Results
| Epoch |
Step |
Training Loss |
Validation Loss |
| 0.0133 |
1 |
0.288 |
- |
| 0.6667 |
50 |
0.1935 |
- |
| 1.0 |
75 |
- |
0.0980 |
| 1.3333 |
100 |
0.0472 |
- |
| 2.0 |
150 |
0.0118 |
0.0767 |
| 2.6667 |
200 |
0.0057 |
- |
| 3.0 |
225 |
- |
0.0719 |
| 3.3333 |
250 |
0.0047 |
- |
| 4.0 |
300 |
0.0039 |
0.0718 |
Framework Versions
- Python: 3.12.11
- SetFit: 1.1.2
- Sentence Transformers: 5.0.0
- Transformers: 4.53.0
- PyTorch: 2.7.1
- Datasets: 3.6.0
- Tokenizers: 0.21.2