File size: 1,490 Bytes
2a2a0f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
language:
  - en
license: mit
library_name: transformers
pipeline_tag: zero-shot-classification
tags:
  - zero-shot
  - multi-label
  - text-classification
  - pytorch
metrics:
  - precision
  - recall
  - f1
base_model: bert-base-uncased
datasets:
  - polodealvarado/zeroshot-classification
---

# Zero-Shot Text Classification — biencoder

Shared BERT encoder with dot-product similarity and sigmoid activation.

This model encodes texts and candidate labels into a shared embedding space using BERT,
enabling classification into arbitrary categories without retraining for new labels.

## Training Details

| Parameter | Value |
|-----------|-------|
| Base model | `bert-base-uncased` |
| Model variant | `biencoder` |
| Training steps | 1000 |
| Batch size | 2 |
| Learning rate | 2e-05 |
| Trainable params | 109,482,240 |
| Training time | 345.0s |

## Dataset

Trained on [polodealvarado/zeroshot-classification](https://huggingface.co/datasets/polodealvarado/zeroshot-classification).

## Evaluation Results

| Metric | Score |
|--------|-------|
| Precision | 0.9486 |
| Recall | 0.9660 |
| F1 Score | 0.9572 |

## Usage

```python
from models.base import BiEncoderModel

model = BiEncoderModel.from_pretrained("polodealvarado/biencoder")

predictions = model.predict(
    texts=["The stock market crashed yesterday."],
    labels=[["Finance", "Sports", "Biology", "Economy"]],
)
print(predictions)
# [{"text": "...", "scores": {"Finance": 0.98, "Economy": 0.85, ...}}]
```