polodealvarado commited on
Commit
1ff2a71
·
verified ·
1 Parent(s): 25ccff4

Upload folder using huggingface_hub

Browse files
Files changed (6) hide show
  1. README.md +65 -0
  2. config.json +6 -0
  3. model.safetensors +3 -0
  4. tokenizer.json +0 -0
  5. tokenizer_config.json +14 -0
  6. training_meta.json +13 -0
README.md ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ library_name: transformers
6
+ pipeline_tag: zero-shot-classification
7
+ tags:
8
+ - zero-shot
9
+ - multi-label
10
+ - text-classification
11
+ - pytorch
12
+ metrics:
13
+ - precision
14
+ - recall
15
+ - f1
16
+ base_model: bert-base-uncased
17
+ datasets:
18
+ - polodealvarado/zeroshot-classification
19
+ ---
20
+
21
+ # Zero-Shot Text Classification — spanclass
22
+
23
+ GLiNER-inspired span-attentive classification with top-K span selection.
24
+
25
+ This model encodes texts and candidate labels into a shared embedding space using BERT,
26
+ enabling classification into arbitrary categories without retraining for new labels.
27
+
28
+ ## Training Details
29
+
30
+ | Parameter | Value |
31
+ |-----------|-------|
32
+ | Base model | `bert-base-uncased` |
33
+ | Model variant | `spanclass` |
34
+ | Training steps | 1000 |
35
+ | Batch size | 2 |
36
+ | Learning rate | 2e-05 |
37
+ | Trainable params | 111,254,017 |
38
+ | Training time | 374.1s |
39
+
40
+ ## Dataset
41
+
42
+ Trained on [polodealvarado/zeroshot-classification](https://huggingface.co/datasets/polodealvarado/zeroshot-classification).
43
+
44
+ ## Evaluation Results
45
+
46
+ | Metric | Score |
47
+ |--------|-------|
48
+ | Precision | 0.9277 |
49
+ | Recall | 0.9503 |
50
+ | F1 Score | 0.9388 |
51
+
52
+ ## Usage
53
+
54
+ ```python
55
+ from models.spanclass import SpanClassModel
56
+
57
+ model = SpanClassModel.from_pretrained("polodealvarado/spanclass")
58
+
59
+ predictions = model.predict(
60
+ texts=["The stock market crashed yesterday."],
61
+ labels=[["Finance", "Sports", "Biology", "Economy"]],
62
+ )
63
+ print(predictions)
64
+ # [{"text": "...", "scores": {"Finance": 0.98, "Economy": 0.85, ...}}]
65
+ ```
config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "max_num_labels": 13,
3
+ "max_span_width": 5,
4
+ "model_name": "bert-base-uncased",
5
+ "top_k_spans": 8
6
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f86de12cb5dbe2875e166b813232be74cf257168009d6c59aaa0eecf13f8650
3
+ size 445041916
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "cls_token": "[CLS]",
4
+ "do_lower_case": true,
5
+ "is_local": false,
6
+ "mask_token": "[MASK]",
7
+ "model_max_length": 512,
8
+ "pad_token": "[PAD]",
9
+ "sep_token": "[SEP]",
10
+ "strip_accents": null,
11
+ "tokenize_chinese_chars": true,
12
+ "tokenizer_class": "BertTokenizer",
13
+ "unk_token": "[UNK]"
14
+ }
training_meta.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "spanclass",
3
+ "encoder_name": "bert-base-uncased",
4
+ "param_count": 111254017,
5
+ "num_steps": 1000,
6
+ "best_step": 875,
7
+ "batch_size": 2,
8
+ "learning_rate": 2e-05,
9
+ "train_time_s": 374.11,
10
+ "precision": 0.9277,
11
+ "recall": 0.9503,
12
+ "f1": 0.9388
13
+ }