Text Classification
Safetensors
GLiClass
text classification
nli
sentiment analysis
semioticrobotic BioMike commited on
Commit
f2ad5b4
·
0 Parent(s):

Duplicate from knowledgator/gliclass-base-v3.0

Browse files

Co-authored-by: Mykhailo Shtopko <BioMike@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - BioMike/formal-logic-reasoning-gliclass-2k
5
+ - knowledgator/gliclass-v3-logic-dataset
6
+ - tau/commonsense_qa
7
+ metrics:
8
+ - f1
9
+ tags:
10
+ - text classification
11
+ - nli
12
+ - sentiment analysis
13
+ pipeline_tag: text-classification
14
+ ---
15
+
16
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6405f62ba577649430be5124/I9RAQol7giilBHbbf2T7M.png)
17
+
18
+ # GLiClass: Generalist and Lightweight Model for Sequence Classification
19
+
20
+ This is an efficient zero-shot classifier inspired by [GLiNER](https://github.com/urchade/GLiNER/tree/main) work. It demonstrates the same performance as a cross-encoder while being more compute-efficient because classification is done at a single forward path.
21
+
22
+ It can be used for `topic classification`, `sentiment analysis`, and as a reranker in `RAG` pipelines.
23
+
24
+ The model was trained on logical tasks to induce reasoning. LoRa adapters were used to fine-tune the model without destroying the previous knowledge.
25
+
26
+ LoRA parameters:
27
+ | | [gliclass‑modern‑base‑v3.0](https://huggingface.co/knowledgator/gliclass-modern-base-v3.0) | [gliclass‑modern‑large‑v3.0](https://huggingface.co/knowledgator/gliclass-modern-large-v3.0) | [gliclass‑base‑v3.0](https://huggingface.co/knowledgator/gliclass-base-v3.0) | [gliclass‑large‑v3.0](https://huggingface.co/knowledgator/gliclass-large-v3.0) |
28
+ |----------------------|---------------------------------|----------------------------------|--------------------------------|---------------------------------|
29
+ | LoRa r | 512 | 768 | 384 | 384 |
30
+ | LoRa α | 1024 | 1536 | 768 | 768 |
31
+ | focal loss α | 0.7 | 0.7 | 0.7 | 0.7 |
32
+ | Target modules | "Wqkv", "Wo", "Wi", "linear_1", "linear_2" | "Wqkv", "Wo", "Wi", "linear_1", "linear_2" | "query_proj", "key_proj", "value_proj", "dense", "linear_1", "linear_2", mlp.0", "mlp.2", "mlp.4" | "query_proj", "key_proj", "value_proj", "dense", "linear_1", "linear_2", mlp.0", "mlp.2", "mlp.4" |
33
+
34
+ GLiClass-V3 Models:
35
+ Model name | Size | Params | Average Banchmark | Average Inference Speed (batch size = 1, a6000, examples/s)
36
+ |----------|------|--------|-------------------|---------------------------------------------------------|
37
+ [gliclass‑edge‑v3.0](https://huggingface.co/knowledgator/gliclass‑edge‑v3.0)| 131 MB | 32.7M | 0.4873 | 97.29 |
38
+ [gliclass‑modern‑base‑v3.0](https://huggingface.co/knowledgator/gliclass-modern-base-v3.0)| 606 MB | 151M | 0.5571 | 54.46 |
39
+ [gliclass‑modern‑large‑v3.0](https://huggingface.co/knowledgator/gliclass-modern-large-v3.0)| 1.6 GB | 399M | 0.6082 | 43.80 |
40
+ [gliclass‑base‑v3.0](https://huggingface.co/knowledgator/gliclass-base-v3.0)| 746 MB | 187M | 0.6556 | 51.61 |
41
+ [gliclass‑large‑v3.0](https://huggingface.co/knowledgator/gliclass-large-v3.0)| 1.75 GB | 439M | 0.7001 | 25.22 |
42
+
43
+
44
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6405f62ba577649430be5124/MvfWyOdG824KWWB4Hy-dG.png)
45
+
46
+ ### How to use:
47
+ First of all, you need to install GLiClass library:
48
+ ```bash
49
+ pip install gliclass
50
+ pip install -U transformers>=4.48.0
51
+ ```
52
+
53
+ Then you need to initialize a model and a pipeline:
54
+ ```python
55
+ from gliclass import GLiClassModel, ZeroShotClassificationPipeline
56
+ from transformers import AutoTokenizer
57
+
58
+ model = GLiClassModel.from_pretrained("knowledgator/gliclass-base-v3.0")
59
+ tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-base-v3.0")
60
+ pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')
61
+
62
+ text = "One day I will see the world!"
63
+ labels = ["travel", "dreams", "sport", "science", "politics"]
64
+ results = pipeline(text, labels, threshold=0.5)[0] #because we have one text
65
+ for result in results:
66
+ print(result["label"], "=>", result["score"])
67
+ ```
68
+
69
+ If you want to use it for NLI type of tasks, we recommend representing your premise as a text and hypothesis as a label, you can put several hypotheses, but the model works best with a single input hypothesis.
70
+ ```python
71
+ # Initialize model and multi-label pipeline
72
+ text = "The cat slept on the windowsill all afternoon"
73
+ labels = ["The cat was awake and playing outside."]
74
+ results = pipeline(text, labels, threshold=0.0)[0]
75
+ print(results)
76
+ ```
77
+
78
+ ### Benchmarks:
79
+ Below, you can see the F1 score on several text classification datasets. All tested models were not fine-tuned on those datasets and were tested in a zero-shot setting.
80
+
81
+ GLiClass-V3:
82
+ | Dataset | [gliclass‑large‑v3.0](https://huggingface.co/knowledgator/gliclass-large-v3.0) | [gliclass‑base‑v3.0](https://huggingface.co/knowledgator/gliclass-base-v3.0) | [gliclass‑modern‑large‑v3.0](https://huggingface.co/knowledgator/gliclass-modern-large-v3.0) | [gliclass‑modern‑base‑v3.0](https://huggingface.co/knowledgator/gliclass-modern-base-v3.0) | [gliclass‑edge‑v3.0](https://huggingface.co/knowledgator/gliclass-edge-v3.0) |
83
+ |----------------------------|---------|---------|---------|---------|---------|
84
+ | CR | 0.9398 | 0.9127 | 0.8952 | 0.8902 | 0.8215 |
85
+ | sst2 | 0.9192 | 0.8959 | 0.9330 | 0.8959 | 0.8199 |
86
+ | sst5 | 0.4606 | 0.3376 | 0.4619 | 0.2756 | 0.2823 |
87
+ | 20_news_<br>groups | 0.5958 | 0.4759 | 0.3905 | 0.3433 | 0.2217 |
88
+ | spam | 0.7584 | 0.6760 | 0.5813 | 0.6398 | 0.5623 |
89
+ | financial_<br>phrasebank | 0.9000 | 0.8971 | 0.5929 | 0.4200 | 0.5004 |
90
+ | imdb | 0.9366 | 0.9251 | 0.9402 | 0.9158 | 0.8485 |
91
+ | ag_news | 0.7181 | 0.7279 | 0.7269 | 0.6663 | 0.6645 |
92
+ | emotion | 0.4506 | 0.4447 | 0.4517 | 0.4254 | 0.3851 |
93
+ | cap_sotu | 0.4589 | 0.4614 | 0.4072 | 0.3625 | 0.2583 |
94
+ | rotten_<br>tomatoes | 0.8411 | 0.7943 | 0.7664 | 0.7070 | 0.7024 |
95
+ | massive | 0.5649 | 0.5040 | 0.3905 | 0.3442 | 0.2414 |
96
+ | banking | 0.5574 | 0.4698 | 0.3683 | 0.3561 | 0.0272 |
97
+ | snips | 0.9692 | 0.9474 | 0.7707 | 0.5663 | 0.5257 |
98
+ | **AVERAGE** | **0.7193** | **0.6764** | **0.6197** | **0.5577** | **0.4900** |
99
+
100
+ Previous GLiClass models:
101
+ | Dataset | [gliclass‑large‑v1.0‑lw](https://huggingface.co/knowledgator/gliclass-large-v1.0-lw) | [gliclass‑base‑v1.0‑lw](https://huggingface.co/knowledgator/gliclass-base-v1.0-lw) | [gliclass‑modern‑large‑v2.0](https://huggingface.co/knowledgator/gliclass-modern-large-v2.0) | [gliclass‑modern‑base‑v2.0](https://huggingface.co/knowledgator/gliclass-modern-base-v2.0) |
102
+ |----------------------------|---------------------------------|--------------------------------|----------------------------------|---------------------------------|
103
+ | CR | 0.9226 | 0.9097 | 0.9154 | 0.8977 |
104
+ | sst2 | 0.9247 | 0.8987 | 0.9308 | 0.8524 |
105
+ | sst5 | 0.2891 | 0.3779 | 0.2152 | 0.2346 |
106
+ | 20_news_<br>groups | 0.4083 | 0.3953 | 0.3813 | 0.3857 |
107
+ | spam | 0.3642 | 0.5126 | 0.6603 | 0.4608 |
108
+ | financial_<br>phrasebank | 0.9044 | 0.8880 | 0.3152 | 0.3465 |
109
+ | imdb | 0.9429 | 0.9351 | 0.9449 | 0.9188 |
110
+ | ag_news | 0.7559 | 0.6985 | 0.6999 | 0.6836 |
111
+ | emotion | 0.3951 | 0.3516 | 0.4341 | 0.3926 |
112
+ | cap_sotu | 0.4749 | 0.4643 | 0.4095 | 0.3588 |
113
+ | rotten_<br>tomatoes | 0.8807 | 0.8429 | 0.7386 | 0.6066 |
114
+ | massive | 0.5606 | 0.4635 | 0.2394 | 0.3458 |
115
+ | banking | 0.3317 | 0.4396 | 0.1355 | 0.2907 |
116
+ | snips | 0.9707 | 0.9572 | 0.8468 | 0.7378 |
117
+ | **AVERAGE** | **0.6518** | **0.6525** | **0.5619** | **0.5366** |
118
+
119
+
120
+ Cross-Encoders:
121
+ | Dataset | [deberta‑v3‑large‑zeroshot‑v2.0](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0) | [deberta‑v3‑base‑zeroshot‑v2.0](https://huggingface.co/MoritzLaurer/deberta-v3-base-zeroshot-v2.0) | [roberta‑large‑zeroshot‑v2.0‑c](https://huggingface.co/MoritzLaurer/roberta-large-zeroshot-v2.0-c) | [comprehend_it‑base](https://huggingface.co/knowledgator/comprehend_it-base) |
122
+ |------------------------------------|--------|--------|--------|--------|
123
+ | CR | 0.9134 | 0.9051 | 0.9141 | 0.8936 |
124
+ | sst2 | 0.9272 | 0.9176 | 0.8573 | 0.9006 |
125
+ | sst5 | 0.3861 | 0.3848 | 0.4159 | 0.4140 |
126
+ | enron_<br>spam | 0.5970 | 0.4640 | 0.5040 | 0.3637 |
127
+ | financial_<br>phrasebank | 0.5820 | 0.6690 | 0.4550 | 0.4695 |
128
+ | imdb | 0.9180 | 0.8990 | 0.9040 | 0.4644 |
129
+ | ag_news | 0.7710 | 0.7420 | 0.7450 | 0.6016 |
130
+ | emotion | 0.4840 | 0.4950 | 0.4860 | 0.4165 |
131
+ | cap_sotu | 0.5020 | 0.4770 | 0.5230 | 0.3823 |
132
+ | rotten_<br>tomatoes | 0.8680 | 0.8600 | 0.8410 | 0.4728 |
133
+ | massive | 0.5180 | 0.5200 | 0.5200 | 0.3314 |
134
+ | banking77 | 0.5670 | 0.4460 | 0.2900 | 0.4972 |
135
+ | snips | 0.8340 | 0.7477 | 0.5430 | 0.7227 |
136
+ | **AVERAGE** | **0.6821** | **0.6559** | **0.6152** | **0.5331** |
137
+
138
+
139
+ Inference Speed:
140
+
141
+ Each model was tested on examples with 64, 256, and 512 tokens in text and 1, 2, 4, 8, 16, 32, 64, and 128 labels on an a6000 GPU. Then, scores were averaged across text lengths.
142
+
143
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6405f62ba577649430be5124/YipDUMZuIqL4f8mWl7IHt.png)
144
+
145
+ Model  Name / n samples per second per m labels | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | **Average** |
146
+ |---------------------|---|---|---|---|----|----|----|-----|---------|
147
+ | [gliclass‑edge‑v3.0](https://huggingface.co/knowledgator/gliclass-edge-v3.0) | 103.81 | 101.01 | 103.50 | 103.50 | 98.36 | 96.77 | 88.76 | 82.64 | **97.29** |
148
+ | [gliclass‑modern‑base‑v3.0](https://huggingface.co/knowledgator/gliclass-modern-base-v3.0) | 56.00 | 55.46 | 54.95 | 55.66 | 54.73 | 54.95 | 53.48 | 50.34 | **54.46** |
149
+ | [gliclass‑modern‑large‑v3.0](https://huggingface.co/knowledgator/gliclass-modern-large-v3.0) | 46.30 | 46.82 | 46.66 | 46.30 | 43.93 | 44.73 | 42.77 | 32.89 | **43.80** |
150
+ | [gliclass‑base‑v3.0](https://huggingface.co/knowledgator/gliclass-base-v3.0) | 49.42 | 50.25 | 40.05 | 57.69 | 57.14 | 56.39 | 55.97 | 45.94 | **51.61** |
151
+ | [gliclass‑large‑v3.0](https://huggingface.co/knowledgator/gliclass-large-v3.0) | 19.05 | 26.86 | 23.64 | 29.27 | 29.04 | 28.79 | 27.55 | 17.60 | **25.22** |
152
+ | [deberta‑v3‑base‑zeroshot‑v2.0](https://huggingface.co/MoritzLaurer/deberta-v3-base-zeroshot-v2.0) | 24.55 | 30.40 | 15.38 | 7.62 | 3.77 | 1.87 | 0.94 | 0.47 | **10.63** |
153
+ | [deberta‑v3‑large‑zeroshot‑v2.0](https://huggingface.co/MoritzLaurer/deberta-v3-large-zeroshot-v2.0) | 16.82 | 15.82 | 7.93 | 3.98 | 1.99 | 0.99 | 0.49 | 0.25 | **6.03** |
154
+ | [roberta‑large‑zeroshot‑v2.0‑c](https://huggingface.co/MoritzLaurer/roberta-large-zeroshot-v2.0-c) | 50.42 | 39.27 | 19.95 | 9.95 | 5.01 | 2.48 | 1.25 | 0.64 | **16.12** |
155
+ | [comprehend_it‑base](https://huggingface.co/knowledgator/comprehend_it-base) | 21.79 | 27.32 | 13.60 | 7.58 | 3.80 | 1.90 | 0.97 | 0.49 | **9.72** |
156
+
157
+ ## Citation
158
+ ```bibtex
159
+ @misc{stepanov2025gliclassgeneralistlightweightmodel,
160
+ title={GLiClass: Generalist Lightweight Model for Sequence Classification Tasks},
161
+ author={Ihor Stepanov and Mykhailo Shtopko and Dmytro Vodianytskyi and Oleksandr Lukashov and Alexander Yavorskyi and Mykyta Yaroshenko},
162
+ year={2025},
163
+ eprint={2508.07662},
164
+ archivePrefix={arXiv},
165
+ primaryClass={cs.LG},
166
+ url={https://arxiv.org/abs/2508.07662},
167
+ }
168
+ ```
added_tokens.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "<<LABEL>>": 128001,
3
+ "<<SEP>>": 128002,
4
+ "[MASK]": 128000
5
+ }
config.json ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "knowledgator/gliclass-base-v3.0",
3
+ "architecture_type": "uni-encoder",
4
+ "architectures": [
5
+ "GLiClassModel"
6
+ ],
7
+ "class_token_index": 128001,
8
+ "contrastive_loss_coef": 0.0,
9
+ "cross_encoder_config": null,
10
+ "embed_class_token": true,
11
+ "encoder_config": {
12
+ "_attn_implementation_autoset": false,
13
+ "_name_or_path": "microsoft/deberta-v3-base",
14
+ "add_cross_attention": false,
15
+ "architectures": null,
16
+ "attention_probs_dropout_prob": 0.1,
17
+ "bad_words_ids": null,
18
+ "begin_suppress_tokens": null,
19
+ "bos_token_id": null,
20
+ "chunk_size_feed_forward": 0,
21
+ "cross_attention_hidden_size": null,
22
+ "decoder_start_token_id": null,
23
+ "diversity_penalty": 0.0,
24
+ "do_sample": false,
25
+ "early_stopping": false,
26
+ "encoder_no_repeat_ngram_size": 0,
27
+ "eos_token_id": null,
28
+ "exponential_decay_length_penalty": null,
29
+ "finetuning_task": null,
30
+ "forced_bos_token_id": null,
31
+ "forced_eos_token_id": null,
32
+ "hidden_act": "gelu",
33
+ "hidden_dropout_prob": 0.1,
34
+ "hidden_size": 768,
35
+ "id2label": {
36
+ "0": "LABEL_0",
37
+ "1": "LABEL_1"
38
+ },
39
+ "initializer_range": 0.02,
40
+ "intermediate_size": 3072,
41
+ "is_decoder": false,
42
+ "is_encoder_decoder": false,
43
+ "label2id": {
44
+ "LABEL_0": 0,
45
+ "LABEL_1": 1
46
+ },
47
+ "layer_norm_eps": 1e-07,
48
+ "legacy": true,
49
+ "length_penalty": 1.0,
50
+ "max_length": 20,
51
+ "max_position_embeddings": 512,
52
+ "max_relative_positions": -1,
53
+ "min_length": 0,
54
+ "model_type": "deberta-v2",
55
+ "no_repeat_ngram_size": 0,
56
+ "norm_rel_ebd": "layer_norm",
57
+ "num_attention_heads": 12,
58
+ "num_beam_groups": 1,
59
+ "num_beams": 1,
60
+ "num_hidden_layers": 12,
61
+ "num_return_sequences": 1,
62
+ "output_attentions": false,
63
+ "output_hidden_states": false,
64
+ "output_scores": false,
65
+ "pad_token_id": 0,
66
+ "pooler_dropout": 0,
67
+ "pooler_hidden_act": "gelu",
68
+ "pooler_hidden_size": 768,
69
+ "pos_att_type": [
70
+ "p2c",
71
+ "c2p"
72
+ ],
73
+ "position_biased_input": false,
74
+ "position_buckets": 256,
75
+ "prefix": null,
76
+ "problem_type": null,
77
+ "pruned_heads": {},
78
+ "relative_attention": true,
79
+ "remove_invalid_values": false,
80
+ "repetition_penalty": 1.0,
81
+ "return_dict": true,
82
+ "return_dict_in_generate": false,
83
+ "sep_token_id": null,
84
+ "share_att_key": true,
85
+ "suppress_tokens": null,
86
+ "task_specific_params": null,
87
+ "temperature": 1.0,
88
+ "tf_legacy_loss": false,
89
+ "tie_encoder_decoder": false,
90
+ "tie_word_embeddings": true,
91
+ "tokenizer_class": null,
92
+ "top_k": 50,
93
+ "top_p": 1.0,
94
+ "torch_dtype": null,
95
+ "torchscript": false,
96
+ "type_vocab_size": 0,
97
+ "typical_p": 1.0,
98
+ "use_bfloat16": false,
99
+ "vocab_size": 128003
100
+ },
101
+ "encoder_layer_id": -1,
102
+ "encoder_model_name": "microsoft/deberta-v3-base",
103
+ "extract_text_features": false,
104
+ "focal_loss_alpha": -1,
105
+ "focal_loss_gamma": -1,
106
+ "focal_loss_reduction": null,
107
+ "hidden_size": 768,
108
+ "ignore_index": -100,
109
+ "initializer_range": 0.03,
110
+ "label_model_config": null,
111
+ "label_model_name": null,
112
+ "layer_wise": false,
113
+ "logit_scale_init_value": 2.6592,
114
+ "max_num_classes": 25,
115
+ "model_type": "GLiClass",
116
+ "normalize_features": false,
117
+ "pooling_strategy": "first",
118
+ "projector_hidden_act": "gelu",
119
+ "prompt_first": true,
120
+ "scorer_type": "mlp",
121
+ "squeeze_layers": false,
122
+ "text_token_index": 128002,
123
+ "torch_dtype": "float32",
124
+ "transformers_version": "4.48.2",
125
+ "use_lstm": false,
126
+ "vocab_size": 128003
127
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4f1043b82812a8f5ec7ff9f97c6744758f47836f9872ba0dbeb3a8e2b0b92a26
3
+ size 746211800
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "[SEP]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "[MASK]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "[PAD]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "[SEP]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": true,
3
+ "added_tokens_decoder": {
4
+ "0": {
5
+ "content": "[PAD]",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
+ "1": {
13
+ "content": "[CLS]",
14
+ "lstrip": false,
15
+ "normalized": false,
16
+ "rstrip": false,
17
+ "single_word": false,
18
+ "special": true
19
+ },
20
+ "2": {
21
+ "content": "[SEP]",
22
+ "lstrip": false,
23
+ "normalized": false,
24
+ "rstrip": false,
25
+ "single_word": false,
26
+ "special": true
27
+ },
28
+ "3": {
29
+ "content": "[UNK]",
30
+ "lstrip": false,
31
+ "normalized": true,
32
+ "rstrip": false,
33
+ "single_word": false,
34
+ "special": true
35
+ },
36
+ "128000": {
37
+ "content": "[MASK]",
38
+ "lstrip": false,
39
+ "normalized": false,
40
+ "rstrip": false,
41
+ "single_word": false,
42
+ "special": true
43
+ },
44
+ "128001": {
45
+ "content": "<<LABEL>>",
46
+ "lstrip": false,
47
+ "normalized": true,
48
+ "rstrip": false,
49
+ "single_word": false,
50
+ "special": false
51
+ },
52
+ "128002": {
53
+ "content": "<<SEP>>",
54
+ "lstrip": false,
55
+ "normalized": true,
56
+ "rstrip": false,
57
+ "single_word": false,
58
+ "special": false
59
+ }
60
+ },
61
+ "bos_token": "[CLS]",
62
+ "clean_up_tokenization_spaces": true,
63
+ "cls_token": "[CLS]",
64
+ "do_lower_case": false,
65
+ "eos_token": "[SEP]",
66
+ "extra_special_tokens": {},
67
+ "mask_token": "[MASK]",
68
+ "model_max_length": 1000000000000000019884624838656,
69
+ "pad_token": "[PAD]",
70
+ "sep_token": "[SEP]",
71
+ "sp_model_kwargs": {},
72
+ "split_by_punct": false,
73
+ "tokenizer_class": "DebertaV2Tokenizer",
74
+ "unk_token": "[UNK]",
75
+ "vocab_type": "spm"
76
+ }