AlexBayer commited on
Commit
793cd10
·
verified ·
1 Parent(s): dbe1568

Add SetFit model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,302 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - setfit
4
+ - sentence-transformers
5
+ - text-classification
6
+ - generated_from_setfit_trainer
7
+ widget:
8
+ - text: ethiopia flood jul 2010 flood event lasted unknown ercs branch located north
9
+ east country reported 4 000 family affected flood 2 221 displaced temporarily
10
+ sheltered public school building 3 206 family reported affected flood 1 565 displaced
11
+ amhara region report indicated 800 family affected displaced flood afar region
12
+ total number affected family reported field 9 000 however number affected people
13
+ increasing due continuous torrential rain part country recently 5 000 family reported
14
+ displaced amhara tigrey afar region far due flooding occurred 22 24 august 2010
15
+ ercs icrc joint assessment tigrey amhara report ambasel tewlerda woredas south
16
+ wollo approx 1 368 hectare land crop flooded damaged hail storm based assessment
17
+ report approximately 3 745 hectare agricultural land flooded last week several
18
+ landslide reported field including 22 august 2010 mersa worgessa word north wollo
19
+ causing injury 19 death 5 people ifrc sep 2010 ethiopia
20
+ - text: malaysia flood nov 2024 flood event lasted unknown end november 2024 malaysia
21
+ experienced heavy rainfall attributed northeast monsoon resulting escalating flooding
22
+ across nine state kelantan terengganu kedah pahang negeri sembilan johor perak
23
+ melaka perlis heavy rain caused significant damage livelihood house livestock
24
+ severely impacting affected community 2 december 2024 national disaster management
25
+ agency nadma reported approximately 137 410 people affected ongoing flood across
26
+ multiple area malaysia deputy prime minister informed medium year flooding worst
27
+ since 2014 kelantan terengganu particularly badly affected since 27 november total
28
+ 633 temporary shelter center opened accommodate 40 922 family displaced flood
29
+ disaster claimed five life kelantan terengganu confirmed department social welfare
30
+ jkm ministry agriculture food security reported malaysia suffered approximately
31
+ chf 1 79 million loss due destruction rice paddy plantation caused flood significant
32
+ damage forced country increase reliance imported rice meet domestic need overall
33
+ malaysian agriculture sector face total estimated loss chf 3 77 million due disaster
34
+ malaysian meteorological department met malaysia forecasted continued adverse
35
+ weather condition including thunderstorm heavy rain strong wind across peninsular
36
+ malaysia 6 9 december 2024 condition expected exacerbate ongoing flooding increasing
37
+ number affected individual intensifying challenge emergency response recovery
38
+ effort persistent heavy rainfall already caused river water level surpass designated
39
+ danger threshold posing severe risk river overflow could inundate surrounding
40
+ area relentless rainfall caused extensive damage home also critical infrastructure
41
+ road airport railway particularly east coast state severely affected cutting intercity
42
+ connection complicating relief effort combined impact flood landslide underscore
43
+ urgent need enhanced mitigation measure coordinated response strategy ifrc 08
44
+ dec 2024 peninsular malaysia including johor kelantan pahang perak terengganu
45
+ state continues experience heavy rainfall consequent flood resulted displacement
46
+ damage according asean disaster information network adinet past day 6 517 people
47
+ displaced 44 evacuation centre across aforementioned state echo 12 dec 2024 4
48
+ january 2025 malaysia still grappling severe flooding caused ongoing northeast
49
+ monsoon began november 2024 expected persist march 2025 eastern coastal state
50
+ kelantan terengganu pahang johor hardest hit heavy rainfall leading widespread
51
+ flooding displacement significant disruption daily life metmalaysia forecast additional
52
+ five seven episode heavy rainfall monsoon season signalling situation may continue
53
+ several month flood caused substantial damage home infrastructure livelihood road
54
+ airport railway particularly affected east coast state disrupted intercity connectivity
55
+ hampered relief effort landslide compounded crisis underscoring need stronger
56
+ disaster mitigation response strategy additionally ministry agriculture food security
57
+ reported approximately chf 1 79 million loss due destruction rice paddy plantation
58
+ exacerbating economic impact affected community flood affected nine state across
59
+ malaysia including kelantan terengganu kedah pahang negeri sembilan johor perak
60
+ melaka perlis satellite imagery unosat show terengganu kelantan kedah severely
61
+ impacted floodwaters initially covering approximately 11 000 km terengganu kelantan
62
+ affecting 120 000 people kedah flood impacted 1 3 million people across 268 km
63
+ significant damage cropland persists even water begin recede ifrc 9 jan 2025 heavy
64
+ rainfall affecting peninsular malaysia since 10 january causing flood resulted
65
+ population displacement damage according asean disaster information network adinet
66
+ report 12 january 3 844 people displaced 38 evacuation centre 3 779 people johor
67
+ 34 perak 31 terengganu state southern peninsular malaysia echo 13 jan 2025 past
68
+ day sabah sarawak state located malaysian borneo experiencing heavy rainfall flood
69
+ resulted casualty damage according medium least five people died 7 500 people
70
+ evacuated 5 385 sarawak 2 240 people affected sabah state echo 30 jan 2025 severe
71
+ monsoon flood continue devastate sabah sarawak displacing thousand causing widespread
72
+ disruption since 28 january 2025 continuous heavy rainfall compounded high tide
73
+ northeast monsoon led rising water level road inundation landslide sarawak situation
74
+ worsened due collision extreme monsoon rain high tide triggering large scale evacuation
75
+ activation multiple relief center 31 january 2025 12 486 evacuee 3 648 family
76
+ relocated 62 temporary relief center pps sarawak bintulu remains severely impacted
77
+ district sheltering 5 885 evacuee 1 649 family followed serian 2 307 evacuee 709
78
+ family samarahan 2 005 evacuee 670 family significantly affected district include
79
+ sibu 1 163 evacuee 293 family miri 650 evacuee 172 family kuching 475 evacuee
80
+ 153 family single evacuee recorded mukah miri continuous heavy rainfall triggered
81
+ major landslide resulting tragic loss five life ifrc 1 feb 2025 according nadma
82
+ flooding landslide sabah sarawak resulted 5 fatality miri district report 3 february
83
+ 1500 hr utc 7 2 9k family 9 7k person remain displaced across 50 evacuation center
84
+ sarawak bintulu serian miri sibu samarahan mukah sabah tongod kinabatangan aha
85
+ centre 3 feb 2025 heavy rainfall continued affect eastern malaysia malaysian part
86
+ borneo island since 29 january causing flood landslide resulted casualty damage
87
+ according international federation red cross ifrc 4 february death toll stand
88
+ five fatality ifrc also report nearly 12 500 evacuated people 62 temporary relief
89
+ center across sarawak state addition around 5 200 evacuated people 33 temporary
90
+ relief center reported across sabah state ifrc 4 feb 2025 malaysia
91
+ - text: paraguay flood apr 2015 flood event lasted unknown 4 apr 2015 severe storm
92
+ hit several town department concepcin northern paraguay affecting house crop farm
93
+ animal authority estimate 5 000 people affected begun response providing roofing
94
+ material food medical attention ocha processing application emergency fund support
95
+ authority response ocha 13 apr 2015 per request paraguayan government usaid channeled
96
+ 50 000 adra support response govt 15 apr 2015 may 2015 heavy rain caused overflowing
97
+ several river affected community asuncion central department according weather
98
+ expert amount rain atypical although intensity volume short time 3 000 family
99
+ affected district ypan villeta ypacara luque mariano roque alonso villa hayes
100
+ capiat limpio yaguarn ocha 11 may 2015 june 2015 national emergency agency sen
101
+ reported around 9 602 family 48 000 people affected flooding paraguay river asuncion
102
+ 6 000 family received assistance sen coordinate action asuncion municipal council
103
+ emergency disaster paho 16 jun 2015 early july number affected family 32 000 23
104
+ 000 received assistance department hit flood alto paraguay boquern presidente
105
+ hayes concepcin san pedro cordillera central guair caazap misiones eembuc government
106
+ paraguay 6 jul 2015 end july 2015 nearly 35 000 people affected flooding heavy
107
+ rain week stay shelter total 6 987 family asuncion shelter paho 24 jul 2015 last
108
+ week august heavy rain strong wind hail left 900 house affected department paraguar
109
+ san pedro cordillera central gov paraguay 28 aug 2015 paraguay
110
+ - text: viet nam storm rai storm surge viet nam storm rai event lasted unknown afternoon
111
+ december 16 storm rai got stronger became super typhoon 19h 16 12 center super
112
+ typhoon central philippine wind level 16 gust level 17 moving northwest direction
113
+ speed 25 30km h past 6 hour intensity storm decreased one level longer level super
114
+ typhoon 01 17 12 center storm right central philippine wind level 15 gust level
115
+ 17 philippine mobilized 54 response team evacuate 198 000 people prepared 26 million
116
+ 414 000 food package respond storm currently human damage recorded viet nam
117
+ - text: occupied palestinian territory cold wave dec 2013 cold wave event lasted unknown
118
+ announced heavy rain fall snow storm hit west bank gaza 10 december 2013 still
119
+ affecting palestinian population west bank palestine heavy rain snow generated
120
+ flood several part palestine thousand family evacuated house extreme weather condition
121
+ also caused several death including baby gaza reported dead family home inundated
122
+ ifrc 16 dec 2013 useful link ocha opt winter storm online system palestinian red
123
+ crescent society occupied palestinian territory
124
+ metrics:
125
+ - accuracy
126
+ pipeline_tag: text-classification
127
+ library_name: setfit
128
+ inference: false
129
+ base_model: avsolatorio/GIST-Embedding-v0
130
+ model-index:
131
+ - name: SetFit with avsolatorio/GIST-Embedding-v0
132
+ results:
133
+ - task:
134
+ type: text-classification
135
+ name: Text Classification
136
+ dataset:
137
+ name: Unknown
138
+ type: unknown
139
+ split: test
140
+ metrics:
141
+ - type: accuracy
142
+ value: 0.5272727272727272
143
+ name: Accuracy
144
+ ---
145
+
146
+ # SetFit with avsolatorio/GIST-Embedding-v0
147
+
148
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [avsolatorio/GIST-Embedding-v0](https://huggingface.co/avsolatorio/GIST-Embedding-v0) as the Sentence Transformer embedding model. A [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance is used for classification.
149
+
150
+ The model has been trained using an efficient few-shot learning technique that involves:
151
+
152
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
153
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
154
+
155
+ ## Model Details
156
+
157
+ ### Model Description
158
+ - **Model Type:** SetFit
159
+ - **Sentence Transformer body:** [avsolatorio/GIST-Embedding-v0](https://huggingface.co/avsolatorio/GIST-Embedding-v0)
160
+ - **Classification head:** a [SetFitHead](huggingface.co/docs/setfit/reference/main#setfit.SetFitHead) instance
161
+ - **Maximum Sequence Length:** 512 tokens
162
+ <!-- - **Number of Classes:** Unknown -->
163
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
164
+ <!-- - **Language:** Unknown -->
165
+ <!-- - **License:** Unknown -->
166
+
167
+ ### Model Sources
168
+
169
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
170
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
171
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
172
+
173
+ ## Evaluation
174
+
175
+ ### Metrics
176
+ | Label | Accuracy |
177
+ |:--------|:---------|
178
+ | **all** | 0.5273 |
179
+
180
+ ## Uses
181
+
182
+ ### Direct Use for Inference
183
+
184
+ First install the SetFit library:
185
+
186
+ ```bash
187
+ pip install setfit
188
+ ```
189
+
190
+ Then you can load this model and run inference.
191
+
192
+ ```python
193
+ from setfit import SetFitModel
194
+
195
+ # Download from the 🤗 Hub
196
+ model = SetFitModel.from_pretrained("AlexBayer/GIST_SetFit_HIPs_v1")
197
+ # Run inference
198
+ preds = model("occupied palestinian territory cold wave dec 2013 cold wave event lasted unknown announced heavy rain fall snow storm hit west bank gaza 10 december 2013 still affecting palestinian population west bank palestine heavy rain snow generated flood several part palestine thousand family evacuated house extreme weather condition also caused several death including baby gaza reported dead family home inundated ifrc 16 dec 2013 useful link ocha opt winter storm online system palestinian red crescent society occupied palestinian territory")
199
+ ```
200
+
201
+ <!--
202
+ ### Downstream Use
203
+
204
+ *List how someone could finetune this model on their own dataset.*
205
+ -->
206
+
207
+ <!--
208
+ ### Out-of-Scope Use
209
+
210
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
211
+ -->
212
+
213
+ <!--
214
+ ## Bias, Risks and Limitations
215
+
216
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
217
+ -->
218
+
219
+ <!--
220
+ ### Recommendations
221
+
222
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
223
+ -->
224
+
225
+ ## Training Details
226
+
227
+ ### Training Set Metrics
228
+ | Training set | Min | Median | Max |
229
+ |:-------------|:----|:---------|:-----|
230
+ | Word count | 34 | 319.4125 | 2470 |
231
+
232
+ ### Training Hyperparameters
233
+ - batch_size: (16, 2)
234
+ - num_epochs: (1, 16)
235
+ - max_steps: -1
236
+ - sampling_strategy: undersampling
237
+ - body_learning_rate: (2.658312040445757e-05, 2.395494892138971e-05)
238
+ - head_learning_rate: 0.01770304761261279
239
+ - loss: CosineSimilarityLoss
240
+ - distance_metric: cosine_distance
241
+ - margin: 0.25
242
+ - end_to_end: True
243
+ - use_amp: True
244
+ - warmup_proportion: 0.1
245
+ - l2_weight: 0.05
246
+ - max_length: 512
247
+ - seed: 42
248
+ - eval_max_steps: -1
249
+ - load_best_model_at_end: False
250
+
251
+ ### Training Results
252
+ | Epoch | Step | Training Loss | Validation Loss |
253
+ |:------:|:----:|:-------------:|:---------------:|
254
+ | 0.1534 | 25 | 0.2424 | - |
255
+ | 0.3067 | 50 | 0.1673 | - |
256
+ | 0.4601 | 75 | 0.1422 | - |
257
+ | 0.6135 | 100 | 0.1242 | - |
258
+ | 0.7669 | 125 | 0.1148 | - |
259
+ | 0.9202 | 150 | 0.096 | - |
260
+
261
+ ### Framework Versions
262
+ - Python: 3.11.12
263
+ - SetFit: 1.1.2
264
+ - Sentence Transformers: 3.4.1
265
+ - Transformers: 4.51.3
266
+ - PyTorch: 2.6.0+cu124
267
+ - Datasets: 3.5.1
268
+ - Tokenizers: 0.21.1
269
+
270
+ ## Citation
271
+
272
+ ### BibTeX
273
+ ```bibtex
274
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
275
+ doi = {10.48550/ARXIV.2209.11055},
276
+ url = {https://arxiv.org/abs/2209.11055},
277
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
278
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
279
+ title = {Efficient Few-Shot Learning Without Prompts},
280
+ publisher = {arXiv},
281
+ year = {2022},
282
+ copyright = {Creative Commons Attribution 4.0 International}
283
+ }
284
+ ```
285
+
286
+ <!--
287
+ ## Glossary
288
+
289
+ *Clearly define terms in order to be accessible across audiences.*
290
+ -->
291
+
292
+ <!--
293
+ ## Model Card Authors
294
+
295
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
296
+ -->
297
+
298
+ <!--
299
+ ## Model Card Contact
300
+
301
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
302
+ -->
config.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "LABEL_0"
13
+ },
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 3072,
16
+ "label2id": {
17
+ "LABEL_0": 0
18
+ },
19
+ "layer_norm_eps": 1e-12,
20
+ "max_position_embeddings": 512,
21
+ "model_type": "bert",
22
+ "num_attention_heads": 12,
23
+ "num_hidden_layers": 12,
24
+ "pad_token_id": 0,
25
+ "position_embedding_type": "absolute",
26
+ "torch_dtype": "float32",
27
+ "transformers_version": "4.51.3",
28
+ "type_vocab_size": 2,
29
+ "use_cache": true,
30
+ "vocab_size": 30522
31
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.51.3",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
config_setfit.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "labels": null,
3
+ "normalize_embeddings": false
4
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:642c5e314d0044d60685836bc30b40cfe77c43b4d0c979dd6a00861c2d02cbdc
3
+ size 437951328
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3fb62f243a9f75c09e6777f9cdaf63a577d0a6d9ead34fac6e46fbb7f9fe830f
3
+ size 44235
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": true
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff