lgsilvaesilva commited on
Commit
c3caebc
·
verified ·
1 Parent(s): bf381fd

Push model using huggingface_hub.

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ unigram.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,320 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - setfit
4
+ - sentence-transformers
5
+ - text-classification
6
+ - generated_from_setfit_trainer
7
+ widget:
8
+ - text: To monitor market dynamics and inform policy responses, the government will
9
+ track the retail value of ultra-processed foods and analyze shifts in consumption
10
+ in relation to labeling and advertising reforms. Data from these analyses will
11
+ feed annual dashboards that link labeling density, promotional intensity, and
12
+ dietary outcomes to guide targeted interventions and budget planning.
13
+ - text: the national agricultural plan is a national sectoral plan of grenada of 2015-2030.
14
+ its main goal is to stimulate economic growth in the agriculture sector through
15
+ the development of a well-coordinated planning and implementation framework that
16
+ is interactive and effective, and involve the full participation of the stakeholders,
17
+ and which promotes food security, income generation and poverty alleviation. in
18
+ the area of food security, the document aims to reduce dependence on food imports
19
+ and imported staples in particular and increase availability of local fresh and
20
+ fresh processed products; increase economic access to food by vulnerable persons
21
+ and their capacity to address their food and nutrition needs; and to improve the
22
+ health status and wellbeing of the grenadians through the consumption of nutritious
23
+ and safe foods. the plan also seeks to make agriculture, forestry and fisheries
24
+ more productive and sustainable. specifically, it envisions to build climate resilience
25
+ to avoid, prevent, or minimize climate change impacts on agriculture (including
26
+ forestry and fisheries), the environment and biodiversity; improve preparedness
27
+ for climate change impacts and extreme events; enhance the country’s response
28
+ capacity in case of extremes; facilitate recovery from impacts and extremes; and
29
+ reduce the impact of land based agriculture on climate change and the environment;
30
+ and preserve and optimize resources (land, sea, genetic). moreover, the document
31
+ aims to reduce rural poverty. in particular, it provides for making additional
32
+ investments in economic infrastructure for increased contribution of the agricultural
33
+ sector to economic growth, poverty alleviation and environmental sustainability.
34
+ further, the plan targets to increase exports of traditional crops, fish, fruits,
35
+ vegetables, root crops, minor spices, and value added products to international
36
+ and regional markets; increase production of targeted fruits, vegetables, root
37
+ crops, herbs and minor spices for targeted domestic markets; make additional investments
38
+ in institutional and human resource capacity development in the agricultural sector
39
+ to improve governance and efficiency; achieve greater collaboration in regional
40
+ and international trade for agricultural products; create framework for donor
41
+ and development partner coordination in providing support for the agriculture
42
+ sector; leverage opportunities in the tourism sector to strengthen the linkage
43
+ between agriculture and tourism; and invest in upgrading agricultural research
44
+ and development capacity. institutional responsibility for the implementation
45
+ of the plan is with the ministry of agriculture, lands, forestry, fisheries and
46
+ the environment. the minister will be obligated to report to the cabinet and parliament
47
+ on progress in the implementation of the plan. it is expected that the plan will
48
+ be incorporated into the national sustainable development plan 2030 (nsdp2030).
49
+ the ministry through the permanent secretary will be expected to report to the
50
+ monitoring committee of the nsdp2030 on a monthly basis on progress in implementation.
51
+ the reports to the cabinet will be submitted biannually.
52
+ - text: 'the seven key objectives are: 1. improve coordination in the sector to successfully
53
+ implement the fruit and vegetable strategy 2. improve market intelligence, promotion
54
+ and dissemination across the whole value chain 3. build a supply sub sector that
55
+ can guarantee consistent quality and supply of fresh fruit and vegetables 4. build
56
+ a sector that is well trained and supported by a comprehensive and properly executed
57
+ capability plan 5. improve financial situation of sector farmers and enterprises
58
+ 6. promote integrated management of resources to ensure sustainability of the
59
+ fruit and vegetable sector 7. strengthen samoa association for manufacturers and
60
+ exporters (same) to provide services that will increase returns and overall value
61
+ addition for sector'
62
+ - text: Trade facilitation should be aligned with nutrition security and rural development
63
+ by prioritizing critical food and input imports, harmonizing rules of origin with
64
+ neighboring economies, and strengthening transit corridors to support small producers.
65
+ Progress indicators include the ratio of food imports to merchandise imports and
66
+ the share of agricultural raw materials imports, alongside the incidence of firms
67
+ naming customs and trade regulations as top obstacles (6.6.3.3).
68
+ - text: 1. general objectives striving to be a developing country with modern industry
69
+ and high middle income by 2030; have a modern, competitive, effective and effective
70
+ management institution; the economy develops dynamically, quickly and sustainably,
71
+ independently and autonomously on the basis of science, technology and innovation
72
+ in association with improving efficiency in external activities and international
73
+ integration; arousing the aspiration to develop the country, promoting the creativity,
74
+ will and strength of the whole nation, building a prosperous, democratic, fair,
75
+ civilized, orderly, disciplined and safe society, ensuring a peaceful and happy
76
+ life of the people; constantly improve all aspects of people's lives; firmly protect
77
+ the fatherland, a peaceful and stable environment for national development; improve
78
+ vietnam's position and prestige in the international arena. striving to become
79
+ a developed and high-income country by 2045. 2. principal indicators a) regarding
80
+ the economy - the average growth rate of gross domestic product (gdp) is about
81
+ 7%/year; gdp per capita at current prices by 2030 will reach about 7,500 usd3.
82
+ - the proportion of the processing and manufacturing industry will reach about
83
+ 30% of gdp, and the digital economy will reach about 30% of gdp. - the urbanization
84
+ rate will reach over 50%. - the average total social investment will reach 33-35%
85
+ of gdp; public debt does not exceed 60% of gdp. - the contribution of total factor
86
+ productivity (tfp) to growth reached 50%. - the average growth rate of social
87
+ labor productivity will reach over 6.5%/year. - reduce energy consumption per
88
+ unit of gdp at 1-1.5%/year. b) regarding social - the human development index
89
+ (hdi) remained above 0.74. - the average life expectancy is 75 years, of which
90
+ the healthy life span is at least 68 years. - the percentage of trained workers
91
+ with degrees and certificates reaches 35-40%. - the proportion of agricultural
92
+ labor in the total social labor force will decrease to less than 20%. c) regarding
93
+ the environment - the forest cover rate is stable at 42%. - the rate of treatment
94
+ and reuse of wastewater into the river basin environment will reach over 70%.
95
+ - reduce greenhouse gas emissions by 9%5. - 100% of production and business establishments
96
+ meet environmental standards. - to increase the area of marine and coastal protected
97
+ areas to 3-5% of the natural area of national waters.
98
+ metrics:
99
+ - accuracy
100
+ pipeline_tag: text-classification
101
+ library_name: setfit
102
+ inference: false
103
+ base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
104
+ ---
105
+
106
+ # SetFit with sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
107
+
108
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) as the Sentence Transformer embedding model. A MultiOutputClassifier instance is used for classification.
109
+
110
+ The model has been trained using an efficient few-shot learning technique that involves:
111
+
112
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
113
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
114
+
115
+ ## Model Details
116
+
117
+ ### Model Description
118
+ - **Model Type:** SetFit
119
+ - **Sentence Transformer body:** [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)
120
+ - **Classification head:** a MultiOutputClassifier instance
121
+ - **Maximum Sequence Length:** 128 tokens
122
+ <!-- - **Number of Classes:** Unknown -->
123
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
124
+ <!-- - **Language:** Unknown -->
125
+ <!-- - **License:** Unknown -->
126
+
127
+ ### Model Sources
128
+
129
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
130
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
131
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
132
+
133
+ ## Uses
134
+
135
+ ### Direct Use for Inference
136
+
137
+ First install the SetFit library:
138
+
139
+ ```bash
140
+ pip install setfit
141
+ ```
142
+
143
+ Then you can load this model and run inference.
144
+
145
+ ```python
146
+ from setfit import SetFitModel
147
+
148
+ # Download from the 🤗 Hub
149
+ model = SetFitModel.from_pretrained("faodl/model_cca_multilabel_MiniLM-L12-v03")
150
+ # Run inference
151
+ preds = model("To monitor market dynamics and inform policy responses, the government will track the retail value of ultra-processed foods and analyze shifts in consumption in relation to labeling and advertising reforms. Data from these analyses will feed annual dashboards that link labeling density, promotional intensity, and dietary outcomes to guide targeted interventions and budget planning.")
152
+ ```
153
+
154
+ <!--
155
+ ### Downstream Use
156
+
157
+ *List how someone could finetune this model on their own dataset.*
158
+ -->
159
+
160
+ <!--
161
+ ### Out-of-Scope Use
162
+
163
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
164
+ -->
165
+
166
+ <!--
167
+ ## Bias, Risks and Limitations
168
+
169
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
170
+ -->
171
+
172
+ <!--
173
+ ### Recommendations
174
+
175
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
176
+ -->
177
+
178
+ ## Training Details
179
+
180
+ ### Training Set Metrics
181
+ | Training set | Min | Median | Max |
182
+ |:-------------|:----|:---------|:----|
183
+ | Word count | 1 | 123.6200 | 951 |
184
+
185
+ ### Training Hyperparameters
186
+ - batch_size: (32, 32)
187
+ - num_epochs: (2, 2)
188
+ - max_steps: -1
189
+ - sampling_strategy: oversampling
190
+ - num_iterations: 20
191
+ - body_learning_rate: (2e-05, 2e-05)
192
+ - head_learning_rate: 2e-05
193
+ - loss: CosineSimilarityLoss
194
+ - distance_metric: cosine_distance
195
+ - margin: 0.25
196
+ - end_to_end: False
197
+ - use_amp: False
198
+ - warmup_proportion: 0.1
199
+ - l2_weight: 0.01
200
+ - seed: 42
201
+ - eval_max_steps: -1
202
+ - load_best_model_at_end: False
203
+
204
+ ### Training Results
205
+ | Epoch | Step | Training Loss | Validation Loss |
206
+ |:------:|:----:|:-------------:|:---------------:|
207
+ | 0.0006 | 1 | 0.1914 | - |
208
+ | 0.0283 | 50 | 0.1948 | - |
209
+ | 0.0566 | 100 | 0.1824 | - |
210
+ | 0.0849 | 150 | 0.1661 | - |
211
+ | 0.1132 | 200 | 0.1523 | - |
212
+ | 0.1415 | 250 | 0.1383 | - |
213
+ | 0.1698 | 300 | 0.1368 | - |
214
+ | 0.1981 | 350 | 0.1267 | - |
215
+ | 0.2264 | 400 | 0.124 | - |
216
+ | 0.2547 | 450 | 0.127 | - |
217
+ | 0.2830 | 500 | 0.1201 | - |
218
+ | 0.3113 | 550 | 0.1206 | - |
219
+ | 0.3396 | 600 | 0.1153 | - |
220
+ | 0.3679 | 650 | 0.1105 | - |
221
+ | 0.3962 | 700 | 0.1071 | - |
222
+ | 0.4244 | 750 | 0.1067 | - |
223
+ | 0.4527 | 800 | 0.1037 | - |
224
+ | 0.4810 | 850 | 0.1072 | - |
225
+ | 0.5093 | 900 | 0.1076 | - |
226
+ | 0.5376 | 950 | 0.1072 | - |
227
+ | 0.5659 | 1000 | 0.0984 | - |
228
+ | 0.5942 | 1050 | 0.0972 | - |
229
+ | 0.6225 | 1100 | 0.1023 | - |
230
+ | 0.6508 | 1150 | 0.0993 | - |
231
+ | 0.6791 | 1200 | 0.0959 | - |
232
+ | 0.7074 | 1250 | 0.0989 | - |
233
+ | 0.7357 | 1300 | 0.0918 | - |
234
+ | 0.7640 | 1350 | 0.099 | - |
235
+ | 0.7923 | 1400 | 0.0924 | - |
236
+ | 0.8206 | 1450 | 0.0889 | - |
237
+ | 0.8489 | 1500 | 0.092 | - |
238
+ | 0.8772 | 1550 | 0.0908 | - |
239
+ | 0.9055 | 1600 | 0.0891 | - |
240
+ | 0.9338 | 1650 | 0.0876 | - |
241
+ | 0.9621 | 1700 | 0.0931 | - |
242
+ | 0.9904 | 1750 | 0.0798 | - |
243
+ | 1.0187 | 1800 | 0.0811 | - |
244
+ | 1.0470 | 1850 | 0.0785 | - |
245
+ | 1.0753 | 1900 | 0.0796 | - |
246
+ | 1.1036 | 1950 | 0.0849 | - |
247
+ | 1.1319 | 2000 | 0.0805 | - |
248
+ | 1.1602 | 2050 | 0.08 | - |
249
+ | 1.1885 | 2100 | 0.0776 | - |
250
+ | 1.2168 | 2150 | 0.0837 | - |
251
+ | 1.2450 | 2200 | 0.0793 | - |
252
+ | 1.2733 | 2250 | 0.0754 | - |
253
+ | 1.3016 | 2300 | 0.078 | - |
254
+ | 1.3299 | 2350 | 0.0796 | - |
255
+ | 1.3582 | 2400 | 0.0777 | - |
256
+ | 1.3865 | 2450 | 0.0787 | - |
257
+ | 1.4148 | 2500 | 0.0752 | - |
258
+ | 1.4431 | 2550 | 0.0775 | - |
259
+ | 1.4714 | 2600 | 0.0749 | - |
260
+ | 1.4997 | 2650 | 0.0722 | - |
261
+ | 1.5280 | 2700 | 0.0832 | - |
262
+ | 1.5563 | 2750 | 0.0738 | - |
263
+ | 1.5846 | 2800 | 0.0863 | - |
264
+ | 1.6129 | 2850 | 0.0754 | - |
265
+ | 1.6412 | 2900 | 0.0855 | - |
266
+ | 1.6695 | 2950 | 0.0767 | - |
267
+ | 1.6978 | 3000 | 0.081 | - |
268
+ | 1.7261 | 3050 | 0.075 | - |
269
+ | 1.7544 | 3100 | 0.0754 | - |
270
+ | 1.7827 | 3150 | 0.0689 | - |
271
+ | 1.8110 | 3200 | 0.0758 | - |
272
+ | 1.8393 | 3250 | 0.0734 | - |
273
+ | 1.8676 | 3300 | 0.0718 | - |
274
+ | 1.8959 | 3350 | 0.0784 | - |
275
+ | 1.9242 | 3400 | 0.0776 | - |
276
+ | 1.9525 | 3450 | 0.0773 | - |
277
+ | 1.9808 | 3500 | 0.071 | - |
278
+
279
+ ### Framework Versions
280
+ - Python: 3.12.12
281
+ - SetFit: 1.1.3
282
+ - Sentence Transformers: 5.1.1
283
+ - Transformers: 4.57.1
284
+ - PyTorch: 2.8.0+cu126
285
+ - Datasets: 4.0.0
286
+ - Tokenizers: 0.22.1
287
+
288
+ ## Citation
289
+
290
+ ### BibTeX
291
+ ```bibtex
292
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
293
+ doi = {10.48550/ARXIV.2209.11055},
294
+ url = {https://arxiv.org/abs/2209.11055},
295
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
296
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
297
+ title = {Efficient Few-Shot Learning Without Prompts},
298
+ publisher = {arXiv},
299
+ year = {2022},
300
+ copyright = {Creative Commons Attribution 4.0 International}
301
+ }
302
+ ```
303
+
304
+ <!--
305
+ ## Glossary
306
+
307
+ *Clearly define terms in order to be accessible across audiences.*
308
+ -->
309
+
310
+ <!--
311
+ ## Model Card Authors
312
+
313
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
314
+ -->
315
+
316
+ <!--
317
+ ## Model Card Contact
318
+
319
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
320
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "dtype": "float32",
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 384,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1536,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "transformers_version": "4.57.1",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 250037
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "5.1.1",
4
+ "transformers": "4.57.1",
5
+ "pytorch": "2.8.0+cu126"
6
+ },
7
+ "model_type": "SentenceTransformer",
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
config_setfit.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "normalize_embeddings": false,
3
+ "labels": null
4
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:865063db583e09a346250f1702afb0569a1efdbaaaf76724adb040817b655e71
3
+ size 470637416
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:90b3c4d1fba6930aed4ac2a931713f4d86fbcad1a2a30b56a01e2b5e7e00840e
3
+ size 329057
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cad551d5600a84242d0973327029452a1e3672ba6313c2a3c3d69c4310e12719
3
+ size 17082987
tokenizer_config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "<s>",
47
+ "do_lower_case": true,
48
+ "eos_token": "</s>",
49
+ "extra_special_tokens": {},
50
+ "mask_token": "<mask>",
51
+ "max_length": 128,
52
+ "model_max_length": 128,
53
+ "pad_to_multiple_of": null,
54
+ "pad_token": "<pad>",
55
+ "pad_token_type_id": 0,
56
+ "padding_side": "right",
57
+ "sep_token": "</s>",
58
+ "stride": 0,
59
+ "strip_accents": null,
60
+ "tokenize_chinese_chars": true,
61
+ "tokenizer_class": "BertTokenizer",
62
+ "truncation_side": "right",
63
+ "truncation_strategy": "longest_first",
64
+ "unk_token": "<unk>"
65
+ }
unigram.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da145b5e7700ae40f16691ec32a0b1fdc1ee3298db22a31ea55f57a966c4a65d
3
+ size 14763260