loganh274 commited on
Commit
c6eaaf8
·
verified ·
1 Parent(s): b56d4b5

Upload folder using huggingface_hub

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,248 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - setfit
4
+ - sentence-transformers
5
+ - text-classification
6
+ - generated_from_setfit_trainer
7
+ widget:
8
+ - text: "MARCH 2023 last update, how about another one!\r\nSo viewing this thread\
9
+ \ you have been looking at introducing this for well over a year, why is not ready\
10
+ \ yet? With customers suggesting the update over ten years ago.\r\nThis is a critical\
11
+ \ addition needed in the software. I am still in my trial period and will not\
12
+ \ be purchasing. Unfortunately I have invested many hours of time uploading debtors\
13
+ \ and creditors and customer contacts etc, and would never have bothered if I\
14
+ \ knew that you couldn't add multiple delivery addresses.\r\nI have a lot of customers\
15
+ \ who use multiple delivery addresses and need this information on their invoices.\
16
+ \ What's even worse than not being able to store multiple delivery addresses,\
17
+ \ is you allow the ability to edit a delivery address in a new invoice, BUT!!\
18
+ \ then you apply to all other previous invoices, WHY??. Every other accountancy\
19
+ \ software package has this functionality, its a basic must!!! why did I listen\
20
+ \ to my accountant, when he recommended this product."
21
+ - text: Thanks for the heads up. This saves us from having to use the reference field
22
+ for street addresses.
23
+ - text: An idea that has 1100 votes and was posted 2 years ago. it takes this long
24
+ to implement this?
25
+ - text: "Xero's attitude to this \"issue\" is nothing short of contemptuous to its\
26
+ \ clients. The feature (to add different delivery addresses) exists in the Purchase\
27
+ \ Order screen, so the code is already written - why can't this be applied to\
28
+ \ the invoice page too? \r\nWe all had an e-mail from the new CEO (Sukhinder Singh\
29
+ \ Cassidy) saying she wanted to be open to clients so I wrote to her, only to\
30
+ \ get a reply from minions with the usual platitudes fobbing off the questions\
31
+ \ - so much for a \"new broom\". \r\nXero spend a fortune on slick TV adverts,\
32
+ \ but can't even be bothered to look after the genuine concerns of its existing\
33
+ \ users. \r\nTime is running out for Xero as far as many users are concerned.\
34
+ \ "
35
+ - text: Thanks for the support Peter Kerly. I will at todays Xero roadshow in London
36
+ where I am going to find a Xero person to talk about this face to face.
37
+ metrics:
38
+ - accuracy
39
+ pipeline_tag: text-classification
40
+ library_name: setfit
41
+ inference: true
42
+ base_model: sentence-transformers/all-mpnet-base-v2
43
+ model-index:
44
+ - name: SetFit with sentence-transformers/all-mpnet-base-v2
45
+ results:
46
+ - task:
47
+ type: text-classification
48
+ name: Text Classification
49
+ dataset:
50
+ name: Unknown
51
+ type: unknown
52
+ split: test
53
+ metrics:
54
+ - type: accuracy
55
+ value: 0.6132075471698113
56
+ name: Accuracy
57
+ ---
58
+
59
+ # SetFit with sentence-transformers/all-mpnet-base-v2
60
+
61
+ This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Text Classification. This SetFit model uses [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2) as the Sentence Transformer embedding model. A [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance is used for classification.
62
+
63
+ The model has been trained using an efficient few-shot learning technique that involves:
64
+
65
+ 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
66
+ 2. Training a classification head with features from the fine-tuned Sentence Transformer.
67
+
68
+ ## Model Details
69
+
70
+ ### Model Description
71
+ - **Model Type:** SetFit
72
+ - **Sentence Transformer body:** [sentence-transformers/all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)
73
+ - **Classification head:** a [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) instance
74
+ - **Maximum Sequence Length:** 384 tokens
75
+ - **Number of Classes:** 5 classes
76
+ <!-- - **Training Dataset:** [Unknown](https://huggingface.co/datasets/unknown) -->
77
+ <!-- - **Language:** Unknown -->
78
+ <!-- - **License:** Unknown -->
79
+
80
+ ### Model Sources
81
+
82
+ - **Repository:** [SetFit on GitHub](https://github.com/huggingface/setfit)
83
+ - **Paper:** [Efficient Few-Shot Learning Without Prompts](https://arxiv.org/abs/2209.11055)
84
+ - **Blogpost:** [SetFit: Efficient Few-Shot Learning Without Prompts](https://huggingface.co/blog/setfit)
85
+
86
+ ### Model Labels
87
+ | Label | Examples |
88
+ |:------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
89
+ | 2 | <ul><li>"Hi Kelly,\r\n\r\nOur company has just moved over from Quickbooks Online as it was so limited, and I'm a little disappointed to see how limited Xero is with this feature.\r\n\r\nWe work with retailers such as Home Hardware and Ace Hardware, which have a vast amount of stores, but are billed centrally through their head office. Not being able to add different delivery addresses or even manually change address every time (as we had to do with QuickBooks) is slowing down our order processing process and is essential for us.\r\n\r\nPlease update us as soon as possible."</li><li>'We also deal with multiple contacts within a single company so we need to be able to send quotes to different people, invoices go again to different contacts and then there are multiple delivery addresses/contacts as well.\r\n\r\nFor us it is important to be able to manage multiple contacts for different functions within the same company.'</li><li>'Hi - can you tell me if this is possible in WorkFlow Max also? Thanks'</li></ul> |
90
+ | 4 | <ul><li>'Great Idea'</li><li>'So grateful this was implemented, works perfectly'</li><li>'Outstanding feature, exactly what the community asked for'</li></ul> |
91
+ | 0 | <ul><li>'This is a vital feature. Please add it urgently. I have several large companies with multiple addresses. A nightmare.'</li><li>'WAY TO GO XERO......\r\n\r\nInstead of fixing this issue, you have changed the way that you update client details. Instead of amending client postal address, physical address and phone number etc on the one screen, lets put in a process that takes three times as long as you have to go into three different screens to update this information.\r\n\r\nWhy do your programmers insist on creating more problems instead of fixing that ones that we have been asking for for over 10 years......\r\n\r\nSeriously, where is the thought process here??'</li><li>'Sorry it\'s all platitudes as we\'ve been waiting about 5 years for one simple idea to be implemented (but we need about 15 things - including this issue to be actually dealt with rather than just commented on that they "are looking at it").\r\n\r\nPlease do not waste your time and energy on Xero - unless you have a very simple business with single customers and simple sales, Xero is not up to it in any form.\r\n\r\nWe have started moving all 3 of our businesses over (all over £1m t/o) to QB as they offer the right balance. Yes there\'s a few niggles - but not basic ones like Xero.\r\n\r\nIf they stopped creating such stupid ads and stopped being Aus/NZ centric in their development processes, maybe they\'d take their customers in Europe a bit more seriously.\r\n\r\nBye bye Xero I won\'t miss you.'</li></ul> |
92
+ | 1 | <ul><li>"It's very basic and much needed features!"</li><li>"Need to be able to select from the different email addresses in contacts for sending quotes and invoices without them being ticked as 'include this email with emails'. Also, the ability to edit email addresses in quotes has disappeared.\r\nCurrently, the whole email needs to be deleted and re-typed. The old format used to allow you to simply edit the email address.\r\nie, the email address is Name@southerncrosscarecommercialgroup.com then you could simply edit the Name. We have clients with 100s of staff with high rotation therefore unable to add all emails as a contact."</li><li>'As a work around we set up our client with fourteen different delivery addresses for the fourteen different branches that we supply - they are now asking us to send them one statement. Really, really need this to happen ... '</li></ul> |
93
+ | 3 | <ul><li>"Much appreciated update. It makes the statement run much cleaner now that we don't have split accounts."</li><li>'Appreciate the transparency on the roadmap'</li><li>'This is helpful. We can finally differentiate between the billing address and the site location on the quote.'</li></ul> |
94
+
95
+ ## Evaluation
96
+
97
+ ### Metrics
98
+ | Label | Accuracy |
99
+ |:--------|:---------|
100
+ | **all** | 0.6132 |
101
+
102
+ ## Uses
103
+
104
+ ### Direct Use for Inference
105
+
106
+ First install the SetFit library:
107
+
108
+ ```bash
109
+ pip install setfit
110
+ ```
111
+
112
+ Then you can load this model and run inference.
113
+
114
+ ```python
115
+ from setfit import SetFitModel
116
+
117
+ # Download from the 🤗 Hub
118
+ model = SetFitModel.from_pretrained("setfit_model_id")
119
+ # Run inference
120
+ preds = model("An idea that has 1100 votes and was posted 2 years ago. it takes this long to implement this?")
121
+ ```
122
+
123
+ <!--
124
+ ### Downstream Use
125
+
126
+ *List how someone could finetune this model on their own dataset.*
127
+ -->
128
+
129
+ <!--
130
+ ### Out-of-Scope Use
131
+
132
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
133
+ -->
134
+
135
+ <!--
136
+ ## Bias, Risks and Limitations
137
+
138
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
139
+ -->
140
+
141
+ <!--
142
+ ### Recommendations
143
+
144
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
145
+ -->
146
+
147
+ ## Training Details
148
+
149
+ ### Training Set Metrics
150
+ | Training set | Min | Median | Max |
151
+ |:-------------|:----|:-------|:----|
152
+ | Word count | 2 | 48.3 | 687 |
153
+
154
+ | Label | Training Sample Count |
155
+ |:------|:----------------------|
156
+ | 0 | 76 |
157
+ | 1 | 143 |
158
+ | 2 | 113 |
159
+ | 3 | 46 |
160
+ | 4 | 42 |
161
+
162
+ ### Training Hyperparameters
163
+ - batch_size: (16, 16)
164
+ - num_epochs: (1, 1)
165
+ - max_steps: -1
166
+ - sampling_strategy: oversampling
167
+ - num_iterations: 20
168
+ - body_learning_rate: (2e-05, 1e-05)
169
+ - head_learning_rate: 0.01
170
+ - loss: CosineSimilarityLoss
171
+ - distance_metric: cosine_distance
172
+ - margin: 0.25
173
+ - end_to_end: False
174
+ - use_amp: False
175
+ - warmup_proportion: 0.1
176
+ - l2_weight: 0.01
177
+ - seed: 42
178
+ - eval_max_steps: -1
179
+ - load_best_model_at_end: True
180
+
181
+ ### Training Results
182
+ | Epoch | Step | Training Loss | Validation Loss |
183
+ |:------:|:----:|:-------------:|:---------------:|
184
+ | 0.0010 | 1 | 0.3746 | - |
185
+ | 0.0476 | 50 | 0.2653 | - |
186
+ | 0.0952 | 100 | 0.2246 | - |
187
+ | 0.1429 | 150 | 0.1862 | - |
188
+ | 0.1905 | 200 | 0.1329 | - |
189
+ | 0.2381 | 250 | 0.0926 | - |
190
+ | 0.2857 | 300 | 0.0906 | - |
191
+ | 0.3333 | 350 | 0.0912 | - |
192
+ | 0.3810 | 400 | 0.0622 | - |
193
+ | 0.4286 | 450 | 0.0448 | - |
194
+ | 0.4762 | 500 | 0.0252 | - |
195
+ | 0.5238 | 550 | 0.0166 | - |
196
+ | 0.5714 | 600 | 0.013 | - |
197
+ | 0.6190 | 650 | 0.0063 | - |
198
+ | 0.6667 | 700 | 0.0085 | - |
199
+ | 0.7143 | 750 | 0.0038 | - |
200
+ | 0.7619 | 800 | 0.0044 | - |
201
+ | 0.8095 | 850 | 0.003 | - |
202
+ | 0.8571 | 900 | 0.0046 | - |
203
+ | 0.9048 | 950 | 0.0018 | - |
204
+ | 0.9524 | 1000 | 0.0013 | - |
205
+ | 1.0 | 1050 | 0.0012 | 0.2680 |
206
+
207
+ ### Framework Versions
208
+ - Python: 3.11.9
209
+ - SetFit: 1.1.3
210
+ - Sentence Transformers: 5.2.0
211
+ - Transformers: 4.57.3
212
+ - PyTorch: 2.7.1+cu118
213
+ - Datasets: 4.4.2
214
+ - Tokenizers: 0.22.2
215
+
216
+ ## Citation
217
+
218
+ ### BibTeX
219
+ ```bibtex
220
+ @article{https://doi.org/10.48550/arxiv.2209.11055,
221
+ doi = {10.48550/ARXIV.2209.11055},
222
+ url = {https://arxiv.org/abs/2209.11055},
223
+ author = {Tunstall, Lewis and Reimers, Nils and Jo, Unso Eun Seo and Bates, Luke and Korat, Daniel and Wasserblat, Moshe and Pereg, Oren},
224
+ keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
225
+ title = {Efficient Few-Shot Learning Without Prompts},
226
+ publisher = {arXiv},
227
+ year = {2022},
228
+ copyright = {Creative Commons Attribution 4.0 International}
229
+ }
230
+ ```
231
+
232
+ <!--
233
+ ## Glossary
234
+
235
+ *Clearly define terms in order to be accessible across audiences.*
236
+ -->
237
+
238
+ <!--
239
+ ## Model Card Authors
240
+
241
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
242
+ -->
243
+
244
+ <!--
245
+ ## Model Card Contact
246
+
247
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
248
+ -->
config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "MPNetModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "dtype": "float32",
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 514,
16
+ "model_type": "mpnet",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 1,
20
+ "relative_attention_num_buckets": 32,
21
+ "transformers_version": "4.57.3",
22
+ "vocab_size": 30527
23
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "5.2.0",
4
+ "transformers": "4.57.3",
5
+ "pytorch": "2.7.1+cu118"
6
+ },
7
+ "model_type": "SentenceTransformer",
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
config_setfit.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels": [
3
+ 0,
4
+ 1,
5
+ 2,
6
+ 3,
7
+ 4
8
+ ],
9
+ "normalize_embeddings": false
10
+ }
confusion_matrix.png ADDED
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a8b47f7443c9a5404891b97f2a35ff5ef9f67b2523cda9ac07962b6cc2774399
3
+ size 437967672
model_head.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad738b1706cf01b4bbddaa1db1170a89be0e425cb16a9e10b7bd2e1f27771424
3
+ size 31647
model_metadata.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "labels": [
3
+ 0,
4
+ 1,
5
+ 2,
6
+ 3,
7
+ 4
8
+ ],
9
+ "environment": {
10
+ "python": "3.11.9 (tags/v3.11.9:de54cf5, Apr 2 2024, 10:12:12) [MSC v.1938 64 bit (AMD64)]",
11
+ "scikit-learn": "1.8.0",
12
+ "setfit": "1.1.3",
13
+ "torch": "2.7.1+cu118"
14
+ },
15
+ "base_model": "sentence-transformers/all-mpnet-base-v2",
16
+ "serialization": "safetensors"
17
+ }
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 384,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "[UNK]",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "104": {
36
+ "content": "[UNK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30526": {
44
+ "content": "<mask>",
45
+ "lstrip": true,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ }
51
+ },
52
+ "bos_token": "<s>",
53
+ "clean_up_tokenization_spaces": false,
54
+ "cls_token": "<s>",
55
+ "do_lower_case": true,
56
+ "eos_token": "</s>",
57
+ "extra_special_tokens": {},
58
+ "mask_token": "<mask>",
59
+ "max_length": 128,
60
+ "model_max_length": 384,
61
+ "pad_to_multiple_of": null,
62
+ "pad_token": "<pad>",
63
+ "pad_token_type_id": 0,
64
+ "padding_side": "right",
65
+ "sep_token": "</s>",
66
+ "stride": 0,
67
+ "strip_accents": null,
68
+ "tokenize_chinese_chars": true,
69
+ "tokenizer_class": "MPNetTokenizer",
70
+ "truncation_side": "right",
71
+ "truncation_strategy": "longest_first",
72
+ "unk_token": "[UNK]"
73
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff