Upload folder using huggingface_hub
Browse files- .gitattributes +9 -27
- README.md +130 -32
- config.json +1 -5
- model.safetensors +2 -2
- pytorch_model.bin +3 -0
- special_tokens_map.json +1 -110
- tf_model.h5 +3 -0
- tokenizer_config.json +1 -966
.gitattributes
CHANGED
|
@@ -1,35 +1,17 @@
|
|
| 1 |
-
*.
|
| 2 |
-
*.
|
| 3 |
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
-
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
-
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
-
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
-
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
*.h5 filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
-
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
-
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
-
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
-
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
-
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
-
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
-
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
-
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
-
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
-
|
| 25 |
-
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
-
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
-
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
-
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
-
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
-
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
-
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
-
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
-
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
-
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
-
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
| 1 |
+
*.bin.* filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 3 |
*.bin filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.tar.gz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 11 |
*.joblib filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
| 12 |
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
*.pb filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
| 15 |
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 16 |
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
model.safetensors filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -1,51 +1,149 @@
|
|
| 1 |
---
|
| 2 |
-
|
|
|
|
| 3 |
tags:
|
| 4 |
- summarization
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
model-index:
|
| 7 |
-
- name:
|
| 8 |
-
results:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
---
|
| 10 |
|
| 11 |
-
|
| 12 |
-
should probably proofread and complete it, then remove this comment. -->
|
| 13 |
|
| 14 |
-
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
##
|
|
|
|
| 19 |
|
| 20 |
-
|
|
|
|
| 21 |
|
| 22 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
-
More information needed
|
| 25 |
|
| 26 |
-
#
|
|
|
|
| 27 |
|
| 28 |
-
|
|
|
|
|
|
|
| 29 |
|
| 30 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
-
#
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
-
|
| 39 |
-
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
-
### Framework versions
|
| 47 |
|
| 48 |
-
- Transformers 4.39.3
|
| 49 |
-
- Pytorch 2.2.2+cu121
|
| 50 |
-
- Datasets 2.18.0
|
| 51 |
-
- Tokenizers 0.15.2
|
|
|
|
| 1 |
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
tags:
|
| 5 |
- summarization
|
| 6 |
+
datasets:
|
| 7 |
+
- xsum
|
| 8 |
+
metrics:
|
| 9 |
+
- rouge
|
| 10 |
+
widget:
|
| 11 |
+
- text: National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed
|
| 12 |
+
to buy rival Samba Financial Group for $15 billion in the biggest banking takeover
|
| 13 |
+
this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to
|
| 14 |
+
a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer
|
| 15 |
+
0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio
|
| 16 |
+
the banks set when they signed an initial framework agreement in June.The offer
|
| 17 |
+
is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24%
|
| 18 |
+
higher than the level the shares traded at before the talks were made public.
|
| 19 |
+
Bloomberg News first reported the merger discussions.The new bank will have total
|
| 20 |
+
assets of more than $220 billion, creating the Gulf region’s third-largest lender.
|
| 21 |
+
The entity’s $46 billion market capitalization nearly matches that of Qatar National
|
| 22 |
+
Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion
|
| 23 |
+
of assets.
|
| 24 |
model-index:
|
| 25 |
+
- name: human-centered-summarization/financial-summarization-pegasus
|
| 26 |
+
results:
|
| 27 |
+
- task:
|
| 28 |
+
type: summarization
|
| 29 |
+
name: Summarization
|
| 30 |
+
dataset:
|
| 31 |
+
name: xsum
|
| 32 |
+
type: xsum
|
| 33 |
+
config: default
|
| 34 |
+
split: test
|
| 35 |
+
metrics:
|
| 36 |
+
- type: rouge
|
| 37 |
+
value: 35.2055
|
| 38 |
+
name: ROUGE-1
|
| 39 |
+
verified: true
|
| 40 |
+
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMTA5OTZkY2YxMDU1YzE3NGJlMmE1OTg1NjlmNzcxOTg4YzY2OThlOTlkNGFhMGFjZWY4YjdiMjU5NDdmMWYzNSIsInZlcnNpb24iOjF9.ufBRoV2JoX4UlEfAUOYq7F3tZougwngdpKlnaC37tYXJU3omsR5hTsWM69hSdYO-k0cKUbAWCAMzjmoGwIaPAw
|
| 41 |
+
- type: rouge
|
| 42 |
+
value: 16.5689
|
| 43 |
+
name: ROUGE-2
|
| 44 |
+
verified: true
|
| 45 |
+
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOWQwMmM2NjJjNzM1N2Y3NjZmMmE5NzNlNjRjNjEwNzNhNjcyZTRiMGRlODY3NWUyMGQ0YzZmMGFhODYzOTRmOSIsInZlcnNpb24iOjF9.AZZkbaYBZG6rw6-QHYjRlSl-p0gBT2EtJxwjIP7QYH5XIQjeoiQsTnDPIq25dSMDbmQLSZnpHC104ZctX0f_Dg
|
| 46 |
+
- type: rouge
|
| 47 |
+
value: 30.1285
|
| 48 |
+
name: ROUGE-L
|
| 49 |
+
verified: true
|
| 50 |
+
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTRjYThlMTllZjI4MGFiMDZhZTVkYmRjMTNhZDUzNTQ0OWQyNDQxMmQ5ODJiMmJiNGI3OTAzYjhiMzc2MTI4NCIsInZlcnNpb24iOjF9.zTHd3F4ZlgS-azl-ZVjOckcTrtrJmDOGWVaC3qQsvvn2UW9TnseNkmo7KBc3DJU7_NmlxWZArl1BdSetED0NCg
|
| 51 |
+
- type: rouge
|
| 52 |
+
value: 30.1706
|
| 53 |
+
name: ROUGE-LSUM
|
| 54 |
+
verified: true
|
| 55 |
+
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZGMzZGFjNzVkYWI0NTJkMmZjZDQ0YjhiYjIxN2VkNmJjMTgwZTk1NjFlOGU2NjNjM2VjYTNlYTBhNTQ5MGZkNSIsInZlcnNpb24iOjF9.xQ2LoI3PwlEiXo1OT2o4Pq9o2thYCd9lSCKCWlLmZdxI5GxdsjcASBKmHKopzUcwCGBPR7zF95MHSAPyszOODA
|
| 56 |
+
- type: loss
|
| 57 |
+
value: 2.7092134952545166
|
| 58 |
+
name: loss
|
| 59 |
+
verified: true
|
| 60 |
+
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMzQzODE0NDc5YTYzYjJlMWU2YTVjOGRjN2JmYWVkOWNkNTRlMTZlOWIyN2NiODJkMDljMjI3YzZmYzM3N2JjYSIsInZlcnNpb24iOjF9.Vv_pdeFuRMoKK3cPr5P6n7D6_18ChJX-2qcT0y4is3XX3mS98fk3U1AYEuy9nBHOwYR3o0U8WBgQ-Ya_FqefBg
|
| 61 |
+
- type: gen_len
|
| 62 |
+
value: 15.1414
|
| 63 |
+
name: gen_len
|
| 64 |
+
verified: true
|
| 65 |
+
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYjk5OTk3NWRiNjZlZmQzMmYwOTU2MmQwOWE1MDNlNTg3YWVkOTgwOTc2ZTQ0MTBiZjliOWMyZTYwMDI2MDUzYiIsInZlcnNpb24iOjF9.Zvj84JzIhM50rWTQ2GrEeOU7HrS8KsILH-8ApTcSWSI6kVnucY0MyW2ODxvRAa_zHeCygFW6Q13TFGrT5kLNAA
|
| 66 |
---
|
| 67 |
|
| 68 |
+
### PEGASUS for Financial Summarization
|
|
|
|
| 69 |
|
| 70 |
+
This model was fine-tuned on a novel financial news dataset, which consists of 2K articles from [Bloomberg](https://www.bloomberg.com/europe), on topics such as stock, markets, currencies, rate and cryptocurrencies.
|
| 71 |
|
| 72 |
+
It is based on the [PEGASUS](https://huggingface.co/transformers/model_doc/pegasus.html) model and in particular PEGASUS fine-tuned on the Extreme Summarization (XSum) dataset: [google/pegasus-xsum model](https://huggingface.co/google/pegasus-xsum). PEGASUS was originally proposed by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu in [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/pdf/1912.08777.pdf).
|
| 73 |
|
| 74 |
+
### How to use
|
| 75 |
+
We provide a simple snippet of how to use this model for the task of financial summarization in PyTorch.
|
| 76 |
|
| 77 |
+
```Python
|
| 78 |
+
from transformers import PegasusTokenizer, PegasusForConditionalGeneration, TFPegasusForConditionalGeneration
|
| 79 |
|
| 80 |
+
# Let's load the model and the tokenizer
|
| 81 |
+
model_name = "human-centered-summarization/financial-summarization-pegasus"
|
| 82 |
+
tokenizer = PegasusTokenizer.from_pretrained(model_name)
|
| 83 |
+
model = PegasusForConditionalGeneration.from_pretrained(model_name) # If you want to use the Tensorflow model
|
| 84 |
+
# just replace with TFPegasusForConditionalGeneration
|
| 85 |
|
|
|
|
| 86 |
|
| 87 |
+
# Some text to summarize here
|
| 88 |
+
text_to_summarize = "National Commercial Bank (NCB), Saudi Arabia’s largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba’s Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region’s third-largest lender. The entity’s $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East’s biggest lender with about $268 billion of assets."
|
| 89 |
|
| 90 |
+
# Tokenize our text
|
| 91 |
+
# If you want to run the code in Tensorflow, please remember to return the particular tensors as simply as using return_tensors = 'tf'
|
| 92 |
+
input_ids = tokenizer(text_to_summarize, return_tensors="pt").input_ids
|
| 93 |
|
| 94 |
+
# Generate the output (Here, we use beam search but you can also use any other strategy you like)
|
| 95 |
+
output = model.generate(
|
| 96 |
+
input_ids,
|
| 97 |
+
max_length=32,
|
| 98 |
+
num_beams=5,
|
| 99 |
+
early_stopping=True
|
| 100 |
+
)
|
| 101 |
|
| 102 |
+
# Finally, we can print the generated summary
|
| 103 |
+
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
| 104 |
+
# Generated Output: Saudi bank to pay a 3.5% premium to Samba share price. Gulf region’s third-largest lender will have total assets of $220 billion
|
| 105 |
+
```
|
| 106 |
|
| 107 |
+
## Evaluation Results
|
| 108 |
+
The results before and after the fine-tuning on our dataset are shown below:
|
| 109 |
+
|
| 110 |
+
|
| 111 |
+
| Fine-tuning | R-1 | R-2 | R-L | R-S |
|
| 112 |
+
|:-----------:|:-----:|:-----:|:------:|:-----:|
|
| 113 |
+
| Yes | 23.55 | 6.99 | 18.14 | 21.36 |
|
| 114 |
+
| No | 13.8 | 2.4 | 10.63 | 12.03 |
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
## Citation
|
| 118 |
+
|
| 119 |
+
You can find more details about this work in the following workshop paper. If you use our model in your research, please consider citing our paper:
|
| 120 |
+
|
| 121 |
+
> T. Passali, A. Gidiotis, E. Chatzikyriakidis and G. Tsoumakas. 2021.
|
| 122 |
+
> Towards Human-Centered Summarization: A Case Study on Financial News.
|
| 123 |
+
> In Proceedings of the First Workshop on Bridging Human-Computer Interaction and Natural Language Processing(pp. 21–27). Association for Computational Linguistics.
|
| 124 |
+
|
| 125 |
+
BibTeX entry:
|
| 126 |
+
|
| 127 |
+
```
|
| 128 |
+
@inproceedings{passali-etal-2021-towards,
|
| 129 |
+
title = "Towards Human-Centered Summarization: A Case Study on Financial News",
|
| 130 |
+
author = "Passali, Tatiana and Gidiotis, Alexios and Chatzikyriakidis, Efstathios and Tsoumakas, Grigorios",
|
| 131 |
+
booktitle = "Proceedings of the First Workshop on Bridging Human{--}Computer Interaction and Natural Language Processing",
|
| 132 |
+
month = apr,
|
| 133 |
+
year = "2021",
|
| 134 |
+
address = "Online",
|
| 135 |
+
publisher = "Association for Computational Linguistics",
|
| 136 |
+
url = "https://www.aclweb.org/anthology/2021.hcinlp-1.4",
|
| 137 |
+
pages = "21--27",
|
| 138 |
+
}
|
| 139 |
+
```
|
| 140 |
+
|
| 141 |
+
## Support
|
| 142 |
+
|
| 143 |
+
Contact us at [info@medoid.ai](mailto:info@medoid.ai) if you are interested in a more sophisticated version of the model, trained on more articles and adapted to your needs!
|
| 144 |
+
|
| 145 |
+
More information about Medoid AI:
|
| 146 |
+
- Website: [https://www.medoid.ai](https://www.medoid.ai)
|
| 147 |
+
- LinkedIn: [https://www.linkedin.com/company/medoid-ai/](https://www.linkedin.com/company/medoid-ai/)
|
| 148 |
|
|
|
|
| 149 |
|
|
|
|
|
|
|
|
|
|
|
|
config.json
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
{
|
| 2 |
-
"_name_or_path": "
|
| 3 |
"activation_dropout": 0.1,
|
| 4 |
"activation_function": "relu",
|
| 5 |
"add_bias_logits": false,
|
|
@@ -16,7 +16,6 @@
|
|
| 16 |
"decoder_ffn_dim": 4096,
|
| 17 |
"decoder_layerdrop": 0.0,
|
| 18 |
"decoder_layers": 16,
|
| 19 |
-
"decoder_start_token_id": 0,
|
| 20 |
"do_blenderbot_90_layernorm": false,
|
| 21 |
"dropout": 0.1,
|
| 22 |
"encoder_attention_heads": 16,
|
|
@@ -26,7 +25,6 @@
|
|
| 26 |
"eos_token_id": 1,
|
| 27 |
"extra_pos_embeddings": 1,
|
| 28 |
"force_bos_token_to_be_generated": false,
|
| 29 |
-
"forced_eos_token_id": 1,
|
| 30 |
"id2label": {
|
| 31 |
"0": "LABEL_0",
|
| 32 |
"1": "LABEL_1",
|
|
@@ -50,8 +48,6 @@
|
|
| 50 |
"pad_token_id": 0,
|
| 51 |
"scale_embedding": true,
|
| 52 |
"static_position_embeddings": true,
|
| 53 |
-
"torch_dtype": "float32",
|
| 54 |
-
"transformers_version": "4.39.3",
|
| 55 |
"use_cache": true,
|
| 56 |
"vocab_size": 96103
|
| 57 |
}
|
|
|
|
| 1 |
{
|
| 2 |
+
"_name_or_path": "google/pegasus-xsum",
|
| 3 |
"activation_dropout": 0.1,
|
| 4 |
"activation_function": "relu",
|
| 5 |
"add_bias_logits": false,
|
|
|
|
| 16 |
"decoder_ffn_dim": 4096,
|
| 17 |
"decoder_layerdrop": 0.0,
|
| 18 |
"decoder_layers": 16,
|
|
|
|
| 19 |
"do_blenderbot_90_layernorm": false,
|
| 20 |
"dropout": 0.1,
|
| 21 |
"encoder_attention_heads": 16,
|
|
|
|
| 25 |
"eos_token_id": 1,
|
| 26 |
"extra_pos_embeddings": 1,
|
| 27 |
"force_bos_token_to_be_generated": false,
|
|
|
|
| 28 |
"id2label": {
|
| 29 |
"0": "LABEL_0",
|
| 30 |
"1": "LABEL_1",
|
|
|
|
| 48 |
"pad_token_id": 0,
|
| 49 |
"scale_embedding": true,
|
| 50 |
"static_position_embeddings": true,
|
|
|
|
|
|
|
| 51 |
"use_cache": true,
|
| 52 |
"vocab_size": 96103
|
| 53 |
}
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:21a2edae4c836a34d445afbcd78c644d5f5e78653424d229c44021b580cf875c
|
| 3 |
+
size 2275264008
|
pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:118a4679c18c62b4b16518941b19ca5debaf9e8676b299f4d62c8bd450fdcb30
|
| 3 |
+
size 2275419259
|
special_tokens_map.json
CHANGED
|
@@ -1,110 +1 @@
|
|
| 1 |
-
{
|
| 2 |
-
"additional_special_tokens": [
|
| 3 |
-
"<mask_1>",
|
| 4 |
-
"<unk_2>",
|
| 5 |
-
"<unk_3>",
|
| 6 |
-
"<unk_4>",
|
| 7 |
-
"<unk_5>",
|
| 8 |
-
"<unk_6>",
|
| 9 |
-
"<unk_7>",
|
| 10 |
-
"<unk_8>",
|
| 11 |
-
"<unk_9>",
|
| 12 |
-
"<unk_10>",
|
| 13 |
-
"<unk_11>",
|
| 14 |
-
"<unk_12>",
|
| 15 |
-
"<unk_13>",
|
| 16 |
-
"<unk_14>",
|
| 17 |
-
"<unk_15>",
|
| 18 |
-
"<unk_16>",
|
| 19 |
-
"<unk_17>",
|
| 20 |
-
"<unk_18>",
|
| 21 |
-
"<unk_19>",
|
| 22 |
-
"<unk_20>",
|
| 23 |
-
"<unk_21>",
|
| 24 |
-
"<unk_22>",
|
| 25 |
-
"<unk_23>",
|
| 26 |
-
"<unk_24>",
|
| 27 |
-
"<unk_25>",
|
| 28 |
-
"<unk_26>",
|
| 29 |
-
"<unk_27>",
|
| 30 |
-
"<unk_28>",
|
| 31 |
-
"<unk_29>",
|
| 32 |
-
"<unk_30>",
|
| 33 |
-
"<unk_31>",
|
| 34 |
-
"<unk_32>",
|
| 35 |
-
"<unk_33>",
|
| 36 |
-
"<unk_34>",
|
| 37 |
-
"<unk_35>",
|
| 38 |
-
"<unk_36>",
|
| 39 |
-
"<unk_37>",
|
| 40 |
-
"<unk_38>",
|
| 41 |
-
"<unk_39>",
|
| 42 |
-
"<unk_40>",
|
| 43 |
-
"<unk_41>",
|
| 44 |
-
"<unk_42>",
|
| 45 |
-
"<unk_43>",
|
| 46 |
-
"<unk_44>",
|
| 47 |
-
"<unk_45>",
|
| 48 |
-
"<unk_46>",
|
| 49 |
-
"<unk_47>",
|
| 50 |
-
"<unk_48>",
|
| 51 |
-
"<unk_49>",
|
| 52 |
-
"<unk_50>",
|
| 53 |
-
"<unk_51>",
|
| 54 |
-
"<unk_52>",
|
| 55 |
-
"<unk_53>",
|
| 56 |
-
"<unk_54>",
|
| 57 |
-
"<unk_55>",
|
| 58 |
-
"<unk_56>",
|
| 59 |
-
"<unk_57>",
|
| 60 |
-
"<unk_58>",
|
| 61 |
-
"<unk_59>",
|
| 62 |
-
"<unk_60>",
|
| 63 |
-
"<unk_61>",
|
| 64 |
-
"<unk_62>",
|
| 65 |
-
"<unk_63>",
|
| 66 |
-
"<unk_64>",
|
| 67 |
-
"<unk_65>",
|
| 68 |
-
"<unk_66>",
|
| 69 |
-
"<unk_67>",
|
| 70 |
-
"<unk_68>",
|
| 71 |
-
"<unk_69>",
|
| 72 |
-
"<unk_70>",
|
| 73 |
-
"<unk_71>",
|
| 74 |
-
"<unk_72>",
|
| 75 |
-
"<unk_73>",
|
| 76 |
-
"<unk_74>",
|
| 77 |
-
"<unk_75>",
|
| 78 |
-
"<unk_76>",
|
| 79 |
-
"<unk_77>",
|
| 80 |
-
"<unk_78>",
|
| 81 |
-
"<unk_79>",
|
| 82 |
-
"<unk_80>",
|
| 83 |
-
"<unk_81>",
|
| 84 |
-
"<unk_82>",
|
| 85 |
-
"<unk_83>",
|
| 86 |
-
"<unk_84>",
|
| 87 |
-
"<unk_85>",
|
| 88 |
-
"<unk_86>",
|
| 89 |
-
"<unk_87>",
|
| 90 |
-
"<unk_88>",
|
| 91 |
-
"<unk_89>",
|
| 92 |
-
"<unk_90>",
|
| 93 |
-
"<unk_91>",
|
| 94 |
-
"<unk_92>",
|
| 95 |
-
"<unk_93>",
|
| 96 |
-
"<unk_94>",
|
| 97 |
-
"<unk_95>",
|
| 98 |
-
"<unk_96>",
|
| 99 |
-
"<unk_97>",
|
| 100 |
-
"<unk_98>",
|
| 101 |
-
"<unk_99>",
|
| 102 |
-
"<unk_100>",
|
| 103 |
-
"<unk_101>",
|
| 104 |
-
"<unk_102>"
|
| 105 |
-
],
|
| 106 |
-
"eos_token": "</s>",
|
| 107 |
-
"mask_token": "<mask_2>",
|
| 108 |
-
"pad_token": "<pad>",
|
| 109 |
-
"unk_token": "<unk>"
|
| 110 |
-
}
|
|
|
|
| 1 |
+
{"eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>", "mask_token": "<mask_2>", "additional_special_tokens": ["<mask_1>", "<unk_2>", "<unk_3>", "<unk_4>", "<unk_5>", "<unk_6>", "<unk_7>", "<unk_8>", "<unk_9>", "<unk_10>", "<unk_11>", "<unk_12>", "<unk_13>", "<unk_14>", "<unk_15>", "<unk_16>", "<unk_17>", "<unk_18>", "<unk_19>", "<unk_20>", "<unk_21>", "<unk_22>", "<unk_23>", "<unk_24>", "<unk_25>", "<unk_26>", "<unk_27>", "<unk_28>", "<unk_29>", "<unk_30>", "<unk_31>", "<unk_32>", "<unk_33>", "<unk_34>", "<unk_35>", "<unk_36>", "<unk_37>", "<unk_38>", "<unk_39>", "<unk_40>", "<unk_41>", "<unk_42>", "<unk_43>", "<unk_44>", "<unk_45>", "<unk_46>", "<unk_47>", "<unk_48>", "<unk_49>", "<unk_50>", "<unk_51>", "<unk_52>", "<unk_53>", "<unk_54>", "<unk_55>", "<unk_56>", "<unk_57>", "<unk_58>", "<unk_59>", "<unk_60>", "<unk_61>", "<unk_62>", "<unk_63>", "<unk_64>", "<unk_65>", "<unk_66>", "<unk_67>", "<unk_68>", "<unk_69>", "<unk_70>", "<unk_71>", "<unk_72>", "<unk_73>", "<unk_74>", "<unk_75>", "<unk_76>", "<unk_77>", "<unk_78>", "<unk_79>", "<unk_80>", "<unk_81>", "<unk_82>", "<unk_83>", "<unk_84>", "<unk_85>", "<unk_86>", "<unk_87>", "<unk_88>", "<unk_89>", "<unk_90>", "<unk_91>", "<unk_92>", "<unk_93>", "<unk_94>", "<unk_95>", "<unk_96>", "<unk_97>", "<unk_98>", "<unk_99>", "<unk_100>", "<unk_101>", "<unk_102>"]}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
tf_model.h5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c3c3f44f93dcb69dd15024cd8ff2819f42d77818bc71538c770dfae0f91519b7
|
| 3 |
+
size 2279696568
|
tokenizer_config.json
CHANGED
|
@@ -1,966 +1 @@
|
|
| 1 |
-
{
|
| 2 |
-
"added_tokens_decoder": {
|
| 3 |
-
"0": {
|
| 4 |
-
"content": "<pad>",
|
| 5 |
-
"lstrip": false,
|
| 6 |
-
"normalized": false,
|
| 7 |
-
"rstrip": false,
|
| 8 |
-
"single_word": false,
|
| 9 |
-
"special": true
|
| 10 |
-
},
|
| 11 |
-
"1": {
|
| 12 |
-
"content": "</s>",
|
| 13 |
-
"lstrip": false,
|
| 14 |
-
"normalized": false,
|
| 15 |
-
"rstrip": false,
|
| 16 |
-
"single_word": false,
|
| 17 |
-
"special": true
|
| 18 |
-
},
|
| 19 |
-
"2": {
|
| 20 |
-
"content": "<mask_1>",
|
| 21 |
-
"lstrip": false,
|
| 22 |
-
"normalized": false,
|
| 23 |
-
"rstrip": false,
|
| 24 |
-
"single_word": false,
|
| 25 |
-
"special": true
|
| 26 |
-
},
|
| 27 |
-
"3": {
|
| 28 |
-
"content": "<mask_2>",
|
| 29 |
-
"lstrip": false,
|
| 30 |
-
"normalized": false,
|
| 31 |
-
"rstrip": false,
|
| 32 |
-
"single_word": false,
|
| 33 |
-
"special": true
|
| 34 |
-
},
|
| 35 |
-
"4": {
|
| 36 |
-
"content": "<unk_2>",
|
| 37 |
-
"lstrip": false,
|
| 38 |
-
"normalized": false,
|
| 39 |
-
"rstrip": false,
|
| 40 |
-
"single_word": false,
|
| 41 |
-
"special": true
|
| 42 |
-
},
|
| 43 |
-
"5": {
|
| 44 |
-
"content": "<unk_3>",
|
| 45 |
-
"lstrip": false,
|
| 46 |
-
"normalized": false,
|
| 47 |
-
"rstrip": false,
|
| 48 |
-
"single_word": false,
|
| 49 |
-
"special": true
|
| 50 |
-
},
|
| 51 |
-
"6": {
|
| 52 |
-
"content": "<unk_4>",
|
| 53 |
-
"lstrip": false,
|
| 54 |
-
"normalized": false,
|
| 55 |
-
"rstrip": false,
|
| 56 |
-
"single_word": false,
|
| 57 |
-
"special": true
|
| 58 |
-
},
|
| 59 |
-
"7": {
|
| 60 |
-
"content": "<unk_5>",
|
| 61 |
-
"lstrip": false,
|
| 62 |
-
"normalized": false,
|
| 63 |
-
"rstrip": false,
|
| 64 |
-
"single_word": false,
|
| 65 |
-
"special": true
|
| 66 |
-
},
|
| 67 |
-
"8": {
|
| 68 |
-
"content": "<unk_6>",
|
| 69 |
-
"lstrip": false,
|
| 70 |
-
"normalized": false,
|
| 71 |
-
"rstrip": false,
|
| 72 |
-
"single_word": false,
|
| 73 |
-
"special": true
|
| 74 |
-
},
|
| 75 |
-
"9": {
|
| 76 |
-
"content": "<unk_7>",
|
| 77 |
-
"lstrip": false,
|
| 78 |
-
"normalized": false,
|
| 79 |
-
"rstrip": false,
|
| 80 |
-
"single_word": false,
|
| 81 |
-
"special": true
|
| 82 |
-
},
|
| 83 |
-
"10": {
|
| 84 |
-
"content": "<unk_8>",
|
| 85 |
-
"lstrip": false,
|
| 86 |
-
"normalized": false,
|
| 87 |
-
"rstrip": false,
|
| 88 |
-
"single_word": false,
|
| 89 |
-
"special": true
|
| 90 |
-
},
|
| 91 |
-
"11": {
|
| 92 |
-
"content": "<unk_9>",
|
| 93 |
-
"lstrip": false,
|
| 94 |
-
"normalized": false,
|
| 95 |
-
"rstrip": false,
|
| 96 |
-
"single_word": false,
|
| 97 |
-
"special": true
|
| 98 |
-
},
|
| 99 |
-
"12": {
|
| 100 |
-
"content": "<unk_10>",
|
| 101 |
-
"lstrip": false,
|
| 102 |
-
"normalized": false,
|
| 103 |
-
"rstrip": false,
|
| 104 |
-
"single_word": false,
|
| 105 |
-
"special": true
|
| 106 |
-
},
|
| 107 |
-
"13": {
|
| 108 |
-
"content": "<unk_11>",
|
| 109 |
-
"lstrip": false,
|
| 110 |
-
"normalized": false,
|
| 111 |
-
"rstrip": false,
|
| 112 |
-
"single_word": false,
|
| 113 |
-
"special": true
|
| 114 |
-
},
|
| 115 |
-
"14": {
|
| 116 |
-
"content": "<unk_12>",
|
| 117 |
-
"lstrip": false,
|
| 118 |
-
"normalized": false,
|
| 119 |
-
"rstrip": false,
|
| 120 |
-
"single_word": false,
|
| 121 |
-
"special": true
|
| 122 |
-
},
|
| 123 |
-
"15": {
|
| 124 |
-
"content": "<unk_13>",
|
| 125 |
-
"lstrip": false,
|
| 126 |
-
"normalized": false,
|
| 127 |
-
"rstrip": false,
|
| 128 |
-
"single_word": false,
|
| 129 |
-
"special": true
|
| 130 |
-
},
|
| 131 |
-
"16": {
|
| 132 |
-
"content": "<unk_14>",
|
| 133 |
-
"lstrip": false,
|
| 134 |
-
"normalized": false,
|
| 135 |
-
"rstrip": false,
|
| 136 |
-
"single_word": false,
|
| 137 |
-
"special": true
|
| 138 |
-
},
|
| 139 |
-
"17": {
|
| 140 |
-
"content": "<unk_15>",
|
| 141 |
-
"lstrip": false,
|
| 142 |
-
"normalized": false,
|
| 143 |
-
"rstrip": false,
|
| 144 |
-
"single_word": false,
|
| 145 |
-
"special": true
|
| 146 |
-
},
|
| 147 |
-
"18": {
|
| 148 |
-
"content": "<unk_16>",
|
| 149 |
-
"lstrip": false,
|
| 150 |
-
"normalized": false,
|
| 151 |
-
"rstrip": false,
|
| 152 |
-
"single_word": false,
|
| 153 |
-
"special": true
|
| 154 |
-
},
|
| 155 |
-
"19": {
|
| 156 |
-
"content": "<unk_17>",
|
| 157 |
-
"lstrip": false,
|
| 158 |
-
"normalized": false,
|
| 159 |
-
"rstrip": false,
|
| 160 |
-
"single_word": false,
|
| 161 |
-
"special": true
|
| 162 |
-
},
|
| 163 |
-
"20": {
|
| 164 |
-
"content": "<unk_18>",
|
| 165 |
-
"lstrip": false,
|
| 166 |
-
"normalized": false,
|
| 167 |
-
"rstrip": false,
|
| 168 |
-
"single_word": false,
|
| 169 |
-
"special": true
|
| 170 |
-
},
|
| 171 |
-
"21": {
|
| 172 |
-
"content": "<unk_19>",
|
| 173 |
-
"lstrip": false,
|
| 174 |
-
"normalized": false,
|
| 175 |
-
"rstrip": false,
|
| 176 |
-
"single_word": false,
|
| 177 |
-
"special": true
|
| 178 |
-
},
|
| 179 |
-
"22": {
|
| 180 |
-
"content": "<unk_20>",
|
| 181 |
-
"lstrip": false,
|
| 182 |
-
"normalized": false,
|
| 183 |
-
"rstrip": false,
|
| 184 |
-
"single_word": false,
|
| 185 |
-
"special": true
|
| 186 |
-
},
|
| 187 |
-
"23": {
|
| 188 |
-
"content": "<unk_21>",
|
| 189 |
-
"lstrip": false,
|
| 190 |
-
"normalized": false,
|
| 191 |
-
"rstrip": false,
|
| 192 |
-
"single_word": false,
|
| 193 |
-
"special": true
|
| 194 |
-
},
|
| 195 |
-
"24": {
|
| 196 |
-
"content": "<unk_22>",
|
| 197 |
-
"lstrip": false,
|
| 198 |
-
"normalized": false,
|
| 199 |
-
"rstrip": false,
|
| 200 |
-
"single_word": false,
|
| 201 |
-
"special": true
|
| 202 |
-
},
|
| 203 |
-
"25": {
|
| 204 |
-
"content": "<unk_23>",
|
| 205 |
-
"lstrip": false,
|
| 206 |
-
"normalized": false,
|
| 207 |
-
"rstrip": false,
|
| 208 |
-
"single_word": false,
|
| 209 |
-
"special": true
|
| 210 |
-
},
|
| 211 |
-
"26": {
|
| 212 |
-
"content": "<unk_24>",
|
| 213 |
-
"lstrip": false,
|
| 214 |
-
"normalized": false,
|
| 215 |
-
"rstrip": false,
|
| 216 |
-
"single_word": false,
|
| 217 |
-
"special": true
|
| 218 |
-
},
|
| 219 |
-
"27": {
|
| 220 |
-
"content": "<unk_25>",
|
| 221 |
-
"lstrip": false,
|
| 222 |
-
"normalized": false,
|
| 223 |
-
"rstrip": false,
|
| 224 |
-
"single_word": false,
|
| 225 |
-
"special": true
|
| 226 |
-
},
|
| 227 |
-
"28": {
|
| 228 |
-
"content": "<unk_26>",
|
| 229 |
-
"lstrip": false,
|
| 230 |
-
"normalized": false,
|
| 231 |
-
"rstrip": false,
|
| 232 |
-
"single_word": false,
|
| 233 |
-
"special": true
|
| 234 |
-
},
|
| 235 |
-
"29": {
|
| 236 |
-
"content": "<unk_27>",
|
| 237 |
-
"lstrip": false,
|
| 238 |
-
"normalized": false,
|
| 239 |
-
"rstrip": false,
|
| 240 |
-
"single_word": false,
|
| 241 |
-
"special": true
|
| 242 |
-
},
|
| 243 |
-
"30": {
|
| 244 |
-
"content": "<unk_28>",
|
| 245 |
-
"lstrip": false,
|
| 246 |
-
"normalized": false,
|
| 247 |
-
"rstrip": false,
|
| 248 |
-
"single_word": false,
|
| 249 |
-
"special": true
|
| 250 |
-
},
|
| 251 |
-
"31": {
|
| 252 |
-
"content": "<unk_29>",
|
| 253 |
-
"lstrip": false,
|
| 254 |
-
"normalized": false,
|
| 255 |
-
"rstrip": false,
|
| 256 |
-
"single_word": false,
|
| 257 |
-
"special": true
|
| 258 |
-
},
|
| 259 |
-
"32": {
|
| 260 |
-
"content": "<unk_30>",
|
| 261 |
-
"lstrip": false,
|
| 262 |
-
"normalized": false,
|
| 263 |
-
"rstrip": false,
|
| 264 |
-
"single_word": false,
|
| 265 |
-
"special": true
|
| 266 |
-
},
|
| 267 |
-
"33": {
|
| 268 |
-
"content": "<unk_31>",
|
| 269 |
-
"lstrip": false,
|
| 270 |
-
"normalized": false,
|
| 271 |
-
"rstrip": false,
|
| 272 |
-
"single_word": false,
|
| 273 |
-
"special": true
|
| 274 |
-
},
|
| 275 |
-
"34": {
|
| 276 |
-
"content": "<unk_32>",
|
| 277 |
-
"lstrip": false,
|
| 278 |
-
"normalized": false,
|
| 279 |
-
"rstrip": false,
|
| 280 |
-
"single_word": false,
|
| 281 |
-
"special": true
|
| 282 |
-
},
|
| 283 |
-
"35": {
|
| 284 |
-
"content": "<unk_33>",
|
| 285 |
-
"lstrip": false,
|
| 286 |
-
"normalized": false,
|
| 287 |
-
"rstrip": false,
|
| 288 |
-
"single_word": false,
|
| 289 |
-
"special": true
|
| 290 |
-
},
|
| 291 |
-
"36": {
|
| 292 |
-
"content": "<unk_34>",
|
| 293 |
-
"lstrip": false,
|
| 294 |
-
"normalized": false,
|
| 295 |
-
"rstrip": false,
|
| 296 |
-
"single_word": false,
|
| 297 |
-
"special": true
|
| 298 |
-
},
|
| 299 |
-
"37": {
|
| 300 |
-
"content": "<unk_35>",
|
| 301 |
-
"lstrip": false,
|
| 302 |
-
"normalized": false,
|
| 303 |
-
"rstrip": false,
|
| 304 |
-
"single_word": false,
|
| 305 |
-
"special": true
|
| 306 |
-
},
|
| 307 |
-
"38": {
|
| 308 |
-
"content": "<unk_36>",
|
| 309 |
-
"lstrip": false,
|
| 310 |
-
"normalized": false,
|
| 311 |
-
"rstrip": false,
|
| 312 |
-
"single_word": false,
|
| 313 |
-
"special": true
|
| 314 |
-
},
|
| 315 |
-
"39": {
|
| 316 |
-
"content": "<unk_37>",
|
| 317 |
-
"lstrip": false,
|
| 318 |
-
"normalized": false,
|
| 319 |
-
"rstrip": false,
|
| 320 |
-
"single_word": false,
|
| 321 |
-
"special": true
|
| 322 |
-
},
|
| 323 |
-
"40": {
|
| 324 |
-
"content": "<unk_38>",
|
| 325 |
-
"lstrip": false,
|
| 326 |
-
"normalized": false,
|
| 327 |
-
"rstrip": false,
|
| 328 |
-
"single_word": false,
|
| 329 |
-
"special": true
|
| 330 |
-
},
|
| 331 |
-
"41": {
|
| 332 |
-
"content": "<unk_39>",
|
| 333 |
-
"lstrip": false,
|
| 334 |
-
"normalized": false,
|
| 335 |
-
"rstrip": false,
|
| 336 |
-
"single_word": false,
|
| 337 |
-
"special": true
|
| 338 |
-
},
|
| 339 |
-
"42": {
|
| 340 |
-
"content": "<unk_40>",
|
| 341 |
-
"lstrip": false,
|
| 342 |
-
"normalized": false,
|
| 343 |
-
"rstrip": false,
|
| 344 |
-
"single_word": false,
|
| 345 |
-
"special": true
|
| 346 |
-
},
|
| 347 |
-
"43": {
|
| 348 |
-
"content": "<unk_41>",
|
| 349 |
-
"lstrip": false,
|
| 350 |
-
"normalized": false,
|
| 351 |
-
"rstrip": false,
|
| 352 |
-
"single_word": false,
|
| 353 |
-
"special": true
|
| 354 |
-
},
|
| 355 |
-
"44": {
|
| 356 |
-
"content": "<unk_42>",
|
| 357 |
-
"lstrip": false,
|
| 358 |
-
"normalized": false,
|
| 359 |
-
"rstrip": false,
|
| 360 |
-
"single_word": false,
|
| 361 |
-
"special": true
|
| 362 |
-
},
|
| 363 |
-
"45": {
|
| 364 |
-
"content": "<unk_43>",
|
| 365 |
-
"lstrip": false,
|
| 366 |
-
"normalized": false,
|
| 367 |
-
"rstrip": false,
|
| 368 |
-
"single_word": false,
|
| 369 |
-
"special": true
|
| 370 |
-
},
|
| 371 |
-
"46": {
|
| 372 |
-
"content": "<unk_44>",
|
| 373 |
-
"lstrip": false,
|
| 374 |
-
"normalized": false,
|
| 375 |
-
"rstrip": false,
|
| 376 |
-
"single_word": false,
|
| 377 |
-
"special": true
|
| 378 |
-
},
|
| 379 |
-
"47": {
|
| 380 |
-
"content": "<unk_45>",
|
| 381 |
-
"lstrip": false,
|
| 382 |
-
"normalized": false,
|
| 383 |
-
"rstrip": false,
|
| 384 |
-
"single_word": false,
|
| 385 |
-
"special": true
|
| 386 |
-
},
|
| 387 |
-
"48": {
|
| 388 |
-
"content": "<unk_46>",
|
| 389 |
-
"lstrip": false,
|
| 390 |
-
"normalized": false,
|
| 391 |
-
"rstrip": false,
|
| 392 |
-
"single_word": false,
|
| 393 |
-
"special": true
|
| 394 |
-
},
|
| 395 |
-
"49": {
|
| 396 |
-
"content": "<unk_47>",
|
| 397 |
-
"lstrip": false,
|
| 398 |
-
"normalized": false,
|
| 399 |
-
"rstrip": false,
|
| 400 |
-
"single_word": false,
|
| 401 |
-
"special": true
|
| 402 |
-
},
|
| 403 |
-
"50": {
|
| 404 |
-
"content": "<unk_48>",
|
| 405 |
-
"lstrip": false,
|
| 406 |
-
"normalized": false,
|
| 407 |
-
"rstrip": false,
|
| 408 |
-
"single_word": false,
|
| 409 |
-
"special": true
|
| 410 |
-
},
|
| 411 |
-
"51": {
|
| 412 |
-
"content": "<unk_49>",
|
| 413 |
-
"lstrip": false,
|
| 414 |
-
"normalized": false,
|
| 415 |
-
"rstrip": false,
|
| 416 |
-
"single_word": false,
|
| 417 |
-
"special": true
|
| 418 |
-
},
|
| 419 |
-
"52": {
|
| 420 |
-
"content": "<unk_50>",
|
| 421 |
-
"lstrip": false,
|
| 422 |
-
"normalized": false,
|
| 423 |
-
"rstrip": false,
|
| 424 |
-
"single_word": false,
|
| 425 |
-
"special": true
|
| 426 |
-
},
|
| 427 |
-
"53": {
|
| 428 |
-
"content": "<unk_51>",
|
| 429 |
-
"lstrip": false,
|
| 430 |
-
"normalized": false,
|
| 431 |
-
"rstrip": false,
|
| 432 |
-
"single_word": false,
|
| 433 |
-
"special": true
|
| 434 |
-
},
|
| 435 |
-
"54": {
|
| 436 |
-
"content": "<unk_52>",
|
| 437 |
-
"lstrip": false,
|
| 438 |
-
"normalized": false,
|
| 439 |
-
"rstrip": false,
|
| 440 |
-
"single_word": false,
|
| 441 |
-
"special": true
|
| 442 |
-
},
|
| 443 |
-
"55": {
|
| 444 |
-
"content": "<unk_53>",
|
| 445 |
-
"lstrip": false,
|
| 446 |
-
"normalized": false,
|
| 447 |
-
"rstrip": false,
|
| 448 |
-
"single_word": false,
|
| 449 |
-
"special": true
|
| 450 |
-
},
|
| 451 |
-
"56": {
|
| 452 |
-
"content": "<unk_54>",
|
| 453 |
-
"lstrip": false,
|
| 454 |
-
"normalized": false,
|
| 455 |
-
"rstrip": false,
|
| 456 |
-
"single_word": false,
|
| 457 |
-
"special": true
|
| 458 |
-
},
|
| 459 |
-
"57": {
|
| 460 |
-
"content": "<unk_55>",
|
| 461 |
-
"lstrip": false,
|
| 462 |
-
"normalized": false,
|
| 463 |
-
"rstrip": false,
|
| 464 |
-
"single_word": false,
|
| 465 |
-
"special": true
|
| 466 |
-
},
|
| 467 |
-
"58": {
|
| 468 |
-
"content": "<unk_56>",
|
| 469 |
-
"lstrip": false,
|
| 470 |
-
"normalized": false,
|
| 471 |
-
"rstrip": false,
|
| 472 |
-
"single_word": false,
|
| 473 |
-
"special": true
|
| 474 |
-
},
|
| 475 |
-
"59": {
|
| 476 |
-
"content": "<unk_57>",
|
| 477 |
-
"lstrip": false,
|
| 478 |
-
"normalized": false,
|
| 479 |
-
"rstrip": false,
|
| 480 |
-
"single_word": false,
|
| 481 |
-
"special": true
|
| 482 |
-
},
|
| 483 |
-
"60": {
|
| 484 |
-
"content": "<unk_58>",
|
| 485 |
-
"lstrip": false,
|
| 486 |
-
"normalized": false,
|
| 487 |
-
"rstrip": false,
|
| 488 |
-
"single_word": false,
|
| 489 |
-
"special": true
|
| 490 |
-
},
|
| 491 |
-
"61": {
|
| 492 |
-
"content": "<unk_59>",
|
| 493 |
-
"lstrip": false,
|
| 494 |
-
"normalized": false,
|
| 495 |
-
"rstrip": false,
|
| 496 |
-
"single_word": false,
|
| 497 |
-
"special": true
|
| 498 |
-
},
|
| 499 |
-
"62": {
|
| 500 |
-
"content": "<unk_60>",
|
| 501 |
-
"lstrip": false,
|
| 502 |
-
"normalized": false,
|
| 503 |
-
"rstrip": false,
|
| 504 |
-
"single_word": false,
|
| 505 |
-
"special": true
|
| 506 |
-
},
|
| 507 |
-
"63": {
|
| 508 |
-
"content": "<unk_61>",
|
| 509 |
-
"lstrip": false,
|
| 510 |
-
"normalized": false,
|
| 511 |
-
"rstrip": false,
|
| 512 |
-
"single_word": false,
|
| 513 |
-
"special": true
|
| 514 |
-
},
|
| 515 |
-
"64": {
|
| 516 |
-
"content": "<unk_62>",
|
| 517 |
-
"lstrip": false,
|
| 518 |
-
"normalized": false,
|
| 519 |
-
"rstrip": false,
|
| 520 |
-
"single_word": false,
|
| 521 |
-
"special": true
|
| 522 |
-
},
|
| 523 |
-
"65": {
|
| 524 |
-
"content": "<unk_63>",
|
| 525 |
-
"lstrip": false,
|
| 526 |
-
"normalized": false,
|
| 527 |
-
"rstrip": false,
|
| 528 |
-
"single_word": false,
|
| 529 |
-
"special": true
|
| 530 |
-
},
|
| 531 |
-
"66": {
|
| 532 |
-
"content": "<unk_64>",
|
| 533 |
-
"lstrip": false,
|
| 534 |
-
"normalized": false,
|
| 535 |
-
"rstrip": false,
|
| 536 |
-
"single_word": false,
|
| 537 |
-
"special": true
|
| 538 |
-
},
|
| 539 |
-
"67": {
|
| 540 |
-
"content": "<unk_65>",
|
| 541 |
-
"lstrip": false,
|
| 542 |
-
"normalized": false,
|
| 543 |
-
"rstrip": false,
|
| 544 |
-
"single_word": false,
|
| 545 |
-
"special": true
|
| 546 |
-
},
|
| 547 |
-
"68": {
|
| 548 |
-
"content": "<unk_66>",
|
| 549 |
-
"lstrip": false,
|
| 550 |
-
"normalized": false,
|
| 551 |
-
"rstrip": false,
|
| 552 |
-
"single_word": false,
|
| 553 |
-
"special": true
|
| 554 |
-
},
|
| 555 |
-
"69": {
|
| 556 |
-
"content": "<unk_67>",
|
| 557 |
-
"lstrip": false,
|
| 558 |
-
"normalized": false,
|
| 559 |
-
"rstrip": false,
|
| 560 |
-
"single_word": false,
|
| 561 |
-
"special": true
|
| 562 |
-
},
|
| 563 |
-
"70": {
|
| 564 |
-
"content": "<unk_68>",
|
| 565 |
-
"lstrip": false,
|
| 566 |
-
"normalized": false,
|
| 567 |
-
"rstrip": false,
|
| 568 |
-
"single_word": false,
|
| 569 |
-
"special": true
|
| 570 |
-
},
|
| 571 |
-
"71": {
|
| 572 |
-
"content": "<unk_69>",
|
| 573 |
-
"lstrip": false,
|
| 574 |
-
"normalized": false,
|
| 575 |
-
"rstrip": false,
|
| 576 |
-
"single_word": false,
|
| 577 |
-
"special": true
|
| 578 |
-
},
|
| 579 |
-
"72": {
|
| 580 |
-
"content": "<unk_70>",
|
| 581 |
-
"lstrip": false,
|
| 582 |
-
"normalized": false,
|
| 583 |
-
"rstrip": false,
|
| 584 |
-
"single_word": false,
|
| 585 |
-
"special": true
|
| 586 |
-
},
|
| 587 |
-
"73": {
|
| 588 |
-
"content": "<unk_71>",
|
| 589 |
-
"lstrip": false,
|
| 590 |
-
"normalized": false,
|
| 591 |
-
"rstrip": false,
|
| 592 |
-
"single_word": false,
|
| 593 |
-
"special": true
|
| 594 |
-
},
|
| 595 |
-
"74": {
|
| 596 |
-
"content": "<unk_72>",
|
| 597 |
-
"lstrip": false,
|
| 598 |
-
"normalized": false,
|
| 599 |
-
"rstrip": false,
|
| 600 |
-
"single_word": false,
|
| 601 |
-
"special": true
|
| 602 |
-
},
|
| 603 |
-
"75": {
|
| 604 |
-
"content": "<unk_73>",
|
| 605 |
-
"lstrip": false,
|
| 606 |
-
"normalized": false,
|
| 607 |
-
"rstrip": false,
|
| 608 |
-
"single_word": false,
|
| 609 |
-
"special": true
|
| 610 |
-
},
|
| 611 |
-
"76": {
|
| 612 |
-
"content": "<unk_74>",
|
| 613 |
-
"lstrip": false,
|
| 614 |
-
"normalized": false,
|
| 615 |
-
"rstrip": false,
|
| 616 |
-
"single_word": false,
|
| 617 |
-
"special": true
|
| 618 |
-
},
|
| 619 |
-
"77": {
|
| 620 |
-
"content": "<unk_75>",
|
| 621 |
-
"lstrip": false,
|
| 622 |
-
"normalized": false,
|
| 623 |
-
"rstrip": false,
|
| 624 |
-
"single_word": false,
|
| 625 |
-
"special": true
|
| 626 |
-
},
|
| 627 |
-
"78": {
|
| 628 |
-
"content": "<unk_76>",
|
| 629 |
-
"lstrip": false,
|
| 630 |
-
"normalized": false,
|
| 631 |
-
"rstrip": false,
|
| 632 |
-
"single_word": false,
|
| 633 |
-
"special": true
|
| 634 |
-
},
|
| 635 |
-
"79": {
|
| 636 |
-
"content": "<unk_77>",
|
| 637 |
-
"lstrip": false,
|
| 638 |
-
"normalized": false,
|
| 639 |
-
"rstrip": false,
|
| 640 |
-
"single_word": false,
|
| 641 |
-
"special": true
|
| 642 |
-
},
|
| 643 |
-
"80": {
|
| 644 |
-
"content": "<unk_78>",
|
| 645 |
-
"lstrip": false,
|
| 646 |
-
"normalized": false,
|
| 647 |
-
"rstrip": false,
|
| 648 |
-
"single_word": false,
|
| 649 |
-
"special": true
|
| 650 |
-
},
|
| 651 |
-
"81": {
|
| 652 |
-
"content": "<unk_79>",
|
| 653 |
-
"lstrip": false,
|
| 654 |
-
"normalized": false,
|
| 655 |
-
"rstrip": false,
|
| 656 |
-
"single_word": false,
|
| 657 |
-
"special": true
|
| 658 |
-
},
|
| 659 |
-
"82": {
|
| 660 |
-
"content": "<unk_80>",
|
| 661 |
-
"lstrip": false,
|
| 662 |
-
"normalized": false,
|
| 663 |
-
"rstrip": false,
|
| 664 |
-
"single_word": false,
|
| 665 |
-
"special": true
|
| 666 |
-
},
|
| 667 |
-
"83": {
|
| 668 |
-
"content": "<unk_81>",
|
| 669 |
-
"lstrip": false,
|
| 670 |
-
"normalized": false,
|
| 671 |
-
"rstrip": false,
|
| 672 |
-
"single_word": false,
|
| 673 |
-
"special": true
|
| 674 |
-
},
|
| 675 |
-
"84": {
|
| 676 |
-
"content": "<unk_82>",
|
| 677 |
-
"lstrip": false,
|
| 678 |
-
"normalized": false,
|
| 679 |
-
"rstrip": false,
|
| 680 |
-
"single_word": false,
|
| 681 |
-
"special": true
|
| 682 |
-
},
|
| 683 |
-
"85": {
|
| 684 |
-
"content": "<unk_83>",
|
| 685 |
-
"lstrip": false,
|
| 686 |
-
"normalized": false,
|
| 687 |
-
"rstrip": false,
|
| 688 |
-
"single_word": false,
|
| 689 |
-
"special": true
|
| 690 |
-
},
|
| 691 |
-
"86": {
|
| 692 |
-
"content": "<unk_84>",
|
| 693 |
-
"lstrip": false,
|
| 694 |
-
"normalized": false,
|
| 695 |
-
"rstrip": false,
|
| 696 |
-
"single_word": false,
|
| 697 |
-
"special": true
|
| 698 |
-
},
|
| 699 |
-
"87": {
|
| 700 |
-
"content": "<unk_85>",
|
| 701 |
-
"lstrip": false,
|
| 702 |
-
"normalized": false,
|
| 703 |
-
"rstrip": false,
|
| 704 |
-
"single_word": false,
|
| 705 |
-
"special": true
|
| 706 |
-
},
|
| 707 |
-
"88": {
|
| 708 |
-
"content": "<unk_86>",
|
| 709 |
-
"lstrip": false,
|
| 710 |
-
"normalized": false,
|
| 711 |
-
"rstrip": false,
|
| 712 |
-
"single_word": false,
|
| 713 |
-
"special": true
|
| 714 |
-
},
|
| 715 |
-
"89": {
|
| 716 |
-
"content": "<unk_87>",
|
| 717 |
-
"lstrip": false,
|
| 718 |
-
"normalized": false,
|
| 719 |
-
"rstrip": false,
|
| 720 |
-
"single_word": false,
|
| 721 |
-
"special": true
|
| 722 |
-
},
|
| 723 |
-
"90": {
|
| 724 |
-
"content": "<unk_88>",
|
| 725 |
-
"lstrip": false,
|
| 726 |
-
"normalized": false,
|
| 727 |
-
"rstrip": false,
|
| 728 |
-
"single_word": false,
|
| 729 |
-
"special": true
|
| 730 |
-
},
|
| 731 |
-
"91": {
|
| 732 |
-
"content": "<unk_89>",
|
| 733 |
-
"lstrip": false,
|
| 734 |
-
"normalized": false,
|
| 735 |
-
"rstrip": false,
|
| 736 |
-
"single_word": false,
|
| 737 |
-
"special": true
|
| 738 |
-
},
|
| 739 |
-
"92": {
|
| 740 |
-
"content": "<unk_90>",
|
| 741 |
-
"lstrip": false,
|
| 742 |
-
"normalized": false,
|
| 743 |
-
"rstrip": false,
|
| 744 |
-
"single_word": false,
|
| 745 |
-
"special": true
|
| 746 |
-
},
|
| 747 |
-
"93": {
|
| 748 |
-
"content": "<unk_91>",
|
| 749 |
-
"lstrip": false,
|
| 750 |
-
"normalized": false,
|
| 751 |
-
"rstrip": false,
|
| 752 |
-
"single_word": false,
|
| 753 |
-
"special": true
|
| 754 |
-
},
|
| 755 |
-
"94": {
|
| 756 |
-
"content": "<unk_92>",
|
| 757 |
-
"lstrip": false,
|
| 758 |
-
"normalized": false,
|
| 759 |
-
"rstrip": false,
|
| 760 |
-
"single_word": false,
|
| 761 |
-
"special": true
|
| 762 |
-
},
|
| 763 |
-
"95": {
|
| 764 |
-
"content": "<unk_93>",
|
| 765 |
-
"lstrip": false,
|
| 766 |
-
"normalized": false,
|
| 767 |
-
"rstrip": false,
|
| 768 |
-
"single_word": false,
|
| 769 |
-
"special": true
|
| 770 |
-
},
|
| 771 |
-
"96": {
|
| 772 |
-
"content": "<unk_94>",
|
| 773 |
-
"lstrip": false,
|
| 774 |
-
"normalized": false,
|
| 775 |
-
"rstrip": false,
|
| 776 |
-
"single_word": false,
|
| 777 |
-
"special": true
|
| 778 |
-
},
|
| 779 |
-
"97": {
|
| 780 |
-
"content": "<unk_95>",
|
| 781 |
-
"lstrip": false,
|
| 782 |
-
"normalized": false,
|
| 783 |
-
"rstrip": false,
|
| 784 |
-
"single_word": false,
|
| 785 |
-
"special": true
|
| 786 |
-
},
|
| 787 |
-
"98": {
|
| 788 |
-
"content": "<unk_96>",
|
| 789 |
-
"lstrip": false,
|
| 790 |
-
"normalized": false,
|
| 791 |
-
"rstrip": false,
|
| 792 |
-
"single_word": false,
|
| 793 |
-
"special": true
|
| 794 |
-
},
|
| 795 |
-
"99": {
|
| 796 |
-
"content": "<unk_97>",
|
| 797 |
-
"lstrip": false,
|
| 798 |
-
"normalized": false,
|
| 799 |
-
"rstrip": false,
|
| 800 |
-
"single_word": false,
|
| 801 |
-
"special": true
|
| 802 |
-
},
|
| 803 |
-
"100": {
|
| 804 |
-
"content": "<unk_98>",
|
| 805 |
-
"lstrip": false,
|
| 806 |
-
"normalized": false,
|
| 807 |
-
"rstrip": false,
|
| 808 |
-
"single_word": false,
|
| 809 |
-
"special": true
|
| 810 |
-
},
|
| 811 |
-
"101": {
|
| 812 |
-
"content": "<unk_99>",
|
| 813 |
-
"lstrip": false,
|
| 814 |
-
"normalized": false,
|
| 815 |
-
"rstrip": false,
|
| 816 |
-
"single_word": false,
|
| 817 |
-
"special": true
|
| 818 |
-
},
|
| 819 |
-
"102": {
|
| 820 |
-
"content": "<unk_100>",
|
| 821 |
-
"lstrip": false,
|
| 822 |
-
"normalized": false,
|
| 823 |
-
"rstrip": false,
|
| 824 |
-
"single_word": false,
|
| 825 |
-
"special": true
|
| 826 |
-
},
|
| 827 |
-
"103": {
|
| 828 |
-
"content": "<unk_101>",
|
| 829 |
-
"lstrip": false,
|
| 830 |
-
"normalized": false,
|
| 831 |
-
"rstrip": false,
|
| 832 |
-
"single_word": false,
|
| 833 |
-
"special": true
|
| 834 |
-
},
|
| 835 |
-
"104": {
|
| 836 |
-
"content": "<unk_102>",
|
| 837 |
-
"lstrip": false,
|
| 838 |
-
"normalized": false,
|
| 839 |
-
"rstrip": false,
|
| 840 |
-
"single_word": false,
|
| 841 |
-
"special": true
|
| 842 |
-
},
|
| 843 |
-
"105": {
|
| 844 |
-
"content": "<unk>",
|
| 845 |
-
"lstrip": false,
|
| 846 |
-
"normalized": false,
|
| 847 |
-
"rstrip": false,
|
| 848 |
-
"single_word": false,
|
| 849 |
-
"special": true
|
| 850 |
-
}
|
| 851 |
-
},
|
| 852 |
-
"additional_special_tokens": [
|
| 853 |
-
"<mask_1>",
|
| 854 |
-
"<unk_2>",
|
| 855 |
-
"<unk_3>",
|
| 856 |
-
"<unk_4>",
|
| 857 |
-
"<unk_5>",
|
| 858 |
-
"<unk_6>",
|
| 859 |
-
"<unk_7>",
|
| 860 |
-
"<unk_8>",
|
| 861 |
-
"<unk_9>",
|
| 862 |
-
"<unk_10>",
|
| 863 |
-
"<unk_11>",
|
| 864 |
-
"<unk_12>",
|
| 865 |
-
"<unk_13>",
|
| 866 |
-
"<unk_14>",
|
| 867 |
-
"<unk_15>",
|
| 868 |
-
"<unk_16>",
|
| 869 |
-
"<unk_17>",
|
| 870 |
-
"<unk_18>",
|
| 871 |
-
"<unk_19>",
|
| 872 |
-
"<unk_20>",
|
| 873 |
-
"<unk_21>",
|
| 874 |
-
"<unk_22>",
|
| 875 |
-
"<unk_23>",
|
| 876 |
-
"<unk_24>",
|
| 877 |
-
"<unk_25>",
|
| 878 |
-
"<unk_26>",
|
| 879 |
-
"<unk_27>",
|
| 880 |
-
"<unk_28>",
|
| 881 |
-
"<unk_29>",
|
| 882 |
-
"<unk_30>",
|
| 883 |
-
"<unk_31>",
|
| 884 |
-
"<unk_32>",
|
| 885 |
-
"<unk_33>",
|
| 886 |
-
"<unk_34>",
|
| 887 |
-
"<unk_35>",
|
| 888 |
-
"<unk_36>",
|
| 889 |
-
"<unk_37>",
|
| 890 |
-
"<unk_38>",
|
| 891 |
-
"<unk_39>",
|
| 892 |
-
"<unk_40>",
|
| 893 |
-
"<unk_41>",
|
| 894 |
-
"<unk_42>",
|
| 895 |
-
"<unk_43>",
|
| 896 |
-
"<unk_44>",
|
| 897 |
-
"<unk_45>",
|
| 898 |
-
"<unk_46>",
|
| 899 |
-
"<unk_47>",
|
| 900 |
-
"<unk_48>",
|
| 901 |
-
"<unk_49>",
|
| 902 |
-
"<unk_50>",
|
| 903 |
-
"<unk_51>",
|
| 904 |
-
"<unk_52>",
|
| 905 |
-
"<unk_53>",
|
| 906 |
-
"<unk_54>",
|
| 907 |
-
"<unk_55>",
|
| 908 |
-
"<unk_56>",
|
| 909 |
-
"<unk_57>",
|
| 910 |
-
"<unk_58>",
|
| 911 |
-
"<unk_59>",
|
| 912 |
-
"<unk_60>",
|
| 913 |
-
"<unk_61>",
|
| 914 |
-
"<unk_62>",
|
| 915 |
-
"<unk_63>",
|
| 916 |
-
"<unk_64>",
|
| 917 |
-
"<unk_65>",
|
| 918 |
-
"<unk_66>",
|
| 919 |
-
"<unk_67>",
|
| 920 |
-
"<unk_68>",
|
| 921 |
-
"<unk_69>",
|
| 922 |
-
"<unk_70>",
|
| 923 |
-
"<unk_71>",
|
| 924 |
-
"<unk_72>",
|
| 925 |
-
"<unk_73>",
|
| 926 |
-
"<unk_74>",
|
| 927 |
-
"<unk_75>",
|
| 928 |
-
"<unk_76>",
|
| 929 |
-
"<unk_77>",
|
| 930 |
-
"<unk_78>",
|
| 931 |
-
"<unk_79>",
|
| 932 |
-
"<unk_80>",
|
| 933 |
-
"<unk_81>",
|
| 934 |
-
"<unk_82>",
|
| 935 |
-
"<unk_83>",
|
| 936 |
-
"<unk_84>",
|
| 937 |
-
"<unk_85>",
|
| 938 |
-
"<unk_86>",
|
| 939 |
-
"<unk_87>",
|
| 940 |
-
"<unk_88>",
|
| 941 |
-
"<unk_89>",
|
| 942 |
-
"<unk_90>",
|
| 943 |
-
"<unk_91>",
|
| 944 |
-
"<unk_92>",
|
| 945 |
-
"<unk_93>",
|
| 946 |
-
"<unk_94>",
|
| 947 |
-
"<unk_95>",
|
| 948 |
-
"<unk_96>",
|
| 949 |
-
"<unk_97>",
|
| 950 |
-
"<unk_98>",
|
| 951 |
-
"<unk_99>",
|
| 952 |
-
"<unk_100>",
|
| 953 |
-
"<unk_101>",
|
| 954 |
-
"<unk_102>"
|
| 955 |
-
],
|
| 956 |
-
"clean_up_tokenization_spaces": true,
|
| 957 |
-
"eos_token": "</s>",
|
| 958 |
-
"mask_token": "<mask_2>",
|
| 959 |
-
"mask_token_sent": "<mask_1>",
|
| 960 |
-
"model_max_length": 512,
|
| 961 |
-
"offset": 103,
|
| 962 |
-
"pad_token": "<pad>",
|
| 963 |
-
"sp_model_kwargs": {},
|
| 964 |
-
"tokenizer_class": "PegasusTokenizer",
|
| 965 |
-
"unk_token": "<unk>"
|
| 966 |
-
}
|
|
|
|
| 1 |
+
{"pad_token": "<pad>", "eos_token": "</s>", "unk_token": "<unk>", "mask_token": "<mask_2>", "mask_token_sent": "<mask_1>", "additional_special_tokens": ["<mask_1>", "<unk_2>", "<unk_3>", "<unk_4>", "<unk_5>", "<unk_6>", "<unk_7>", "<unk_8>", "<unk_9>", "<unk_10>", "<unk_11>", "<unk_12>", "<unk_13>", "<unk_14>", "<unk_15>", "<unk_16>", "<unk_17>", "<unk_18>", "<unk_19>", "<unk_20>", "<unk_21>", "<unk_22>", "<unk_23>", "<unk_24>", "<unk_25>", "<unk_26>", "<unk_27>", "<unk_28>", "<unk_29>", "<unk_30>", "<unk_31>", "<unk_32>", "<unk_33>", "<unk_34>", "<unk_35>", "<unk_36>", "<unk_37>", "<unk_38>", "<unk_39>", "<unk_40>", "<unk_41>", "<unk_42>", "<unk_43>", "<unk_44>", "<unk_45>", "<unk_46>", "<unk_47>", "<unk_48>", "<unk_49>", "<unk_50>", "<unk_51>", "<unk_52>", "<unk_53>", "<unk_54>", "<unk_55>", "<unk_56>", "<unk_57>", "<unk_58>", "<unk_59>", "<unk_60>", "<unk_61>", "<unk_62>", "<unk_63>", "<unk_64>", "<unk_65>", "<unk_66>", "<unk_67>", "<unk_68>", "<unk_69>", "<unk_70>", "<unk_71>", "<unk_72>", "<unk_73>", "<unk_74>", "<unk_75>", "<unk_76>", "<unk_77>", "<unk_78>", "<unk_79>", "<unk_80>", "<unk_81>", "<unk_82>", "<unk_83>", "<unk_84>", "<unk_85>", "<unk_86>", "<unk_87>", "<unk_88>", "<unk_89>", "<unk_90>", "<unk_91>", "<unk_92>", "<unk_93>", "<unk_94>", "<unk_95>", "<unk_96>", "<unk_97>", "<unk_98>", "<unk_99>", "<unk_100>", "<unk_101>", "<unk_102>"], "model_max_length": 512, "name_or_path": "google/pegasus-xsum"}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|