Commit
·
4867f26
verified
·
0
Parent(s):
Super-squash branch 'main' using huggingface_hub
Browse files- .gitattributes +35 -0
- README.md +119 -0
- added_tokens.json +3 -0
- config.json +48 -0
- merges.txt +0 -0
- model.safetensors +3 -0
- special_tokens_map.json +51 -0
- tokenizer.json +0 -0
- tokenizer_config.json +72 -0
- vocab.json +0 -0
.gitattributes
ADDED
|
@@ -0,0 +1,35 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,119 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: artistic-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
tags:
|
| 6 |
+
- '16384'
|
| 7 |
+
- 16k
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
# mega-encoder-small-16k-v1
|
| 11 |
+
|
| 12 |
+
This is a "huggingface-native" pretrained encoder-only model with 16384 context length. The model architecture is [MEGA](https://arxiv.org/abs/2209.10655).
|
| 13 |
+
|
| 14 |
+
## Numbers
|
| 15 |
+
|
| 16 |
+
Despite being a long-context model evaluated on a short-context benchmark, MEGA holds up decently:
|
| 17 |
+
|
| 18 |
+
| Model | Size | CTX | Avg |
|
| 19 |
+
| :------------------------ | :---- | ----: | -----: |
|
| 20 |
+
| mega-encoder-small-16k-v1 | 122M | 16384 | 0.777 |
|
| 21 |
+
| bert-base-uncased | 110M | 512 | 0.7905 |
|
| 22 |
+
| roberta-base | 125M | 514 | 0.86 |
|
| 23 |
+
| [bert-plus-L8-4096-v1.0](https://huggingface.co/BEE-spoke-data/bert-plus-L8-4096-v1.0) | 88.1M | 4096 | 0.8278 |
|
| 24 |
+
| [mega-wikitext103](https://huggingface.co/mnaylor/mega-base-wikitext) | 7.0M | 10000 | 0.48 |
|
| 25 |
+
|
| 26 |
+
<details>
|
| 27 |
+
<summary><strong>GLUE Details</strong></summary>
|
| 28 |
+
|
| 29 |
+
| Model | Size | CTX | Avg | CoLA | SST2 | MRPC | STSB | QQP | MNLI | QNLI | RTE |
|
| 30 |
+
| :------------------------ | :---- | ----: | -----: | -----: | ----: | -----: | -----: | ----: | ----: | ----: | -----: |
|
| 31 |
+
| mega-encoder-small-16k-v1 | 122M | 16384 | 0.777 | 0.454 | 0.914 | 0.8404 | 0.906 | 0.894 | 0.806 | 0.842 | 0.556 |
|
| 32 |
+
| bert-base-uncased | 110M | 512 | 0.7905 | 0.521 | 0.935 | 0.889 | 0.858 | 0.712 | 0.84 | 0.905 | 0.664 |
|
| 33 |
+
| roberta-base | 125M | 514 | 0.86 | 0.64 | 0.95 | 0.9 | 0.91 | 0.92 | 0.88 | 0.93 | 0.79 |
|
| 34 |
+
| bert-plus-L8-4096-v1.0 | 88.1M | 4096 | 0.8278 | 0.6272 | 0.906 | 0.8659 | 0.9207 | 0.906 | 0.832 | 0.9 | 0.6643 |
|
| 35 |
+
| mega-wikitext103 | 7M | 10000| 0.480 | 0.00 | 0.732 | 0.748 | -0.087 | 0.701 | 0.54 | 0.598 | 0.513 |
|
| 36 |
+
|
| 37 |
+
The evals for MEGA/bert-plus can be found in [this open wandb project](https://wandb.ai/pszemraj/glue-benchmarking) and are taken as the max observed values on the validation sets. The values for other models are taken as reported in their papers.
|
| 38 |
+
</details>
|
| 39 |
+
|
| 40 |
+
## Design
|
| 41 |
+
|
| 42 |
+
### Architecture
|
| 43 |
+
|
| 44 |
+
This encoder model has 8 layers, hidden size 768, and a feedforward ratio of 3x. The resulting total size is 122M params.
|
| 45 |
+
|
| 46 |
+
<details>
|
| 47 |
+
<summary><strong>Architecture Details</strong></summary>
|
| 48 |
+
|
| 49 |
+
Details:
|
| 50 |
+
|
| 51 |
+
1. We use a hidden size of 768, and a 3x hidden:feedforward ratio.
|
| 52 |
+
- This contrasts with the 2x ratio used in the paper
|
| 53 |
+
2. To handle the long context, we use MEGA's chunking mechanism, with a chunk length of 1024. As such, there is a linear increase in VRAM usage for multiples of this context length past 1024.
|
| 54 |
+
3. EMA dimension: we use an EMA dimension of 32 in the interest of modeling long and (potentially) complex sequences
|
| 55 |
+
4. We use 8 layers, and a context length of 16384 tokens.
|
| 56 |
+
5. We use `"simple"` relative positional embeddings instead of the rotary embeddings touted in the paper.
|
| 57 |
+
- This choice came from examining [the detailed logs of models](https://github.com/facebookresearch/mega/blob/aeaa4b44592cd1d60a9a34554e359eda2a62b03b/examples/mega/README.lra.md) trained/evaluated on [the LRA benchmark](https://paperswithcode.com/sota/long-range-modeling-on-lra). Models geared towards encoder-type tasks all use the simple relative positional embeddings
|
| 58 |
+
- We observed poor performance/unexplicable 'walls' in previous experiments using rotary positional embeddings with MEGA as an encoder
|
| 59 |
+
6. BART tokenizer: we use the tokenizer from `facebook/bart-large`
|
| 60 |
+
- This choice was motivated mostly from the desire to use the MEGA encoder in combination with a decoder model in the [HF EncoderDecoderModel class](https://huggingface.co/docs/transformers/model_doc/encoder-decoder) in a "huggingface-native" way. BART is supported as a decoder for the this class, **and** BART's tokenizer has the necessary preprocessing for encoder training.
|
| 61 |
+
- - Example usage of MEGA+BART to create an encoder-decoder [here](https://colab.research.google.com/gist/pszemraj/4bac8635361543b66207d73e4b25a13a/mega-encoder-small-16k-v1-for-text2text.ipynb)
|
| 62 |
+
- The tokenizer's vocab is **exactly** the same as Roberta's
|
| 63 |
+
</details>
|
| 64 |
+
|
| 65 |
+
|
| 66 |
+
### Training
|
| 67 |
+
|
| 68 |
+
This model was trained with the transformers package. You can find (mostly unorganized) [training runs on wandb here](https://wandb.ai/pszemraj/mega-tuning-longctx).
|
| 69 |
+
|
| 70 |
+
<details>
|
| 71 |
+
<summary><strong>Training Details</strong></summary>
|
| 72 |
+
|
| 73 |
+
1. **Multi-task training:** the majority of training is "standard" MLM, with no next-sentence prediction, etc. However, in the interest of pretraining a _useful_ encoder for fine-tuning on various tasks, we mix-in such tasks in between several of the MLM phases, carrying-over the model's backbone to the next training phase.
|
| 74 |
+
- an example would be multiple-choice tuning on the [swag](https://huggingface.co/datasets/swag)dataset
|
| 75 |
+
2. **MLM Mask Ratio 40% default:** we use 40% for the MLM ratio, following [Wettig et al. 2022](https://arxiv.org/abs/2202.08005). This is decreased slightly for training at longer sequences (8192+) to encourage the model to learn/leverage the available context in predictions.
|
| 76 |
+
3. AMP with bf16
|
| 77 |
+
4. **Gradient checkpointing implementation**: training this (or similar) models at ctx 8192 or longer becomes quite vram intensive despite the linear increase in memory usage
|
| 78 |
+
</details>
|
| 79 |
+
|
| 80 |
+
## Usage
|
| 81 |
+
|
| 82 |
+
This is a pretrained model intended to be [fine-tuned on various encoder-compatible tasks](https://github.com/huggingface/transformers/tree/831bc25d8fdb85768402f772cf65cc3d7872b211/examples/pytorch). However, if you are interested in testing inference with this model or have a deep passion for predicting mask tokens, you can use the following code:
|
| 83 |
+
|
| 84 |
+
```python
|
| 85 |
+
import json
|
| 86 |
+
from transformers import pipeline
|
| 87 |
+
|
| 88 |
+
pipe = pipeline("fill-mask", model="BEE-spoke-data/mega-encoder-small-16k-v1")
|
| 89 |
+
text = "I love to <mask> memes."
|
| 90 |
+
result = pipe(text)
|
| 91 |
+
print(json.dumps(result, indent=2))
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
### Gradient checkpointing implementation
|
| 95 |
+
|
| 96 |
+
If fine-tuning this model on `<task>`, using gradient checkpointing makes training at 16384 context quite feasible. By installing the transformers fork below and passing `gradient_checkpointing=True` in the training args, you should be able to finetune at batch size 1 with VRAM to spare on a single 3090/4090.
|
| 97 |
+
|
| 98 |
+
```sh
|
| 99 |
+
pip uninstall -y transformers
|
| 100 |
+
pip install -U git+https://github.com/pszemraj/transformers.git@mega-gradient-checkpointing
|
| 101 |
+
pip install -U huggingface-hub
|
| 102 |
+
```
|
| 103 |
+
|
| 104 |
+
if there is sufficient interest, we can look at making a PR into the official repo.
|
| 105 |
+
|
| 106 |
+
## Citation
|
| 107 |
+
|
| 108 |
+
if you find this useful, please consider citing this DOI, it would make us happy.
|
| 109 |
+
|
| 110 |
+
```
|
| 111 |
+
@misc{beespoke_data_2024,
|
| 112 |
+
author = {Peter Szemraj and Vincent Haines and {BEEspoke Data}},
|
| 113 |
+
title = {mega-encoder-small-16k-v1 (Revision 1476bcf)},
|
| 114 |
+
year = 2024,
|
| 115 |
+
url = {https://huggingface.co/BEE-spoke-data/mega-encoder-small-16k-v1},
|
| 116 |
+
doi = {10.57967/hf/1837},
|
| 117 |
+
publisher = {Hugging Face}
|
| 118 |
+
}
|
| 119 |
+
```
|
added_tokens.json
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"<SEP>": 50265
|
| 3 |
+
}
|
config.json
ADDED
|
@@ -0,0 +1,48 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_name_or_path": "BEE-spoke-data/mega-enc-MKVs-L8-v0.8-dolma-xlong_16384",
|
| 3 |
+
"activation": "silu",
|
| 4 |
+
"add_lm_hidden_dense_layer": false,
|
| 5 |
+
"add_token_type_embeddings": true,
|
| 6 |
+
"architectures": [
|
| 7 |
+
"MegaForMaskedLM"
|
| 8 |
+
],
|
| 9 |
+
"attention_activation": "softmax",
|
| 10 |
+
"attention_probs_dropout_prob": 0,
|
| 11 |
+
"bidirectional": true,
|
| 12 |
+
"bos_token_id": 0,
|
| 13 |
+
"chunk_size": 1024,
|
| 14 |
+
"classifier_dropout": null,
|
| 15 |
+
"dropout_prob": 0.05,
|
| 16 |
+
"ema_beta_range": 0.02,
|
| 17 |
+
"ema_delta_alpha_range": 0.2,
|
| 18 |
+
"ema_gamma_omega_range": 1.0,
|
| 19 |
+
"ema_projection_size": 32,
|
| 20 |
+
"eos_token_id": 2,
|
| 21 |
+
"hidden_dropout_prob": 0,
|
| 22 |
+
"hidden_size": 768,
|
| 23 |
+
"initializer_range": 0.02,
|
| 24 |
+
"intermediate_size": 2304,
|
| 25 |
+
"max_positions": 16384,
|
| 26 |
+
"model_type": "mega",
|
| 27 |
+
"nffn_activation_dropout_prob": 0,
|
| 28 |
+
"nffn_hidden_size": 2304,
|
| 29 |
+
"norm_affine": true,
|
| 30 |
+
"normalization_type": "scalenorm",
|
| 31 |
+
"normalize_before_ffn": false,
|
| 32 |
+
"normalize_before_mega": false,
|
| 33 |
+
"num_attention_heads": 1,
|
| 34 |
+
"num_hidden_layers": 8,
|
| 35 |
+
"pad_token_id": 1,
|
| 36 |
+
"relative_positional_bias": "simple",
|
| 37 |
+
"sep_token_id": 2,
|
| 38 |
+
"shared_representation_size": 192,
|
| 39 |
+
"torch_dtype": "float32",
|
| 40 |
+
"transformers_version": "4.38.2",
|
| 41 |
+
"truncation": null,
|
| 42 |
+
"type_vocab_size": 2,
|
| 43 |
+
"use_cache": true,
|
| 44 |
+
"use_chunking": true,
|
| 45 |
+
"use_feature_dropout": false,
|
| 46 |
+
"use_normalized_ffn": true,
|
| 47 |
+
"vocab_size": 50304
|
| 48 |
+
}
|
merges.txt
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d1acb34fbb0279191844c0fd0ca8c0dfd86f1633760b01a831f2a68a11515501
|
| 3 |
+
size 488057176
|
special_tokens_map.json
ADDED
|
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bos_token": {
|
| 3 |
+
"content": "<s>",
|
| 4 |
+
"lstrip": false,
|
| 5 |
+
"normalized": true,
|
| 6 |
+
"rstrip": false,
|
| 7 |
+
"single_word": false
|
| 8 |
+
},
|
| 9 |
+
"cls_token": {
|
| 10 |
+
"content": "<s>",
|
| 11 |
+
"lstrip": false,
|
| 12 |
+
"normalized": true,
|
| 13 |
+
"rstrip": false,
|
| 14 |
+
"single_word": false
|
| 15 |
+
},
|
| 16 |
+
"eos_token": {
|
| 17 |
+
"content": "</s>",
|
| 18 |
+
"lstrip": false,
|
| 19 |
+
"normalized": true,
|
| 20 |
+
"rstrip": false,
|
| 21 |
+
"single_word": false
|
| 22 |
+
},
|
| 23 |
+
"mask_token": {
|
| 24 |
+
"content": "<mask>",
|
| 25 |
+
"lstrip": true,
|
| 26 |
+
"normalized": true,
|
| 27 |
+
"rstrip": false,
|
| 28 |
+
"single_word": false
|
| 29 |
+
},
|
| 30 |
+
"pad_token": {
|
| 31 |
+
"content": "<pad>",
|
| 32 |
+
"lstrip": false,
|
| 33 |
+
"normalized": true,
|
| 34 |
+
"rstrip": false,
|
| 35 |
+
"single_word": false
|
| 36 |
+
},
|
| 37 |
+
"sep_token": {
|
| 38 |
+
"content": "</s>",
|
| 39 |
+
"lstrip": false,
|
| 40 |
+
"normalized": true,
|
| 41 |
+
"rstrip": false,
|
| 42 |
+
"single_word": false
|
| 43 |
+
},
|
| 44 |
+
"unk_token": {
|
| 45 |
+
"content": "</s>",
|
| 46 |
+
"lstrip": false,
|
| 47 |
+
"normalized": true,
|
| 48 |
+
"rstrip": false,
|
| 49 |
+
"single_word": false
|
| 50 |
+
}
|
| 51 |
+
}
|
tokenizer.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
tokenizer_config.json
ADDED
|
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"add_prefix_space": false,
|
| 3 |
+
"added_tokens_decoder": {
|
| 4 |
+
"0": {
|
| 5 |
+
"content": "<s>",
|
| 6 |
+
"lstrip": false,
|
| 7 |
+
"normalized": true,
|
| 8 |
+
"rstrip": false,
|
| 9 |
+
"single_word": false,
|
| 10 |
+
"special": true
|
| 11 |
+
},
|
| 12 |
+
"1": {
|
| 13 |
+
"content": "<pad>",
|
| 14 |
+
"lstrip": false,
|
| 15 |
+
"normalized": true,
|
| 16 |
+
"rstrip": false,
|
| 17 |
+
"single_word": false,
|
| 18 |
+
"special": true
|
| 19 |
+
},
|
| 20 |
+
"2": {
|
| 21 |
+
"content": "</s>",
|
| 22 |
+
"lstrip": false,
|
| 23 |
+
"normalized": true,
|
| 24 |
+
"rstrip": false,
|
| 25 |
+
"single_word": false,
|
| 26 |
+
"special": true
|
| 27 |
+
},
|
| 28 |
+
"3": {
|
| 29 |
+
"content": "<unk>",
|
| 30 |
+
"lstrip": false,
|
| 31 |
+
"normalized": true,
|
| 32 |
+
"rstrip": false,
|
| 33 |
+
"single_word": false,
|
| 34 |
+
"special": true
|
| 35 |
+
},
|
| 36 |
+
"50264": {
|
| 37 |
+
"content": "<mask>",
|
| 38 |
+
"lstrip": true,
|
| 39 |
+
"normalized": true,
|
| 40 |
+
"rstrip": false,
|
| 41 |
+
"single_word": false,
|
| 42 |
+
"special": true
|
| 43 |
+
},
|
| 44 |
+
"50265": {
|
| 45 |
+
"content": "<SEP>",
|
| 46 |
+
"lstrip": false,
|
| 47 |
+
"normalized": false,
|
| 48 |
+
"rstrip": false,
|
| 49 |
+
"single_word": false,
|
| 50 |
+
"special": true
|
| 51 |
+
}
|
| 52 |
+
},
|
| 53 |
+
"bos_token": "<s>",
|
| 54 |
+
"clean_up_tokenization_spaces": true,
|
| 55 |
+
"cls_token": "<s>",
|
| 56 |
+
"eos_token": "</s>",
|
| 57 |
+
"errors": "replace",
|
| 58 |
+
"mask_token": "<mask>",
|
| 59 |
+
"max_length": 16384,
|
| 60 |
+
"model_max_length": 16384,
|
| 61 |
+
"pad_to_multiple_of": 1024,
|
| 62 |
+
"pad_token": "<pad>",
|
| 63 |
+
"pad_token_type_id": 0,
|
| 64 |
+
"padding_side": "right",
|
| 65 |
+
"sep_token": "</s>",
|
| 66 |
+
"stride": 0,
|
| 67 |
+
"tokenizer_class": "BartTokenizer",
|
| 68 |
+
"trim_offsets": true,
|
| 69 |
+
"truncation_side": "right",
|
| 70 |
+
"truncation_strategy": "longest_first",
|
| 71 |
+
"unk_token": "</s>"
|
| 72 |
+
}
|
vocab.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|