kiddothe2b
/

longformer-mini-1024

@@ -2,6 +2,7 @@
 license: cc-by-nc-sa-4.0
 pipeline_tag: fill-mask
 language: en
 tags:
 - long_documents
 datasets:
@@ -15,9 +16,9 @@ model-index:
 ## Model description
-[Longformer](https://arxiv.org/abs/2004.05150) is a transformer model for long documents.  This version of Longformer is presented in [An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification (Chalkidis et al., 2022)](https://arxiv.org/abs/xxx).
-The model has been warm-started re-using the weights of miniature BERT [(Turc et al., 2019)](https://arxiv.org/abs/1908.08962), and continued pre-trained for MLM following the paradigm of Longformer released by [Beltagy et al. (2020)](](https://arxiv.org/abs/1908.08962)). It supports sequences of length up to 1,024.
 Longformer uses a combination of a sliding window (local) attention and global attention. Global attention is user-configured based on the task to allow the model to learn task-specific representations.
@@ -27,7 +28,7 @@ You can use the raw model for masked language modeling, but it's mostly intended
 See the [model hub](https://huggingface.co/models?filter=longformer) to look for fine-tuned versions on a task that
 interests you.
-Note that this model is primarily aimed at being fine-tuned on tasks that use the whole document to make decisions, such as document classification, sequential sentence classification or question answering.
 ## How to use
@@ -39,12 +40,12 @@ mlm_model = pipeline('fill-mask', model='kiddothe2b/longformer-mini-1024', trust
 mlm_model("Hello I'm a <mask> model.")
 ```
-You can also fine-tun it for SequenceClassification, SequentialSentenceClassification, and MultipleChoice down-stream tasks:
 ```python
 from transformers import AutoTokenizer, AutoModelforSequenceClassification
 tokenizer = AutoTokenizer.from_pretrained("kiddothe2b/longformer-mini-1024", trust_remote_code=True)
-doc_classifier = AutoModelforSequenceClassification(model='kiddothe2b/longformer-mini-1024', trust_remote_code=True)
 ```
 ## Limitations and bias
@@ -93,19 +94,24 @@ The following hyperparameters were used during training:
 - Tokenizers 0.11.6
-## Citing
-If you use this Longformer model in your research, please cite [An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification](https://arxiv.org/abs/xxx), alongside [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150).
 ```
 @misc{chalkidis-etal-2022-hat,
-  url = {https://arxiv.org/abs/xxx},
   author = {Chalkidis, Ilias and Dai, Xiang and Fergadiotis, Manos and Malakasiotis, Prodromos and Elliott, Desmond},
   title = {An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification},
   publisher = {arXiv},
   year = {2022},
 }
 @article{Beltagy2020Longformer,
   title={Longformer: The Long-Document Transformer},
   author={Iz Beltagy and Matthew E. Peters and Arman Cohan},

 license: cc-by-nc-sa-4.0
 pipeline_tag: fill-mask
 language: en
+arxiv:
 tags:
 - long_documents
 datasets:
 ## Model description
+[Longformer](https://arxiv.org/abs/2004.05150) is a transformer model for long documents.  This version of Longformer is presented in [An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification (Chalkidis et al., 2022)](https://arxiv.org/abs/2210.05529).
+The model has been warm-started re-using the weights of miniature BERT (Turc et al., 2019), and continued pre-trained for MLM following the paradigm of Longformer released by [Beltagy et al. (2020)](](https://arxiv.org/abs/1908.08962)). It supports sequences of length up to 1,024.
 Longformer uses a combination of a sliding window (local) attention and global attention. Global attention is user-configured based on the task to allow the model to learn task-specific representations.
 See the [model hub](https://huggingface.co/models?filter=longformer) to look for fine-tuned versions on a task that
 interests you.
+Note that this model is primarily aimed at being fine-tuned on tasks that use the whole document to make decisions, such as document classification, sequential sentence classification, or question answering.
 ## How to use
 mlm_model("Hello I'm a <mask> model.")
 ```
+You can also fine-tune it for SequenceClassification, SequentialSentenceClassification, and MultipleChoice down-stream tasks:
 ```python
 from transformers import AutoTokenizer, AutoModelforSequenceClassification
 tokenizer = AutoTokenizer.from_pretrained("kiddothe2b/longformer-mini-1024", trust_remote_code=True)
+doc_classifier = AutoModelforSequenceClassification("kiddothe2b/longformer-mini-1024", trust_remote_code=True)
 ```
 ## Limitations and bias
 - Tokenizers 0.11.6
+## Citing
+If you use HAT in your research, please cite:
+[An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification](https://arxiv.org/abs/2210.05529). Ilias Chalkidis, Xiang Dai, Manos Fergadiotis, Prodromos Malakasiotis, and Desmond Elliott. 2022. arXiv:2210.05529 (Preprint).
 ```
 @misc{chalkidis-etal-2022-hat,
+  url = {https://arxiv.org/abs/2210.05529},
   author = {Chalkidis, Ilias and Dai, Xiang and Fergadiotis, Manos and Malakasiotis, Prodromos and Elliott, Desmond},
   title = {An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification},
   publisher = {arXiv},
   year = {2022},
 }
+```
+Also cite the original work: [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150).
+```
 @article{Beltagy2020Longformer,
   title={Longformer: The Long-Document Transformer},
   author={Iz Beltagy and Matthew E. Peters and Arman Cohan},