howey
/

HDT-ED

+---
+library_name: transformers
+license: apache-2.0
+language:
+- en
+datasets:
+- howey/unarXive
+- howey/wiki_en
+- howey/hupd
+---
+# Model Weights Comming Soon!
+## Using HDT
+To use the pre-trained model for [UL2](https://arxiv.org/abs/2205.05131), use the following snippet:
+```python
+from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
+# See the `MDLM` collection page on the hub for list of available models.
+tokenizer = transformers.AutoTokenizer.from_pretrained('howey/HDT-ED')
+model_name = 'howey/HDT-ED'
+model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
+```
+For more details, please see our github repository: [HDT](https://github.com/autonomousvision/hdt)
+## Model Details
+The model, which has a context length of `8192` and is similar in size to BERT with approximately `110M` parameters,
+was trained on standard UL2 task with a Transformer-based architecture using our proposed hierarchical attention.
+The training regimen comprised 72 hours on the ArXiv+Wikipedia+HUPD corpus, involving the processing of a total of `2.6 billion` tokens.
+For more details, please see our paper: [HDT: Hierarchical Document Transformer](https://arxiv.org/pdf/2407.08330).
+## Citation
+<!-- If there is a paper or blog post introducing the model, the Bibtex information for that should go in this section. -->
+Please cite our work using the bibtex below:
+**BibTeX:**
+```
+@inproceedings{He2024COLM,
+      title={HDT: Hierarchical Document Transformer},
+      author={Haoyu He and Markus Flicke and Jan Buchmann and Iryna Gurevych and Andreas Geiger},
+      year={2024},
+      booktitle={Conference on Language Modeling}
+}
+```
+## Model Card Contact
+Haoyu (haoyu.he@uni-tuebingen.de)