| --- |
| datasets: |
| - Skylion007/openwebtext |
| language: |
| - en |
| library_name: transformers |
| license: apache-2.0 |
| metrics: |
| - perplexity |
| pipeline_tag: text-generation |
| --- |
| |
| ## Using DUO |
| To use the pre-trained model for masked language modeling, use the following snippet: |
| ```python |
| from transformers import AutoModelForMaskedLM, AutoTokenizer |
| |
| # See the `DUO` collection page on the hub for list of available models. |
| tokenizer = transformers.AutoTokenizer.from_pretrained('gpt2') |
| model = AutoModelForMaskedLM.from_pretrained('s-sahoo/duo-distilled') |
| ``` |
| For a hands-on example, check out this [Colab notebook](https://colab.research.google.com/drive/1Sf7R-dqdR6gq-H8nyZ9E3ZkyvqMTqcwq?usp=sharing). |
| For more information and implementation details, visit our github repository: [DUO](https://github.com/s-sahoo/duo) |
|
|
| ## Model Details |
| The model, which has a context length of `1024` and is similar in size to GPT2-medium with approximately `130 million` non-embedding parameters, |
| was trained for 1M steps on the OpenWebText corpus. |
|
|
| For more details, please see our paper: [The Diffusion Duality](https://huggingface.co/papers/2506.10892). |
|
|
| Project page: https://s-sahoo.com/duo |
|
|
| ## Citation |
|
|
| <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
| Please cite our work using the bibtex below: |
|
|
| **BibTeX:** |
|
|
| ``` |
| @inproceedings{ |
| sahoo2025the, |
| title={The Diffusion Duality}, |
| author={Subham Sekhar Sahoo and Justin Deschenaux and Aaron Gokaslan and Guanghan Wang and Justin T Chiu and Volodymyr Kuleshov}, |
| booktitle={Forty-second International Conference on Machine Learning}, |
| year={2025}, |
| url={https://openreview.net/forum?id=9P9Y8FOSOk} |
| } |
| ``` |
|
|
| ## Model Card Contact |
| Subham Sekhar Sahoo (ssahoo@cs.cornell.edu) |