Improve model card metadata and add paper link
Browse filesHi! I'm Niels from the Hugging Face community team.
I've opened this PR to improve the discoverability of your model by adding `library_name` and `pipeline_tag` metadata. This enables features like the "Use in Transformers" button and ensures the model appears in relevant searches on the Hub.
I've also added a link to your paper [D3LM: A Discrete DNA Diffusion Language Model for Bidirectional DNA Understanding and Generation](https://huggingface.co/papers/2603.01780) at the top of the model card.
README.md
CHANGED
|
@@ -1,14 +1,18 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
| 3 |
tags:
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
---
|
| 9 |
|
| 10 |
# D3LM: A Discrete DNA Diffusion Language Model for Bidirectional DNA Understanding and Generation
|
| 11 |
|
|
|
|
|
|
|
| 12 |
A masked diffusion language model for **unconditional mammalian DNA sequence generation**, built on the ESM encoder with Rotary Positional Embeddings.
|
| 13 |
|
| 14 |
> **Initialization**: trained **from scratch** (random initialization) with masked diffusion objective on mammalian DNA.
|
|
@@ -54,7 +58,8 @@ with torch.no_grad():
|
|
| 54 |
outputs = model.diffusion_generate(inputs=input_ids, generation_config=config)
|
| 55 |
|
| 56 |
for i, seq in enumerate(outputs.sequences):
|
| 57 |
-
print(f">{i}
|
|
|
|
| 58 |
```
|
| 59 |
|
| 60 |
## Generation Parameters
|
|
@@ -83,4 +88,4 @@ for i, seq in enumerate(outputs.sequences):
|
|
| 83 |
|
| 84 |
## License
|
| 85 |
|
| 86 |
-
Apache 2.0
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: text-generation
|
| 5 |
tags:
|
| 6 |
+
- biology
|
| 7 |
+
- genomics
|
| 8 |
+
- dna
|
| 9 |
+
- diffusion
|
| 10 |
---
|
| 11 |
|
| 12 |
# D3LM: A Discrete DNA Diffusion Language Model for Bidirectional DNA Understanding and Generation
|
| 13 |
|
| 14 |
+
This repository contains the weights for **D3LM-scratch**, presented in the paper [D3LM: A Discrete DNA Diffusion Language Model for Bidirectional DNA Understanding and Generation](https://huggingface.co/papers/2603.01780).
|
| 15 |
+
|
| 16 |
A masked diffusion language model for **unconditional mammalian DNA sequence generation**, built on the ESM encoder with Rotary Positional Embeddings.
|
| 17 |
|
| 18 |
> **Initialization**: trained **from scratch** (random initialization) with masked diffusion objective on mammalian DNA.
|
|
|
|
| 58 |
outputs = model.diffusion_generate(inputs=input_ids, generation_config=config)
|
| 59 |
|
| 60 |
for i, seq in enumerate(outputs.sequences):
|
| 61 |
+
print(f">{i}
|
| 62 |
+
{tokenizer.decode(seq, skip_special_tokens=True).replace(' ', '')}")
|
| 63 |
```
|
| 64 |
|
| 65 |
## Generation Parameters
|
|
|
|
| 88 |
|
| 89 |
## License
|
| 90 |
|
| 91 |
+
Apache 2.0
|