Update README.md
Browse files
README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
datasets:
|
| 4 |
-
- jheuschkel/cds-dataset
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
pipeline_tag: fill-mask
|
|
@@ -27,7 +27,7 @@ misc:
|
|
| 27 |
|
| 28 |
- This repository contains code to utilize the model, and reproduce results of the preprint [**Advancing Codon Language Modeling with Synonymous Codon Constrained Masking**](https://doi.org/10.1101/2025.08.19.671089).
|
| 29 |
- Unlike other Codon Language Models, SynCodonLM was trained with logit-level control, masking logits for non-synonymous codons. This allowed the model to learn codon-specific patterns disentangled from protein-level semantics.
|
| 30 |
-
- [Pre-training dataset of
|
| 31 |
---
|
| 32 |
## Installation
|
| 33 |
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
datasets:
|
| 4 |
+
- jheuschkel/clustered-cds-dataset
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
pipeline_tag: fill-mask
|
|
|
|
| 27 |
|
| 28 |
- This repository contains code to utilize the model, and reproduce results of the preprint [**Advancing Codon Language Modeling with Synonymous Codon Constrained Masking**](https://doi.org/10.1101/2025.08.19.671089).
|
| 29 |
- Unlike other Codon Language Models, SynCodonLM was trained with logit-level control, masking logits for non-synonymous codons. This allowed the model to learn codon-specific patterns disentangled from protein-level semantics.
|
| 30 |
+
- [Pre-training dataset of 43 Million CDS is available on Hugging Face here.](https://huggingface.co/datasets/jheuschkel/clustered-cds-dataset)
|
| 31 |
---
|
| 32 |
## Installation
|
| 33 |
|