jheuschkel commited on
Commit
e6ff2b5
·
verified ·
1 Parent(s): d46d7ec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  license: apache-2.0
3
  datasets:
4
- - jheuschkel/cds-dataset
5
  language:
6
  - en
7
  pipeline_tag: fill-mask
@@ -27,7 +27,7 @@ misc:
27
 
28
  - This repository contains code to utilize the model, and reproduce results of the preprint [**Advancing Codon Language Modeling with Synonymous Codon Constrained Masking**](https://doi.org/10.1101/2025.08.19.671089).
29
  - Unlike other Codon Language Models, SynCodonLM was trained with logit-level control, masking logits for non-synonymous codons. This allowed the model to learn codon-specific patterns disentangled from protein-level semantics.
30
- - [Pre-training dataset of 66 Million CDS is available on Hugging Face here.](https://huggingface.co/datasets/jheuschkel/cds-dataset)
31
  ---
32
  ## Installation
33
 
 
1
  ---
2
  license: apache-2.0
3
  datasets:
4
+ - jheuschkel/clustered-cds-dataset
5
  language:
6
  - en
7
  pipeline_tag: fill-mask
 
27
 
28
  - This repository contains code to utilize the model, and reproduce results of the preprint [**Advancing Codon Language Modeling with Synonymous Codon Constrained Masking**](https://doi.org/10.1101/2025.08.19.671089).
29
  - Unlike other Codon Language Models, SynCodonLM was trained with logit-level control, masking logits for non-synonymous codons. This allowed the model to learn codon-specific patterns disentangled from protein-level semantics.
30
+ - [Pre-training dataset of 43 Million CDS is available on Hugging Face here.](https://huggingface.co/datasets/jheuschkel/clustered-cds-dataset)
31
  ---
32
  ## Installation
33