alegendaryfish commited on Apr 8

Commit

1dbb59f

verified ·

1 Parent(s): 75c84f0

Publish final model only and clean public repo

Browse files

Files changed (26) hide show

CodonTranslator/__pycache__/__init__.cpython-312.pyc +0 -0
CodonTranslator/__pycache__/layers.cpython-312.pyc +0 -0
CodonTranslator/__pycache__/models.cpython-312.pyc +0 -0
CodonTranslator/__pycache__/tokenizer.cpython-312.pyc +0 -0
CodonTranslator/__pycache__/translator.cpython-312.pyc +0 -0
README.md +1 -2
__pycache__/precompute_embeddings.cpython-312.pyc +0 -0
__pycache__/resplit_data_v3.cpython-312.pyc +0 -0
__pycache__/sampling.cpython-312.pyc +0 -0
__pycache__/train.cpython-312.pyc +0 -0
final_model/model.safetensors +1 -1
final_model/trainer_state.json +1 -1
src/__pycache__/__init__.cpython-312.pyc +0 -0
src/__pycache__/dataset.cpython-312.pyc +0 -0
src/__pycache__/layers.cpython-312.pyc +0 -0
src/__pycache__/models.cpython-312.pyc +0 -0
src/__pycache__/sampler.cpython-312.pyc +0 -0
src/__pycache__/tokenizer.cpython-312.pyc +0 -0
src/__pycache__/trainer.cpython-312.pyc +0 -0
training_checkpoints/checkpoint-71000/config.json +0 -17
training_checkpoints/checkpoint-71000/model.safetensors +0 -3
training_checkpoints/checkpoint-71000/optimizer.pt +0 -3
training_checkpoints/checkpoint-71000/scheduler.pt +0 -3
training_checkpoints/checkpoint-71000/trainer_config.json +0 -17
training_checkpoints/checkpoint-71000/trainer_state.json +0 -4
training_checkpoints/checkpoint-71000/vocab.json +0 -78

CodonTranslator/__pycache__/__init__.cpython-312.pyc DELETED Viewed

Binary file (222 Bytes)

CodonTranslator/__pycache__/layers.cpython-312.pyc DELETED Viewed

Binary file (17.4 kB)

CodonTranslator/__pycache__/models.cpython-312.pyc DELETED Viewed

Binary file (20.1 kB)

CodonTranslator/__pycache__/tokenizer.cpython-312.pyc DELETED Viewed

Binary file (11.2 kB)

CodonTranslator/__pycache__/translator.cpython-312.pyc DELETED Viewed

Binary file (29.9 kB)

README.md CHANGED Viewed

@@ -18,7 +18,6 @@ CodonTranslator is a protein-conditioned codon sequence generation model trained
 This repository is the public model and training-code release. It contains:
 - `final_model/`: inference-ready weights
-- `training_checkpoints/checkpoint-71000/`: a resumable training checkpoint
 - `src/`, `train.py`, `sampling.py`: training and inference code
 - `resplit_data_v3.py`: the `data_v3` reconstruction pipeline
 - `slurm/`: the single-node H200 training and data rebuild submission scripts
@@ -104,7 +103,7 @@ python sampling.py \
 - Training uses precomputed `embeddings_v2` for species conditioning.
 - The data split is built in protein space with MMseqs clustering and binomial-species test holdout.
-- `checkpoint-71000` is included for training resumption; `final_model/` is the recommended inference entrypoint.
 - For compatibility, released model directories contain both `trainer_config.json` and `config.json`.
 ## Sampling arguments

 This repository is the public model and training-code release. It contains:
 - `final_model/`: inference-ready weights
 - `src/`, `train.py`, `sampling.py`: training and inference code
 - `resplit_data_v3.py`: the `data_v3` reconstruction pipeline
 - `slurm/`: the single-node H200 training and data rebuild submission scripts
 - Training uses precomputed `embeddings_v2` for species conditioning.
 - The data split is built in protein space with MMseqs clustering and binomial-species test holdout.
+- `final_model/` is the published inference entrypoint.
 - For compatibility, released model directories contain both `trainer_config.json` and `config.json`.
 ## Sampling arguments

__pycache__/precompute_embeddings.cpython-312.pyc DELETED Viewed

Binary file (24.1 kB)

__pycache__/resplit_data_v3.cpython-312.pyc DELETED Viewed

Binary file (57.6 kB)

__pycache__/sampling.cpython-312.pyc DELETED Viewed

Binary file (17.7 kB)

__pycache__/train.cpython-312.pyc DELETED Viewed

Binary file (21.8 kB)

final_model/model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5af6fe27a93e8a5edf622131b8fff74240f90db036a95697cfe4f28af1d23ef9
 size 1284544520

 version https://git-lfs.github.com/spec/v1
+oid sha256:8e2a8594fa00b268f493f3779f704f81a8bda9501480ba95a263c2479816d951
 size 1284544520

final_model/trainer_state.json CHANGED Viewed

@@ -1,4 +1,4 @@
 {
   "epoch": 2,
-  "global_step": 120513
 }

 {
   "epoch": 2,
+  "global_step": 72049
 }

src/__pycache__/__init__.cpython-312.pyc DELETED Viewed

Binary file (637 Bytes)

src/__pycache__/dataset.cpython-312.pyc DELETED Viewed

Binary file (41 kB)

src/__pycache__/layers.cpython-312.pyc DELETED Viewed

Binary file (21.5 kB)

src/__pycache__/models.cpython-312.pyc DELETED Viewed

Binary file (25.8 kB)

src/__pycache__/sampler.cpython-312.pyc DELETED Viewed

Binary file (36.1 kB)

src/__pycache__/tokenizer.cpython-312.pyc DELETED Viewed

Binary file (17.1 kB)

src/__pycache__/trainer.cpython-312.pyc DELETED Viewed

Binary file (65.7 kB)

training_checkpoints/checkpoint-71000/config.json DELETED Viewed

@@ -1,17 +0,0 @@
-{
-  "max_length": 2048,
-  "max_species_prefix": 0,
-  "max_protein_prefix": 1024,
-  "hidden_size": 750,
-  "num_hidden_layers": 20,
-  "num_attention_heads": 15,
-  "mlp_ratio": 3.2,
-  "prepend_species": true,
-  "prepend_protein": true,
-  "species_embedding_dim": 1024,
-  "esm_model_name": "esmc_300m",
-  "esm_device": "cuda:0",
-  "esm_dtype": "bf16",
-  "attn_impl": "mha",
-  "num_kv_groups": 5
-}

training_checkpoints/checkpoint-71000/model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:07bc223f4d934e2baff5a8085a78348766b6a8324aa091a1459fce2b2c6d3837
-size 1284544520

training_checkpoints/checkpoint-71000/optimizer.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:751570fed64f000a53218f2c9a7e47a4503a302760f1c0d6b52b63ce4a25cec8
-size 1237115851

training_checkpoints/checkpoint-71000/scheduler.pt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:bdca58db103d9ad6aba34334e8a03e08e780b7fe95ef0677f2519e7b16023ff8
-size 1465

training_checkpoints/checkpoint-71000/trainer_config.json DELETED Viewed

@@ -1,17 +0,0 @@
-{
-  "max_length": 2048,
-  "max_species_prefix": 0,
-  "max_protein_prefix": 1024,
-  "hidden_size": 750,
-  "num_hidden_layers": 20,
-  "num_attention_heads": 15,
-  "mlp_ratio": 3.2,
-  "prepend_species": true,
-  "prepend_protein": true,
-  "species_embedding_dim": 1024,
-  "esm_model_name": "esmc_300m",
-  "esm_device": "cuda:0",
-  "esm_dtype": "bf16",
-  "attn_impl": "mha",
-  "num_kv_groups": 5
-}

training_checkpoints/checkpoint-71000/trainer_state.json DELETED Viewed

@@ -1,4 +0,0 @@
-{
-  "epoch": 2,
-  "global_step": 71000
-}

training_checkpoints/checkpoint-71000/vocab.json DELETED Viewed

@@ -1,78 +0,0 @@
-{
-  "special_token_str": {
-    "bos": "<bos>",
-    "eos": "<stop>",
-    "pad": "<pad>",
-    "unk": "<unk>"
-  },
-  "vocab": {
-    "<bos>": 2,
-    "<pad>": 0,
-    "<stop>": 3,
-    "<unk>": 1,
-    "AAA": 4,
-    "AAC": 5,
-    "AAG": 6,
-    "AAT": 7,
-    "ACA": 8,
-    "ACC": 9,
-    "ACG": 10,
-    "ACT": 11,
-    "AGA": 12,
-    "AGC": 13,
-    "AGG": 14,
-    "AGT": 15,
-    "ATA": 16,
-    "ATC": 17,
-    "ATG": 18,
-    "ATT": 19,
-    "CAA": 20,
-    "CAC": 21,
-    "CAG": 22,
-    "CAT": 23,
-    "CCA": 24,
-    "CCC": 25,
-    "CCG": 26,
-    "CCT": 27,
-    "CGA": 28,
-    "CGC": 29,
-    "CGG": 30,
-    "CGT": 31,
-    "CTA": 32,
-    "CTC": 33,
-    "CTG": 34,
-    "CTT": 35,
-    "GAA": 36,
-    "GAC": 37,
-    "GAG": 38,
-    "GAT": 39,
-    "GCA": 40,
-    "GCC": 41,
-    "GCG": 42,
-    "GCT": 43,
-    "GGA": 44,
-    "GGC": 45,
-    "GGG": 46,
-    "GGT": 47,
-    "GTA": 48,
-    "GTC": 49,
-    "GTG": 50,
-    "GTT": 51,
-    "TAA": 52,
-    "TAC": 53,
-    "TAG": 54,
-    "TAT": 55,
-    "TCA": 56,
-    "TCC": 57,
-    "TCG": 58,
-    "TCT": 59,
-    "TGA": 60,
-    "TGC": 61,
-    "TGG": 62,
-    "TGT": 63,
-    "TTA": 64,
-    "TTC": 65,
-    "TTG": 66,
-    "TTT": 67
-  }
-}