End of training

Browse files

Files changed (4) hide show

README.md +22 -21
model.safetensors +1 -1
runs/Jul29_16-37-14_cc53d3056a16/events.out.tfevents.1722271039.cc53d3056a16.633.0 +3 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -5,41 +5,42 @@ tags:
 model-index:
 - name: ProGemma
   results: []
-pipeline_tag: text-generation
 ---
 # ProGemma
-This is a custom configuration of Google's Gemma 2 model that is being pre-trained on amino acid sequences of lengths 0 to 512.
-I used the free version of Google Colab to train this model, so updates are made regularly as the model hits new checkpoints.
-As of 07.28.2024, the model has been trained on about 5% of the dataset.
-The model generates amino acids on a letter-by-letter basis.
-Current training loss is about 2.7. Preliminary evaluation of generated sequences on AlphaFold 3 shows pTM scores of ~0.4 and
-average pLLDT scores ~60. After training is complete, a proper evaluation will be done to see whether sequences result in proteins with
-a low free energy. Perplexity scores will also be calculated.
-The purpose of this model was to see whether I could develop an alternative to NVIDIA's ProtGPT2. ProGemma also serves as a stepping stone
-to a new model that will also utilize control tags to generate proteins based on function.
-To use this mode for yourself using the pipeline within the Transformers package, please see the code below:
-from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM
-model = AutoModelForCausalLM.from_pretrained("JuIm/ProGemma")
-tokenizer = AutoTokenizer.from_pretrained("JuIm/Amino-Acid-Sequence-Tokenizer")
-progemma = pipeline("text-generation", model=model, tokenizer=tokenizer)
-sequence = progemma("bosM", top_k=950, max_length=100, num_return_sequences=1, do_sample=True, repetition_penalty=1.2, eos_token_id=21, pad_token_id=22, bos_token_id=20)
-print(sequence)
@@ -47,4 +48,4 @@ print(sequence)
 - Transformers 4.42.4
 - Pytorch 2.3.1+cu121
-- Tokenizers 0.19.1

 model-index:
 - name: ProGemma
   results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
 # ProGemma
+This model is a fine-tuned version of [JuIm/ProGemma](https://huggingface.co/JuIm/ProGemma) on an unknown dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.001
+- train_batch_size: 1
+- eval_batch_size: 8
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.4
+- training_steps: 5000
+### Training results
 - Transformers 4.42.4
 - Pytorch 2.3.1+cu121
+- Tokenizers 0.19.1

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:de411c75a546aef900000ac61d174346ec545938769b91a3ec8348167fef00f3
 size 1101271208

 version https://git-lfs.github.com/spec/v1
+oid sha256:421e8d2b9d187571d7f33a061b0159ef8506e62e39b2bfd7d6e4d49ffdc0faeb
 size 1101271208

runs/Jul29_16-37-14_cc53d3056a16/events.out.tfevents.1722271039.cc53d3056a16.633.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9dd47a96dc0f57392c50d7a5ba59a0a614dc5f1ea1be9b1a204bb7034c588be4
+size 1059827

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b0de069b7900ed2b617b27a9995235f3d9f9f1e0de81051884a5f99ad7076ad6
 size 5112

 version https://git-lfs.github.com/spec/v1
+oid sha256:7f1cb5b7c407fa5b732f4672cda3d3aa6944cf6855dc291b8aa8908c79e458e9
 size 5112