code block syntax highlighting (#16)
Browse files- code block syntax highlighting (6aa0c656e757587245ebe238e8f53228983e69a7)
Co-authored-by: Abdellah <drHt@users.noreply.huggingface.co>
README.md
CHANGED
|
@@ -75,7 +75,7 @@ so those that finish in *_0.fasta and *_1.fasta will be the best ones per batch.
|
|
| 75 |
**Given that generation runs so fast, we recommend generating hundreds or thousands and then only picking the best 5% or less.
|
| 76 |
With the script below, that would mean picking only those that finish in '_0.fasta'. Good perplexity values for this model so be below 1.75-1.5.**
|
| 77 |
|
| 78 |
-
```
|
| 79 |
import torch
|
| 80 |
from transformers import GPT2LMHeadModel, AutoTokenizer
|
| 81 |
import os
|
|
@@ -179,7 +179,7 @@ We recommend using at least 200 sequences to obtain the best results. But we've
|
|
| 179 |
that many, give it still a go.
|
| 180 |
|
| 181 |
|
| 182 |
-
```
|
| 183 |
import random
|
| 184 |
from transformers import AutoTokenizer
|
| 185 |
|
|
@@ -350,7 +350,7 @@ To do that, you can take the trainer file that we provide in this repository (5.
|
|
| 350 |
The command below shows an example at an specific learning rate,
|
| 351 |
but you could try with other hyperparameters to obtain the best training and evaluation losses.
|
| 352 |
|
| 353 |
-
```
|
| 354 |
python 5.run_clm-post.py --tokenizer_name AI4PD/ZymCTRL
|
| 355 |
--do_train --do_eval --output_dir output --eval_strategy steps --eval_steps 10
|
| 356 |
--logging_steps 5 --save_steps 500 --num_train_epochs 28 --per_device_train_batch_size 1
|
|
|
|
| 75 |
**Given that generation runs so fast, we recommend generating hundreds or thousands and then only picking the best 5% or less.
|
| 76 |
With the script below, that would mean picking only those that finish in '_0.fasta'. Good perplexity values for this model so be below 1.75-1.5.**
|
| 77 |
|
| 78 |
+
```python
|
| 79 |
import torch
|
| 80 |
from transformers import GPT2LMHeadModel, AutoTokenizer
|
| 81 |
import os
|
|
|
|
| 179 |
that many, give it still a go.
|
| 180 |
|
| 181 |
|
| 182 |
+
```python
|
| 183 |
import random
|
| 184 |
from transformers import AutoTokenizer
|
| 185 |
|
|
|
|
| 350 |
The command below shows an example at an specific learning rate,
|
| 351 |
but you could try with other hyperparameters to obtain the best training and evaluation losses.
|
| 352 |
|
| 353 |
+
```bash
|
| 354 |
python 5.run_clm-post.py --tokenizer_name AI4PD/ZymCTRL
|
| 355 |
--do_train --do_eval --output_dir output --eval_strategy steps --eval_steps 10
|
| 356 |
--logging_steps 5 --save_steps 500 --num_train_epochs 28 --per_device_train_batch_size 1
|