AmitMY commited on
Commit
2d70bae
·
verified ·
1 Parent(s): fb6b7fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -26
README.md CHANGED
@@ -9,43 +9,57 @@ datasets:
9
  model-index:
10
  - name: output-tiny-lm-fineweb
11
  results: []
 
 
12
  ---
13
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
- # output-tiny-lm-fineweb
18
 
19
  This model is a fine-tuned version of [sbintuitions/tiny-lm](https://huggingface.co/sbintuitions/tiny-lm) on the HuggingFaceFW/fineweb dataset.
20
 
21
- ## Model description
22
 
23
- More information needed
24
-
25
- ## Intended uses & limitations
26
-
27
- More information needed
28
-
29
- ## Training and evaluation data
30
-
31
- More information needed
32
 
33
  ## Training procedure
34
 
35
- ### Training hyperparameters
36
-
37
- The following hyperparameters were used during training:
38
- - learning_rate: 0.0003
39
- - train_batch_size: 128
40
- - eval_batch_size: 8
41
- - seed: 42
42
- - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
43
- - lr_scheduler_type: cosine
44
- - lr_scheduler_warmup_ratio: 0.01
45
- - training_steps: 20000
46
-
47
- ### Training results
48
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
 
50
 
51
  ### Framework versions
@@ -53,4 +67,4 @@ The following hyperparameters were used during training:
53
  - Transformers 4.57.3
54
  - Pytorch 2.9.1+cu130
55
  - Datasets 4.4.1
56
- - Tokenizers 0.22.1
 
9
  model-index:
10
  - name: output-tiny-lm-fineweb
11
  results: []
12
+ language:
13
+ - en
14
  ---
15
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
  should probably proofread and complete it, then remove this comment. -->
18
 
19
+ # UTF8-LM-tiny
20
 
21
  This model is a fine-tuned version of [sbintuitions/tiny-lm](https://huggingface.co/sbintuitions/tiny-lm) on the HuggingFaceFW/fineweb dataset.
22
 
23
+ Using [this](https://github.com/sign/utf8-tokenizer/blob/main/experiments/language-modelling/run_clm.py) training script, from [utf8-tokenizer](https://github.com/sign/utf8-tokenizer/tree/main).
24
 
25
+ The repository includes the joined model for ease of use, and the [bit_projection_weights.pt](https://huggingface.co/sign/utf8-lm-tiny/blob/main/bit_projection_weights.pt) for further analysis.
 
 
 
 
 
 
 
 
26
 
27
  ## Training procedure
28
 
29
+ ```shell
30
+ python run_clm.py \
31
+ --use_bit_embeddings True \
32
+ --output_dir ./output-tiny-lm-fineweb \
33
+ --dataset_name HuggingFaceFW/fineweb \
34
+ --streaming True \
35
+ --dataloader_num_workers 1 \
36
+ --dataloader_prefetch_factor 4 \
37
+ --dataloader_pin_memory True \
38
+ --dataloader_persistent_workers True \
39
+ --do_train True \
40
+ --save_strategy steps \
41
+ --max_steps 20000 \
42
+ --save_steps 1000 \
43
+ --save_total_limit 2 \
44
+ --logging_steps 100 \
45
+ --logging_strategy steps \
46
+ --model_name_or_path sbintuitions/tiny-lm \
47
+ --per_device_train_batch_size 128 \
48
+ --block_size 256 \
49
+ --optim adamw_torch_fused \
50
+ --learning_rate 3e-4 \
51
+ --lr_scheduler_type cosine \
52
+ --warmup_ratio 0.01 \
53
+ --weight_decay 0.1 \
54
+ --adam_beta1 0.9 \
55
+ --adam_beta2 0.95 \
56
+ --max_grad_norm 1.0 \
57
+ --gradient_checkpointing True \
58
+ --bf16 True \
59
+ --seed 42 \
60
+ --report_to wandb \
61
+ --include_num_input_tokens_seen True
62
+ ```
63
 
64
 
65
  ### Framework versions
 
67
  - Transformers 4.57.3
68
  - Pytorch 2.9.1+cu130
69
  - Datasets 4.4.1
70
+ - Tokenizers 0.22.1