Muhammad Farrukh Mehmood commited on Dec 23, 2024

Commit

4deee95

verified ·

1 Parent(s): e17baa8

End of training

Browse files

Files changed (29) hide show

README.md +60 -0
config.json +33 -0
generation_config.json +7 -0
merges.txt +0 -0
model.safetensors +3 -0
runs/Dec23_06-10-57_f534e42ab4c2/events.out.tfevents.1734934368.f534e42ab4c2.632.0 +3 -0
smollm2-sft-test1/config.json +33 -0
smollm2-sft-test1/generation_config.json +7 -0
smollm2-sft-test1/merges.txt +0 -0
smollm2-sft-test1/model.safetensors +3 -0
smollm2-sft-test1/special_tokens_map.json +28 -0
smollm2-sft-test1/tokenizer.json +0 -0
smollm2-sft-test1/tokenizer_config.json +155 -0
smollm2-sft-test1/training_args.bin +3 -0
smollm2-sft-test1/vocab.json +0 -0
special_tokens_map.json +28 -0
tokenizer.json +0 -0
tokenizer_config.json +155 -0
training_args.bin +3 -0
vocab.json +0 -0
wandb/debug-internal.log +14 -0
wandb/debug.log +36 -0
wandb/run-20241223_062433-kk1dm8nx/files/output.log +2 -0
wandb/run-20241223_062433-kk1dm8nx/files/requirements.txt +579 -0
wandb/run-20241223_062433-kk1dm8nx/files/wandb-metadata.json +41 -0
wandb/run-20241223_062433-kk1dm8nx/logs/debug-core.log +7 -0
wandb/run-20241223_062433-kk1dm8nx/logs/debug-internal.log +14 -0
wandb/run-20241223_062433-kk1dm8nx/logs/debug.log +36 -0
wandb/run-20241223_062433-kk1dm8nx/run-kk1dm8nx.wandb +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,60 @@

+---
+base_model: HuggingFaceTB/SmolLM2-135M
+library_name: transformers
+model_name: smollm2-sft-test1
+tags:
+- generated_from_trainer
+- smol-course
+- module_1
+- trl
+- sft
+licence: license
+---
+# Model Card for smollm2-sft-test1
+This model is a fine-tuned version of [HuggingFaceTB/SmolLM2-135M](https://huggingface.co/HuggingFaceTB/SmolLM2-135M).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
+```python
+from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="sfarrukh/smollm2-sft-test1", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
+```
+## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/farrukhmehmood-nts-the-islamia-university-of-bahawalpur/huggingface/runs/kk1dm8nx)
+This model was trained with SFT.
+### Framework versions
+- TRL: 0.13.0
+- Transformers: 4.47.1
+- Pytorch: 2.5.1+cu121
+- Datasets: 3.2.0
+- Tokenizers: 0.21.0
+## Citations
+Cite TRL as:
+```bibtex
+@misc{vonwerra2022trl,
+	title        = {{TRL: Transformer Reinforcement Learning}},
+	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
+	year         = 2020,
+	journal      = {GitHub repository},
+	publisher    = {GitHub},
+	howpublished = {\url{https://github.com/huggingface/trl}}
+}
+```

config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "_name_or_path": "HuggingFaceTB/SmolLM2-135M",
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "head_dim": 64,
+  "hidden_act": "silu",
+  "hidden_size": 576,
+  "initializer_range": 0.041666666666666664,
+  "intermediate_size": 1536,
+  "is_llama_config": true,
+  "max_position_embeddings": 8192,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 9,
+  "num_hidden_layers": 30,
+  "num_key_value_heads": 3,
+  "pad_token_id": 2,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_interleaved": false,
+  "rope_scaling": null,
+  "rope_theta": 100000,
+  "tie_word_embeddings": true,
+  "torch_dtype": "float32",
+  "transformers_version": "4.47.1",
+  "use_cache": true,
+  "vocab_size": 49152
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "pad_token_id": 2,
+  "transformers_version": "4.47.1"
+}

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fcffccc97b1ea08e794e534190b323e296d32c5de7077f75e7012b9148ca47b4
+size 538090408

runs/Dec23_06-10-57_f534e42ab4c2/events.out.tfevents.1734934368.f534e42ab4c2.632.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c9bf40d539cc705b2fd631b9e3e59983d6c13a1bf88c4823551a0901049fe0c3
+size 32517

smollm2-sft-test1/config.json ADDED Viewed

	@@ -0,0 +1,33 @@

+{
+  "_name_or_path": "HuggingFaceTB/SmolLM2-135M",
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "head_dim": 64,
+  "hidden_act": "silu",
+  "hidden_size": 576,
+  "initializer_range": 0.041666666666666664,
+  "intermediate_size": 1536,
+  "is_llama_config": true,
+  "max_position_embeddings": 8192,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 9,
+  "num_hidden_layers": 30,
+  "num_key_value_heads": 3,
+  "pad_token_id": 2,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_interleaved": false,
+  "rope_scaling": null,
+  "rope_theta": 100000,
+  "tie_word_embeddings": true,
+  "torch_dtype": "float32",
+  "transformers_version": "4.47.1",
+  "use_cache": true,
+  "vocab_size": 49152
+}

smollm2-sft-test1/generation_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": 2,
+  "pad_token_id": 2,
+  "transformers_version": "4.47.1"
+}

smollm2-sft-test1/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

smollm2-sft-test1/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fcffccc97b1ea08e794e534190b323e296d32c5de7077f75e7012b9148ca47b4
+size 538090408

smollm2-sft-test1/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "additional_special_tokens": [
+    {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    }
+  ],
+  "bos_token": "<|im_start|>",
+  "eos_token": "<|im_end|>",
+  "pad_token": "<|im_end|>",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

smollm2-sft-test1/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

smollm2-sft-test1/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,155 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<repo_name>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<reponame>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "5": {
+      "content": "<file_sep>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "6": {
+      "content": "<filename>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "7": {
+      "content": "<gh_stars>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "8": {
+      "content": "<issue_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "9": {
+      "content": "<issue_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "10": {
+      "content": "<issue_closed>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "11": {
+      "content": "<jupyter_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "12": {
+      "content": "<jupyter_text>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "13": {
+      "content": "<jupyter_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "14": {
+      "content": "<jupyter_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "15": {
+      "content": "<jupyter_script>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "16": {
+      "content": "<empty_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "bos_token": "<|im_start|>",
+  "chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "extra_special_tokens": {},
+  "model_max_length": 8192,
+  "pad_token": "<|im_end|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>",
+  "vocab_size": 49152
+}

smollm2-sft-test1/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:35459363358047d4a2f46e71bc01618c53808c4628467f77f88dbf99f150f477
+size 5688

smollm2-sft-test1/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "additional_special_tokens": [
+    {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    },
+    {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false
+    }
+  ],
+  "bos_token": "<|im_start|>",
+  "eos_token": "<|im_end|>",
+  "pad_token": "<|im_end|>",
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,155 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<repo_name>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "4": {
+      "content": "<reponame>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "5": {
+      "content": "<file_sep>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "6": {
+      "content": "<filename>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "7": {
+      "content": "<gh_stars>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "8": {
+      "content": "<issue_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "9": {
+      "content": "<issue_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "10": {
+      "content": "<issue_closed>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "11": {
+      "content": "<jupyter_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "12": {
+      "content": "<jupyter_text>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "13": {
+      "content": "<jupyter_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "14": {
+      "content": "<jupyter_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "15": {
+      "content": "<jupyter_script>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "16": {
+      "content": "<empty_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "bos_token": "<|im_start|>",
+  "chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "extra_special_tokens": {},
+  "model_max_length": 8192,
+  "pad_token": "<|im_end|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>",
+  "vocab_size": 49152
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:35459363358047d4a2f46e71bc01618c53808c4628467f77f88dbf99f150f477
+size 5688

vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

wandb/debug-internal.log ADDED Viewed

	@@ -0,0 +1,14 @@

+{"time":"2024-12-23T06:24:33.403481622Z","level":"INFO","msg":"using version","core version":"0.19.1"}
+{"time":"2024-12-23T06:24:33.403525688Z","level":"INFO","msg":"created symlink","path":"/content/smollm-fine-tuning/wandb/run-20241223_062433-kk1dm8nx/logs/debug-core.log"}
+{"time":"2024-12-23T06:24:33.515469919Z","level":"INFO","msg":"created new stream","id":"kk1dm8nx"}
+{"time":"2024-12-23T06:24:33.515527717Z","level":"INFO","msg":"stream: started","id":"kk1dm8nx"}
+{"time":"2024-12-23T06:24:33.515537628Z","level":"INFO","msg":"writer: Do: started","stream_id":"kk1dm8nx"}
+{"time":"2024-12-23T06:24:33.515733694Z","level":"INFO","msg":"handler: started","stream_id":"kk1dm8nx"}
+{"time":"2024-12-23T06:24:33.515756662Z","level":"INFO","msg":"sender: started","stream_id":"kk1dm8nx"}
+{"time":"2024-12-23T06:24:36.718018301Z","level":"INFO","msg":"Starting system monitor"}
+{"time":"2024-12-23T06:42:14.432322766Z","level":"INFO","msg":"Pausing system monitor"}
+{"time":"2024-12-23T06:58:34.001396346Z","level":"INFO","msg":"Resuming system monitor"}
+{"time":"2024-12-23T06:58:34.115544844Z","level":"INFO","msg":"Pausing system monitor"}
+{"time":"2024-12-23T06:59:50.035860315Z","level":"INFO","msg":"Resuming system monitor"}
+{"time":"2024-12-23T06:59:57.29236364Z","level":"INFO","msg":"Pausing system monitor"}
+{"time":"2024-12-23T07:00:55.04796864Z","level":"INFO","msg":"Resuming system monitor"}

wandb/debug.log ADDED Viewed

	@@ -0,0 +1,36 @@

+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_setup.py:_flush():68] Current SDK version is 0.19.1
+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_setup.py:_flush():68] Configure stats pid to 632
+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_setup.py:_flush():68] Loading settings from /root/.config/wandb/settings
+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_setup.py:_flush():68] Loading settings from /content/smollm-fine-tuning/wandb/settings
+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_setup.py:_flush():68] Loading settings from environment variables
+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_init.py:_log_setup():528] Logging user logs to /content/smollm-fine-tuning/wandb/run-20241223_062433-kk1dm8nx/logs/debug.log
+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_init.py:_log_setup():529] Logging internal logs to /content/smollm-fine-tuning/wandb/run-20241223_062433-kk1dm8nx/logs/debug-internal.log
+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_init.py:_jupyter_setup():474] configuring jupyter hooks <wandb.sdk.wandb_init._WandbInit object at 0x78e9883258d0>
+2024-12-23 06:24:33,394 INFO    MainThread:632 [wandb_init.py:init():644] calling init triggers
+2024-12-23 06:24:33,394 INFO    MainThread:632 [wandb_init.py:init():650] wandb.init called with sweep_config: {}
+config: {}
+2024-12-23 06:24:33,394 INFO    MainThread:632 [wandb_init.py:init():680] starting backend
+2024-12-23 06:24:33,394 INFO    MainThread:632 [wandb_init.py:init():684] sending inform_init request
+2024-12-23 06:24:33,400 INFO    MainThread:632 [backend.py:_multiprocessing_setup():104] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
+2024-12-23 06:24:33,400 INFO    MainThread:632 [wandb_init.py:init():697] backend started and connected
+2024-12-23 06:24:33,414 INFO    MainThread:632 [wandb_run.py:_label_probe_notebook():1222] probe notebook
+2024-12-23 06:24:36,588 INFO    MainThread:632 [wandb_init.py:init():790] updated telemetry
+2024-12-23 06:24:36,594 INFO    MainThread:632 [wandb_init.py:init():822] communicating run to backend with 90.0 second timeout
+2024-12-23 06:24:36,712 INFO    MainThread:632 [wandb_init.py:init():874] starting run threads in backend
+2024-12-23 06:24:37,146 INFO    MainThread:632 [wandb_run.py:_console_start():2374] atexit reg
+2024-12-23 06:24:37,147 INFO    MainThread:632 [wandb_run.py:_redirect():2224] redirect: wrap_raw
+2024-12-23 06:24:37,147 INFO    MainThread:632 [wandb_run.py:_redirect():2289] Wrapping output streams.
+2024-12-23 06:24:37,147 INFO    MainThread:632 [wandb_run.py:_redirect():2314] Redirects installed.
+2024-12-23 06:24:37,152 INFO    MainThread:632 [wandb_init.py:init():916] run started, returning control to user process
+2024-12-23 06:24:37,156 INFO    MainThread:632 [wandb_run.py:_config_callback():1279] config_cb None None {'vocab_size': 49152, 'max_position_embeddings': 8192, 'hidden_size': 576, 'intermediate_size': 1536, 'num_hidden_layers': 30, 'num_attention_heads': 9, 'num_key_value_heads': 3, 'hidden_act': 'silu', 'initializer_range': 0.041666666666666664, 'rms_norm_eps': 1e-05, 'pretraining_tp': 1, 'use_cache': True, 'rope_theta': 100000, 'rope_scaling': None, 'attention_bias': False, 'attention_dropout': 0.0, 'mlp_bias': False, 'head_dim': 64, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'bfloat16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['LlamaForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 1, 'pad_token_id': 2, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'HuggingFaceTB/SmolLM2-135M', '_attn_implementation_autoset': True, 'transformers_version': '4.47.1', 'is_llama_config': True, 'model_type': 'llama', 'rope_interleaved': False, 'output_dir': '/content/drive/MyDrive/smollm-fine-tuning/trained_models', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 10, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 5e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': 1000, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 0, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': '/content/drive/MyDrive/smollm-fine-tuning/trained_models/runs/Dec23_06-10-57_f534e42ab4c2', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 10, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 100, 'save_total_limit': None, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 50, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': '/content/drive/MyDrive/smollm-fine-tuning/trained_models', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['tensorboard', 'wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': 'smollm2-sft-test1', 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'steps', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'average_tokens_across_devices': False, 'dataset_text_field': 'text', 'packing': False, 'max_seq_length': 1024, 'dataset_num_proc': None, 'dataset_batch_size': 1000, 'model_init_kwargs': None, 'dataset_kwargs': {'add_special_tokens': False}, 'eval_packing': None, 'num_of_sequences': 1024, 'chars_per_token': '<CHARS_PER_TOKEN>', 'use_liger': False}
+2024-12-23 06:24:37,159 INFO    MainThread:632 [wandb_config.py:__setitem__():154] config set model/num_parameters = 134515008 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x78e95840efe0>>
+2024-12-23 06:24:37,159 INFO    MainThread:632 [wandb_run.py:_config_callback():1279] config_cb model/num_parameters 134515008 None
+2024-12-23 06:42:14,430 INFO    MainThread:632 [jupyter.py:save_ipynb():386] not saving jupyter notebook
+2024-12-23 06:42:14,431 INFO    MainThread:632 [wandb_init.py:_pause_backend():439] pausing backend
+2024-12-23 06:58:34,000 INFO    MainThread:632 [wandb_init.py:_resume_backend():444] resuming backend
+2024-12-23 06:58:34,115 INFO    MainThread:632 [jupyter.py:save_ipynb():386] not saving jupyter notebook
+2024-12-23 06:58:34,115 INFO    MainThread:632 [wandb_init.py:_pause_backend():439] pausing backend
+2024-12-23 06:59:50,031 INFO    MainThread:632 [wandb_init.py:_resume_backend():444] resuming backend
+2024-12-23 06:59:57,291 INFO    MainThread:632 [jupyter.py:save_ipynb():386] not saving jupyter notebook
+2024-12-23 06:59:57,291 INFO    MainThread:632 [wandb_init.py:_pause_backend():439] pausing backend
+2024-12-23 07:00:55,043 INFO    MainThread:632 [wandb_init.py:_resume_backend():444] resuming backend

wandb/run-20241223_062433-kk1dm8nx/files/output.log ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
2	+ cp: -r not specified; omitting directory '/content/smollm-fine-tuning/smollm2-sft-test1'

wandb/run-20241223_062433-kk1dm8nx/files/requirements.txt ADDED Viewed

	@@ -0,0 +1,579 @@

+multiprocess==0.70.16
+fsspec==2024.9.0
+dill==0.3.8
+xxhash==3.5.0
+datasets==3.2.0
+trl==0.13.0
+google-colab==1.0.0
+colour==0.1.5
+httpimport==1.4.0
+ipyfilechooser==0.6.0
+miniKanren==1.0.3
+protobuf==4.25.5
+pycocotools==2.0.8
+cudf-cu12==24.10.1
+yarl==1.18.3
+safetensors==0.4.5
+en-core-web-sm==3.7.1
+nest-asyncio==1.6.0
+pandas-gbq==0.25.0
+Cython==3.0.11
+torchsummary==1.5.1
+weasel==0.4.1
+markdown-it-py==3.0.0
+pydantic==2.10.3
+cvxpy==1.6.0
+tables==3.10.1
+optree==0.13.1
+backcall==0.2.0
+ipykernel==5.5.6
+google-cloud-resource-manager==1.14.0
+ipyparallel==8.8.0
+pickleshare==0.7.5
+scipy==1.13.1
+prettytable==3.12.0
+ml-dtypes==0.4.1
+multidict==6.1.0
+grpcio-status==1.62.3
+nx-cugraph-cu12==24.10.0
+cloudpickle==3.1.0
+websocket-client==1.8.0
+pillow==11.0.0
+GitPython==3.1.43
+exceptiongroup==1.2.2
+propcache==0.2.1
+tensorflow-probability==0.24.0
+patsy==1.0.1
+traitlets==5.7.1
+gitdb==4.0.11
+mistune==3.0.2
+nltk==3.9.1
+alabaster==1.0.0
+h5netcdf==1.4.1
+pytest==8.3.4
+blinker==1.9.0
+language_data==1.3.0
+cupy-cuda12x==12.2.0
+librosa==0.10.2.post1
+entrypoints==0.4
+tornado==6.3.3
+traittypes==0.2.1
+ipyleaflet==0.19.2
+iniconfig==2.0.0
+blis==0.7.11
+cmake==3.31.2
+datascience==0.17.6
+xyzservices==2024.9.0
+Markdown==3.7
+pymc==5.19.1
+async-timeout==4.0.3
+decorator==4.4.2
+google-api-core==2.19.2
+argon2-cffi-bindings==21.2.0
+fastcore==1.7.27
+h11==0.14.0
+more-itertools==10.5.0
+terminado==0.18.1
+fastprogress==1.0.3
+sphinxcontrib-htmlhelp==2.1.0
+tensorflow==2.17.1
+blosc2==2.7.1
+jupyter_core==5.7.2
+jaxlib==0.4.33
+python-box==7.3.0
+itsdangerous==2.2.0
+community==1.0.0b1
+imageio==2.36.1
+joblib==1.4.2
+platformdirs==4.3.6
+pandas-stubs==2.2.2.240909
+pygit2==1.16.0
+mlxtend==0.23.3
+lightgbm==4.5.0
+tbb==2022.0.0
+wasabi==1.1.3
+jsonpatch==1.33
+packaging==24.2
+nvidia-nvjitlink-cu12==12.6.85
+Bottleneck==1.4.2
+peewee==3.17.8
+pyparsing==3.2.0
+httplib2==0.22.0
+moviepy==1.0.3
+sniffio==1.3.1
+pyviz_comms==3.0.3
+jiter==0.8.2
+array_record==0.5.1
+google-cloud-core==2.4.1
+attrs==24.3.0
+progressbar2==4.5.0
+colorcet==3.1.0
+PyYAML==6.0.2
+pydantic_core==2.27.1
+google-cloud-aiplatform==1.74.0
+sphinxcontrib-devhelp==2.0.0
+google-cloud-bigquery-storage==2.27.0
+opentelemetry-sdk==1.29.0
+gast==0.6.0
+pyzmq==24.0.1
+uritemplate==4.1.1
+pytz==2024.2
+pydotplus==2.0.2
+nvtx==0.2.10
+tensorboard==2.17.1
+missingno==0.5.2
+flatbuffers==24.3.25
+cons==0.4.6
+libclang==18.1.1
+defusedxml==0.7.1
+jellyfish==1.1.0
+grpcio==1.68.1
+argon2-cffi==23.1.0
+linkify-it-py==2.0.3
+rsa==4.9
+psycopg2==2.9.10
+MarkupSafe==3.0.2
+uc-micro-py==1.0.3
+typing_extensions==4.12.2
+ipython==7.34.0
+pyarrow==17.0.0
+toolz==0.12.1
+jupyterlab_pygments==0.3.0
+osqp==0.6.7.post3
+huggingface-hub==0.27.0
+opentelemetry-semantic-conventions==0.50b0
+cloudpathlib==0.20.0
+msgpack==1.1.0
+google-cloud-bigtable==2.27.0
+rich==13.9.4
+cachetools==5.5.0
+editdistance==0.8.1
+regex==2024.11.6
+param==2.2.0
+cffi==1.17.1
+google==2.0.3
+promise==2.3
+hyperopt==0.2.7
+python-slugify==8.0.4
+astunparse==1.6.3
+pyperclip==1.9.0
+tensorstore==0.1.71
+aiohttp==3.11.10
+altair==5.5.0
+catalogue==2.0.10
+astropy==6.1.7
+sphinxcontrib-qthelp==2.0.0
+GDAL==3.6.4
+google-generativeai==0.8.3
+networkx==3.4.2
+pyasn1==0.6.1
+sqlglot==25.1.0
+gensim==4.3.3
+albumentations==1.4.20
+CacheControl==0.14.1
+ipywidgets==7.7.1
+toml==0.10.2
+annotated-types==0.7.0
+yfinance==0.2.50
+googledrivedownloader==0.4
+srsly==2.5.0
+proto-plus==1.25.0
+tabulate==0.9.0
+fastai==2.7.18
+Send2Trash==1.8.3
+dask==2024.10.0
+jsonpickle==4.0.1
+seaborn==0.13.2
+setproctitle==1.3.4
+referencing==0.35.1
+music21==9.3.0
+xarray-einstats==0.8.0
+astropy-iers-data==0.2024.12.16.0.35.48
+chardet==5.2.0
+wrapt==1.17.0
+mdurl==0.1.2
+openai==1.57.4
+google-resumable-media==2.7.2
+geopy==2.4.1
+plotnine==0.14.4
+statsmodels==0.14.4
+google-crc32c==1.6.0
+scs==3.2.7
+Pyomo==6.8.2
+keras==3.5.0
+gspread-dataframe==3.3.1
+notebook_shim==0.2.4
+langchain-text-splitters==0.3.3
+oauthlib==3.2.2
+tcmlib==1.2.0
+tifffile==2024.12.12
+cmdstanpy==1.2.5
+diffusers==0.31.0
+aiohappyeyeballs==2.4.4
+autograd==1.7.0
+lazy_loader==0.4
+graphviz==0.20.3
+nvidia-nccl-cu12==2.23.4
+pydot==3.0.3
+tf_keras==2.17.0
+bqplot==0.12.43
+torchaudio==2.5.1+cu121
+kagglehub==0.3.5
+imgaug==0.4.0
+nvidia-curand-cu12==10.3.7.77
+cymem==2.0.10
+glob2==0.7
+eerepr==0.0.4
+yellowbrick==1.5
+umf==0.9.1
+PyDrive==1.3.1
+langsmith==0.2.3
+ratelim==0.1.6
+importlib_resources==6.4.5
+einops==0.8.0
+peft==0.14.0
+langchain-core==0.3.25
+cycler==0.12.1
+html5lib==1.1
+smart-open==7.1.0
+ply==3.11
+sphinxcontrib-serializinghtml==2.0.0
+simple-parsing==0.1.6
+smmap==5.0.1
+tzdata==2024.2
+libcudf-cu12==24.10.1
+dopamine_rl==4.1.0
+zipp==3.21.0
+imageio-ffmpeg==0.5.1
+wcwidth==0.2.13
+text-unidecode==1.3
+orbax-checkpoint==0.6.4
+et_xmlfile==2.0.0
+frozenlist==1.5.0
+google-cloud-pubsub==2.27.1
+marisa-trie==1.2.1
+db-dtypes==1.3.1
+nvidia-cuda-cupti-cu12==12.6.80
+pexpect==4.9.0
+psutil==5.9.5
+google-cloud-language==2.16.0
+opentelemetry-api==1.29.0
+SQLAlchemy==2.0.36
+soupsieve==2.6
+Sphinx==8.1.3
+pyogrio==0.10.0
+qdldl==0.1.7.post4
+branca==0.8.1
+oauth2client==4.1.3
+google-auth==2.27.0
+google-cloud-functions==1.19.0
+tinycss2==1.4.0
+jupyter-server==1.24.0
+scikit-learn==1.6.0
+jsonschema-specifications==2024.10.1
+ndindex==1.9.2
+geographiclib==2.0
+Jinja2==3.1.4
+googleapis-common-protos==1.66.0
+urllib3==2.2.3
+opencv-python-headless==4.10.0.84
+google-cloud-bigquery==3.25.0
+transformers==4.47.1
+wandb==0.19.1
+torch==2.5.1+cu121
+pymystem3==0.2.0
+pyOpenSSL==24.2.1
+stringzilla==3.11.1
+numba==0.60.0
+intel-cmplr-lib-ur==2025.0.4
+polars==1.9.0
+tweepy==4.14.0
+plotly==5.24.1
+nvidia-cuda-runtime-cu12==12.6.77
+narwhals==1.18.4
+widgetsnbextension==3.6.10
+PyJWT==2.10.1
+etils==1.11.0
+proglog==0.1.10
+tomli==2.2.1
+rpds-py==0.22.3
+google-cloud-iam==2.17.0
+immutabledict==4.2.1
+portpicker==1.5.2
+nvidia-cuda-nvcc-cu12==12.6.85
+nbclassic==1.1.0
+llvmlite==0.43.0
+importlib_metadata==8.5.0
+rmm-cu12==24.10.0
+tzlocal==5.2
+nbconvert==7.16.4
+mizani==0.13.1
+shapely==2.0.6
+python-louvain==0.16
+dm-tree==0.1.8
+opencv-contrib-python==4.10.0.84
+ipython-sql==0.5.0
+jsonschema==4.23.0
+google-auth-oauthlib==1.2.1
+webencodings==0.5.1
+spacy==3.7.5
+prompt_toolkit==3.0.48
+slicer==0.0.8
+fonttools==4.55.3
+filelock==3.16.1
+etuples==0.3.9
+google-ai-generativelanguage==0.6.10
+nvidia-cusolver-cu12==11.7.1.2
+mpmath==1.3.0
+prophet==1.1.6
+click==8.1.7
+pyasn1_modules==0.4.1
+bleach==6.2.0
+pandas-datareader==0.10.0
+confection==0.1.5
+namex==0.0.8
+websockets==14.1
+duckdb==1.1.3
+typeguard==4.4.1
+httpx==0.28.1
+notebook==6.5.5
+pygame==2.6.1
+google-cloud-datastore==2.20.2
+requests-oauthlib==1.3.1
+nibabel==5.3.2
+timm==1.0.12
+prometheus_client==0.21.1
+debugpy==1.8.0
+natsort==8.4.0
+ipytree==0.2.2
+partd==1.4.2
+sentry-sdk==2.19.2
+future==1.0.0
+tokenizers==0.21.0
+jsonpointer==3.0.0
+accelerate==1.2.1
+jupyter-console==6.1.0
+kiwisolver==1.4.7
+geopandas==1.0.1
+easydict==1.13
+StrEnum==0.4.15
+absl-py==1.4.0
+lxml==5.3.0
+tqdm==4.67.1
+jupyter-leaflet==0.19.2
+spacy-legacy==3.0.12
+requests-toolbelt==1.0.0
+multipledispatch==1.0.0
+gcsfs==2024.10.0
+docstring_parser==0.16
+sentence-transformers==3.3.1
+ipython-genutils==0.2.0
+spacy-loggers==1.0.5
+python-dateutil==2.8.2
+thinc==8.2.5
+gspread==6.0.2
+mkl==2025.0.1
+google-cloud-translate==3.19.0
+Deprecated==1.2.15
+aiosignal==1.3.2
+bigframes==1.29.0
+opencv-python==4.10.0.84
+intel-openmp==2025.0.4
+google-auth-httplib2==0.2.0
+vega-datasets==0.9.0
+orjson==3.10.12
+dlib==19.24.2
+tf-slim==1.1.0
+py4j==0.10.9.7
+locket==1.0.0
+charset-normalizer==3.4.0
+beautifulsoup4==4.12.3
+parso==0.8.4
+sphinxcontrib-applehelp==2.0.0
+pyspark==3.5.3
+textblob==0.17.1
+pynvjitlink-cu12==0.4.0
+sqlparse==0.5.3
+langchain==0.3.12
+sphinxcontrib-jsmath==1.0.1
+holidays==0.63
+jieba==0.42.1
+rpy2==3.4.2
+google-genai==0.3.0
+PySocks==1.7.1
+pyshp==2.3.1
+tensorflow-metadata==1.13.1
+pydata-google-auth==1.9.0
+logical-unification==0.4.6
+sklearn-pandas==2.2.0
+soundfile==0.12.1
+holoviews==1.20.0
+parsy==2.1
+geocoder==1.38.1
+matplotlib==3.8.0
+cuda-python==12.2.1
+imbalanced-learn==0.12.4
+ibis-framework==9.2.0
+numexpr==2.10.2
+nbformat==5.10.4
+multitasking==0.0.11
+openpyxl==3.1.5
+nvidia-cufft-cu12==11.3.0.4
+nbclient==0.10.1
+nvidia-cusparse-cu12==12.5.4.2
+colorlover==0.3.0
+shellingham==1.5.4
+jax==0.4.33
+tensorflow-io-gcs-filesystem==0.37.1
+types-pytz==2024.2.0.20241003
+python-utils==3.9.1
+xlrd==2.0.1
+mdit-py-plugins==0.4.2
+google-cloud-storage==2.19.0
+google-cloud-firestore==2.19.0
+certifi==2024.12.14
+Pygments==2.18.0
+fastjsonschema==2.21.1
+httpcore==1.0.7
+snowballstemmer==2.2.0
+cryptography==43.0.3
+tensorflow-hub==0.16.1
+pandas==2.2.2
+optax==0.2.4
+scikit-image==0.25.0
+imutils==0.5.4
+tenacity==9.0.0
+langcodes==3.5.0
+gym==0.25.2
+termcolor==2.5.0
+kaggle==1.6.17
+pyproj==3.7.0
+preshed==3.0.9
+nvidia-cublas-cu12==12.6.4.1
+fastrlock==0.8.3
+anyio==3.7.1
+gdown==5.2.0
+jax-cuda12-plugin==0.4.33
+clarabel==0.9.0
+matplotlib-inline==0.1.7
+torchvision==0.20.1+cu121
+gym-notices==0.0.8
+jax-cuda12-pjrt==0.4.33
+wheel==0.45.1
+tensorboard-data-server==0.7.2
+pooch==1.8.2
+imagesize==1.4.1
+pandocfilters==1.5.1
+h5py==3.12.1
+geemap==0.35.1
+pycparser==2.22
+contourpy==1.3.1
+babel==2.16.0
+matplotlib-venn==1.1.1
+PyOpenGL==3.1.7
+xarray==2024.11.0
+numpy==1.26.4
+grpc-google-iam-v1==0.13.1
+jupyterlab_widgets==3.0.13
+typer==0.15.1
+idna==3.10
+google-pasta==0.2.0
+greenlet==3.1.1
+cufflinks==0.17.3
+bigquery-magics==0.4.0
+sentencepiece==0.2.0
+wordcloud==1.9.4
+docker-pycreds==0.4.0
+murmurhash==1.0.11
+atpublic==4.1.0
+docutils==0.21.2
+earthengine-api==1.4.3
+pyerfa==2.0.1.5
+gin-config==0.5.0
+google-cloud-bigquery-connection==1.17.0
+six==1.17.0
+arviz==0.20.0
+jupyter-client==6.1.12
+folium==0.19.2
+webcolors==24.11.1
+pluggy==1.5.0
+eval_type_backport==0.2.0
+chex==0.1.88
+xgboost==2.1.3
+bokeh==3.6.2
+soxr==0.5.0.post1
+ipyevents==2.0.2
+Flask==3.1.0
+albucore==0.0.19
+distro==1.9.0
+threadpoolctl==3.5.0
+frozendict==2.4.6
+inflect==7.4.0
+fastdownload==0.0.7
+nvidia-cudnn-cu12==9.6.0.74
+sympy==1.13.1
+pylibraft-cu12==24.10.0
+requests==2.32.3
+opt_einsum==3.4.0
+google-api-python-client==2.155.0
+tensorflow-datasets==4.9.7
+Werkzeug==3.1.3
+pylibcudf-cu12==24.10.1
+py-cpuinfo==9.0.0
+pylibcugraph-cu12==24.10.0
+pytensor==2.26.4
+PyDrive2==1.21.3
+audioread==3.0.1
+pathlib==1.0.1
+stanio==0.5.1
+firebase-admin==6.6.0
+cvxopt==1.3.2
+shap==0.46.0
+humanize==4.11.0
+ptyprocess==0.7.0
+panel==1.5.4
+flax==0.8.5
+scooby==0.10.0
+python-apt==0.0.0
+requirements-parser==0.9.0
+pip==24.1.2
+setuptools==75.1.0
+types-setuptools==75.6.0.20241126
+cryptography==3.4.8
+lazr.uri==1.0.6
+importlib-metadata==4.6.4
+distro==1.7.0
+pyparsing==2.4.7
+wadllib==1.3.6
+python-apt==2.4.0+ubuntu4
+httplib2==0.20.2
+PyGObject==3.42.1
+blinker==1.4
+oauthlib==3.2.0
+more-itertools==8.10.0
+SecretStorage==3.3.1
+PyJWT==2.3.0
+lazr.restfulclient==0.14.4
+six==1.16.0
+jeepney==0.7.1
+dbus-python==1.2.18
+keyring==23.5.0
+zipp==1.0.0
+launchpadlib==1.10.16
+wheel==0.43.0
+inflect==7.3.1
+backports.tarfile==1.2.0
+jaraco.context==5.3.0
+typing_extensions==4.12.2
+importlib_resources==6.4.0
+zipp==3.19.2
+jaraco.text==3.12.1
+jaraco.collections==5.1.0
+typeguard==4.3.0
+jaraco.functools==4.0.1
+platformdirs==4.2.2
+autocommand==2.2.2
+more-itertools==10.3.0
+importlib_metadata==8.0.0
+packaging==24.1
+tomli==2.0.1

wandb/run-20241223_062433-kk1dm8nx/files/wandb-metadata.json ADDED Viewed

	@@ -0,0 +1,41 @@

+{
+  "os":  "Linux-6.1.85+-x86_64-with-glibc2.35",
+  "python":  "CPython 3.10.12",
+  "startedAt":  "2024-12-23T06:24:33.401066Z",
+  "program":  "sft_smollm.ipynb",
+  "git":  {
+    "remote":  "https://github.com/farrukh602/smollm-fine-tuning",
+    "commit":  "f060c88b98a1447662f698d1c81d063f9f2e3b9a"
+  },
+  "email":  "farrukhmehmood.nts@iub.edu.pk",
+  "root":  "/content/smollm-fine-tuning",
+  "host":  "f534e42ab4c2",
+  "executable":  "/usr/bin/python3",
+  "colab":  "https://colab.research.google.com/notebook#fileId=1Q8h1M6Pw6nkQ1buMh1fXAdKupG98exLe",
+  "cpu_count":  1,
+  "cpu_count_logical":  2,
+  "gpu":  "Tesla T4",
+  "gpu_count":  1,
+  "disk":  {
+    "/":  {
+      "total":  "120942624768",
+      "used":  "35434274816"
+    }
+  },
+  "memory":  {
+    "total":  "13609431040"
+  },
+  "cpu":  {
+    "count":  1,
+    "countLogical":  2
+  },
+  "gpu_nvidia":  [
+    {
+      "name":  "Tesla T4",
+      "memoryTotal":  "16106127360",
+      "cudaCores":  2560,
+      "architecture":  "Turing"
+    }
+  ],
+  "cudaVersion":  "12.2"
+}

wandb/run-20241223_062433-kk1dm8nx/logs/debug-core.log ADDED Viewed

	@@ -0,0 +1,7 @@

+{"time":"2024-12-23T06:12:50.975330129Z","level":"INFO","msg":"started logging, with flags","port-filename":"/tmp/tmpuo6nfebj/port-632.txt","pid":632,"debug":false,"disable-analytics":false}
+{"time":"2024-12-23T06:12:50.975390635Z","level":"INFO","msg":"FeatureState","shutdownOnParentExitEnabled":false}
+{"time":"2024-12-23T06:12:50.983185882Z","level":"INFO","msg":"Will exit if parent process dies.","ppid":632}
+{"time":"2024-12-23T06:12:50.983291852Z","level":"INFO","msg":"server is running","addr":{"IP":"127.0.0.1","Port":39815,"Zone":""}}
+{"time":"2024-12-23T06:12:51.175371695Z","level":"INFO","msg":"connection: ManageConnectionData: new connection created","id":"127.0.0.1:43168"}
+{"time":"2024-12-23T06:24:33.40327228Z","level":"INFO","msg":"handleInformInit: received","streamId":"kk1dm8nx","id":"127.0.0.1:43168"}
+{"time":"2024-12-23T06:24:33.515538151Z","level":"INFO","msg":"handleInformInit: stream started","streamId":"kk1dm8nx","id":"127.0.0.1:43168"}

wandb/run-20241223_062433-kk1dm8nx/logs/debug-internal.log ADDED Viewed

	@@ -0,0 +1,14 @@

+{"time":"2024-12-23T06:24:33.403481622Z","level":"INFO","msg":"using version","core version":"0.19.1"}
+{"time":"2024-12-23T06:24:33.403525688Z","level":"INFO","msg":"created symlink","path":"/content/smollm-fine-tuning/wandb/run-20241223_062433-kk1dm8nx/logs/debug-core.log"}
+{"time":"2024-12-23T06:24:33.515469919Z","level":"INFO","msg":"created new stream","id":"kk1dm8nx"}
+{"time":"2024-12-23T06:24:33.515527717Z","level":"INFO","msg":"stream: started","id":"kk1dm8nx"}
+{"time":"2024-12-23T06:24:33.515537628Z","level":"INFO","msg":"writer: Do: started","stream_id":"kk1dm8nx"}
+{"time":"2024-12-23T06:24:33.515733694Z","level":"INFO","msg":"handler: started","stream_id":"kk1dm8nx"}
+{"time":"2024-12-23T06:24:33.515756662Z","level":"INFO","msg":"sender: started","stream_id":"kk1dm8nx"}
+{"time":"2024-12-23T06:24:36.718018301Z","level":"INFO","msg":"Starting system monitor"}
+{"time":"2024-12-23T06:42:14.432322766Z","level":"INFO","msg":"Pausing system monitor"}
+{"time":"2024-12-23T06:58:34.001396346Z","level":"INFO","msg":"Resuming system monitor"}
+{"time":"2024-12-23T06:58:34.115544844Z","level":"INFO","msg":"Pausing system monitor"}
+{"time":"2024-12-23T06:59:50.035860315Z","level":"INFO","msg":"Resuming system monitor"}
+{"time":"2024-12-23T06:59:57.29236364Z","level":"INFO","msg":"Pausing system monitor"}
+{"time":"2024-12-23T07:00:55.04796864Z","level":"INFO","msg":"Resuming system monitor"}

wandb/run-20241223_062433-kk1dm8nx/logs/debug.log ADDED Viewed

	@@ -0,0 +1,36 @@

+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_setup.py:_flush():68] Current SDK version is 0.19.1
+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_setup.py:_flush():68] Configure stats pid to 632
+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_setup.py:_flush():68] Loading settings from /root/.config/wandb/settings
+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_setup.py:_flush():68] Loading settings from /content/smollm-fine-tuning/wandb/settings
+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_setup.py:_flush():68] Loading settings from environment variables
+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_init.py:_log_setup():528] Logging user logs to /content/smollm-fine-tuning/wandb/run-20241223_062433-kk1dm8nx/logs/debug.log
+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_init.py:_log_setup():529] Logging internal logs to /content/smollm-fine-tuning/wandb/run-20241223_062433-kk1dm8nx/logs/debug-internal.log
+2024-12-23 06:24:33,393 INFO    MainThread:632 [wandb_init.py:_jupyter_setup():474] configuring jupyter hooks <wandb.sdk.wandb_init._WandbInit object at 0x78e9883258d0>
+2024-12-23 06:24:33,394 INFO    MainThread:632 [wandb_init.py:init():644] calling init triggers
+2024-12-23 06:24:33,394 INFO    MainThread:632 [wandb_init.py:init():650] wandb.init called with sweep_config: {}
+config: {}
+2024-12-23 06:24:33,394 INFO    MainThread:632 [wandb_init.py:init():680] starting backend
+2024-12-23 06:24:33,394 INFO    MainThread:632 [wandb_init.py:init():684] sending inform_init request
+2024-12-23 06:24:33,400 INFO    MainThread:632 [backend.py:_multiprocessing_setup():104] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
+2024-12-23 06:24:33,400 INFO    MainThread:632 [wandb_init.py:init():697] backend started and connected
+2024-12-23 06:24:33,414 INFO    MainThread:632 [wandb_run.py:_label_probe_notebook():1222] probe notebook
+2024-12-23 06:24:36,588 INFO    MainThread:632 [wandb_init.py:init():790] updated telemetry
+2024-12-23 06:24:36,594 INFO    MainThread:632 [wandb_init.py:init():822] communicating run to backend with 90.0 second timeout
+2024-12-23 06:24:36,712 INFO    MainThread:632 [wandb_init.py:init():874] starting run threads in backend
+2024-12-23 06:24:37,146 INFO    MainThread:632 [wandb_run.py:_console_start():2374] atexit reg
+2024-12-23 06:24:37,147 INFO    MainThread:632 [wandb_run.py:_redirect():2224] redirect: wrap_raw
+2024-12-23 06:24:37,147 INFO    MainThread:632 [wandb_run.py:_redirect():2289] Wrapping output streams.
+2024-12-23 06:24:37,147 INFO    MainThread:632 [wandb_run.py:_redirect():2314] Redirects installed.
+2024-12-23 06:24:37,152 INFO    MainThread:632 [wandb_init.py:init():916] run started, returning control to user process
+2024-12-23 06:24:37,156 INFO    MainThread:632 [wandb_run.py:_config_callback():1279] config_cb None None {'vocab_size': 49152, 'max_position_embeddings': 8192, 'hidden_size': 576, 'intermediate_size': 1536, 'num_hidden_layers': 30, 'num_attention_heads': 9, 'num_key_value_heads': 3, 'hidden_act': 'silu', 'initializer_range': 0.041666666666666664, 'rms_norm_eps': 1e-05, 'pretraining_tp': 1, 'use_cache': True, 'rope_theta': 100000, 'rope_scaling': None, 'attention_bias': False, 'attention_dropout': 0.0, 'mlp_bias': False, 'head_dim': 64, 'return_dict': True, 'output_hidden_states': False, 'output_attentions': False, 'torchscript': False, 'torch_dtype': 'bfloat16', 'use_bfloat16': False, 'tf_legacy_loss': False, 'pruned_heads': {}, 'tie_word_embeddings': True, 'chunk_size_feed_forward': 0, 'is_encoder_decoder': False, 'is_decoder': False, 'cross_attention_hidden_size': None, 'add_cross_attention': False, 'tie_encoder_decoder': False, 'max_length': 20, 'min_length': 0, 'do_sample': False, 'early_stopping': False, 'num_beams': 1, 'num_beam_groups': 1, 'diversity_penalty': 0.0, 'temperature': 1.0, 'top_k': 50, 'top_p': 1.0, 'typical_p': 1.0, 'repetition_penalty': 1.0, 'length_penalty': 1.0, 'no_repeat_ngram_size': 0, 'encoder_no_repeat_ngram_size': 0, 'bad_words_ids': None, 'num_return_sequences': 1, 'output_scores': False, 'return_dict_in_generate': False, 'forced_bos_token_id': None, 'forced_eos_token_id': None, 'remove_invalid_values': False, 'exponential_decay_length_penalty': None, 'suppress_tokens': None, 'begin_suppress_tokens': None, 'architectures': ['LlamaForCausalLM'], 'finetuning_task': None, 'id2label': {0: 'LABEL_0', 1: 'LABEL_1'}, 'label2id': {'LABEL_0': 0, 'LABEL_1': 1}, 'tokenizer_class': None, 'prefix': None, 'bos_token_id': 1, 'pad_token_id': 2, 'eos_token_id': 2, 'sep_token_id': None, 'decoder_start_token_id': None, 'task_specific_params': None, 'problem_type': None, '_name_or_path': 'HuggingFaceTB/SmolLM2-135M', '_attn_implementation_autoset': True, 'transformers_version': '4.47.1', 'is_llama_config': True, 'model_type': 'llama', 'rope_interleaved': False, 'output_dir': '/content/drive/MyDrive/smollm-fine-tuning/trained_models', 'overwrite_output_dir': False, 'do_train': False, 'do_eval': True, 'do_predict': False, 'eval_strategy': 'steps', 'prediction_loss_only': False, 'per_device_train_batch_size': 10, 'per_device_eval_batch_size': 8, 'per_gpu_train_batch_size': None, 'per_gpu_eval_batch_size': None, 'gradient_accumulation_steps': 1, 'eval_accumulation_steps': None, 'eval_delay': 0, 'torch_empty_cache_steps': None, 'learning_rate': 5e-05, 'weight_decay': 0.0, 'adam_beta1': 0.9, 'adam_beta2': 0.999, 'adam_epsilon': 1e-08, 'max_grad_norm': 1.0, 'num_train_epochs': 3.0, 'max_steps': 1000, 'lr_scheduler_type': 'linear', 'lr_scheduler_kwargs': {}, 'warmup_ratio': 0.0, 'warmup_steps': 0, 'log_level': 'passive', 'log_level_replica': 'warning', 'log_on_each_node': True, 'logging_dir': '/content/drive/MyDrive/smollm-fine-tuning/trained_models/runs/Dec23_06-10-57_f534e42ab4c2', 'logging_strategy': 'steps', 'logging_first_step': False, 'logging_steps': 10, 'logging_nan_inf_filter': True, 'save_strategy': 'steps', 'save_steps': 100, 'save_total_limit': None, 'save_safetensors': True, 'save_on_each_node': False, 'save_only_model': False, 'restore_callback_states_from_checkpoint': False, 'no_cuda': False, 'use_cpu': False, 'use_mps_device': False, 'seed': 42, 'data_seed': None, 'jit_mode_eval': False, 'use_ipex': False, 'bf16': False, 'fp16': False, 'fp16_opt_level': 'O1', 'half_precision_backend': 'auto', 'bf16_full_eval': False, 'fp16_full_eval': False, 'tf32': None, 'local_rank': 0, 'ddp_backend': None, 'tpu_num_cores': None, 'tpu_metrics_debug': False, 'debug': [], 'dataloader_drop_last': False, 'eval_steps': 50, 'dataloader_num_workers': 0, 'dataloader_prefetch_factor': None, 'past_index': -1, 'run_name': '/content/drive/MyDrive/smollm-fine-tuning/trained_models', 'disable_tqdm': False, 'remove_unused_columns': True, 'label_names': None, 'load_best_model_at_end': False, 'metric_for_best_model': None, 'greater_is_better': None, 'ignore_data_skip': False, 'fsdp': [], 'fsdp_min_num_params': 0, 'fsdp_config': {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, 'fsdp_transformer_layer_cls_to_wrap': None, 'accelerator_config': {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}, 'deepspeed': None, 'label_smoothing_factor': 0.0, 'optim': 'adamw_torch', 'optim_args': None, 'adafactor': False, 'group_by_length': False, 'length_column_name': 'length', 'report_to': ['tensorboard', 'wandb'], 'ddp_find_unused_parameters': None, 'ddp_bucket_cap_mb': None, 'ddp_broadcast_buffers': None, 'dataloader_pin_memory': True, 'dataloader_persistent_workers': False, 'skip_memory_metrics': True, 'use_legacy_prediction_loop': False, 'push_to_hub': False, 'resume_from_checkpoint': None, 'hub_model_id': 'smollm2-sft-test1', 'hub_strategy': 'every_save', 'hub_token': '<HUB_TOKEN>', 'hub_private_repo': None, 'hub_always_push': False, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': None, 'include_inputs_for_metrics': False, 'include_for_metrics': [], 'eval_do_concat_batches': True, 'fp16_backend': 'auto', 'evaluation_strategy': 'steps', 'push_to_hub_model_id': None, 'push_to_hub_organization': None, 'push_to_hub_token': '<PUSH_TO_HUB_TOKEN>', 'mp_parameters': '', 'auto_find_batch_size': False, 'full_determinism': False, 'torchdynamo': None, 'ray_scope': 'last', 'ddp_timeout': 1800, 'torch_compile': False, 'torch_compile_backend': None, 'torch_compile_mode': None, 'dispatch_batches': None, 'split_batches': None, 'include_tokens_per_second': False, 'include_num_input_tokens_seen': False, 'neftune_noise_alpha': None, 'optim_target_modules': None, 'batch_eval_metrics': False, 'eval_on_start': False, 'use_liger_kernel': False, 'eval_use_gather_object': False, 'average_tokens_across_devices': False, 'dataset_text_field': 'text', 'packing': False, 'max_seq_length': 1024, 'dataset_num_proc': None, 'dataset_batch_size': 1000, 'model_init_kwargs': None, 'dataset_kwargs': {'add_special_tokens': False}, 'eval_packing': None, 'num_of_sequences': 1024, 'chars_per_token': '<CHARS_PER_TOKEN>', 'use_liger': False}
+2024-12-23 06:24:37,159 INFO    MainThread:632 [wandb_config.py:__setitem__():154] config set model/num_parameters = 134515008 - <bound method Run._config_callback of <wandb.sdk.wandb_run.Run object at 0x78e95840efe0>>
+2024-12-23 06:24:37,159 INFO    MainThread:632 [wandb_run.py:_config_callback():1279] config_cb model/num_parameters 134515008 None
+2024-12-23 06:42:14,430 INFO    MainThread:632 [jupyter.py:save_ipynb():386] not saving jupyter notebook
+2024-12-23 06:42:14,431 INFO    MainThread:632 [wandb_init.py:_pause_backend():439] pausing backend
+2024-12-23 06:58:34,000 INFO    MainThread:632 [wandb_init.py:_resume_backend():444] resuming backend
+2024-12-23 06:58:34,115 INFO    MainThread:632 [jupyter.py:save_ipynb():386] not saving jupyter notebook
+2024-12-23 06:58:34,115 INFO    MainThread:632 [wandb_init.py:_pause_backend():439] pausing backend
+2024-12-23 06:59:50,031 INFO    MainThread:632 [wandb_init.py:_resume_backend():444] resuming backend
+2024-12-23 06:59:57,291 INFO    MainThread:632 [jupyter.py:save_ipynb():386] not saving jupyter notebook
+2024-12-23 06:59:57,291 INFO    MainThread:632 [wandb_init.py:_pause_backend():439] pausing backend
+2024-12-23 07:00:55,043 INFO    MainThread:632 [wandb_init.py:_resume_backend():444] resuming backend

wandb/run-20241223_062433-kk1dm8nx/run-kk1dm8nx.wandb ADDED Viewed

Binary file (164 kB). View file