Instructions to use CMU-AIR2/code-ctrl-gh-mixture-240419 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CMU-AIR2/code-ctrl-gh-mixture-240419 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="CMU-AIR2/code-ctrl-gh-mixture-240419")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("CMU-AIR2/code-ctrl-gh-mixture-240419")
model = AutoModelForCausalLM.from_pretrained("CMU-AIR2/code-ctrl-gh-mixture-240419")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use CMU-AIR2/code-ctrl-gh-mixture-240419 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CMU-AIR2/code-ctrl-gh-mixture-240419"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CMU-AIR2/code-ctrl-gh-mixture-240419",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/CMU-AIR2/code-ctrl-gh-mixture-240419

SGLang

How to use CMU-AIR2/code-ctrl-gh-mixture-240419 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CMU-AIR2/code-ctrl-gh-mixture-240419" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CMU-AIR2/code-ctrl-gh-mixture-240419",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CMU-AIR2/code-ctrl-gh-mixture-240419" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CMU-AIR2/code-ctrl-gh-mixture-240419",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use CMU-AIR2/code-ctrl-gh-mixture-240419 with Docker Model Runner:
```
docker model run hf.co/CMU-AIR2/code-ctrl-gh-mixture-240419
```

cterdam commited on Apr 18, 2024

Commit

9f14619

verified ·

1 Parent(s): a61b90a

Upload folder using huggingface_hub

Browse files

Files changed (13) hide show

command.txt +1 -0
config.json +31 -0
generation_config.json +6 -0
model-00001-of-00002.safetensors +3 -0
model-00002-of-00002.safetensors +3 -0
model.safetensors.index.json +226 -0
optimizer.pt +3 -0
rng_state.pth +3 -0
run_config.json +602 -0
scheduler.pt +3 -0
trainer_state.json +387 -0
training_args.bin +3 -0
training_logs.txt +124 -0

command.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ main.py deepseek None runs/deepseek-ctrl-gh/checkpoint-15000

config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "_name_or_path": "runs/deepseek-ctrl-gh/checkpoint-15000",
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 32013,
+  "eos_token_id": 32021,
+  "hidden_act": "silu",
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 5504,
+  "max_position_embeddings": 16384,
+  "model_type": "llama",
+  "num_attention_heads": 16,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 16,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-06,
+  "rope_scaling": {
+    "factor": 4.0,
+    "type": "linear"
+  },
+  "rope_theta": 100000,
+  "tie_word_embeddings": false,
+  "torch_dtype": "float32",
+  "transformers_version": "4.38.2",
+  "use_cache": true,
+  "vocab_size": 32256
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 32013,
+  "eos_token_id": 32021,
+  "transformers_version": "4.38.2"
+}

model-00001-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f6d19cff5cedf87734ca26cc1433c8416fc0acccb1d6e0a72416902d5f93efb9
+size 4986380064

model-00002-of-00002.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:47f52fb9188137c93237bc2d29f5bad8c88cc585613e0b2dbad3afb23a3c2f22
+size 399532808

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,226 @@

+{
+  "metadata": {
+    "total_size": 5385887744
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00002-of-00002.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00002-of-00002.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00002.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00002.safetensors",
+    "model.norm.weight": "model-00002-of-00002.safetensors"
+  }
+}

optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e8b617ab3f5d2199c0f65179969c8a22980f3543a4d0a52b05caea65094020bb
+size 2699039674

rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c52295d3807e2e59216b7e1ea6b0ab41cccc585ef6a638edbc526508897829a6
+size 14180

run_config.json ADDED Viewed

	@@ -0,0 +1,602 @@

+{
+    "model": {
+        "codellama": {
+            "base_model_id": "codellama/CodeLlama-7b-hf",
+            "quantitize": "int8",
+            "dataset": "Arithmetic_Simple",
+            "data_collator": "DataCollatorForSeq2Seq",
+            "lora_config": {
+                "r": 16,
+                "lora_alpha": 16,
+                "target_modules": [
+                    "q_proj",
+                    "k_proj",
+                    "v_proj",
+                    "o_proj",
+                    "gate_proj",
+                    "up_proj",
+                    "down_proj"
+                ],
+                "lora_dropout": 0.05,
+                "bias": "none",
+                "task_type": "CAUSAL_LM"
+            },
+            "training_args": {
+                "output_dir": "codellama-output",
+                "warmup_steps": 100,
+                "per_device_train_batch_size": 1,
+                "per_device_eval_batch_size": 1,
+                "gradient_accumulation_steps": 4,
+                "max_steps": 10000,
+                "learning_rate": 0.0003,
+                "optim": "adamw_torch",
+                "logging_dir": "codellama-output-logs",
+                "logging_steps": 10,
+                "save_strategy": "steps",
+                "save_steps": 500,
+                "load_best_model_at_end": false,
+                "group_by_length": true,
+                "fp16": true,
+                "evaluation_strategy": "steps",
+                "eval_steps": 1000
+            },
+            "tokenizer": {
+                "tokenize_config": {
+                    "truncation": true,
+                    "max_length": 192,
+                    "padding": "max_length"
+                },
+                "prompt_template": "config/qa_template.txt"
+            }
+        },
+        "phi-2": {
+            "base_model_id": "microsoft/phi-2",
+            "quantitize": "fp16",
+            "dataset": "Arithmetic_Simple",
+            "data_collator": "DataCollatorForLanguageModeling",
+            "lora_config": {
+                "r": 32,
+                "lora_alpha": 64,
+                "target_modules": [
+                    "q_proj",
+                    "k_proj",
+                    "v_proj",
+                    "dense",
+                    "fc1",
+                    "fc2"
+                ],
+                "bias": "none",
+                "lora_dropout": 0.05,
+                "task_type": "CAUSAL_LM"
+            },
+            "training_args": {
+                "output_dir": "phi2-output",
+                "warmup_steps": 500,
+                "per_device_train_batch_size": 1,
+                "per_device_eval_batch_size": 1,
+                "gradient_accumulation_steps": 4,
+                "max_steps": 100000,
+                "learning_rate": 0.0003,
+                "optim": "paged_adamw_8bit",
+                "logging_dir": "phi2-output-logs",
+                "logging_steps": 100,
+                "save_strategy": "steps",
+                "save_steps": 500,
+                "evaluation_strategy": "steps",
+                "eval_steps": 500,
+                "fp16": true
+            },
+            "tokenizer": {
+                "tokenize_config": {
+                    "truncation": true,
+                    "max_length": 512,
+                    "padding": "max_length"
+                },
+                "prompt_template": "config/qa_template.txt"
+            }
+        },
+        "deepseek": {
+            "base_model_id": "deepseek-ai/deepseek-coder-1.3b-instruct",
+            "quantitize": "fp16",
+            "dataset": "mixture_codegen",
+            "data_collator": "DataCollatorForLanguageModeling",
+            "lora_config": {
+                "r": 32,
+                "lora_alpha": 64,
+                "target_modules": [
+                    "q_proj",
+                    "k_proj",
+                    "v_proj",
+                    "o_proj",
+                    "gate_proj",
+                    "up_proj",
+                    "down_proj"
+                ],
+                "bias": "none",
+                "lora_dropout": 0.05,
+                "task_type": "CAUSAL_LM"
+            },
+            "lora_large_config": {
+                "r": 128,
+                "lora_alpha": 256,
+                "target_modules": [
+                    "q_proj",
+                    "k_proj",
+                    "v_proj",
+                    "o_proj",
+                    "gate_proj",
+                    "up_proj",
+                    "down_proj"
+                ],
+                "bias": "none",
+                "lora_dropout": 0.05,
+                "task_type": "CAUSAL_LM"
+            },
+            "p_tuning_config": {
+                "num_virtual_tokens": 16,
+                "num_transformer_submodules": 1,
+                "token_dim": 2048,
+                "encoder_hidden_size": 2048,
+                "task_type": "CAUSAL_LM"
+            },
+            "training_args": {
+                "output_dir": "runs/deepseek-ctrl-gh-mixture",
+                "warmup_steps": 0,
+                "per_device_train_batch_size": 4,
+                "per_device_eval_batch_size": 4,
+                "gradient_accumulation_steps": 8,
+                "max_steps": 5000,
+                "learning_rate": 2e-05,
+                "optim": "paged_adamw_8bit",
+                "logging_dir": "runs/deepseek-ctrl-gh-mixture/logs",
+                "logging_steps": 100,
+                "save_strategy": "steps",
+                "save_steps": 2500,
+                "evaluation_strategy": "steps",
+                "eval_steps": 2500,
+                "weight_decay": 0.01,
+                "fp16": true
+            },
+            "tokenizer": {
+                "tokenize_config": {
+                    "truncation": true,
+                    "max_length": 1024,
+                    "padding": "max_length"
+                },
+                "prompt_template": "config/qa_template.txt"
+            }
+        }
+    },
+    "dataset": {
+        "simple_dataset": {
+            "type": "huggingface",
+            "dataset_purpose": "downstream",
+            "name": "b-mc2/sql-create-context",
+            "train_split": 0.9,
+            "max_train_size": 100,
+            "filling_field": [
+                "question",
+                "context",
+                "answer"
+            ]
+        },
+        "testdset": {
+            "type": "local",
+            "dataset_purpose": "downstream",
+            "train_file": "data/Test/TestDataset.json",
+            "val_file": "data/Test/TestDataset.json",
+            "test_file": "data/Test/TestDataset.json",
+            "filling_field": [
+                "prompted_question",
+                "answer"
+            ]
+        },
+        "mixture_codegen": {
+            "filling_field": [
+                "Question",
+                "Answer"
+            ],
+            "dataset_purpose": "downstream"
+        },
+        "MathQA_Python_loader": {
+            "type": "list-like",
+            "dataset_purpose": "downstream",
+            "train": "data/MathQA_Python_processed/mathqa_python_train_clean_final.json",
+            "val": "data/MathQA_Python_processed/mathqa_python_dev_clean_final.json",
+            "test": "data/MathQA_Python_processed/mathqa_python_test_clean_final.json",
+            "filling_field": [
+                "Question",
+                "Answer"
+            ]
+        },
+        "APPS_loader": {
+            "type": "list-like",
+            "dataset_purpose": "downstream",
+            "train": "data/APPS/apps_train.json",
+            "val": "data/APPS/apps_dev.json",
+            "test": "data/APPS/test/apps_test_75.json",
+            "filling_field": [
+                "Question",
+                "Answer"
+            ]
+        },
+        "MBPP_loader": {
+            "type": "list-like",
+            "dataset_purpose": "downstream",
+            "train": "data/MBPP/mbpp_train.json",
+            "val": "data/MBPP/mbpp_dev.json",
+            "test": "data/MBPP/mbpp_test.json",
+            "filling_field": [
+                "Question",
+                "Answer"
+            ]
+        },
+        "Arithmetic_Simple": {
+            "type": "list-like",
+            "dataset_purpose": "downstream",
+            "attributes": {
+                "subjects": [
+                    1,
+                    2,
+                    3,
+                    4,
+                    5,
+                    6,
+                    7,
+                    8,
+                    9
+                ],
+                "lessons": [
+                    "Max_Ops1_Bounds0_100",
+                    "Max_Ops1_Bounds0_1000",
+                    "Max_Ops2_Bounds0_100",
+                    "Max_Ops2_Bounds0_1000",
+                    "Max_Ops3_Bounds0_100",
+                    "Max_Ops3_Bounds0_1000",
+                    "Max_Ops4_Bounds0_100",
+                    "Max_Ops4_Bounds0_1000",
+                    "Max_Ops5_Bounds0_100",
+                    "Max_Ops5_Bounds0_1000"
+                ]
+            },
+            "train": "data/Arithmetic/Curriculum_Simple",
+            "val": "data/Arithmetic/Curriculum_Simple",
+            "test": "data/Arithmetic/Curriculum_Simple",
+            "filling_field": [
+                "Question",
+                "Answer"
+            ]
+        },
+        "Arithmetic_Hard": {
+            "type": "list-like",
+            "dataset_purpose": "downstream",
+            "attributes": {
+                "subjects": [
+                    1,
+                    2,
+                    3,
+                    4,
+                    5,
+                    6,
+                    7,
+                    8,
+                    9
+                ],
+                "lessons": [
+                    "Max_Ops1_Bounds-1000_1000",
+                    "Max_Ops1_Bounds-100_100",
+                    "Max_Ops1_Bounds0_100",
+                    "Max_Ops1_Bounds0_1000",
+                    "Max_Ops2_Bounds-1000_1000",
+                    "Max_Ops2_Bounds-100_100",
+                    "Max_Ops2_Bounds0_100",
+                    "Max_Ops2_Bounds0_1000",
+                    "Max_Ops3_Bounds-1000_1000",
+                    "Max_Ops3_Bounds-100_100",
+                    "Max_Ops3_Bounds0_100",
+                    "Max_Ops3_Bounds0_1000",
+                    "Max_Ops4_Bounds-1000_1000",
+                    "Max_Ops4_Bounds-100_100",
+                    "Max_Ops4_Bounds0_100",
+                    "Max_Ops4_Bounds0_1000",
+                    "Max_Ops5_Bounds-1000_1000",
+                    "Max_Ops5_Bounds-100_100",
+                    "Max_Ops5_Bounds0_100",
+                    "Max_Ops5_Bounds0_1000",
+                    "Max_Ops6_Bounds-1000_1000",
+                    "Max_Ops6_Bounds-100_100",
+                    "Max_Ops6_Bounds0_100",
+                    "Max_Ops6_Bounds0_1000",
+                    "Max_Ops7_Bounds-1000_1000",
+                    "Max_Ops7_Bounds-100_100",
+                    "Max_Ops7_Bounds0_100",
+                    "Max_Ops7_Bounds0_1000",
+                    "Max_Ops8_Bounds-1000_1000",
+                    "Max_Ops8_Bounds-100_100",
+                    "Max_Ops8_Bounds0_100",
+                    "Max_Ops8_Bounds0_1000",
+                    "Max_Ops9_Bounds-1000_1000",
+                    "Max_Ops9_Bounds-100_100",
+                    "Max_Ops9_Bounds0_100",
+                    "Max_Ops9_Bounds0_1000",
+                    "Max_Ops10_Bounds-1000_1000",
+                    "Max_Ops10_Bounds-100_100",
+                    "Max_Ops10_Bounds0_100",
+                    "Max_Ops10_Bounds0_1000"
+                ]
+            },
+            "train": "data/Arithmetic/Curriculum_Hard",
+            "val": "data/Arithmetic/Curriculum_Hard",
+            "test": "data/Arithmetic/Curriculum_Hard",
+            "filling_field": [
+                "Question",
+                "Answer"
+            ]
+        },
+        "Arithmetic_Hard_prompt_C11": {
+            "type": "list-like",
+            "dataset_purpose": "downstream",
+            "attributes": {
+                "subjects": [
+                    1,
+                    2,
+                    3,
+                    4,
+                    5,
+                    6,
+                    7,
+                    8,
+                    9
+                ],
+                "lessons": [
+                    "Max_Ops1_Bounds-1000_1000",
+                    "Max_Ops1_Bounds-100_100",
+                    "Max_Ops1_Bounds0_100",
+                    "Max_Ops1_Bounds0_1000",
+                    "Max_Ops2_Bounds-1000_1000",
+                    "Max_Ops2_Bounds-100_100",
+                    "Max_Ops2_Bounds0_100",
+                    "Max_Ops2_Bounds0_1000",
+                    "Max_Ops3_Bounds-1000_1000",
+                    "Max_Ops3_Bounds-100_100",
+                    "Max_Ops3_Bounds0_100",
+                    "Max_Ops3_Bounds0_1000",
+                    "Max_Ops4_Bounds-1000_1000",
+                    "Max_Ops4_Bounds-100_100",
+                    "Max_Ops4_Bounds0_100",
+                    "Max_Ops4_Bounds0_1000",
+                    "Max_Ops5_Bounds-1000_1000",
+                    "Max_Ops5_Bounds-100_100",
+                    "Max_Ops5_Bounds0_100",
+                    "Max_Ops5_Bounds0_1000",
+                    "Max_Ops6_Bounds-1000_1000",
+                    "Max_Ops6_Bounds-100_100",
+                    "Max_Ops6_Bounds0_100",
+                    "Max_Ops6_Bounds0_1000",
+                    "Max_Ops7_Bounds-1000_1000",
+                    "Max_Ops7_Bounds-100_100",
+                    "Max_Ops7_Bounds0_100",
+                    "Max_Ops7_Bounds0_1000",
+                    "Max_Ops8_Bounds-1000_1000",
+                    "Max_Ops8_Bounds-100_100",
+                    "Max_Ops8_Bounds0_100",
+                    "Max_Ops8_Bounds0_1000",
+                    "Max_Ops9_Bounds-1000_1000",
+                    "Max_Ops9_Bounds-100_100",
+                    "Max_Ops9_Bounds0_100",
+                    "Max_Ops9_Bounds0_1000",
+                    "Max_Ops10_Bounds-1000_1000",
+                    "Max_Ops10_Bounds-100_100",
+                    "Max_Ops10_Bounds0_100",
+                    "Max_Ops10_Bounds0_1000"
+                ]
+            },
+            "train": "data/Arithmetic/Curriculum_Hard",
+            "val": "data/Arithmetic/Curriculum_Hard",
+            "test": "data/Arithmetic/Curriculum_Hard",
+            "filling_field": [
+                "Question",
+                "Answer"
+            ]
+        },
+        "Arithmetic_Hard_prompt_C12": {
+            "type": "list-like",
+            "dataset_purpose": "downstream",
+            "attributes": {
+                "subjects": [
+                    7,
+                    9
+                ],
+                "lessons": [
+                    "Max_Ops1_Bounds-1000_1000",
+                    "Max_Ops1_Bounds-100_100",
+                    "Max_Ops1_Bounds0_100",
+                    "Max_Ops1_Bounds0_1000",
+                    "Max_Ops2_Bounds-1000_1000",
+                    "Max_Ops2_Bounds-100_100",
+                    "Max_Ops2_Bounds0_100",
+                    "Max_Ops2_Bounds0_1000",
+                    "Max_Ops3_Bounds-1000_1000",
+                    "Max_Ops3_Bounds-100_100",
+                    "Max_Ops3_Bounds0_100",
+                    "Max_Ops3_Bounds0_1000",
+                    "Max_Ops4_Bounds-1000_1000",
+                    "Max_Ops4_Bounds-100_100",
+                    "Max_Ops4_Bounds0_100",
+                    "Max_Ops4_Bounds0_1000",
+                    "Max_Ops5_Bounds-1000_1000",
+                    "Max_Ops5_Bounds-100_100",
+                    "Max_Ops5_Bounds0_100",
+                    "Max_Ops5_Bounds0_1000",
+                    "Max_Ops6_Bounds-1000_1000",
+                    "Max_Ops6_Bounds-100_100",
+                    "Max_Ops6_Bounds0_100",
+                    "Max_Ops6_Bounds0_1000",
+                    "Max_Ops7_Bounds-1000_1000",
+                    "Max_Ops7_Bounds-100_100",
+                    "Max_Ops7_Bounds0_100",
+                    "Max_Ops7_Bounds0_1000",
+                    "Max_Ops8_Bounds-1000_1000",
+                    "Max_Ops8_Bounds-100_100",
+                    "Max_Ops8_Bounds0_100",
+                    "Max_Ops8_Bounds0_1000",
+                    "Max_Ops9_Bounds-1000_1000",
+                    "Max_Ops9_Bounds-100_100",
+                    "Max_Ops9_Bounds0_100",
+                    "Max_Ops9_Bounds0_1000",
+                    "Max_Ops10_Bounds-1000_1000",
+                    "Max_Ops10_Bounds-100_100",
+                    "Max_Ops10_Bounds0_100",
+                    "Max_Ops10_Bounds0_1000"
+                ]
+            },
+            "train": "data/Arithmetic/Curriculum_Hard",
+            "val": "data/Arithmetic/Curriculum_Hard",
+            "test": "data/Arithmetic/Curriculum_Hard",
+            "filling_field": [
+                "Question",
+                "Answer"
+            ]
+        },
+        "Arithmetic_XHard": {
+            "type": "list-like",
+            "dataset_purpose": "downstream",
+            "attributes": {
+                "subjects": [
+                    1,
+                    2,
+                    3,
+                    4,
+                    5,
+                    6,
+                    7,
+                    8,
+                    9
+                ],
+                "lessons": [
+                    "Max_Ops10_Bounds0_10000.json",
+                    "Max_Ops10_Bounds0_1000.json",
+                    "Max_Ops10_Bounds-10000_10000.json",
+                    "Max_Ops10_Bounds-1000_1000.json",
+                    "Max_Ops11_Bounds0_10000.json",
+                    "Max_Ops11_Bounds0_1000.json",
+                    "Max_Ops11_Bounds-10000_10000.json",
+                    "Max_Ops11_Bounds-1000_1000.json",
+                    "Max_Ops12_Bounds0_10000.json",
+                    "Max_Ops12_Bounds0_1000.json",
+                    "Max_Ops12_Bounds-10000_10000.json",
+                    "Max_Ops12_Bounds-1000_1000.json",
+                    "Max_Ops13_Bounds0_10000.json",
+                    "Max_Ops13_Bounds0_1000.json",
+                    "Max_Ops13_Bounds-10000_10000.json",
+                    "Max_Ops13_Bounds-1000_1000.json",
+                    "Max_Ops14_Bounds0_10000.json",
+                    "Max_Ops14_Bounds0_1000.json",
+                    "Max_Ops14_Bounds-10000_10000.json",
+                    "Max_Ops14_Bounds-1000_1000.json",
+                    "Max_Ops15_Bounds0_10000.json",
+                    "Max_Ops15_Bounds0_1000.json",
+                    "Max_Ops15_Bounds-10000_10000.json",
+                    "Max_Ops15_Bounds-1000_1000.json",
+                    "Max_Ops16_Bounds0_10000.json",
+                    "Max_Ops16_Bounds0_1000.json",
+                    "Max_Ops16_Bounds-10000_10000.json",
+                    "Max_Ops16_Bounds-1000_1000.json",
+                    "Max_Ops17_Bounds0_10000.json",
+                    "Max_Ops17_Bounds0_1000.json",
+                    "Max_Ops17_Bounds-10000_10000.json",
+                    "Max_Ops17_Bounds-1000_1000.json",
+                    "Max_Ops18_Bounds0_10000.json",
+                    "Max_Ops18_Bounds0_1000.json",
+                    "Max_Ops18_Bounds-10000_10000.json",
+                    "Max_Ops18_Bounds-1000_1000.json",
+                    "Max_Ops19_Bounds0_10000.json",
+                    "Max_Ops19_Bounds0_1000.json",
+                    "Max_Ops19_Bounds-10000_10000.json",
+                    "Max_Ops19_Bounds-1000_1000.json",
+                    "Max_Ops1_Bounds0_10000.json",
+                    "Max_Ops1_Bounds0_1000.json",
+                    "Max_Ops1_Bounds-10000_10000.json",
+                    "Max_Ops1_Bounds-1000_1000.json",
+                    "Max_Ops20_Bounds0_10000.json",
+                    "Max_Ops20_Bounds0_1000.json",
+                    "Max_Ops20_Bounds-10000_10000.json",
+                    "Max_Ops20_Bounds-1000_1000.json",
+                    "Max_Ops2_Bounds0_10000.json",
+                    "Max_Ops2_Bounds0_1000.json",
+                    "Max_Ops2_Bounds-10000_10000.json",
+                    "Max_Ops2_Bounds-1000_1000.json",
+                    "Max_Ops3_Bounds0_10000.json",
+                    "Max_Ops3_Bounds0_1000.json",
+                    "Max_Ops3_Bounds-10000_10000.json",
+                    "Max_Ops3_Bounds-1000_1000.json",
+                    "Max_Ops4_Bounds0_10000.json",
+                    "Max_Ops4_Bounds0_1000.json",
+                    "Max_Ops4_Bounds-10000_10000.json",
+                    "Max_Ops4_Bounds-1000_1000.json",
+                    "Max_Ops5_Bounds0_10000.json",
+                    "Max_Ops5_Bounds0_1000.json",
+                    "Max_Ops5_Bounds-10000_10000.json",
+                    "Max_Ops5_Bounds-1000_1000.json",
+                    "Max_Ops6_Bounds0_10000.json",
+                    "Max_Ops6_Bounds0_1000.json",
+                    "Max_Ops6_Bounds-10000_10000.json",
+                    "Max_Ops6_Bounds-1000_1000.json",
+                    "Max_Ops7_Bounds0_10000.json",
+                    "Max_Ops7_Bounds0_1000.json",
+                    "Max_Ops7_Bounds-10000_10000.json",
+                    "Max_Ops7_Bounds-1000_1000.json",
+                    "Max_Ops8_Bounds0_10000.json",
+                    "Max_Ops8_Bounds0_1000.json",
+                    "Max_Ops8_Bounds-10000_10000.json",
+                    "Max_Ops8_Bounds-1000_1000.json",
+                    "Max_Ops9_Bounds0_10000.json",
+                    "Max_Ops9_Bounds0_1000.json",
+                    "Max_Ops9_Bounds-10000_10000.json",
+                    "Max_Ops9_Bounds-1000_1000.json"
+                ]
+            },
+            "train": "data/Arithmetic/Curriculum_XHard",
+            "val": "data/Arithmetic/Curriculum_XHard",
+            "test": "data/Arithmetic/Curriculum_XHard",
+            "filling_field": [
+                "Question",
+                "Answer"
+            ]
+        },
+        "GSM8K": {
+            "type": "local",
+            "dataset_purpose": "downstream",
+            "train_file": "data/GSM8K/GSM8K_train.json",
+            "val_file": "data/GSM8K/GSM8K_test.json",
+            "test_file": "data/GSM8K/GSM8K_dev.json",
+            "filling_field": [
+                "Body",
+                "Question",
+                "Answer"
+            ]
+        },
+        "APPS": {
+            "type": "local",
+            "dataset_purpose": "downstream",
+            "train_file": "data/APPS/apps_train.json",
+            "val_file": "data/APPS/apps_test.json",
+            "test_file": "data/APPS/apps_dev.json",
+            "filling_field": [
+                "Body",
+                "Question",
+                "Answer"
+            ]
+        },
+        "ghcode_python": {
+            "type": "huggingface",
+            "dataset_purpose": "pretrain",
+            "name": "slseanwu/ghcode_python_split_700k",
+            "max_eval_size": 1000,
+            "max_train_size": 160000,
+            "filling_field": [
+                "code"
+            ]
+        }
+    }
+}

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2be67f6aac8e482bb2022409709d8774ffb125292c0c9cf025c0ae747f3a6d57
+size 1064

trainer_state.json ADDED Viewed

	@@ -0,0 +1,387 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 8.605851979345955,
+  "eval_steps": 2500,
+  "global_step": 5000,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.17,
+      "grad_norm": 1.0365092754364014,
+      "learning_rate": 1.9600000000000002e-05,
+      "loss": 0.4163,
+      "step": 100
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 1.212004542350769,
+      "learning_rate": 1.9200000000000003e-05,
+      "loss": 0.3793,
+      "step": 200
+    },
+    {
+      "epoch": 0.52,
+      "grad_norm": 1.066266417503357,
+      "learning_rate": 1.88e-05,
+      "loss": 0.3682,
+      "step": 300
+    },
+    {
+      "epoch": 0.69,
+      "grad_norm": 1.322099208831787,
+      "learning_rate": 1.8400000000000003e-05,
+      "loss": 0.3536,
+      "step": 400
+    },
+    {
+      "epoch": 0.86,
+      "grad_norm": 0.998599648475647,
+      "learning_rate": 1.8e-05,
+      "loss": 0.3282,
+      "step": 500
+    },
+    {
+      "epoch": 1.03,
+      "grad_norm": 1.5098826885223389,
+      "learning_rate": 1.76e-05,
+      "loss": 0.3092,
+      "step": 600
+    },
+    {
+      "epoch": 1.2,
+      "grad_norm": 1.05723237991333,
+      "learning_rate": 1.72e-05,
+      "loss": 0.2248,
+      "step": 700
+    },
+    {
+      "epoch": 1.38,
+      "grad_norm": 1.0882526636123657,
+      "learning_rate": 1.6800000000000002e-05,
+      "loss": 0.2239,
+      "step": 800
+    },
+    {
+      "epoch": 1.55,
+      "grad_norm": 1.1547547578811646,
+      "learning_rate": 1.64e-05,
+      "loss": 0.2342,
+      "step": 900
+    },
+    {
+      "epoch": 1.72,
+      "grad_norm": 1.1294739246368408,
+      "learning_rate": 1.6000000000000003e-05,
+      "loss": 0.2158,
+      "step": 1000
+    },
+    {
+      "epoch": 1.89,
+      "grad_norm": 0.9624162912368774,
+      "learning_rate": 1.5600000000000003e-05,
+      "loss": 0.2096,
+      "step": 1100
+    },
+    {
+      "epoch": 2.07,
+      "grad_norm": 1.1864293813705444,
+      "learning_rate": 1.5200000000000002e-05,
+      "loss": 0.1787,
+      "step": 1200
+    },
+    {
+      "epoch": 2.24,
+      "grad_norm": 1.1997874975204468,
+      "learning_rate": 1.48e-05,
+      "loss": 0.1246,
+      "step": 1300
+    },
+    {
+      "epoch": 2.41,
+      "grad_norm": 1.2120954990386963,
+      "learning_rate": 1.4400000000000001e-05,
+      "loss": 0.1197,
+      "step": 1400
+    },
+    {
+      "epoch": 2.58,
+      "grad_norm": 0.6992385983467102,
+      "learning_rate": 1.4e-05,
+      "loss": 0.1185,
+      "step": 1500
+    },
+    {
+      "epoch": 2.75,
+      "grad_norm": 1.0601509809494019,
+      "learning_rate": 1.3600000000000002e-05,
+      "loss": 0.1241,
+      "step": 1600
+    },
+    {
+      "epoch": 2.93,
+      "grad_norm": 1.1058382987976074,
+      "learning_rate": 1.3200000000000002e-05,
+      "loss": 0.1282,
+      "step": 1700
+    },
+    {
+      "epoch": 3.1,
+      "grad_norm": 1.1598687171936035,
+      "learning_rate": 1.2800000000000001e-05,
+      "loss": 0.0847,
+      "step": 1800
+    },
+    {
+      "epoch": 3.27,
+      "grad_norm": 1.2096168994903564,
+      "learning_rate": 1.2400000000000002e-05,
+      "loss": 0.0616,
+      "step": 1900
+    },
+    {
+      "epoch": 3.44,
+      "grad_norm": 1.5343897342681885,
+      "learning_rate": 1.2e-05,
+      "loss": 0.0645,
+      "step": 2000
+    },
+    {
+      "epoch": 3.61,
+      "grad_norm": 1.165819764137268,
+      "learning_rate": 1.16e-05,
+      "loss": 0.0652,
+      "step": 2100
+    },
+    {
+      "epoch": 3.79,
+      "grad_norm": 1.3763171434402466,
+      "learning_rate": 1.1200000000000001e-05,
+      "loss": 0.0619,
+      "step": 2200
+    },
+    {
+      "epoch": 3.96,
+      "grad_norm": 0.9929534792900085,
+      "learning_rate": 1.0800000000000002e-05,
+      "loss": 0.0612,
+      "step": 2300
+    },
+    {
+      "epoch": 4.13,
+      "grad_norm": 1.1144566535949707,
+      "learning_rate": 1.04e-05,
+      "loss": 0.038,
+      "step": 2400
+    },
+    {
+      "epoch": 4.3,
+      "grad_norm": 1.150139570236206,
+      "learning_rate": 1e-05,
+      "loss": 0.0311,
+      "step": 2500
+    },
+    {
+      "epoch": 4.3,
+      "eval_loss": 0.35374003648757935,
+      "eval_runtime": 84.8171,
+      "eval_samples_per_second": 11.79,
+      "eval_steps_per_second": 2.948,
+      "step": 2500
+    },
+    {
+      "epoch": 4.48,
+      "grad_norm": 1.4293252229690552,
+      "learning_rate": 9.600000000000001e-06,
+      "loss": 0.0308,
+      "step": 2600
+    },
+    {
+      "epoch": 4.65,
+      "grad_norm": 1.1352962255477905,
+      "learning_rate": 9.200000000000002e-06,
+      "loss": 0.0308,
+      "step": 2700
+    },
+    {
+      "epoch": 4.82,
+      "grad_norm": 1.0544779300689697,
+      "learning_rate": 8.8e-06,
+      "loss": 0.033,
+      "step": 2800
+    },
+    {
+      "epoch": 4.99,
+      "grad_norm": 1.110599160194397,
+      "learning_rate": 8.400000000000001e-06,
+      "loss": 0.0318,
+      "step": 2900
+    },
+    {
+      "epoch": 5.16,
+      "grad_norm": 0.7125316262245178,
+      "learning_rate": 8.000000000000001e-06,
+      "loss": 0.0147,
+      "step": 3000
+    },
+    {
+      "epoch": 5.34,
+      "grad_norm": 0.9172051548957825,
+      "learning_rate": 7.600000000000001e-06,
+      "loss": 0.0156,
+      "step": 3100
+    },
+    {
+      "epoch": 5.51,
+      "grad_norm": 0.9805625081062317,
+      "learning_rate": 7.2000000000000005e-06,
+      "loss": 0.0145,
+      "step": 3200
+    },
+    {
+      "epoch": 5.68,
+      "grad_norm": 0.5053761601448059,
+      "learning_rate": 6.800000000000001e-06,
+      "loss": 0.0149,
+      "step": 3300
+    },
+    {
+      "epoch": 5.85,
+      "grad_norm": 1.1218398809432983,
+      "learning_rate": 6.4000000000000006e-06,
+      "loss": 0.0168,
+      "step": 3400
+    },
+    {
+      "epoch": 6.02,
+      "grad_norm": 0.3119220733642578,
+      "learning_rate": 6e-06,
+      "loss": 0.0154,
+      "step": 3500
+    },
+    {
+      "epoch": 6.2,
+      "grad_norm": 0.23416651785373688,
+      "learning_rate": 5.600000000000001e-06,
+      "loss": 0.0065,
+      "step": 3600
+    },
+    {
+      "epoch": 6.37,
+      "grad_norm": 0.6167200803756714,
+      "learning_rate": 5.2e-06,
+      "loss": 0.0079,
+      "step": 3700
+    },
+    {
+      "epoch": 6.54,
+      "grad_norm": 1.1704833507537842,
+      "learning_rate": 4.800000000000001e-06,
+      "loss": 0.0067,
+      "step": 3800
+    },
+    {
+      "epoch": 6.71,
+      "grad_norm": 0.8806678056716919,
+      "learning_rate": 4.4e-06,
+      "loss": 0.0093,
+      "step": 3900
+    },
+    {
+      "epoch": 6.88,
+      "grad_norm": 0.30924132466316223,
+      "learning_rate": 4.000000000000001e-06,
+      "loss": 0.007,
+      "step": 4000
+    },
+    {
+      "epoch": 7.06,
+      "grad_norm": 0.46306928992271423,
+      "learning_rate": 3.6000000000000003e-06,
+      "loss": 0.0052,
+      "step": 4100
+    },
+    {
+      "epoch": 7.23,
+      "grad_norm": 0.46887511014938354,
+      "learning_rate": 3.2000000000000003e-06,
+      "loss": 0.0042,
+      "step": 4200
+    },
+    {
+      "epoch": 7.4,
+      "grad_norm": 0.902063250541687,
+      "learning_rate": 2.8000000000000003e-06,
+      "loss": 0.0031,
+      "step": 4300
+    },
+    {
+      "epoch": 7.57,
+      "grad_norm": 0.1910380870103836,
+      "learning_rate": 2.4000000000000003e-06,
+      "loss": 0.0029,
+      "step": 4400
+    },
+    {
+      "epoch": 7.75,
+      "grad_norm": 0.6202380657196045,
+      "learning_rate": 2.0000000000000003e-06,
+      "loss": 0.0032,
+      "step": 4500
+    },
+    {
+      "epoch": 7.92,
+      "grad_norm": 0.5730396509170532,
+      "learning_rate": 1.6000000000000001e-06,
+      "loss": 0.0034,
+      "step": 4600
+    },
+    {
+      "epoch": 8.09,
+      "grad_norm": 0.10635427385568619,
+      "learning_rate": 1.2000000000000002e-06,
+      "loss": 0.0034,
+      "step": 4700
+    },
+    {
+      "epoch": 8.26,
+      "grad_norm": 0.1567939668893814,
+      "learning_rate": 8.000000000000001e-07,
+      "loss": 0.0027,
+      "step": 4800
+    },
+    {
+      "epoch": 8.43,
+      "grad_norm": 0.11498889327049255,
+      "learning_rate": 4.0000000000000003e-07,
+      "loss": 0.0015,
+      "step": 4900
+    },
+    {
+      "epoch": 8.61,
+      "grad_norm": 0.09903218597173691,
+      "learning_rate": 0.0,
+      "loss": 0.0017,
+      "step": 5000
+    },
+    {
+      "epoch": 8.61,
+      "eval_loss": 0.4493824243545532,
+      "eval_runtime": 84.77,
+      "eval_samples_per_second": 11.797,
+      "eval_steps_per_second": 2.949,
+      "step": 5000
+    }
+  ],
+  "logging_steps": 100,
+  "max_steps": 5000,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 9,
+  "save_steps": 2500,
+  "total_flos": 1.258569996863275e+18,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:42f576a53716a0435df09b5afe87ec075766f880338fac95b5faf5bd7ec56a3c
+size 4856

training_logs.txt ADDED Viewed

	@@ -0,0 +1,124 @@

+[train dset len] 18590
+[valid dset len] 1000
+/usr0/home/liangzel/anaconda3/envs/air2/lib/python3.11/site-packages/accelerate/accelerator.py:432:
+FutureWarning: Passing the following arguments to `Accelerator` is deprecated
+and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches',
+'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass an
+`accelerate.DataLoaderConfiguration` instead:
+dataloader_config = DataLoaderConfiguration(dispatch_batches=None,
+split_batches=False, even_batches=True, use_seedable_sampler=True)
+  warnings.warn(
+  {'loss': 0.4163, 'grad_norm': 1.0365092754364014, 'learning_rate':
+  1.9600000000000002e-05, 'epoch': 0.17}
+    3%|██▍
+    | 164/5000 [19:31<9:34:34,  7.13s/it]Token indices sequence length is longer
+    than the specified maximum sequence length for this model (24682 > 16384).
+    Running this sequence through the model will result in indexing errors
+    {'loss': 0.3793, 'grad_norm': 1.212004542350769, 'learning_rate':
+    1.9200000000000003e-05, 'epoch': 0.34}
+    {'loss': 0.3682, 'grad_norm': 1.066266417503357, 'learning_rate': 1.88e-05,
+    'epoch': 0.52}
+    {'loss': 0.3536, 'grad_norm': 1.322099208831787, 'learning_rate':
+    1.8400000000000003e-05, 'epoch': 0.69}
+    {'loss': 0.3282, 'grad_norm': 0.998599648475647, 'learning_rate': 1.8e-05,
+    'epoch': 0.86}
+    {'loss': 0.3092, 'grad_norm': 1.5098826885223389, 'learning_rate': 1.76e-05,
+    'epoch': 1.03}
+    {'loss': 0.2248, 'grad_norm': 1.05723237991333, 'learning_rate': 1.72e-05,
+    'epoch': 1.2}
+    {'loss': 0.2239, 'grad_norm': 1.0882526636123657, 'learning_rate':
+    1.6800000000000002e-05, 'epoch': 1.38}
+    {'loss': 0.2342, 'grad_norm': 1.1547547578811646, 'learning_rate': 1.64e-05,
+    'epoch': 1.55}
+    {'loss': 0.2158, 'grad_norm': 1.1294739246368408, 'learning_rate':
+    1.6000000000000003e-05, 'epoch': 1.72}
+    {'loss': 0.2096, 'grad_norm': 0.9624162912368774, 'learning_rate':
+    1.5600000000000003e-05, 'epoch': 1.89}
+    {'loss': 0.1787, 'grad_norm': 1.1864293813705444, 'learning_rate':
+    1.5200000000000002e-05, 'epoch': 2.07}
+    {'loss': 0.1246, 'grad_norm': 1.1997874975204468, 'learning_rate': 1.48e-05,
+    'epoch': 2.24}
+    {'loss': 0.1197, 'grad_norm': 1.2120954990386963, 'learning_rate':
+    1.4400000000000001e-05, 'epoch': 2.41}
+    {'loss': 0.1185, 'grad_norm': 0.6992385983467102, 'learning_rate': 1.4e-05,
+    'epoch': 2.58}
+    {'loss': 0.1241, 'grad_norm': 1.0601509809494019, 'learning_rate':
+    1.3600000000000002e-05, 'epoch': 2.75}
+    {'loss': 0.1282, 'grad_norm': 1.1058382987976074, 'learning_rate':
+    1.3200000000000002e-05, 'epoch': 2.93}
+    {'loss': 0.0847, 'grad_norm': 1.1598687171936035, 'learning_rate':
+    1.2800000000000001e-05, 'epoch': 3.1}
+    {'loss': 0.0616, 'grad_norm': 1.2096168994903564, 'learning_rate':
+    1.2400000000000002e-05, 'epoch': 3.27}
+    {'loss': 0.0645, 'grad_norm': 1.5343897342681885, 'learning_rate': 1.2e-05,
+    'epoch': 3.44}
+    {'loss': 0.0652, 'grad_norm': 1.165819764137268, 'learning_rate': 1.16e-05,
+    'epoch': 3.61}
+    {'loss': 0.0619, 'grad_norm': 1.3763171434402466, 'learning_rate':
+    1.1200000000000001e-05, 'epoch': 3.79}
+    {'loss': 0.0612, 'grad_norm': 0.9929534792900085, 'learning_rate':
+    1.0800000000000002e-05, 'epoch': 3.96}
+    {'loss': 0.038, 'grad_norm': 1.1144566535949707, 'learning_rate': 1.04e-05,
+    'epoch': 4.13}
+    {'loss': 0.0311, 'grad_norm': 1.150139570236206, 'learning_rate': 1e-05,
+    'epoch': 4.3}
+    {'eval_loss': 0.35374003648757935, 'eval_runtime': 84.8171,
+    'eval_samples_per_second': 11.79, 'eval_steps_per_second': 2.948, 'epoch':
+    4.3}
+    {'loss': 0.0308, 'grad_norm': 1.4293252229690552, 'learning_rate':
+    9.600000000000001e-06, 'epoch': 4.48}
+    {'loss': 0.0308, 'grad_norm': 1.1352962255477905, 'learning_rate':
+    9.200000000000002e-06, 'epoch': 4.65}
+    {'loss': 0.033, 'grad_norm': 1.0544779300689697, 'learning_rate': 8.8e-06,
+    'epoch': 4.82}
+    {'loss': 0.0318, 'grad_norm': 1.110599160194397, 'learning_rate':
+    8.400000000000001e-06, 'epoch': 4.99}
+    {'loss': 0.0147, 'grad_norm': 0.7125316262245178, 'learning_rate':
+    8.000000000000001e-06, 'epoch': 5.16}
+    {'loss': 0.0156, 'grad_norm': 0.9172051548957825, 'learning_rate':
+    7.600000000000001e-06, 'epoch': 5.34}
+    {'loss': 0.0145, 'grad_norm': 0.9805625081062317, 'learning_rate':
+    7.2000000000000005e-06, 'epoch': 5.51}
+    {'loss': 0.0149, 'grad_norm': 0.5053761601448059, 'learning_rate':
+    6.800000000000001e-06, 'epoch': 5.68}
+    {'loss': 0.0168, 'grad_norm': 1.1218398809432983, 'learning_rate':
+    6.4000000000000006e-06, 'epoch': 5.85}
+    {'loss': 0.0154, 'grad_norm': 0.3119220733642578, 'learning_rate': 6e-06,
+    'epoch': 6.02}
+    {'loss': 0.0065, 'grad_norm': 0.23416651785373688, 'learning_rate':
+    5.600000000000001e-06, 'epoch': 6.2}
+    {'loss': 0.0079, 'grad_norm': 0.6167200803756714, 'learning_rate': 5.2e-06,
+    'epoch': 6.37}
+    {'loss': 0.0067, 'grad_norm': 1.1704833507537842, 'learning_rate':
+    4.800000000000001e-06, 'epoch': 6.54}
+    {'loss': 0.0093, 'grad_norm': 0.8806678056716919, 'learning_rate': 4.4e-06,
+    'epoch': 6.71}
+    {'loss': 0.007, 'grad_norm': 0.30924132466316223, 'learning_rate':
+    4.000000000000001e-06, 'epoch': 6.88}
+    {'loss': 0.0052, 'grad_norm': 0.46306928992271423, 'learning_rate':
+    3.6000000000000003e-06, 'epoch': 7.06}
+    {'loss': 0.0042, 'grad_norm': 0.46887511014938354, 'learning_rate':
+    3.2000000000000003e-06, 'epoch': 7.23}
+    {'loss': 0.0031, 'grad_norm': 0.902063250541687, 'learning_rate':
+    2.8000000000000003e-06, 'epoch': 7.4}
+    {'loss': 0.0029, 'grad_norm': 0.1910380870103836, 'learning_rate':
+    2.4000000000000003e-06, 'epoch': 7.57}
+    {'loss': 0.0032, 'grad_norm': 0.6202380657196045, 'learning_rate':
+    2.0000000000000003e-06, 'epoch': 7.75}
+    {'loss': 0.0034, 'grad_norm': 0.5730396509170532, 'learning_rate':
+    1.6000000000000001e-06, 'epoch': 7.92}
+    {'loss': 0.0034, 'grad_norm': 0.10635427385568619, 'learning_rate':
+    1.2000000000000002e-06, 'epoch': 8.09}
+    {'loss': 0.0027, 'grad_norm': 0.1567939668893814, 'learning_rate':
+    8.000000000000001e-07, 'epoch': 8.26}
+    {'loss': 0.0015, 'grad_norm': 0.11498889327049255, 'learning_rate':
+    4.0000000000000003e-07, 'epoch': 8.43}
+    {'loss': 0.0017, 'grad_norm': 0.09903218597173691, 'learning_rate': 0.0,
+    'epoch': 8.61}
+    {'eval_loss': 0.4493824243545532, 'eval_runtime': 84.77,
+    'eval_samples_per_second': 11.797, 'eval_steps_per_second': 2.949, 'epoch':
+    8.61}
+    {'train_runtime': 35516.5263, 'train_samples_per_second': 4.505,
+    'train_steps_per_second': 0.141, 'train_loss': 0.09624048218131065, 'epoch':
+    8.61}