Instructions to use Ba2han/lqd_augment with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ba2han/lqd_augment with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ba2han/lqd_augment", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ba2han/lqd_augment", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Ba2han/lqd_augment", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Ba2han/lqd_augment with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ba2han/lqd_augment"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/lqd_augment",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Ba2han/lqd_augment

SGLang

How to use Ba2han/lqd_augment with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ba2han/lqd_augment" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/lqd_augment",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ba2han/lqd_augment" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/lqd_augment",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Unsloth Studio new

How to use Ba2han/lqd_augment with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/lqd_augment to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/lqd_augment to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ba2han/lqd_augment to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Ba2han/lqd_augment",
    max_seq_length=2048,
)

Docker Model Runner
How to use Ba2han/lqd_augment with Docker Model Runner:
```
docker model run hf.co/Ba2han/lqd_augment
```

Ba2han commited on Feb 7

Commit

42fff4d

verified ·

1 Parent(s): 3b72c12

Training in progress, step 375, checkpoint

Browse files

Files changed (11) hide show

last-checkpoint/chat_template.jinja +4 -0
last-checkpoint/config.json +66 -0
last-checkpoint/generation_config.json +10 -0
last-checkpoint/model.safetensors +3 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer_config.json +22 -0
last-checkpoint/trainer_state.json +2659 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,4 @@

+{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '
+' + message['content'] + '<|im_end|>' + '
+'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
+' }}{% endif %}

last-checkpoint/config.json ADDED Viewed

	@@ -0,0 +1,66 @@

+{
+  "architectures": [
+    "Lfm2MoeForCausalLM"
+  ],
+  "auto_map": {
+    "AutoConfig": "configuration_lfm2_moe.Lfm2MoeConfig",
+    "AutoModelForCausalLM": "modeling_lfm2_moe.Lfm2MoeForCausalLM"
+  },
+  "bos_token_id": 1,
+  "conv_L_cache": 3,
+  "conv_bias": false,
+  "dtype": "bfloat16",
+  "eos_token_id": 7,
+  "hidden_size": 2048,
+  "initializer_range": 0.02,
+  "intermediate_size": 7168,
+  "layer_types": [
+    "conv",
+    "conv",
+    "full_attention",
+    "conv",
+    "conv",
+    "conv",
+    "full_attention",
+    "conv",
+    "conv",
+    "conv",
+    "full_attention",
+    "conv",
+    "conv",
+    "conv",
+    "full_attention",
+    "conv",
+    "conv",
+    "conv",
+    "full_attention",
+    "conv",
+    "conv",
+    "full_attention",
+    "conv",
+    "conv"
+  ],
+  "max_position_embeddings": 128000,
+  "model_type": "lfm2_moe",
+  "moe_intermediate_size": 1792,
+  "norm_eps": 1e-05,
+  "norm_topk_prob": true,
+  "num_attention_heads": 32,
+  "num_dense_layers": 2,
+  "num_experts": 32,
+  "num_experts_per_tok": 4,
+  "num_hidden_layers": 24,
+  "num_key_value_heads": 8,
+  "pad_token_id": 0,
+  "rope_parameters": {
+    "rope_theta": 1000000.0,
+    "rope_type": "default"
+  },
+  "routed_scaling_factor": 1.0,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.0.0",
+  "unsloth_version": "2026.1.4",
+  "use_cache": false,
+  "use_expert_bias": true,
+  "vocab_size": 65536
+}

last-checkpoint/generation_config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 1,
+  "eos_token_id": [
+    7
+  ],
+  "max_length": 128000,
+  "pad_token_id": 0,
+  "transformers_version": "5.0.0"
+}

last-checkpoint/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b6b72cf77930f6e49871825a7bc1f922793bcfcb9c02699bc735b610817fdcc3
+size 16680154224

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f9acc139e8aea2be68ada96539cfea414f4a12fcb36183ae93da05ec2f5de6f8
+size 16957053431

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1b7c72b81b0d822b6aa4a72bce960bb779e90a2897b5fc0a4b59dcb48b9b3ae5
+size 14645

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:363d5d1b00939e1cc84fa67cef40f428e133b876ac240cbbacddf195b8308ea8
+size 1465

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,22 @@

+{
+  "backend": "tokenizers",
+  "bos_token": "<|startoftext|>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "is_local": false,
+  "legacy": false,
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 1000000000000000019884624838656,
+  "model_specific_special_tokens": {},
+  "pad_token": "<|pad|>",
+  "padding_side": "right",
+  "sp_model_kwargs": {},
+  "spaces_between_special_tokens": false,
+  "tokenizer_class": "TokenizersBackend",
+  "unk_token": null,
+  "use_default_system_prompt": false,
+  "use_fast": true
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2659 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.09488866396761134,
+  "eval_steps": 500,
+  "global_step": 375,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.00025303643724696357,
+      "grad_norm": 20.5,
+      "learning_rate": 0.0,
+      "loss": 2.7386393547058105,
+      "step": 1
+    },
+    {
+      "epoch": 0.0005060728744939271,
+      "grad_norm": 16.0,
+      "learning_rate": 0.00022988505747126436,
+      "loss": 2.6319832801818848,
+      "step": 2
+    },
+    {
+      "epoch": 0.0007591093117408907,
+      "grad_norm": 23.625,
+      "learning_rate": 0.0004597701149425287,
+      "loss": 2.7548398971557617,
+      "step": 3
+    },
+    {
+      "epoch": 0.0010121457489878543,
+      "grad_norm": 13.125,
+      "learning_rate": 0.0006896551724137932,
+      "loss": 2.396958589553833,
+      "step": 4
+    },
+    {
+      "epoch": 0.0012651821862348178,
+      "grad_norm": 9.9375,
+      "learning_rate": 0.0009195402298850574,
+      "loss": 2.2186975479125977,
+      "step": 5
+    },
+    {
+      "epoch": 0.0015182186234817814,
+      "grad_norm": 7.59375,
+      "learning_rate": 0.0011494252873563218,
+      "loss": 2.168830394744873,
+      "step": 6
+    },
+    {
+      "epoch": 0.001771255060728745,
+      "grad_norm": 4.84375,
+      "learning_rate": 0.0013793103448275863,
+      "loss": 1.9716055393218994,
+      "step": 7
+    },
+    {
+      "epoch": 0.0020242914979757085,
+      "grad_norm": 2.546875,
+      "learning_rate": 0.0016091954022988506,
+      "loss": 1.8040101528167725,
+      "step": 8
+    },
+    {
+      "epoch": 0.002277327935222672,
+      "grad_norm": 1.484375,
+      "learning_rate": 0.0018390804597701149,
+      "loss": 1.6331110000610352,
+      "step": 9
+    },
+    {
+      "epoch": 0.0025303643724696357,
+      "grad_norm": 1.4296875,
+      "learning_rate": 0.0020689655172413794,
+      "loss": 1.559295892715454,
+      "step": 10
+    },
+    {
+      "epoch": 0.002783400809716599,
+      "grad_norm": 1.2265625,
+      "learning_rate": 0.0022988505747126436,
+      "loss": 1.4502407312393188,
+      "step": 11
+    },
+    {
+      "epoch": 0.003036437246963563,
+      "grad_norm": 0.90625,
+      "learning_rate": 0.0025287356321839084,
+      "loss": 1.6704494953155518,
+      "step": 12
+    },
+    {
+      "epoch": 0.003289473684210526,
+      "grad_norm": 0.84765625,
+      "learning_rate": 0.0027586206896551726,
+      "loss": 1.3471019268035889,
+      "step": 13
+    },
+    {
+      "epoch": 0.00354251012145749,
+      "grad_norm": 0.9375,
+      "learning_rate": 0.002988505747126437,
+      "loss": 1.6374365091323853,
+      "step": 14
+    },
+    {
+      "epoch": 0.0037955465587044533,
+      "grad_norm": 0.92578125,
+      "learning_rate": 0.003218390804597701,
+      "loss": 1.5217013359069824,
+      "step": 15
+    },
+    {
+      "epoch": 0.004048582995951417,
+      "grad_norm": 0.828125,
+      "learning_rate": 0.003448275862068966,
+      "loss": 1.4502065181732178,
+      "step": 16
+    },
+    {
+      "epoch": 0.00430161943319838,
+      "grad_norm": 0.75,
+      "learning_rate": 0.0036781609195402297,
+      "loss": 1.5312974452972412,
+      "step": 17
+    },
+    {
+      "epoch": 0.004554655870445344,
+      "grad_norm": 0.77734375,
+      "learning_rate": 0.003908045977011495,
+      "loss": 1.4265103340148926,
+      "step": 18
+    },
+    {
+      "epoch": 0.004807692307692308,
+      "grad_norm": 0.86328125,
+      "learning_rate": 0.004137931034482759,
+      "loss": 1.4511473178863525,
+      "step": 19
+    },
+    {
+      "epoch": 0.005060728744939271,
+      "grad_norm": 0.9296875,
+      "learning_rate": 0.0043678160919540234,
+      "loss": 1.4595309495925903,
+      "step": 20
+    },
+    {
+      "epoch": 0.005313765182186235,
+      "grad_norm": 1.0703125,
+      "learning_rate": 0.004597701149425287,
+      "loss": 1.4979453086853027,
+      "step": 21
+    },
+    {
+      "epoch": 0.005566801619433198,
+      "grad_norm": 1.171875,
+      "learning_rate": 0.004827586206896552,
+      "loss": 1.4711323976516724,
+      "step": 22
+    },
+    {
+      "epoch": 0.005819838056680162,
+      "grad_norm": 0.98046875,
+      "learning_rate": 0.005057471264367817,
+      "loss": 1.4307830333709717,
+      "step": 23
+    },
+    {
+      "epoch": 0.006072874493927126,
+      "grad_norm": 1.1015625,
+      "learning_rate": 0.0052873563218390806,
+      "loss": 1.4139049053192139,
+      "step": 24
+    },
+    {
+      "epoch": 0.006325910931174089,
+      "grad_norm": 1.2578125,
+      "learning_rate": 0.005517241379310345,
+      "loss": 1.5090482234954834,
+      "step": 25
+    },
+    {
+      "epoch": 0.006578947368421052,
+      "grad_norm": 0.87109375,
+      "learning_rate": 0.005747126436781609,
+      "loss": 1.4550621509552002,
+      "step": 26
+    },
+    {
+      "epoch": 0.0068319838056680165,
+      "grad_norm": 0.9609375,
+      "learning_rate": 0.005977011494252874,
+      "loss": 1.4417916536331177,
+      "step": 27
+    },
+    {
+      "epoch": 0.00708502024291498,
+      "grad_norm": 0.8515625,
+      "learning_rate": 0.0062068965517241385,
+      "loss": 1.3825995922088623,
+      "step": 28
+    },
+    {
+      "epoch": 0.007338056680161943,
+      "grad_norm": 0.9453125,
+      "learning_rate": 0.006436781609195402,
+      "loss": 1.369423270225525,
+      "step": 29
+    },
+    {
+      "epoch": 0.0075910931174089065,
+      "grad_norm": 0.8203125,
+      "learning_rate": 0.006666666666666666,
+      "loss": 1.3526304960250854,
+      "step": 30
+    },
+    {
+      "epoch": 0.00784412955465587,
+      "grad_norm": 0.6015625,
+      "learning_rate": 0.006896551724137932,
+      "loss": 1.3979132175445557,
+      "step": 31
+    },
+    {
+      "epoch": 0.008097165991902834,
+      "grad_norm": 0.640625,
+      "learning_rate": 0.007126436781609196,
+      "loss": 1.3506033420562744,
+      "step": 32
+    },
+    {
+      "epoch": 0.008350202429149798,
+      "grad_norm": 0.76171875,
+      "learning_rate": 0.0073563218390804595,
+      "loss": 1.4074552059173584,
+      "step": 33
+    },
+    {
+      "epoch": 0.00860323886639676,
+      "grad_norm": 0.75390625,
+      "learning_rate": 0.007586206896551724,
+      "loss": 1.4538687467575073,
+      "step": 34
+    },
+    {
+      "epoch": 0.008856275303643725,
+      "grad_norm": 0.80078125,
+      "learning_rate": 0.00781609195402299,
+      "loss": 1.4242024421691895,
+      "step": 35
+    },
+    {
+      "epoch": 0.009109311740890687,
+      "grad_norm": 0.95703125,
+      "learning_rate": 0.008045977011494253,
+      "loss": 1.5446974039077759,
+      "step": 36
+    },
+    {
+      "epoch": 0.009362348178137652,
+      "grad_norm": 0.765625,
+      "learning_rate": 0.008275862068965517,
+      "loss": 1.4819495677947998,
+      "step": 37
+    },
+    {
+      "epoch": 0.009615384615384616,
+      "grad_norm": 0.59765625,
+      "learning_rate": 0.008505747126436782,
+      "loss": 1.470993161201477,
+      "step": 38
+    },
+    {
+      "epoch": 0.009868421052631578,
+      "grad_norm": 0.609375,
+      "learning_rate": 0.008735632183908047,
+      "loss": 1.4080774784088135,
+      "step": 39
+    },
+    {
+      "epoch": 0.010121457489878543,
+      "grad_norm": 0.6484375,
+      "learning_rate": 0.00896551724137931,
+      "loss": 1.4043101072311401,
+      "step": 40
+    },
+    {
+      "epoch": 0.010374493927125507,
+      "grad_norm": 0.8125,
+      "learning_rate": 0.009195402298850575,
+      "loss": 1.4553110599517822,
+      "step": 41
+    },
+    {
+      "epoch": 0.01062753036437247,
+      "grad_norm": 0.6640625,
+      "learning_rate": 0.00942528735632184,
+      "loss": 1.4775452613830566,
+      "step": 42
+    },
+    {
+      "epoch": 0.010880566801619434,
+      "grad_norm": 0.6953125,
+      "learning_rate": 0.009655172413793104,
+      "loss": 1.4560859203338623,
+      "step": 43
+    },
+    {
+      "epoch": 0.011133603238866396,
+      "grad_norm": 0.62890625,
+      "learning_rate": 0.009885057471264369,
+      "loss": 1.3985958099365234,
+      "step": 44
+    },
+    {
+      "epoch": 0.01138663967611336,
+      "grad_norm": 0.5390625,
+      "learning_rate": 0.010114942528735633,
+      "loss": 1.4953653812408447,
+      "step": 45
+    },
+    {
+      "epoch": 0.011639676113360324,
+      "grad_norm": 0.578125,
+      "learning_rate": 0.010344827586206898,
+      "loss": 1.4428997039794922,
+      "step": 46
+    },
+    {
+      "epoch": 0.011892712550607287,
+      "grad_norm": 0.58984375,
+      "learning_rate": 0.010574712643678161,
+      "loss": 1.4267104864120483,
+      "step": 47
+    },
+    {
+      "epoch": 0.012145748987854251,
+      "grad_norm": 0.875,
+      "learning_rate": 0.010804597701149426,
+      "loss": 1.3390274047851562,
+      "step": 48
+    },
+    {
+      "epoch": 0.012398785425101215,
+      "grad_norm": 0.59375,
+      "learning_rate": 0.01103448275862069,
+      "loss": 1.2774426937103271,
+      "step": 49
+    },
+    {
+      "epoch": 0.012651821862348178,
+      "grad_norm": 0.546875,
+      "learning_rate": 0.011264367816091954,
+      "loss": 1.448158621788025,
+      "step": 50
+    },
+    {
+      "epoch": 0.012904858299595142,
+      "grad_norm": 0.5703125,
+      "learning_rate": 0.011494252873563218,
+      "loss": 1.552297830581665,
+      "step": 51
+    },
+    {
+      "epoch": 0.013157894736842105,
+      "grad_norm": 0.55859375,
+      "learning_rate": 0.011724137931034481,
+      "loss": 1.3143055438995361,
+      "step": 52
+    },
+    {
+      "epoch": 0.013410931174089069,
+      "grad_norm": 0.49609375,
+      "learning_rate": 0.011954022988505748,
+      "loss": 1.5644612312316895,
+      "step": 53
+    },
+    {
+      "epoch": 0.013663967611336033,
+      "grad_norm": 0.56640625,
+      "learning_rate": 0.012183908045977012,
+      "loss": 1.3711479902267456,
+      "step": 54
+    },
+    {
+      "epoch": 0.013917004048582995,
+      "grad_norm": 0.90625,
+      "learning_rate": 0.012413793103448277,
+      "loss": 1.5378882884979248,
+      "step": 55
+    },
+    {
+      "epoch": 0.01417004048582996,
+      "grad_norm": 0.6875,
+      "learning_rate": 0.01264367816091954,
+      "loss": 1.5556209087371826,
+      "step": 56
+    },
+    {
+      "epoch": 0.014423076923076924,
+      "grad_norm": 0.58203125,
+      "learning_rate": 0.012873563218390805,
+      "loss": 1.537410020828247,
+      "step": 57
+    },
+    {
+      "epoch": 0.014676113360323886,
+      "grad_norm": 0.60546875,
+      "learning_rate": 0.01310344827586207,
+      "loss": 1.5693731307983398,
+      "step": 58
+    },
+    {
+      "epoch": 0.01492914979757085,
+      "grad_norm": 0.59765625,
+      "learning_rate": 0.013333333333333332,
+      "loss": 1.6816911697387695,
+      "step": 59
+    },
+    {
+      "epoch": 0.015182186234817813,
+      "grad_norm": 0.61328125,
+      "learning_rate": 0.013563218390804597,
+      "loss": 1.6039944887161255,
+      "step": 60
+    },
+    {
+      "epoch": 0.015435222672064777,
+      "grad_norm": 0.66796875,
+      "learning_rate": 0.013793103448275864,
+      "loss": 1.4081966876983643,
+      "step": 61
+    },
+    {
+      "epoch": 0.01568825910931174,
+      "grad_norm": 0.6328125,
+      "learning_rate": 0.014022988505747127,
+      "loss": 1.6515611410140991,
+      "step": 62
+    },
+    {
+      "epoch": 0.015941295546558706,
+      "grad_norm": 0.64453125,
+      "learning_rate": 0.014252873563218391,
+      "loss": 1.519553780555725,
+      "step": 63
+    },
+    {
+      "epoch": 0.016194331983805668,
+      "grad_norm": 0.55078125,
+      "learning_rate": 0.014482758620689656,
+      "loss": 1.5124565362930298,
+      "step": 64
+    },
+    {
+      "epoch": 0.01644736842105263,
+      "grad_norm": 0.470703125,
+      "learning_rate": 0.014712643678160919,
+      "loss": 1.6325674057006836,
+      "step": 65
+    },
+    {
+      "epoch": 0.016700404858299597,
+      "grad_norm": 0.59765625,
+      "learning_rate": 0.014942528735632184,
+      "loss": 1.6184418201446533,
+      "step": 66
+    },
+    {
+      "epoch": 0.01695344129554656,
+      "grad_norm": 0.671875,
+      "learning_rate": 0.015172413793103448,
+      "loss": 1.662506341934204,
+      "step": 67
+    },
+    {
+      "epoch": 0.01720647773279352,
+      "grad_norm": 0.58984375,
+      "learning_rate": 0.015402298850574711,
+      "loss": 1.6800591945648193,
+      "step": 68
+    },
+    {
+      "epoch": 0.017459514170040488,
+      "grad_norm": 0.6640625,
+      "learning_rate": 0.01563218390804598,
+      "loss": 1.5753778219223022,
+      "step": 69
+    },
+    {
+      "epoch": 0.01771255060728745,
+      "grad_norm": 0.58984375,
+      "learning_rate": 0.015862068965517243,
+      "loss": 1.6762198209762573,
+      "step": 70
+    },
+    {
+      "epoch": 0.017965587044534412,
+      "grad_norm": 0.474609375,
+      "learning_rate": 0.016091954022988506,
+      "loss": 1.5733668804168701,
+      "step": 71
+    },
+    {
+      "epoch": 0.018218623481781375,
+      "grad_norm": 0.4921875,
+      "learning_rate": 0.016321839080459772,
+      "loss": 1.7303202152252197,
+      "step": 72
+    },
+    {
+      "epoch": 0.01847165991902834,
+      "grad_norm": 0.796875,
+      "learning_rate": 0.016551724137931035,
+      "loss": 1.7469077110290527,
+      "step": 73
+    },
+    {
+      "epoch": 0.018724696356275303,
+      "grad_norm": 0.5390625,
+      "learning_rate": 0.016781609195402298,
+      "loss": 1.6346166133880615,
+      "step": 74
+    },
+    {
+      "epoch": 0.018977732793522266,
+      "grad_norm": 0.546875,
+      "learning_rate": 0.017011494252873564,
+      "loss": 1.8797675371170044,
+      "step": 75
+    },
+    {
+      "epoch": 0.019230769230769232,
+      "grad_norm": 0.51953125,
+      "learning_rate": 0.017241379310344827,
+      "loss": 1.6815588474273682,
+      "step": 76
+    },
+    {
+      "epoch": 0.019483805668016194,
+      "grad_norm": 0.59375,
+      "learning_rate": 0.017471264367816094,
+      "loss": 1.5040360689163208,
+      "step": 77
+    },
+    {
+      "epoch": 0.019736842105263157,
+      "grad_norm": 0.51171875,
+      "learning_rate": 0.017701149425287357,
+      "loss": 1.80476975440979,
+      "step": 78
+    },
+    {
+      "epoch": 0.019989878542510123,
+      "grad_norm": 2.390625,
+      "learning_rate": 0.01793103448275862,
+      "loss": 1.929945945739746,
+      "step": 79
+    },
+    {
+      "epoch": 0.020242914979757085,
+      "grad_norm": 0.400390625,
+      "learning_rate": 0.018160919540229886,
+      "loss": 1.7132759094238281,
+      "step": 80
+    },
+    {
+      "epoch": 0.020495951417004048,
+      "grad_norm": 1.015625,
+      "learning_rate": 0.01839080459770115,
+      "loss": 1.8109431266784668,
+      "step": 81
+    },
+    {
+      "epoch": 0.020748987854251014,
+      "grad_norm": 0.5625,
+      "learning_rate": 0.018620689655172412,
+      "loss": 1.807767629623413,
+      "step": 82
+    },
+    {
+      "epoch": 0.021002024291497976,
+      "grad_norm": 0.6328125,
+      "learning_rate": 0.01885057471264368,
+      "loss": 1.6236469745635986,
+      "step": 83
+    },
+    {
+      "epoch": 0.02125506072874494,
+      "grad_norm": 0.5234375,
+      "learning_rate": 0.01908045977011494,
+      "loss": 1.6389708518981934,
+      "step": 84
+    },
+    {
+      "epoch": 0.0215080971659919,
+      "grad_norm": 0.52734375,
+      "learning_rate": 0.019310344827586208,
+      "loss": 1.7953280210494995,
+      "step": 85
+    },
+    {
+      "epoch": 0.021761133603238867,
+      "grad_norm": 0.515625,
+      "learning_rate": 0.01954022988505747,
+      "loss": 1.8677895069122314,
+      "step": 86
+    },
+    {
+      "epoch": 0.02201417004048583,
+      "grad_norm": 0.50390625,
+      "learning_rate": 0.019770114942528737,
+      "loss": 1.8045387268066406,
+      "step": 87
+    },
+    {
+      "epoch": 0.022267206477732792,
+      "grad_norm": 0.58984375,
+      "learning_rate": 0.02,
+      "loss": 1.9091248512268066,
+      "step": 88
+    },
+    {
+      "epoch": 0.022520242914979758,
+      "grad_norm": 0.609375,
+      "learning_rate": 0.02,
+      "loss": 1.808335304260254,
+      "step": 89
+    },
+    {
+      "epoch": 0.02277327935222672,
+      "grad_norm": 0.5546875,
+      "learning_rate": 0.02,
+      "loss": 1.8608583211898804,
+      "step": 90
+    },
+    {
+      "epoch": 0.023026315789473683,
+      "grad_norm": 0.54296875,
+      "learning_rate": 0.02,
+      "loss": 1.9061346054077148,
+      "step": 91
+    },
+    {
+      "epoch": 0.02327935222672065,
+      "grad_norm": 0.421875,
+      "learning_rate": 0.02,
+      "loss": 1.9825823307037354,
+      "step": 92
+    },
+    {
+      "epoch": 0.02353238866396761,
+      "grad_norm": 0.4609375,
+      "learning_rate": 0.02,
+      "loss": 1.8379669189453125,
+      "step": 93
+    },
+    {
+      "epoch": 0.023785425101214574,
+      "grad_norm": 0.494140625,
+      "learning_rate": 0.02,
+      "loss": 1.6752934455871582,
+      "step": 94
+    },
+    {
+      "epoch": 0.02403846153846154,
+      "grad_norm": 0.46484375,
+      "learning_rate": 0.02,
+      "loss": 1.7464494705200195,
+      "step": 95
+    },
+    {
+      "epoch": 0.024291497975708502,
+      "grad_norm": 0.453125,
+      "learning_rate": 0.02,
+      "loss": 1.8795886039733887,
+      "step": 96
+    },
+    {
+      "epoch": 0.024544534412955465,
+      "grad_norm": 0.419921875,
+      "learning_rate": 0.02,
+      "loss": 1.8445587158203125,
+      "step": 97
+    },
+    {
+      "epoch": 0.02479757085020243,
+      "grad_norm": 0.40625,
+      "learning_rate": 0.02,
+      "loss": 1.842057228088379,
+      "step": 98
+    },
+    {
+      "epoch": 0.025050607287449393,
+      "grad_norm": 0.421875,
+      "learning_rate": 0.02,
+      "loss": 1.6107308864593506,
+      "step": 99
+    },
+    {
+      "epoch": 0.025303643724696356,
+      "grad_norm": 0.392578125,
+      "learning_rate": 0.02,
+      "loss": 1.6814723014831543,
+      "step": 100
+    },
+    {
+      "epoch": 0.025556680161943318,
+      "grad_norm": 0.404296875,
+      "learning_rate": 0.02,
+      "loss": 1.706100583076477,
+      "step": 101
+    },
+    {
+      "epoch": 0.025809716599190284,
+      "grad_norm": 0.443359375,
+      "learning_rate": 0.02,
+      "loss": 1.8581061363220215,
+      "step": 102
+    },
+    {
+      "epoch": 0.026062753036437247,
+      "grad_norm": 0.419921875,
+      "learning_rate": 0.02,
+      "loss": 1.7351648807525635,
+      "step": 103
+    },
+    {
+      "epoch": 0.02631578947368421,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.02,
+      "loss": 1.9900397062301636,
+      "step": 104
+    },
+    {
+      "epoch": 0.026568825910931175,
+      "grad_norm": 0.38671875,
+      "learning_rate": 0.02,
+      "loss": 1.871990442276001,
+      "step": 105
+    },
+    {
+      "epoch": 0.026821862348178137,
+      "grad_norm": 0.400390625,
+      "learning_rate": 0.02,
+      "loss": 1.862628698348999,
+      "step": 106
+    },
+    {
+      "epoch": 0.0270748987854251,
+      "grad_norm": 0.421875,
+      "learning_rate": 0.02,
+      "loss": 1.8539869785308838,
+      "step": 107
+    },
+    {
+      "epoch": 0.027327935222672066,
+      "grad_norm": 0.408203125,
+      "learning_rate": 0.02,
+      "loss": 1.8793188333511353,
+      "step": 108
+    },
+    {
+      "epoch": 0.02758097165991903,
+      "grad_norm": 0.3984375,
+      "learning_rate": 0.02,
+      "loss": 1.8705213069915771,
+      "step": 109
+    },
+    {
+      "epoch": 0.02783400809716599,
+      "grad_norm": 0.388671875,
+      "learning_rate": 0.02,
+      "loss": 1.911798357963562,
+      "step": 110
+    },
+    {
+      "epoch": 0.028087044534412957,
+      "grad_norm": 0.609375,
+      "learning_rate": 0.02,
+      "loss": 1.9883649349212646,
+      "step": 111
+    },
+    {
+      "epoch": 0.02834008097165992,
+      "grad_norm": 0.44921875,
+      "learning_rate": 0.02,
+      "loss": 1.8723022937774658,
+      "step": 112
+    },
+    {
+      "epoch": 0.028593117408906882,
+      "grad_norm": 0.392578125,
+      "learning_rate": 0.02,
+      "loss": 1.8594697713851929,
+      "step": 113
+    },
+    {
+      "epoch": 0.028846153846153848,
+      "grad_norm": 0.34765625,
+      "learning_rate": 0.02,
+      "loss": 1.5962891578674316,
+      "step": 114
+    },
+    {
+      "epoch": 0.02909919028340081,
+      "grad_norm": 0.384765625,
+      "learning_rate": 0.02,
+      "loss": 1.8171108961105347,
+      "step": 115
+    },
+    {
+      "epoch": 0.029352226720647773,
+      "grad_norm": 0.404296875,
+      "learning_rate": 0.02,
+      "loss": 1.7834141254425049,
+      "step": 116
+    },
+    {
+      "epoch": 0.029605263157894735,
+      "grad_norm": 0.53125,
+      "learning_rate": 0.02,
+      "loss": 1.9656782150268555,
+      "step": 117
+    },
+    {
+      "epoch": 0.0298582995951417,
+      "grad_norm": 0.361328125,
+      "learning_rate": 0.02,
+      "loss": 1.8432538509368896,
+      "step": 118
+    },
+    {
+      "epoch": 0.030111336032388664,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.02,
+      "loss": 1.8127596378326416,
+      "step": 119
+    },
+    {
+      "epoch": 0.030364372469635626,
+      "grad_norm": 0.6953125,
+      "learning_rate": 0.02,
+      "loss": 1.923449158668518,
+      "step": 120
+    },
+    {
+      "epoch": 0.030617408906882592,
+      "grad_norm": 0.36328125,
+      "learning_rate": 0.02,
+      "loss": 1.9775664806365967,
+      "step": 121
+    },
+    {
+      "epoch": 0.030870445344129555,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.02,
+      "loss": 1.9287586212158203,
+      "step": 122
+    },
+    {
+      "epoch": 0.031123481781376517,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.02,
+      "loss": 1.9073728322982788,
+      "step": 123
+    },
+    {
+      "epoch": 0.03137651821862348,
+      "grad_norm": 0.3671875,
+      "learning_rate": 0.02,
+      "loss": 1.912489891052246,
+      "step": 124
+    },
+    {
+      "epoch": 0.031629554655870445,
+      "grad_norm": 0.337890625,
+      "learning_rate": 0.02,
+      "loss": 1.906019926071167,
+      "step": 125
+    },
+    {
+      "epoch": 0.03188259109311741,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.02,
+      "loss": 1.8980028629302979,
+      "step": 126
+    },
+    {
+      "epoch": 0.03213562753036437,
+      "grad_norm": 0.3515625,
+      "learning_rate": 0.02,
+      "loss": 1.8099416494369507,
+      "step": 127
+    },
+    {
+      "epoch": 0.032388663967611336,
+      "grad_norm": 0.34765625,
+      "learning_rate": 0.02,
+      "loss": 1.753847360610962,
+      "step": 128
+    },
+    {
+      "epoch": 0.0326417004048583,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.02,
+      "loss": 1.887335181236267,
+      "step": 129
+    },
+    {
+      "epoch": 0.03289473684210526,
+      "grad_norm": 0.35546875,
+      "learning_rate": 0.02,
+      "loss": 1.8118953704833984,
+      "step": 130
+    },
+    {
+      "epoch": 0.03314777327935223,
+      "grad_norm": 0.3515625,
+      "learning_rate": 0.02,
+      "loss": 1.905268907546997,
+      "step": 131
+    },
+    {
+      "epoch": 0.03340080971659919,
+      "grad_norm": 0.302734375,
+      "learning_rate": 0.02,
+      "loss": 1.6896657943725586,
+      "step": 132
+    },
+    {
+      "epoch": 0.03365384615384615,
+      "grad_norm": 0.36328125,
+      "learning_rate": 0.02,
+      "loss": 1.8001630306243896,
+      "step": 133
+    },
+    {
+      "epoch": 0.03390688259109312,
+      "grad_norm": 0.3828125,
+      "learning_rate": 0.02,
+      "loss": 1.7924871444702148,
+      "step": 134
+    },
+    {
+      "epoch": 0.034159919028340084,
+      "grad_norm": 0.384765625,
+      "learning_rate": 0.02,
+      "loss": 1.8576805591583252,
+      "step": 135
+    },
+    {
+      "epoch": 0.03441295546558704,
+      "grad_norm": 0.388671875,
+      "learning_rate": 0.02,
+      "loss": 1.9136528968811035,
+      "step": 136
+    },
+    {
+      "epoch": 0.03466599190283401,
+      "grad_norm": 0.388671875,
+      "learning_rate": 0.02,
+      "loss": 1.8513039350509644,
+      "step": 137
+    },
+    {
+      "epoch": 0.034919028340080975,
+      "grad_norm": 0.32421875,
+      "learning_rate": 0.02,
+      "loss": 1.6870956420898438,
+      "step": 138
+    },
+    {
+      "epoch": 0.035172064777327934,
+      "grad_norm": 0.33203125,
+      "learning_rate": 0.02,
+      "loss": 1.8061823844909668,
+      "step": 139
+    },
+    {
+      "epoch": 0.0354251012145749,
+      "grad_norm": 0.44921875,
+      "learning_rate": 0.02,
+      "loss": 1.7925238609313965,
+      "step": 140
+    },
+    {
+      "epoch": 0.03567813765182186,
+      "grad_norm": 0.34765625,
+      "learning_rate": 0.02,
+      "loss": 1.9564220905303955,
+      "step": 141
+    },
+    {
+      "epoch": 0.035931174089068825,
+      "grad_norm": 0.353515625,
+      "learning_rate": 0.02,
+      "loss": 1.8622322082519531,
+      "step": 142
+    },
+    {
+      "epoch": 0.03618421052631579,
+      "grad_norm": 0.404296875,
+      "learning_rate": 0.02,
+      "loss": 1.839081883430481,
+      "step": 143
+    },
+    {
+      "epoch": 0.03643724696356275,
+      "grad_norm": 0.52734375,
+      "learning_rate": 0.02,
+      "loss": 1.8134467601776123,
+      "step": 144
+    },
+    {
+      "epoch": 0.036690283400809716,
+      "grad_norm": 0.298828125,
+      "learning_rate": 0.02,
+      "loss": 1.6601486206054688,
+      "step": 145
+    },
+    {
+      "epoch": 0.03694331983805668,
+      "grad_norm": 0.310546875,
+      "learning_rate": 0.02,
+      "loss": 1.8179130554199219,
+      "step": 146
+    },
+    {
+      "epoch": 0.03719635627530364,
+      "grad_norm": 0.302734375,
+      "learning_rate": 0.02,
+      "loss": 1.8195692300796509,
+      "step": 147
+    },
+    {
+      "epoch": 0.03744939271255061,
+      "grad_norm": 0.322265625,
+      "learning_rate": 0.02,
+      "loss": 1.8244118690490723,
+      "step": 148
+    },
+    {
+      "epoch": 0.03770242914979757,
+      "grad_norm": 0.36328125,
+      "learning_rate": 0.02,
+      "loss": 1.9877636432647705,
+      "step": 149
+    },
+    {
+      "epoch": 0.03795546558704453,
+      "grad_norm": 0.326171875,
+      "learning_rate": 0.02,
+      "loss": 1.8807326555252075,
+      "step": 150
+    },
+    {
+      "epoch": 0.0382085020242915,
+      "grad_norm": 0.341796875,
+      "learning_rate": 0.02,
+      "loss": 1.799276351928711,
+      "step": 151
+    },
+    {
+      "epoch": 0.038461538461538464,
+      "grad_norm": 0.35546875,
+      "learning_rate": 0.02,
+      "loss": 1.8706992864608765,
+      "step": 152
+    },
+    {
+      "epoch": 0.03871457489878542,
+      "grad_norm": 0.380859375,
+      "learning_rate": 0.02,
+      "loss": 1.806701898574829,
+      "step": 153
+    },
+    {
+      "epoch": 0.03896761133603239,
+      "grad_norm": 0.3828125,
+      "learning_rate": 0.02,
+      "loss": 1.9343551397323608,
+      "step": 154
+    },
+    {
+      "epoch": 0.039220647773279355,
+      "grad_norm": 0.3515625,
+      "learning_rate": 0.02,
+      "loss": 1.8837814331054688,
+      "step": 155
+    },
+    {
+      "epoch": 0.039473684210526314,
+      "grad_norm": 0.275390625,
+      "learning_rate": 0.02,
+      "loss": 1.7536430358886719,
+      "step": 156
+    },
+    {
+      "epoch": 0.03972672064777328,
+      "grad_norm": 0.3203125,
+      "learning_rate": 0.02,
+      "loss": 2.000185966491699,
+      "step": 157
+    },
+    {
+      "epoch": 0.039979757085020245,
+      "grad_norm": 0.3125,
+      "learning_rate": 0.02,
+      "loss": 1.8336604833602905,
+      "step": 158
+    },
+    {
+      "epoch": 0.040232793522267205,
+      "grad_norm": 0.298828125,
+      "learning_rate": 0.02,
+      "loss": 1.8002318143844604,
+      "step": 159
+    },
+    {
+      "epoch": 0.04048582995951417,
+      "grad_norm": 0.291015625,
+      "learning_rate": 0.02,
+      "loss": 1.9328553676605225,
+      "step": 160
+    },
+    {
+      "epoch": 0.040738866396761136,
+      "grad_norm": 0.271484375,
+      "learning_rate": 0.02,
+      "loss": 1.7373687028884888,
+      "step": 161
+    },
+    {
+      "epoch": 0.040991902834008095,
+      "grad_norm": 0.283203125,
+      "learning_rate": 0.02,
+      "loss": 1.6631128787994385,
+      "step": 162
+    },
+    {
+      "epoch": 0.04124493927125506,
+      "grad_norm": 0.275390625,
+      "learning_rate": 0.02,
+      "loss": 1.5757675170898438,
+      "step": 163
+    },
+    {
+      "epoch": 0.04149797570850203,
+      "grad_norm": 0.287109375,
+      "learning_rate": 0.02,
+      "loss": 1.7366318702697754,
+      "step": 164
+    },
+    {
+      "epoch": 0.041751012145748986,
+      "grad_norm": 0.29296875,
+      "learning_rate": 0.02,
+      "loss": 1.8514606952667236,
+      "step": 165
+    },
+    {
+      "epoch": 0.04200404858299595,
+      "grad_norm": 0.322265625,
+      "learning_rate": 0.02,
+      "loss": 1.810927391052246,
+      "step": 166
+    },
+    {
+      "epoch": 0.04225708502024292,
+      "grad_norm": 0.3125,
+      "learning_rate": 0.02,
+      "loss": 1.7555034160614014,
+      "step": 167
+    },
+    {
+      "epoch": 0.04251012145748988,
+      "grad_norm": 0.310546875,
+      "learning_rate": 0.02,
+      "loss": 1.804673433303833,
+      "step": 168
+    },
+    {
+      "epoch": 0.04276315789473684,
+      "grad_norm": 0.283203125,
+      "learning_rate": 0.02,
+      "loss": 1.8926290273666382,
+      "step": 169
+    },
+    {
+      "epoch": 0.0430161943319838,
+      "grad_norm": 0.302734375,
+      "learning_rate": 0.02,
+      "loss": 1.928168773651123,
+      "step": 170
+    },
+    {
+      "epoch": 0.04326923076923077,
+      "grad_norm": 0.28125,
+      "learning_rate": 0.02,
+      "loss": 1.9081146717071533,
+      "step": 171
+    },
+    {
+      "epoch": 0.043522267206477734,
+      "grad_norm": 0.296875,
+      "learning_rate": 0.02,
+      "loss": 1.8865680694580078,
+      "step": 172
+    },
+    {
+      "epoch": 0.04377530364372469,
+      "grad_norm": 0.29296875,
+      "learning_rate": 0.02,
+      "loss": 1.677335500717163,
+      "step": 173
+    },
+    {
+      "epoch": 0.04402834008097166,
+      "grad_norm": 0.265625,
+      "learning_rate": 0.02,
+      "loss": 1.5733604431152344,
+      "step": 174
+    },
+    {
+      "epoch": 0.044281376518218625,
+      "grad_norm": 0.265625,
+      "learning_rate": 0.02,
+      "loss": 1.7141188383102417,
+      "step": 175
+    },
+    {
+      "epoch": 0.044534412955465584,
+      "grad_norm": 0.26171875,
+      "learning_rate": 0.02,
+      "loss": 1.7283015251159668,
+      "step": 176
+    },
+    {
+      "epoch": 0.04478744939271255,
+      "grad_norm": 0.26953125,
+      "learning_rate": 0.02,
+      "loss": 1.6554383039474487,
+      "step": 177
+    },
+    {
+      "epoch": 0.045040485829959516,
+      "grad_norm": 0.326171875,
+      "learning_rate": 0.02,
+      "loss": 1.8230128288269043,
+      "step": 178
+    },
+    {
+      "epoch": 0.045293522267206475,
+      "grad_norm": 0.357421875,
+      "learning_rate": 0.02,
+      "loss": 1.8195432424545288,
+      "step": 179
+    },
+    {
+      "epoch": 0.04554655870445344,
+      "grad_norm": 0.37109375,
+      "learning_rate": 0.02,
+      "loss": 1.7151989936828613,
+      "step": 180
+    },
+    {
+      "epoch": 0.04579959514170041,
+      "grad_norm": 0.314453125,
+      "learning_rate": 0.02,
+      "loss": 1.8021490573883057,
+      "step": 181
+    },
+    {
+      "epoch": 0.046052631578947366,
+      "grad_norm": 0.50390625,
+      "learning_rate": 0.02,
+      "loss": 1.7367699146270752,
+      "step": 182
+    },
+    {
+      "epoch": 0.04630566801619433,
+      "grad_norm": 0.275390625,
+      "learning_rate": 0.02,
+      "loss": 1.862687349319458,
+      "step": 183
+    },
+    {
+      "epoch": 0.0465587044534413,
+      "grad_norm": 0.25390625,
+      "learning_rate": 0.02,
+      "loss": 1.7368934154510498,
+      "step": 184
+    },
+    {
+      "epoch": 0.04681174089068826,
+      "grad_norm": 0.28515625,
+      "learning_rate": 0.02,
+      "loss": 1.767867922782898,
+      "step": 185
+    },
+    {
+      "epoch": 0.04706477732793522,
+      "grad_norm": 0.314453125,
+      "learning_rate": 0.02,
+      "loss": 1.7828075885772705,
+      "step": 186
+    },
+    {
+      "epoch": 0.04731781376518219,
+      "grad_norm": 0.310546875,
+      "learning_rate": 0.02,
+      "loss": 1.8469303846359253,
+      "step": 187
+    },
+    {
+      "epoch": 0.04757085020242915,
+      "grad_norm": 0.4765625,
+      "learning_rate": 0.02,
+      "loss": 1.7417302131652832,
+      "step": 188
+    },
+    {
+      "epoch": 0.047823886639676114,
+      "grad_norm": 0.255859375,
+      "learning_rate": 0.02,
+      "loss": 1.767709732055664,
+      "step": 189
+    },
+    {
+      "epoch": 0.04807692307692308,
+      "grad_norm": 0.255859375,
+      "learning_rate": 0.02,
+      "loss": 1.6383213996887207,
+      "step": 190
+    },
+    {
+      "epoch": 0.04832995951417004,
+      "grad_norm": 0.263671875,
+      "learning_rate": 0.02,
+      "loss": 1.7237427234649658,
+      "step": 191
+    },
+    {
+      "epoch": 0.048582995951417005,
+      "grad_norm": 0.275390625,
+      "learning_rate": 0.02,
+      "loss": 1.851733684539795,
+      "step": 192
+    },
+    {
+      "epoch": 0.04883603238866397,
+      "grad_norm": 0.310546875,
+      "learning_rate": 0.02,
+      "loss": 1.6895689964294434,
+      "step": 193
+    },
+    {
+      "epoch": 0.04908906882591093,
+      "grad_norm": 0.390625,
+      "learning_rate": 0.02,
+      "loss": 1.9249845743179321,
+      "step": 194
+    },
+    {
+      "epoch": 0.049342105263157895,
+      "grad_norm": 0.296875,
+      "learning_rate": 0.02,
+      "loss": 1.8771311044692993,
+      "step": 195
+    },
+    {
+      "epoch": 0.04959514170040486,
+      "grad_norm": 0.2734375,
+      "learning_rate": 0.02,
+      "loss": 1.9580897092819214,
+      "step": 196
+    },
+    {
+      "epoch": 0.04984817813765182,
+      "grad_norm": 0.30859375,
+      "learning_rate": 0.02,
+      "loss": 1.920396327972412,
+      "step": 197
+    },
+    {
+      "epoch": 0.050101214574898786,
+      "grad_norm": 0.310546875,
+      "learning_rate": 0.02,
+      "loss": 1.84848952293396,
+      "step": 198
+    },
+    {
+      "epoch": 0.05035425101214575,
+      "grad_norm": 0.298828125,
+      "learning_rate": 0.02,
+      "loss": 1.8590221405029297,
+      "step": 199
+    },
+    {
+      "epoch": 0.05060728744939271,
+      "grad_norm": 0.54296875,
+      "learning_rate": 0.02,
+      "loss": 1.7710858583450317,
+      "step": 200
+    },
+    {
+      "epoch": 0.05086032388663968,
+      "grad_norm": 0.298828125,
+      "learning_rate": 0.02,
+      "loss": 1.8579610586166382,
+      "step": 201
+    },
+    {
+      "epoch": 0.051113360323886636,
+      "grad_norm": 0.27734375,
+      "learning_rate": 0.02,
+      "loss": 1.7782834768295288,
+      "step": 202
+    },
+    {
+      "epoch": 0.0513663967611336,
+      "grad_norm": 0.28125,
+      "learning_rate": 0.02,
+      "loss": 1.8796536922454834,
+      "step": 203
+    },
+    {
+      "epoch": 0.05161943319838057,
+      "grad_norm": 0.375,
+      "learning_rate": 0.02,
+      "loss": 1.708918571472168,
+      "step": 204
+    },
+    {
+      "epoch": 0.05187246963562753,
+      "grad_norm": 0.294921875,
+      "learning_rate": 0.02,
+      "loss": 1.7594828605651855,
+      "step": 205
+    },
+    {
+      "epoch": 0.05212550607287449,
+      "grad_norm": 0.25390625,
+      "learning_rate": 0.02,
+      "loss": 1.5761895179748535,
+      "step": 206
+    },
+    {
+      "epoch": 0.05237854251012146,
+      "grad_norm": 0.283203125,
+      "learning_rate": 0.02,
+      "loss": 1.765464425086975,
+      "step": 207
+    },
+    {
+      "epoch": 0.05263157894736842,
+      "grad_norm": 0.287109375,
+      "learning_rate": 0.02,
+      "loss": 1.6899892091751099,
+      "step": 208
+    },
+    {
+      "epoch": 0.052884615384615384,
+      "grad_norm": 0.255859375,
+      "learning_rate": 0.02,
+      "loss": 1.9187116622924805,
+      "step": 209
+    },
+    {
+      "epoch": 0.05313765182186235,
+      "grad_norm": 0.265625,
+      "learning_rate": 0.02,
+      "loss": 1.7201926708221436,
+      "step": 210
+    },
+    {
+      "epoch": 0.05339068825910931,
+      "grad_norm": 0.28125,
+      "learning_rate": 0.02,
+      "loss": 1.8093514442443848,
+      "step": 211
+    },
+    {
+      "epoch": 0.053643724696356275,
+      "grad_norm": 0.267578125,
+      "learning_rate": 0.02,
+      "loss": 1.644389271736145,
+      "step": 212
+    },
+    {
+      "epoch": 0.05389676113360324,
+      "grad_norm": 0.31640625,
+      "learning_rate": 0.02,
+      "loss": 1.7600808143615723,
+      "step": 213
+    },
+    {
+      "epoch": 0.0541497975708502,
+      "grad_norm": 0.28125,
+      "learning_rate": 0.02,
+      "loss": 1.8550102710723877,
+      "step": 214
+    },
+    {
+      "epoch": 0.054402834008097166,
+      "grad_norm": 0.26171875,
+      "learning_rate": 0.02,
+      "loss": 1.7401726245880127,
+      "step": 215
+    },
+    {
+      "epoch": 0.05465587044534413,
+      "grad_norm": 0.283203125,
+      "learning_rate": 0.02,
+      "loss": 1.9065459966659546,
+      "step": 216
+    },
+    {
+      "epoch": 0.05490890688259109,
+      "grad_norm": 0.275390625,
+      "learning_rate": 0.02,
+      "loss": 1.569657802581787,
+      "step": 217
+    },
+    {
+      "epoch": 0.05516194331983806,
+      "grad_norm": 0.2470703125,
+      "learning_rate": 0.02,
+      "loss": 1.883676528930664,
+      "step": 218
+    },
+    {
+      "epoch": 0.05541497975708502,
+      "grad_norm": 0.2578125,
+      "learning_rate": 0.02,
+      "loss": 1.9213581085205078,
+      "step": 219
+    },
+    {
+      "epoch": 0.05566801619433198,
+      "grad_norm": 0.302734375,
+      "learning_rate": 0.02,
+      "loss": 1.7225048542022705,
+      "step": 220
+    },
+    {
+      "epoch": 0.05592105263157895,
+      "grad_norm": 0.27734375,
+      "learning_rate": 0.02,
+      "loss": 1.687207579612732,
+      "step": 221
+    },
+    {
+      "epoch": 0.056174089068825914,
+      "grad_norm": 0.2890625,
+      "learning_rate": 0.02,
+      "loss": 1.7699179649353027,
+      "step": 222
+    },
+    {
+      "epoch": 0.05642712550607287,
+      "grad_norm": 0.28125,
+      "learning_rate": 0.02,
+      "loss": 1.8854470252990723,
+      "step": 223
+    },
+    {
+      "epoch": 0.05668016194331984,
+      "grad_norm": 0.294921875,
+      "learning_rate": 0.02,
+      "loss": 1.8999043703079224,
+      "step": 224
+    },
+    {
+      "epoch": 0.056933198380566805,
+      "grad_norm": 0.244140625,
+      "learning_rate": 0.02,
+      "loss": 1.863234043121338,
+      "step": 225
+    },
+    {
+      "epoch": 0.057186234817813764,
+      "grad_norm": 0.2265625,
+      "learning_rate": 0.02,
+      "loss": 1.7562830448150635,
+      "step": 226
+    },
+    {
+      "epoch": 0.05743927125506073,
+      "grad_norm": 0.244140625,
+      "learning_rate": 0.02,
+      "loss": 1.8581035137176514,
+      "step": 227
+    },
+    {
+      "epoch": 0.057692307692307696,
+      "grad_norm": 0.28515625,
+      "learning_rate": 0.02,
+      "loss": 1.910494089126587,
+      "step": 228
+    },
+    {
+      "epoch": 0.057945344129554655,
+      "grad_norm": 0.287109375,
+      "learning_rate": 0.02,
+      "loss": 1.5930771827697754,
+      "step": 229
+    },
+    {
+      "epoch": 0.05819838056680162,
+      "grad_norm": 0.322265625,
+      "learning_rate": 0.02,
+      "loss": 1.9999498128890991,
+      "step": 230
+    },
+    {
+      "epoch": 0.058451417004048586,
+      "grad_norm": 0.263671875,
+      "learning_rate": 0.02,
+      "loss": 1.8463115692138672,
+      "step": 231
+    },
+    {
+      "epoch": 0.058704453441295545,
+      "grad_norm": 0.255859375,
+      "learning_rate": 0.02,
+      "loss": 1.6839101314544678,
+      "step": 232
+    },
+    {
+      "epoch": 0.05895748987854251,
+      "grad_norm": 0.25390625,
+      "learning_rate": 0.02,
+      "loss": 1.6041113138198853,
+      "step": 233
+    },
+    {
+      "epoch": 0.05921052631578947,
+      "grad_norm": 0.255859375,
+      "learning_rate": 0.02,
+      "loss": 1.7488685846328735,
+      "step": 234
+    },
+    {
+      "epoch": 0.059463562753036436,
+      "grad_norm": 0.2412109375,
+      "learning_rate": 0.02,
+      "loss": 1.7088991403579712,
+      "step": 235
+    },
+    {
+      "epoch": 0.0597165991902834,
+      "grad_norm": 0.244140625,
+      "learning_rate": 0.02,
+      "loss": 1.789707899093628,
+      "step": 236
+    },
+    {
+      "epoch": 0.05996963562753036,
+      "grad_norm": 0.2470703125,
+      "learning_rate": 0.02,
+      "loss": 1.6540621519088745,
+      "step": 237
+    },
+    {
+      "epoch": 0.06022267206477733,
+      "grad_norm": 0.337890625,
+      "learning_rate": 0.02,
+      "loss": 1.831263780593872,
+      "step": 238
+    },
+    {
+      "epoch": 0.06047570850202429,
+      "grad_norm": 0.25,
+      "learning_rate": 0.02,
+      "loss": 1.7118489742279053,
+      "step": 239
+    },
+    {
+      "epoch": 0.06072874493927125,
+      "grad_norm": 0.2314453125,
+      "learning_rate": 0.02,
+      "loss": 1.7510230541229248,
+      "step": 240
+    },
+    {
+      "epoch": 0.06098178137651822,
+      "grad_norm": 0.2216796875,
+      "learning_rate": 0.02,
+      "loss": 1.8188490867614746,
+      "step": 241
+    },
+    {
+      "epoch": 0.061234817813765184,
+      "grad_norm": 0.2333984375,
+      "learning_rate": 0.02,
+      "loss": 1.7565772533416748,
+      "step": 242
+    },
+    {
+      "epoch": 0.06148785425101214,
+      "grad_norm": 0.2431640625,
+      "learning_rate": 0.02,
+      "loss": 1.8934447765350342,
+      "step": 243
+    },
+    {
+      "epoch": 0.06174089068825911,
+      "grad_norm": 0.224609375,
+      "learning_rate": 0.02,
+      "loss": 1.5972774028778076,
+      "step": 244
+    },
+    {
+      "epoch": 0.061993927125506075,
+      "grad_norm": 0.24609375,
+      "learning_rate": 0.02,
+      "loss": 1.6532171964645386,
+      "step": 245
+    },
+    {
+      "epoch": 0.062246963562753034,
+      "grad_norm": 0.236328125,
+      "learning_rate": 0.02,
+      "loss": 1.8103532791137695,
+      "step": 246
+    },
+    {
+      "epoch": 0.0625,
+      "grad_norm": 1.1015625,
+      "learning_rate": 0.02,
+      "loss": 1.623213768005371,
+      "step": 247
+    },
+    {
+      "epoch": 0.06275303643724696,
+      "grad_norm": 0.228515625,
+      "learning_rate": 0.02,
+      "loss": 1.5337111949920654,
+      "step": 248
+    },
+    {
+      "epoch": 0.06300607287449393,
+      "grad_norm": 0.26171875,
+      "learning_rate": 0.02,
+      "loss": 1.6231749057769775,
+      "step": 249
+    },
+    {
+      "epoch": 0.06325910931174089,
+      "grad_norm": 0.255859375,
+      "learning_rate": 0.02,
+      "loss": 1.5969362258911133,
+      "step": 250
+    },
+    {
+      "epoch": 0.06351214574898785,
+      "grad_norm": 0.27734375,
+      "learning_rate": 0.02,
+      "loss": 1.8024649620056152,
+      "step": 251
+    },
+    {
+      "epoch": 0.06376518218623482,
+      "grad_norm": 0.287109375,
+      "learning_rate": 0.02,
+      "loss": 1.7619102001190186,
+      "step": 252
+    },
+    {
+      "epoch": 0.06401821862348178,
+      "grad_norm": 0.25,
+      "learning_rate": 0.02,
+      "loss": 1.812684416770935,
+      "step": 253
+    },
+    {
+      "epoch": 0.06427125506072874,
+      "grad_norm": 0.388671875,
+      "learning_rate": 0.02,
+      "loss": 1.6886664628982544,
+      "step": 254
+    },
+    {
+      "epoch": 0.06452429149797571,
+      "grad_norm": 0.283203125,
+      "learning_rate": 0.02,
+      "loss": 1.746760606765747,
+      "step": 255
+    },
+    {
+      "epoch": 0.06477732793522267,
+      "grad_norm": 0.2470703125,
+      "learning_rate": 0.02,
+      "loss": 1.7671563625335693,
+      "step": 256
+    },
+    {
+      "epoch": 0.06503036437246963,
+      "grad_norm": 0.240234375,
+      "learning_rate": 0.02,
+      "loss": 1.9502842426300049,
+      "step": 257
+    },
+    {
+      "epoch": 0.0652834008097166,
+      "grad_norm": 2.53125,
+      "learning_rate": 0.02,
+      "loss": 1.60953688621521,
+      "step": 258
+    },
+    {
+      "epoch": 0.06553643724696356,
+      "grad_norm": 0.3203125,
+      "learning_rate": 0.02,
+      "loss": 1.987889289855957,
+      "step": 259
+    },
+    {
+      "epoch": 0.06578947368421052,
+      "grad_norm": 0.251953125,
+      "learning_rate": 0.02,
+      "loss": 1.867415189743042,
+      "step": 260
+    },
+    {
+      "epoch": 0.0660425101214575,
+      "grad_norm": 0.287109375,
+      "learning_rate": 0.02,
+      "loss": 1.8441483974456787,
+      "step": 261
+    },
+    {
+      "epoch": 0.06629554655870445,
+      "grad_norm": 0.259765625,
+      "learning_rate": 0.02,
+      "loss": 1.7997292280197144,
+      "step": 262
+    },
+    {
+      "epoch": 0.06654858299595141,
+      "grad_norm": 0.2490234375,
+      "learning_rate": 0.02,
+      "loss": 1.7468944787979126,
+      "step": 263
+    },
+    {
+      "epoch": 0.06680161943319839,
+      "grad_norm": 0.2197265625,
+      "learning_rate": 0.02,
+      "loss": 1.8009114265441895,
+      "step": 264
+    },
+    {
+      "epoch": 0.06705465587044535,
+      "grad_norm": 0.232421875,
+      "learning_rate": 0.02,
+      "loss": 1.5228102207183838,
+      "step": 265
+    },
+    {
+      "epoch": 0.0673076923076923,
+      "grad_norm": 0.232421875,
+      "learning_rate": 0.02,
+      "loss": 1.817479133605957,
+      "step": 266
+    },
+    {
+      "epoch": 0.06756072874493928,
+      "grad_norm": 0.2373046875,
+      "learning_rate": 0.02,
+      "loss": 1.6446462869644165,
+      "step": 267
+    },
+    {
+      "epoch": 0.06781376518218624,
+      "grad_norm": 0.25,
+      "learning_rate": 0.02,
+      "loss": 1.8258877992630005,
+      "step": 268
+    },
+    {
+      "epoch": 0.0680668016194332,
+      "grad_norm": 0.21484375,
+      "learning_rate": 0.02,
+      "loss": 1.6597766876220703,
+      "step": 269
+    },
+    {
+      "epoch": 0.06831983805668017,
+      "grad_norm": 0.2265625,
+      "learning_rate": 0.02,
+      "loss": 1.6289044618606567,
+      "step": 270
+    },
+    {
+      "epoch": 0.06857287449392713,
+      "grad_norm": 0.232421875,
+      "learning_rate": 0.02,
+      "loss": 1.7609938383102417,
+      "step": 271
+    },
+    {
+      "epoch": 0.06882591093117409,
+      "grad_norm": 0.23046875,
+      "learning_rate": 0.02,
+      "loss": 1.7204265594482422,
+      "step": 272
+    },
+    {
+      "epoch": 0.06907894736842106,
+      "grad_norm": 0.2275390625,
+      "learning_rate": 0.02,
+      "loss": 1.6945526599884033,
+      "step": 273
+    },
+    {
+      "epoch": 0.06933198380566802,
+      "grad_norm": 0.23046875,
+      "learning_rate": 0.02,
+      "loss": 1.5438995361328125,
+      "step": 274
+    },
+    {
+      "epoch": 0.06958502024291498,
+      "grad_norm": 0.28515625,
+      "learning_rate": 0.02,
+      "loss": 1.7701501846313477,
+      "step": 275
+    },
+    {
+      "epoch": 0.06983805668016195,
+      "grad_norm": 0.259765625,
+      "learning_rate": 0.02,
+      "loss": 1.6865530014038086,
+      "step": 276
+    },
+    {
+      "epoch": 0.07009109311740891,
+      "grad_norm": 0.21875,
+      "learning_rate": 0.02,
+      "loss": 1.6084356307983398,
+      "step": 277
+    },
+    {
+      "epoch": 0.07034412955465587,
+      "grad_norm": 0.224609375,
+      "learning_rate": 0.02,
+      "loss": 1.6889640092849731,
+      "step": 278
+    },
+    {
+      "epoch": 0.07059716599190283,
+      "grad_norm": 0.2333984375,
+      "learning_rate": 0.02,
+      "loss": 1.555455207824707,
+      "step": 279
+    },
+    {
+      "epoch": 0.0708502024291498,
+      "grad_norm": 0.2373046875,
+      "learning_rate": 0.02,
+      "loss": 1.728455662727356,
+      "step": 280
+    },
+    {
+      "epoch": 0.07110323886639676,
+      "grad_norm": 0.2216796875,
+      "learning_rate": 0.02,
+      "loss": 1.5139386653900146,
+      "step": 281
+    },
+    {
+      "epoch": 0.07135627530364372,
+      "grad_norm": 0.2373046875,
+      "learning_rate": 0.02,
+      "loss": 1.8783767223358154,
+      "step": 282
+    },
+    {
+      "epoch": 0.07160931174089069,
+      "grad_norm": 0.2275390625,
+      "learning_rate": 0.02,
+      "loss": 1.6216429471969604,
+      "step": 283
+    },
+    {
+      "epoch": 0.07186234817813765,
+      "grad_norm": 0.224609375,
+      "learning_rate": 0.02,
+      "loss": 1.5281383991241455,
+      "step": 284
+    },
+    {
+      "epoch": 0.07211538461538461,
+      "grad_norm": 0.2314453125,
+      "learning_rate": 0.02,
+      "loss": 1.6406452655792236,
+      "step": 285
+    },
+    {
+      "epoch": 0.07236842105263158,
+      "grad_norm": 0.25,
+      "learning_rate": 0.02,
+      "loss": 1.6633228063583374,
+      "step": 286
+    },
+    {
+      "epoch": 0.07262145748987854,
+      "grad_norm": 0.232421875,
+      "learning_rate": 0.02,
+      "loss": 1.5398434400558472,
+      "step": 287
+    },
+    {
+      "epoch": 0.0728744939271255,
+      "grad_norm": 0.24609375,
+      "learning_rate": 0.02,
+      "loss": 1.7598520517349243,
+      "step": 288
+    },
+    {
+      "epoch": 0.07312753036437247,
+      "grad_norm": 0.26171875,
+      "learning_rate": 0.02,
+      "loss": 1.6095614433288574,
+      "step": 289
+    },
+    {
+      "epoch": 0.07338056680161943,
+      "grad_norm": 0.57421875,
+      "learning_rate": 0.02,
+      "loss": 1.6392581462860107,
+      "step": 290
+    },
+    {
+      "epoch": 0.07363360323886639,
+      "grad_norm": 0.287109375,
+      "learning_rate": 0.02,
+      "loss": 1.8370375633239746,
+      "step": 291
+    },
+    {
+      "epoch": 0.07388663967611336,
+      "grad_norm": 0.216796875,
+      "learning_rate": 0.02,
+      "loss": 1.629447102546692,
+      "step": 292
+    },
+    {
+      "epoch": 0.07413967611336032,
+      "grad_norm": 0.2138671875,
+      "learning_rate": 0.02,
+      "loss": 1.8073177337646484,
+      "step": 293
+    },
+    {
+      "epoch": 0.07439271255060728,
+      "grad_norm": 0.2177734375,
+      "learning_rate": 0.02,
+      "loss": 1.7319437265396118,
+      "step": 294
+    },
+    {
+      "epoch": 0.07464574898785425,
+      "grad_norm": 0.2265625,
+      "learning_rate": 0.02,
+      "loss": 1.8226912021636963,
+      "step": 295
+    },
+    {
+      "epoch": 0.07489878542510121,
+      "grad_norm": 0.2119140625,
+      "learning_rate": 0.02,
+      "loss": 1.4903874397277832,
+      "step": 296
+    },
+    {
+      "epoch": 0.07515182186234817,
+      "grad_norm": 0.2421875,
+      "learning_rate": 0.02,
+      "loss": 1.5817227363586426,
+      "step": 297
+    },
+    {
+      "epoch": 0.07540485829959515,
+      "grad_norm": 0.2421875,
+      "learning_rate": 0.02,
+      "loss": 1.5707061290740967,
+      "step": 298
+    },
+    {
+      "epoch": 0.0756578947368421,
+      "grad_norm": 0.2275390625,
+      "learning_rate": 0.02,
+      "loss": 1.8374524116516113,
+      "step": 299
+    },
+    {
+      "epoch": 0.07591093117408906,
+      "grad_norm": 0.234375,
+      "learning_rate": 0.02,
+      "loss": 1.8860702514648438,
+      "step": 300
+    },
+    {
+      "epoch": 0.07616396761133604,
+      "grad_norm": 0.275390625,
+      "learning_rate": 0.02,
+      "loss": 1.7141436338424683,
+      "step": 301
+    },
+    {
+      "epoch": 0.076417004048583,
+      "grad_norm": 0.2412109375,
+      "learning_rate": 0.02,
+      "loss": 1.7473678588867188,
+      "step": 302
+    },
+    {
+      "epoch": 0.07667004048582995,
+      "grad_norm": 0.2177734375,
+      "learning_rate": 0.02,
+      "loss": 1.5174469947814941,
+      "step": 303
+    },
+    {
+      "epoch": 0.07692307692307693,
+      "grad_norm": 0.2158203125,
+      "learning_rate": 0.02,
+      "loss": 1.7078137397766113,
+      "step": 304
+    },
+    {
+      "epoch": 0.07717611336032389,
+      "grad_norm": 0.2236328125,
+      "learning_rate": 0.02,
+      "loss": 1.7914185523986816,
+      "step": 305
+    },
+    {
+      "epoch": 0.07742914979757085,
+      "grad_norm": 0.2197265625,
+      "learning_rate": 0.02,
+      "loss": 1.6762712001800537,
+      "step": 306
+    },
+    {
+      "epoch": 0.07768218623481782,
+      "grad_norm": 0.248046875,
+      "learning_rate": 0.02,
+      "loss": 1.8586490154266357,
+      "step": 307
+    },
+    {
+      "epoch": 0.07793522267206478,
+      "grad_norm": 0.2333984375,
+      "learning_rate": 0.02,
+      "loss": 1.640291690826416,
+      "step": 308
+    },
+    {
+      "epoch": 0.07818825910931174,
+      "grad_norm": 0.2451171875,
+      "learning_rate": 0.02,
+      "loss": 1.5084575414657593,
+      "step": 309
+    },
+    {
+      "epoch": 0.07844129554655871,
+      "grad_norm": 0.240234375,
+      "learning_rate": 0.02,
+      "loss": 1.6774191856384277,
+      "step": 310
+    },
+    {
+      "epoch": 0.07869433198380567,
+      "grad_norm": 0.2255859375,
+      "learning_rate": 0.02,
+      "loss": 1.682057499885559,
+      "step": 311
+    },
+    {
+      "epoch": 0.07894736842105263,
+      "grad_norm": 0.2265625,
+      "learning_rate": 0.02,
+      "loss": 1.8212378025054932,
+      "step": 312
+    },
+    {
+      "epoch": 0.0792004048582996,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.02,
+      "loss": 1.6554744243621826,
+      "step": 313
+    },
+    {
+      "epoch": 0.07945344129554656,
+      "grad_norm": 0.5078125,
+      "learning_rate": 0.02,
+      "loss": 1.537276268005371,
+      "step": 314
+    },
+    {
+      "epoch": 0.07970647773279352,
+      "grad_norm": 0.353515625,
+      "learning_rate": 0.02,
+      "loss": 1.8137785196304321,
+      "step": 315
+    },
+    {
+      "epoch": 0.07995951417004049,
+      "grad_norm": 0.2041015625,
+      "learning_rate": 0.02,
+      "loss": 1.6575486660003662,
+      "step": 316
+    },
+    {
+      "epoch": 0.08021255060728745,
+      "grad_norm": 0.2119140625,
+      "learning_rate": 0.02,
+      "loss": 1.6336121559143066,
+      "step": 317
+    },
+    {
+      "epoch": 0.08046558704453441,
+      "grad_norm": 0.2138671875,
+      "learning_rate": 0.02,
+      "loss": 1.642393946647644,
+      "step": 318
+    },
+    {
+      "epoch": 0.08071862348178138,
+      "grad_norm": 0.2431640625,
+      "learning_rate": 0.02,
+      "loss": 1.5971171855926514,
+      "step": 319
+    },
+    {
+      "epoch": 0.08097165991902834,
+      "grad_norm": 0.236328125,
+      "learning_rate": 0.02,
+      "loss": 1.6344141960144043,
+      "step": 320
+    },
+    {
+      "epoch": 0.0812246963562753,
+      "grad_norm": 0.224609375,
+      "learning_rate": 0.02,
+      "loss": 1.7557191848754883,
+      "step": 321
+    },
+    {
+      "epoch": 0.08147773279352227,
+      "grad_norm": 0.2314453125,
+      "learning_rate": 0.02,
+      "loss": 1.662795901298523,
+      "step": 322
+    },
+    {
+      "epoch": 0.08173076923076923,
+      "grad_norm": 0.4296875,
+      "learning_rate": 0.02,
+      "loss": 1.8351266384124756,
+      "step": 323
+    },
+    {
+      "epoch": 0.08198380566801619,
+      "grad_norm": 0.2255859375,
+      "learning_rate": 0.02,
+      "loss": 1.5431427955627441,
+      "step": 324
+    },
+    {
+      "epoch": 0.08223684210526316,
+      "grad_norm": 0.2236328125,
+      "learning_rate": 0.02,
+      "loss": 1.8382834196090698,
+      "step": 325
+    },
+    {
+      "epoch": 0.08248987854251012,
+      "grad_norm": 0.2177734375,
+      "learning_rate": 0.02,
+      "loss": 1.6902275085449219,
+      "step": 326
+    },
+    {
+      "epoch": 0.08274291497975708,
+      "grad_norm": 0.232421875,
+      "learning_rate": 0.02,
+      "loss": 1.6618757247924805,
+      "step": 327
+    },
+    {
+      "epoch": 0.08299595141700405,
+      "grad_norm": 0.2080078125,
+      "learning_rate": 0.02,
+      "loss": 1.5447195768356323,
+      "step": 328
+    },
+    {
+      "epoch": 0.08324898785425101,
+      "grad_norm": 0.205078125,
+      "learning_rate": 0.02,
+      "loss": 1.6230026483535767,
+      "step": 329
+    },
+    {
+      "epoch": 0.08350202429149797,
+      "grad_norm": 0.203125,
+      "learning_rate": 0.02,
+      "loss": 1.5954384803771973,
+      "step": 330
+    },
+    {
+      "epoch": 0.08375506072874495,
+      "grad_norm": 0.2197265625,
+      "learning_rate": 0.02,
+      "loss": 1.5904014110565186,
+      "step": 331
+    },
+    {
+      "epoch": 0.0840080971659919,
+      "grad_norm": 0.2392578125,
+      "learning_rate": 0.02,
+      "loss": 1.835031509399414,
+      "step": 332
+    },
+    {
+      "epoch": 0.08426113360323886,
+      "grad_norm": 0.21484375,
+      "learning_rate": 0.02,
+      "loss": 1.7414573431015015,
+      "step": 333
+    },
+    {
+      "epoch": 0.08451417004048584,
+      "grad_norm": 0.2314453125,
+      "learning_rate": 0.02,
+      "loss": 1.690129280090332,
+      "step": 334
+    },
+    {
+      "epoch": 0.0847672064777328,
+      "grad_norm": 0.2470703125,
+      "learning_rate": 0.02,
+      "loss": 1.8188143968582153,
+      "step": 335
+    },
+    {
+      "epoch": 0.08502024291497975,
+      "grad_norm": 0.2451171875,
+      "learning_rate": 0.02,
+      "loss": 1.6343677043914795,
+      "step": 336
+    },
+    {
+      "epoch": 0.08527327935222673,
+      "grad_norm": 0.2294921875,
+      "learning_rate": 0.02,
+      "loss": 1.6848071813583374,
+      "step": 337
+    },
+    {
+      "epoch": 0.08552631578947369,
+      "grad_norm": 0.19921875,
+      "learning_rate": 0.02,
+      "loss": 1.5910496711730957,
+      "step": 338
+    },
+    {
+      "epoch": 0.08577935222672065,
+      "grad_norm": 0.2099609375,
+      "learning_rate": 0.02,
+      "loss": 1.6693202257156372,
+      "step": 339
+    },
+    {
+      "epoch": 0.0860323886639676,
+      "grad_norm": 0.2265625,
+      "learning_rate": 0.02,
+      "loss": 1.6638545989990234,
+      "step": 340
+    },
+    {
+      "epoch": 0.08628542510121458,
+      "grad_norm": 0.2197265625,
+      "learning_rate": 0.02,
+      "loss": 1.6470742225646973,
+      "step": 341
+    },
+    {
+      "epoch": 0.08653846153846154,
+      "grad_norm": 0.216796875,
+      "learning_rate": 0.02,
+      "loss": 1.6216919422149658,
+      "step": 342
+    },
+    {
+      "epoch": 0.0867914979757085,
+      "grad_norm": 0.2255859375,
+      "learning_rate": 0.02,
+      "loss": 1.6436307430267334,
+      "step": 343
+    },
+    {
+      "epoch": 0.08704453441295547,
+      "grad_norm": 0.2265625,
+      "learning_rate": 0.02,
+      "loss": 1.5941998958587646,
+      "step": 344
+    },
+    {
+      "epoch": 0.08729757085020243,
+      "grad_norm": 0.2060546875,
+      "learning_rate": 0.02,
+      "loss": 1.6785557270050049,
+      "step": 345
+    },
+    {
+      "epoch": 0.08755060728744939,
+      "grad_norm": 0.2060546875,
+      "learning_rate": 0.02,
+      "loss": 1.6645257472991943,
+      "step": 346
+    },
+    {
+      "epoch": 0.08780364372469636,
+      "grad_norm": 0.2001953125,
+      "learning_rate": 0.02,
+      "loss": 1.625213861465454,
+      "step": 347
+    },
+    {
+      "epoch": 0.08805668016194332,
+      "grad_norm": 0.2119140625,
+      "learning_rate": 0.02,
+      "loss": 1.570495367050171,
+      "step": 348
+    },
+    {
+      "epoch": 0.08830971659919028,
+      "grad_norm": 2.484375,
+      "learning_rate": 0.02,
+      "loss": 1.7367191314697266,
+      "step": 349
+    },
+    {
+      "epoch": 0.08856275303643725,
+      "grad_norm": 0.224609375,
+      "learning_rate": 0.02,
+      "loss": 1.5887837409973145,
+      "step": 350
+    },
+    {
+      "epoch": 0.08881578947368421,
+      "grad_norm": 0.224609375,
+      "learning_rate": 0.02,
+      "loss": 1.63081693649292,
+      "step": 351
+    },
+    {
+      "epoch": 0.08906882591093117,
+      "grad_norm": 0.2421875,
+      "learning_rate": 0.02,
+      "loss": 1.5376256704330444,
+      "step": 352
+    },
+    {
+      "epoch": 0.08932186234817814,
+      "grad_norm": 0.2255859375,
+      "learning_rate": 0.02,
+      "loss": 1.5460972785949707,
+      "step": 353
+    },
+    {
+      "epoch": 0.0895748987854251,
+      "grad_norm": 0.2236328125,
+      "learning_rate": 0.02,
+      "loss": 1.5327677726745605,
+      "step": 354
+    },
+    {
+      "epoch": 0.08982793522267206,
+      "grad_norm": 0.2333984375,
+      "learning_rate": 0.02,
+      "loss": 1.70501708984375,
+      "step": 355
+    },
+    {
+      "epoch": 0.09008097165991903,
+      "grad_norm": 0.26171875,
+      "learning_rate": 0.02,
+      "loss": 1.8134093284606934,
+      "step": 356
+    },
+    {
+      "epoch": 0.09033400809716599,
+      "grad_norm": 0.234375,
+      "learning_rate": 0.02,
+      "loss": 1.509171485900879,
+      "step": 357
+    },
+    {
+      "epoch": 0.09058704453441295,
+      "grad_norm": 0.2138671875,
+      "learning_rate": 0.02,
+      "loss": 1.6045968532562256,
+      "step": 358
+    },
+    {
+      "epoch": 0.09084008097165992,
+      "grad_norm": 0.2197265625,
+      "learning_rate": 0.02,
+      "loss": 1.6716241836547852,
+      "step": 359
+    },
+    {
+      "epoch": 0.09109311740890688,
+      "grad_norm": 0.2412109375,
+      "learning_rate": 0.02,
+      "loss": 1.8341147899627686,
+      "step": 360
+    },
+    {
+      "epoch": 0.09134615384615384,
+      "grad_norm": 0.22265625,
+      "learning_rate": 0.02,
+      "loss": 1.7935357093811035,
+      "step": 361
+    },
+    {
+      "epoch": 0.09159919028340081,
+      "grad_norm": 0.2080078125,
+      "learning_rate": 0.02,
+      "loss": 1.6881043910980225,
+      "step": 362
+    },
+    {
+      "epoch": 0.09185222672064777,
+      "grad_norm": 0.21484375,
+      "learning_rate": 0.02,
+      "loss": 1.6678187847137451,
+      "step": 363
+    },
+    {
+      "epoch": 0.09210526315789473,
+      "grad_norm": 0.2265625,
+      "learning_rate": 0.02,
+      "loss": 1.6596988439559937,
+      "step": 364
+    },
+    {
+      "epoch": 0.0923582995951417,
+      "grad_norm": 0.20703125,
+      "learning_rate": 0.02,
+      "loss": 1.6384483575820923,
+      "step": 365
+    },
+    {
+      "epoch": 0.09261133603238866,
+      "grad_norm": 0.451171875,
+      "learning_rate": 0.02,
+      "loss": 1.7804160118103027,
+      "step": 366
+    },
+    {
+      "epoch": 0.09286437246963562,
+      "grad_norm": 0.2197265625,
+      "learning_rate": 0.02,
+      "loss": 1.7853267192840576,
+      "step": 367
+    },
+    {
+      "epoch": 0.0931174089068826,
+      "grad_norm": 0.251953125,
+      "learning_rate": 0.02,
+      "loss": 1.6451454162597656,
+      "step": 368
+    },
+    {
+      "epoch": 0.09337044534412955,
+      "grad_norm": 0.234375,
+      "learning_rate": 0.02,
+      "loss": 1.8762222528457642,
+      "step": 369
+    },
+    {
+      "epoch": 0.09362348178137651,
+      "grad_norm": 0.22265625,
+      "learning_rate": 0.02,
+      "loss": 1.587763786315918,
+      "step": 370
+    },
+    {
+      "epoch": 0.09387651821862349,
+      "grad_norm": 0.23828125,
+      "learning_rate": 0.02,
+      "loss": 1.9008150100708008,
+      "step": 371
+    },
+    {
+      "epoch": 0.09412955465587045,
+      "grad_norm": 0.2099609375,
+      "learning_rate": 0.02,
+      "loss": 1.3649919033050537,
+      "step": 372
+    },
+    {
+      "epoch": 0.0943825910931174,
+      "grad_norm": 0.2255859375,
+      "learning_rate": 0.02,
+      "loss": 1.6461126804351807,
+      "step": 373
+    },
+    {
+      "epoch": 0.09463562753036438,
+      "grad_norm": 0.240234375,
+      "learning_rate": 0.02,
+      "loss": 1.7606909275054932,
+      "step": 374
+    },
+    {
+      "epoch": 0.09488866396761134,
+      "grad_norm": 0.22265625,
+      "learning_rate": 0.02,
+      "loss": 1.5905332565307617,
+      "step": 375
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 4348,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 870,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": false,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 2.598754036421998e+18,
+  "train_batch_size": 14,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ff0e4d19f148cbb295eb1d6818e5f6a52d84db174c8eb93358b808d726037330
+size 5713