Instructions to use Ba2han/experimental2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Ba2han/experimental2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Ba2han/experimental2", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Ba2han/experimental2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Ba2han/experimental2", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Ba2han/experimental2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Ba2han/experimental2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Ba2han/experimental2

SGLang

How to use Ba2han/experimental2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Ba2han/experimental2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Ba2han/experimental2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Ba2han/experimental2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Unsloth Studio new

How to use Ba2han/experimental2 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Ba2han/experimental2 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Ba2han/experimental2 to start chatting

Load model with FastModel

pip install unsloth
from unsloth import FastModel
model, tokenizer = FastModel.from_pretrained(
    model_name="Ba2han/experimental2",
    max_seq_length=2048,
)

Docker Model Runner
How to use Ba2han/experimental2 with Docker Model Runner:
```
docker model run hf.co/Ba2han/experimental2
```

Ba2han commited on Apr 28

Commit

4429c11

1 Parent(s): b5cacc7

Training in progress, step 300, checkpoint

Browse files

Files changed (5) hide show

last-checkpoint/config.json +90 -0
last-checkpoint/generation_config.json +13 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer_config.json +163 -0
last-checkpoint/trainer_state.json +1084 -0

last-checkpoint/config.json ADDED Viewed

	@@ -0,0 +1,90 @@

+{
+  "architectures": [
+    "Qwen3CanonForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "auto_map": {
+    "AutoModelForCausalLM": "patch.Qwen3CanonForCausalLM"
+  },
+  "bos_token_id": 50030,
+  "dtype": "bfloat16",
+  "eos_token_id": 50031,
+  "head_dim": 64,
+  "hidden_act": "silu",
+  "hidden_size": 1024,
+  "initializer_range": 0.02,
+  "intermediate_size": 4096,
+  "layer_types": [
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention",
+    "full_attention"
+  ],
+  "max_position_embeddings": 8192,
+  "max_window_layers": 42,
+  "model_name": "test_checkpoint",
+  "model_type": "qwen3",
+  "num_attention_heads": 12,
+  "num_hidden_layers": 42,
+  "num_key_value_heads": 4,
+  "pad_token_id": 50034,
+  "qk_norm_freeze_affine": true,
+  "resid_lambda_init": 1.0,
+  "rms_norm_eps": 1e-06,
+  "rope_parameters": {
+    "rope_theta": 50000,
+    "rope_type": "default"
+  },
+  "sliding_window": null,
+  "softcap_divisor": 7.5,
+  "softcap_logits": true,
+  "softcap_scale": 23.0,
+  "softcap_shift": 5.0,
+  "tie_word_embeddings": true,
+  "transformers_version": "5.6.2",
+  "unsloth_version": "2026.4.8",
+  "use_cache": false,
+  "use_qk_norm_patch": true,
+  "use_sliding_window": false,
+  "vocab_size": 50048,
+  "x0_lambda_init": 0.1
+}

last-checkpoint/generation_config.json ADDED Viewed

	@@ -0,0 +1,13 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 50030,
+  "eos_token_id": [
+    50031
+  ],
+  "max_length": 8192,
+  "output_attentions": false,
+  "output_hidden_states": false,
+  "pad_token_id": 50034,
+  "transformers_version": "5.6.2",
+  "use_cache": false
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,163 @@

+{
+  "backend": "tokenizers",
+  "bos_token": "<|begin_of_text|>",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|end_of_text|>",
+  "is_local": true,
+  "local_files_only": false,
+  "model_input_names": [
+    "input_ids",
+    "attention_mask"
+  ],
+  "model_max_length": 8192,
+  "pad_token": "<|finetune_right_pad_id|>",
+  "padding_side": "right",
+  "tokenizer_class": "TokenizersBackend",
+  "unk_token": null,
+  "added_tokens_decoder": {
+    "50030": {
+      "content": "<|begin_of_text|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50031": {
+      "content": "<|end_of_text|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50032": {
+      "content": "<|reserved_special_token_0|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50033": {
+      "content": "<|reserved_special_token_1|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50034": {
+      "content": "<|finetune_right_pad_id|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50035": {
+      "content": "<|reserved_special_token_2|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50036": {
+      "content": "<|start_header_id|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50037": {
+      "content": "<|end_header_id|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50038": {
+      "content": "<|eom_id|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50039": {
+      "content": "<|eot_id|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50040": {
+      "content": "<|python_tag|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50041": {
+      "content": "<|reserved_special_token_3|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50042": {
+      "content": "<|reserved_special_token_4|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50043": {
+      "content": "<|reserved_special_token_5|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50044": {
+      "content": "<|reserved_special_token_6|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50045": {
+      "content": "<|reserved_special_token_7|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50046": {
+      "content": "<|reserved_special_token_8|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
+    "50047": {
+      "content": "<|reserved_special_token_9|>",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    }
+  }
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1084 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.02633678020516626,
+  "eval_steps": 957,
+  "global_step": 300,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.00017557853470110841,
+      "grad_norm": 0.30078125,
+      "learning_rate": 2.090126243625115e-05,
+      "loss": 10.960652351379395,
+      "step": 2
+    },
+    {
+      "epoch": 0.00035115706940221683,
+      "grad_norm": 0.314453125,
+      "learning_rate": 6.270378730875344e-05,
+      "loss": 10.96611499786377,
+      "step": 4
+    },
+    {
+      "epoch": 0.0005267356041033253,
+      "grad_norm": 0.337890625,
+      "learning_rate": 0.00010450631218125576,
+      "loss": 10.96183967590332,
+      "step": 6
+    },
+    {
+      "epoch": 0.0007023141388044337,
+      "grad_norm": 0.380859375,
+      "learning_rate": 0.00014630883705375804,
+      "loss": 10.948229789733887,
+      "step": 8
+    },
+    {
+      "epoch": 0.000877892673505542,
+      "grad_norm": 0.400390625,
+      "learning_rate": 0.00018811136192626034,
+      "loss": 10.914310455322266,
+      "step": 10
+    },
+    {
+      "epoch": 0.0010534712082066505,
+      "grad_norm": 0.498046875,
+      "learning_rate": 0.00022991388679876267,
+      "loss": 10.865043640136719,
+      "step": 12
+    },
+    {
+      "epoch": 0.0012290497429077588,
+      "grad_norm": 0.6171875,
+      "learning_rate": 0.0002717164116712649,
+      "loss": 10.783700942993164,
+      "step": 14
+    },
+    {
+      "epoch": 0.0014046282776088673,
+      "grad_norm": 0.8359375,
+      "learning_rate": 0.0003135189365437672,
+      "loss": 10.673444747924805,
+      "step": 16
+    },
+    {
+      "epoch": 0.0015802068123099756,
+      "grad_norm": 0.81640625,
+      "learning_rate": 0.0003553214614162695,
+      "loss": 10.533811569213867,
+      "step": 18
+    },
+    {
+      "epoch": 0.001755785347011084,
+      "grad_norm": 0.890625,
+      "learning_rate": 0.00039712398628877183,
+      "loss": 10.36827564239502,
+      "step": 20
+    },
+    {
+      "epoch": 0.0019313638817121926,
+      "grad_norm": 1.0,
+      "learning_rate": 0.0004389265111612742,
+      "loss": 10.181196212768555,
+      "step": 22
+    },
+    {
+      "epoch": 0.002106942416413301,
+      "grad_norm": 0.90625,
+      "learning_rate": 0.0004807290360337765,
+      "loss": 9.960575103759766,
+      "step": 24
+    },
+    {
+      "epoch": 0.002282520951114409,
+      "grad_norm": 0.97265625,
+      "learning_rate": 0.0005225315609062787,
+      "loss": 9.699069023132324,
+      "step": 26
+    },
+    {
+      "epoch": 0.0024580994858155176,
+      "grad_norm": 1.1328125,
+      "learning_rate": 0.000564334085778781,
+      "loss": 9.424476623535156,
+      "step": 28
+    },
+    {
+      "epoch": 0.002633678020516626,
+      "grad_norm": 1.03125,
+      "learning_rate": 0.0006061366106512833,
+      "loss": 9.146112442016602,
+      "step": 30
+    },
+    {
+      "epoch": 0.0028092565552177346,
+      "grad_norm": 0.93359375,
+      "learning_rate": 0.0006479391355237856,
+      "loss": 8.853119850158691,
+      "step": 32
+    },
+    {
+      "epoch": 0.002984835089918843,
+      "grad_norm": 0.828125,
+      "learning_rate": 0.0006897416603962879,
+      "loss": 8.57155704498291,
+      "step": 34
+    },
+    {
+      "epoch": 0.003160413624619951,
+      "grad_norm": 0.625,
+      "learning_rate": 0.0007315441852687902,
+      "loss": 8.293695449829102,
+      "step": 36
+    },
+    {
+      "epoch": 0.0033359921593210597,
+      "grad_norm": 0.5078125,
+      "learning_rate": 0.0007733467101412925,
+      "loss": 8.079232215881348,
+      "step": 38
+    },
+    {
+      "epoch": 0.003511570694022168,
+      "grad_norm": 0.52734375,
+      "learning_rate": 0.0008151492350137948,
+      "loss": 7.869166851043701,
+      "step": 40
+    },
+    {
+      "epoch": 0.0036871492287232767,
+      "grad_norm": 0.486328125,
+      "learning_rate": 0.0008569517598862971,
+      "loss": 7.719941139221191,
+      "step": 42
+    },
+    {
+      "epoch": 0.003862727763424385,
+      "grad_norm": 0.33984375,
+      "learning_rate": 0.0008987542847587994,
+      "loss": 7.580569744110107,
+      "step": 44
+    },
+    {
+      "epoch": 0.004038306298125493,
+      "grad_norm": 0.29296875,
+      "learning_rate": 0.0009405568096313017,
+      "loss": 7.442490100860596,
+      "step": 46
+    },
+    {
+      "epoch": 0.004213884832826602,
+      "grad_norm": 0.33203125,
+      "learning_rate": 0.0009823593345038041,
+      "loss": 7.3093085289001465,
+      "step": 48
+    },
+    {
+      "epoch": 0.00438946336752771,
+      "grad_norm": 0.2353515625,
+      "learning_rate": 0.0010241618593763062,
+      "loss": 7.1847639083862305,
+      "step": 50
+    },
+    {
+      "epoch": 0.004565041902228818,
+      "grad_norm": 0.2275390625,
+      "learning_rate": 0.0010659643842488085,
+      "loss": 7.069242000579834,
+      "step": 52
+    },
+    {
+      "epoch": 0.004740620436929927,
+      "grad_norm": 0.30859375,
+      "learning_rate": 0.0011077669091213108,
+      "loss": 6.970668792724609,
+      "step": 54
+    },
+    {
+      "epoch": 0.004916198971631035,
+      "grad_norm": 0.291015625,
+      "learning_rate": 0.0011495694339938131,
+      "loss": 6.86820125579834,
+      "step": 56
+    },
+    {
+      "epoch": 0.005091777506332144,
+      "grad_norm": 0.34375,
+      "learning_rate": 0.0011913719588663154,
+      "loss": 6.78927755355835,
+      "step": 58
+    },
+    {
+      "epoch": 0.005267356041033252,
+      "grad_norm": 0.318359375,
+      "learning_rate": 0.0012331744837388177,
+      "loss": 6.671013832092285,
+      "step": 60
+    },
+    {
+      "epoch": 0.00544293457573436,
+      "grad_norm": 0.328125,
+      "learning_rate": 0.00127497700861132,
+      "loss": 6.608604431152344,
+      "step": 62
+    },
+    {
+      "epoch": 0.005618513110435469,
+      "grad_norm": 0.359375,
+      "learning_rate": 0.0013167795334838223,
+      "loss": 6.491241455078125,
+      "step": 64
+    },
+    {
+      "epoch": 0.005794091645136577,
+      "grad_norm": 0.38671875,
+      "learning_rate": 0.0013585820583563247,
+      "loss": 6.4354963302612305,
+      "step": 66
+    },
+    {
+      "epoch": 0.005969670179837686,
+      "grad_norm": 0.34375,
+      "learning_rate": 0.001400384583228827,
+      "loss": 6.354619979858398,
+      "step": 68
+    },
+    {
+      "epoch": 0.006145248714538794,
+      "grad_norm": 0.373046875,
+      "learning_rate": 0.0014421871081013293,
+      "loss": 6.282101631164551,
+      "step": 70
+    },
+    {
+      "epoch": 0.006320827249239902,
+      "grad_norm": 0.37109375,
+      "learning_rate": 0.0014839896329738316,
+      "loss": 6.176161766052246,
+      "step": 72
+    },
+    {
+      "epoch": 0.006496405783941011,
+      "grad_norm": 0.408203125,
+      "learning_rate": 0.0015257921578463339,
+      "loss": 6.122098922729492,
+      "step": 74
+    },
+    {
+      "epoch": 0.006671984318642119,
+      "grad_norm": 0.37109375,
+      "learning_rate": 0.001567594682718836,
+      "loss": 6.045535564422607,
+      "step": 76
+    },
+    {
+      "epoch": 0.006847562853343228,
+      "grad_norm": 0.4765625,
+      "learning_rate": 0.0016093972075913385,
+      "loss": 5.991270542144775,
+      "step": 78
+    },
+    {
+      "epoch": 0.007023141388044336,
+      "grad_norm": 0.57421875,
+      "learning_rate": 0.0016511997324638406,
+      "loss": 5.9186930656433105,
+      "step": 80
+    },
+    {
+      "epoch": 0.007198719922745444,
+      "grad_norm": 0.40234375,
+      "learning_rate": 0.001693002257336343,
+      "loss": 5.854170322418213,
+      "step": 82
+    },
+    {
+      "epoch": 0.007374298457446553,
+      "grad_norm": 0.333984375,
+      "learning_rate": 0.0017348047822088452,
+      "loss": 5.790707111358643,
+      "step": 84
+    },
+    {
+      "epoch": 0.007549876992147661,
+      "grad_norm": 0.54296875,
+      "learning_rate": 0.0017766073070813477,
+      "loss": 5.740701675415039,
+      "step": 86
+    },
+    {
+      "epoch": 0.00772545552684877,
+      "grad_norm": 0.66796875,
+      "learning_rate": 0.0018184098319538498,
+      "loss": 5.699957370758057,
+      "step": 88
+    },
+    {
+      "epoch": 0.007901034061549878,
+      "grad_norm": 0.404296875,
+      "learning_rate": 0.0018602123568263523,
+      "loss": 5.671308994293213,
+      "step": 90
+    },
+    {
+      "epoch": 0.008076612596250986,
+      "grad_norm": 0.40625,
+      "learning_rate": 0.0019020148816988546,
+      "loss": 5.610586166381836,
+      "step": 92
+    },
+    {
+      "epoch": 0.008252191130952095,
+      "grad_norm": 0.39453125,
+      "learning_rate": 0.001943817406571357,
+      "loss": 5.554894924163818,
+      "step": 94
+    },
+    {
+      "epoch": 0.008427769665653204,
+      "grad_norm": 0.3359375,
+      "learning_rate": 0.001985619931443859,
+      "loss": 5.456742763519287,
+      "step": 96
+    },
+    {
+      "epoch": 0.008603348200354312,
+      "grad_norm": 0.404296875,
+      "learning_rate": 0.0020274224563163615,
+      "loss": 5.479247093200684,
+      "step": 98
+    },
+    {
+      "epoch": 0.00877892673505542,
+      "grad_norm": 0.3828125,
+      "learning_rate": 0.002069224981188864,
+      "loss": 5.427374839782715,
+      "step": 100
+    },
+    {
+      "epoch": 0.00895450526975653,
+      "grad_norm": 0.59375,
+      "learning_rate": 0.002111027506061366,
+      "loss": 5.365617752075195,
+      "step": 102
+    },
+    {
+      "epoch": 0.009130083804457637,
+      "grad_norm": 0.71875,
+      "learning_rate": 0.0021528300309338684,
+      "loss": 5.361892223358154,
+      "step": 104
+    },
+    {
+      "epoch": 0.009305662339158745,
+      "grad_norm": 0.431640625,
+      "learning_rate": 0.0021946325558063707,
+      "loss": 5.33181095123291,
+      "step": 106
+    },
+    {
+      "epoch": 0.009481240873859854,
+      "grad_norm": 0.35546875,
+      "learning_rate": 0.002236435080678873,
+      "loss": 5.2017741203308105,
+      "step": 108
+    },
+    {
+      "epoch": 0.009656819408560962,
+      "grad_norm": 0.4375,
+      "learning_rate": 0.0022782376055513753,
+      "loss": 5.1847686767578125,
+      "step": 110
+    },
+    {
+      "epoch": 0.00983239794326207,
+      "grad_norm": 0.51953125,
+      "learning_rate": 0.0023200401304238776,
+      "loss": 5.185354232788086,
+      "step": 112
+    },
+    {
+      "epoch": 0.01000797647796318,
+      "grad_norm": 0.51171875,
+      "learning_rate": 0.00236184265529638,
+      "loss": 5.12120246887207,
+      "step": 114
+    },
+    {
+      "epoch": 0.010183555012664288,
+      "grad_norm": 0.5078125,
+      "learning_rate": 0.0024036451801688822,
+      "loss": 5.126992225646973,
+      "step": 116
+    },
+    {
+      "epoch": 0.010359133547365396,
+      "grad_norm": 0.361328125,
+      "learning_rate": 0.0024454477050413845,
+      "loss": 5.073342800140381,
+      "step": 118
+    },
+    {
+      "epoch": 0.010534712082066505,
+      "grad_norm": 0.431640625,
+      "learning_rate": 0.002487250229913887,
+      "loss": 4.999391555786133,
+      "step": 120
+    },
+    {
+      "epoch": 0.010710290616767613,
+      "grad_norm": 0.546875,
+      "learning_rate": 0.002529052754786389,
+      "loss": 5.031741619110107,
+      "step": 122
+    },
+    {
+      "epoch": 0.01088586915146872,
+      "grad_norm": 0.4453125,
+      "learning_rate": 0.0025708552796588915,
+      "loss": 4.9870171546936035,
+      "step": 124
+    },
+    {
+      "epoch": 0.01106144768616983,
+      "grad_norm": 0.470703125,
+      "learning_rate": 0.0026126578045313938,
+      "loss": 4.929328441619873,
+      "step": 126
+    },
+    {
+      "epoch": 0.011237026220870939,
+      "grad_norm": 0.447265625,
+      "learning_rate": 0.002654460329403896,
+      "loss": 4.910821914672852,
+      "step": 128
+    },
+    {
+      "epoch": 0.011412604755572046,
+      "grad_norm": 0.515625,
+      "learning_rate": 0.0026962628542763984,
+      "loss": 4.8920087814331055,
+      "step": 130
+    },
+    {
+      "epoch": 0.011588183290273155,
+      "grad_norm": 0.9140625,
+      "learning_rate": 0.0027380653791489007,
+      "loss": 4.849368095397949,
+      "step": 132
+    },
+    {
+      "epoch": 0.011763761824974264,
+      "grad_norm": 0.6015625,
+      "learning_rate": 0.002779867904021403,
+      "loss": 4.815767765045166,
+      "step": 134
+    },
+    {
+      "epoch": 0.011939340359675372,
+      "grad_norm": 0.62890625,
+      "learning_rate": 0.0028216704288939053,
+      "loss": 4.809019565582275,
+      "step": 136
+    },
+    {
+      "epoch": 0.01211491889437648,
+      "grad_norm": 0.5234375,
+      "learning_rate": 0.0028634729537664076,
+      "loss": 4.747541427612305,
+      "step": 138
+    },
+    {
+      "epoch": 0.012290497429077589,
+      "grad_norm": 0.4765625,
+      "learning_rate": 0.00290527547863891,
+      "loss": 4.7378010749816895,
+      "step": 140
+    },
+    {
+      "epoch": 0.012466075963778698,
+      "grad_norm": 0.44921875,
+      "learning_rate": 0.002947078003511412,
+      "loss": 4.686991214752197,
+      "step": 142
+    },
+    {
+      "epoch": 0.012641654498479805,
+      "grad_norm": 0.52734375,
+      "learning_rate": 0.0029888805283839145,
+      "loss": 4.707539081573486,
+      "step": 144
+    },
+    {
+      "epoch": 0.012817233033180914,
+      "grad_norm": 0.49609375,
+      "learning_rate": 0.003030683053256417,
+      "loss": 4.634755611419678,
+      "step": 146
+    },
+    {
+      "epoch": 0.012992811567882023,
+      "grad_norm": 0.56640625,
+      "learning_rate": 0.0030724855781289187,
+      "loss": 4.611801624298096,
+      "step": 148
+    },
+    {
+      "epoch": 0.01316839010258313,
+      "grad_norm": 0.5703125,
+      "learning_rate": 0.003114288103001422,
+      "loss": 4.616387844085693,
+      "step": 150
+    },
+    {
+      "epoch": 0.013343968637284239,
+      "grad_norm": 0.474609375,
+      "learning_rate": 0.0031560906278739237,
+      "loss": 4.566751956939697,
+      "step": 152
+    },
+    {
+      "epoch": 0.013519547171985348,
+      "grad_norm": 0.859375,
+      "learning_rate": 0.003197893152746426,
+      "loss": 4.5823516845703125,
+      "step": 154
+    },
+    {
+      "epoch": 0.013695125706686457,
+      "grad_norm": 0.66015625,
+      "learning_rate": 0.003239695677618928,
+      "loss": 4.515060901641846,
+      "step": 156
+    },
+    {
+      "epoch": 0.013870704241387564,
+      "grad_norm": 0.58203125,
+      "learning_rate": 0.003281498202491431,
+      "loss": 4.500759124755859,
+      "step": 158
+    },
+    {
+      "epoch": 0.014046282776088673,
+      "grad_norm": 0.494140625,
+      "learning_rate": 0.003323300727363933,
+      "loss": 4.467392444610596,
+      "step": 160
+    },
+    {
+      "epoch": 0.014221861310789782,
+      "grad_norm": 0.431640625,
+      "learning_rate": 0.0033651032522364352,
+      "loss": 4.431788921356201,
+      "step": 162
+    },
+    {
+      "epoch": 0.014397439845490889,
+      "grad_norm": 0.4921875,
+      "learning_rate": 0.003406905777108937,
+      "loss": 4.449859142303467,
+      "step": 164
+    },
+    {
+      "epoch": 0.014573018380191998,
+      "grad_norm": 0.494140625,
+      "learning_rate": 0.0034487083019814403,
+      "loss": 4.380939960479736,
+      "step": 166
+    },
+    {
+      "epoch": 0.014748596914893107,
+      "grad_norm": 0.5234375,
+      "learning_rate": 0.003490510826853942,
+      "loss": 4.370035171508789,
+      "step": 168
+    },
+    {
+      "epoch": 0.014924175449594214,
+      "grad_norm": 0.51171875,
+      "learning_rate": 0.0035323133517264444,
+      "loss": 4.3310227394104,
+      "step": 170
+    },
+    {
+      "epoch": 0.015099753984295323,
+      "grad_norm": 0.51171875,
+      "learning_rate": 0.0035741158765989467,
+      "loss": 4.363381862640381,
+      "step": 172
+    },
+    {
+      "epoch": 0.015275332518996432,
+      "grad_norm": 0.404296875,
+      "learning_rate": 0.0036159184014714495,
+      "loss": 4.319387912750244,
+      "step": 174
+    },
+    {
+      "epoch": 0.01545091105369754,
+      "grad_norm": 0.46875,
+      "learning_rate": 0.0036577209263439514,
+      "loss": 4.258921146392822,
+      "step": 176
+    },
+    {
+      "epoch": 0.01562648958839865,
+      "grad_norm": 0.48828125,
+      "learning_rate": 0.0036995234512164537,
+      "loss": 4.2898406982421875,
+      "step": 178
+    },
+    {
+      "epoch": 0.015802068123099755,
+      "grad_norm": 0.671875,
+      "learning_rate": 0.003741325976088956,
+      "loss": 4.256139278411865,
+      "step": 180
+    },
+    {
+      "epoch": 0.015977646657800864,
+      "grad_norm": 0.388671875,
+      "learning_rate": 0.0037831285009614587,
+      "loss": 4.20559024810791,
+      "step": 182
+    },
+    {
+      "epoch": 0.016153225192501973,
+      "grad_norm": 0.392578125,
+      "learning_rate": 0.0038249310258339606,
+      "loss": 4.260622024536133,
+      "step": 184
+    },
+    {
+      "epoch": 0.016328803727203082,
+      "grad_norm": 0.474609375,
+      "learning_rate": 0.003866733550706463,
+      "loss": 4.154942989349365,
+      "step": 186
+    },
+    {
+      "epoch": 0.01650438226190419,
+      "grad_norm": 0.4453125,
+      "learning_rate": 0.003908536075578965,
+      "loss": 4.221053123474121,
+      "step": 188
+    },
+    {
+      "epoch": 0.0166799607966053,
+      "grad_norm": 0.39453125,
+      "learning_rate": 0.0039503386004514675,
+      "loss": 4.153572082519531,
+      "step": 190
+    },
+    {
+      "epoch": 0.01685553933130641,
+      "grad_norm": 0.3359375,
+      "learning_rate": 0.003992141125323969,
+      "loss": 4.154505729675293,
+      "step": 192
+    },
+    {
+      "epoch": 0.017031117866007514,
+      "grad_norm": 0.369140625,
+      "learning_rate": 0.004033943650196472,
+      "loss": 4.094570636749268,
+      "step": 194
+    },
+    {
+      "epoch": 0.017206696400708623,
+      "grad_norm": 0.3515625,
+      "learning_rate": 0.004075746175068974,
+      "loss": 4.124532222747803,
+      "step": 196
+    },
+    {
+      "epoch": 0.017382274935409732,
+      "grad_norm": 0.431640625,
+      "learning_rate": 0.004117548699941477,
+      "loss": 4.099862575531006,
+      "step": 198
+    },
+    {
+      "epoch": 0.01755785347011084,
+      "grad_norm": 0.40625,
+      "learning_rate": 0.0041593512248139786,
+      "loss": 4.086720943450928,
+      "step": 200
+    },
+    {
+      "epoch": 0.01773343200481195,
+      "grad_norm": 0.38671875,
+      "learning_rate": 0.004201153749686481,
+      "loss": 4.042497158050537,
+      "step": 202
+    },
+    {
+      "epoch": 0.01790901053951306,
+      "grad_norm": 0.36328125,
+      "learning_rate": 0.004242956274558983,
+      "loss": 4.051867961883545,
+      "step": 204
+    },
+    {
+      "epoch": 0.018084589074214168,
+      "grad_norm": 0.365234375,
+      "learning_rate": 0.004284758799431486,
+      "loss": 4.011574745178223,
+      "step": 206
+    },
+    {
+      "epoch": 0.018260167608915273,
+      "grad_norm": 0.37890625,
+      "learning_rate": 0.004326561324303988,
+      "loss": 4.033310890197754,
+      "step": 208
+    },
+    {
+      "epoch": 0.018435746143616382,
+      "grad_norm": 0.384765625,
+      "learning_rate": 0.0043683638491764905,
+      "loss": 4.015084743499756,
+      "step": 210
+    },
+    {
+      "epoch": 0.01861132467831749,
+      "grad_norm": 0.3125,
+      "learning_rate": 0.004410166374048992,
+      "loss": 3.971015691757202,
+      "step": 212
+    },
+    {
+      "epoch": 0.0187869032130186,
+      "grad_norm": 0.298828125,
+      "learning_rate": 0.004451968898921495,
+      "loss": 4.00208044052124,
+      "step": 214
+    },
+    {
+      "epoch": 0.01896248174771971,
+      "grad_norm": 0.375,
+      "learning_rate": 0.004493771423793997,
+      "loss": 3.9292211532592773,
+      "step": 216
+    },
+    {
+      "epoch": 0.019138060282420818,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.0045355739486665,
+      "loss": 3.9596006870269775,
+      "step": 218
+    },
+    {
+      "epoch": 0.019313638817121923,
+      "grad_norm": 0.291015625,
+      "learning_rate": 0.004577376473539002,
+      "loss": 3.935854911804199,
+      "step": 220
+    },
+    {
+      "epoch": 0.019489217351823032,
+      "grad_norm": 0.349609375,
+      "learning_rate": 0.004619178998411504,
+      "loss": 3.8975400924682617,
+      "step": 222
+    },
+    {
+      "epoch": 0.01966479588652414,
+      "grad_norm": 0.31640625,
+      "learning_rate": 0.004660981523284006,
+      "loss": 3.903998613357544,
+      "step": 224
+    },
+    {
+      "epoch": 0.01984037442122525,
+      "grad_norm": 0.39453125,
+      "learning_rate": 0.004702784048156509,
+      "loss": 3.9066410064697266,
+      "step": 226
+    },
+    {
+      "epoch": 0.02001595295592636,
+      "grad_norm": 0.36328125,
+      "learning_rate": 0.004744586573029011,
+      "loss": 3.874302864074707,
+      "step": 228
+    },
+    {
+      "epoch": 0.020191531490627468,
+      "grad_norm": 0.2734375,
+      "learning_rate": 0.0047863890979015136,
+      "loss": 3.894779920578003,
+      "step": 230
+    },
+    {
+      "epoch": 0.020367110025328577,
+      "grad_norm": 0.33203125,
+      "learning_rate": 0.004828191622774015,
+      "loss": 3.8634965419769287,
+      "step": 232
+    },
+    {
+      "epoch": 0.020542688560029682,
+      "grad_norm": 0.3125,
+      "learning_rate": 0.004869994147646518,
+      "loss": 3.846660852432251,
+      "step": 234
+    },
+    {
+      "epoch": 0.02071826709473079,
+      "grad_norm": 0.267578125,
+      "learning_rate": 0.00491179667251902,
+      "loss": 3.856271505355835,
+      "step": 236
+    },
+    {
+      "epoch": 0.0208938456294319,
+      "grad_norm": 0.30078125,
+      "learning_rate": 0.004953599197391523,
+      "loss": 3.823467969894409,
+      "step": 238
+    },
+    {
+      "epoch": 0.02106942416413301,
+      "grad_norm": 0.330078125,
+      "learning_rate": 0.004995401722264025,
+      "loss": 3.782071590423584,
+      "step": 240
+    },
+    {
+      "epoch": 0.021245002698834118,
+      "grad_norm": 0.302734375,
+      "learning_rate": 0.005,
+      "loss": 3.7924273014068604,
+      "step": 242
+    },
+    {
+      "epoch": 0.021420581233535227,
+      "grad_norm": 0.296875,
+      "learning_rate": 0.005,
+      "loss": 3.7755682468414307,
+      "step": 244
+    },
+    {
+      "epoch": 0.021596159768236336,
+      "grad_norm": 0.294921875,
+      "learning_rate": 0.005,
+      "loss": 3.7866594791412354,
+      "step": 246
+    },
+    {
+      "epoch": 0.02177173830293744,
+      "grad_norm": 0.28515625,
+      "learning_rate": 0.005,
+      "loss": 3.7925446033477783,
+      "step": 248
+    },
+    {
+      "epoch": 0.02194731683763855,
+      "grad_norm": 0.306640625,
+      "learning_rate": 0.005,
+      "loss": 3.7477009296417236,
+      "step": 250
+    },
+    {
+      "epoch": 0.02212289537233966,
+      "grad_norm": 0.271484375,
+      "learning_rate": 0.005,
+      "loss": 3.7392055988311768,
+      "step": 252
+    },
+    {
+      "epoch": 0.022298473907040768,
+      "grad_norm": 0.25,
+      "learning_rate": 0.005,
+      "loss": 3.7321717739105225,
+      "step": 254
+    },
+    {
+      "epoch": 0.022474052441741877,
+      "grad_norm": 0.28515625,
+      "learning_rate": 0.005,
+      "loss": 3.714862823486328,
+      "step": 256
+    },
+    {
+      "epoch": 0.022649630976442986,
+      "grad_norm": 0.2265625,
+      "learning_rate": 0.005,
+      "loss": 3.6854515075683594,
+      "step": 258
+    },
+    {
+      "epoch": 0.02282520951114409,
+      "grad_norm": 0.23828125,
+      "learning_rate": 0.005,
+      "loss": 3.6973862648010254,
+      "step": 260
+    },
+    {
+      "epoch": 0.0230007880458452,
+      "grad_norm": 0.255859375,
+      "learning_rate": 0.005,
+      "loss": 3.674755096435547,
+      "step": 262
+    },
+    {
+      "epoch": 0.02317636658054631,
+      "grad_norm": 0.232421875,
+      "learning_rate": 0.005,
+      "loss": 3.6436076164245605,
+      "step": 264
+    },
+    {
+      "epoch": 0.023351945115247418,
+      "grad_norm": 0.2275390625,
+      "learning_rate": 0.005,
+      "loss": 3.640627384185791,
+      "step": 266
+    },
+    {
+      "epoch": 0.023527523649948527,
+      "grad_norm": 0.2421875,
+      "learning_rate": 0.005,
+      "loss": 3.653189182281494,
+      "step": 268
+    },
+    {
+      "epoch": 0.023703102184649636,
+      "grad_norm": 0.2412109375,
+      "learning_rate": 0.005,
+      "loss": 3.6322195529937744,
+      "step": 270
+    },
+    {
+      "epoch": 0.023878680719350745,
+      "grad_norm": 0.201171875,
+      "learning_rate": 0.005,
+      "loss": 3.608694553375244,
+      "step": 272
+    },
+    {
+      "epoch": 0.02405425925405185,
+      "grad_norm": 0.224609375,
+      "learning_rate": 0.005,
+      "loss": 3.6067698001861572,
+      "step": 274
+    },
+    {
+      "epoch": 0.02422983778875296,
+      "grad_norm": 0.208984375,
+      "learning_rate": 0.005,
+      "loss": 3.6058900356292725,
+      "step": 276
+    },
+    {
+      "epoch": 0.02440541632345407,
+      "grad_norm": 0.2197265625,
+      "learning_rate": 0.005,
+      "loss": 3.6085643768310547,
+      "step": 278
+    },
+    {
+      "epoch": 0.024580994858155177,
+      "grad_norm": 0.1806640625,
+      "learning_rate": 0.005,
+      "loss": 3.5690369606018066,
+      "step": 280
+    },
+    {
+      "epoch": 0.024756573392856286,
+      "grad_norm": 0.21875,
+      "learning_rate": 0.005,
+      "loss": 3.5159733295440674,
+      "step": 282
+    },
+    {
+      "epoch": 0.024932151927557395,
+      "grad_norm": 0.2041015625,
+      "learning_rate": 0.005,
+      "loss": 3.557368755340576,
+      "step": 284
+    },
+    {
+      "epoch": 0.025107730462258504,
+      "grad_norm": 0.2197265625,
+      "learning_rate": 0.005,
+      "loss": 3.53389310836792,
+      "step": 286
+    },
+    {
+      "epoch": 0.02528330899695961,
+      "grad_norm": 0.23046875,
+      "learning_rate": 0.005,
+      "loss": 3.520721912384033,
+      "step": 288
+    },
+    {
+      "epoch": 0.02545888753166072,
+      "grad_norm": 0.2119140625,
+      "learning_rate": 0.005,
+      "loss": 3.5246963500976562,
+      "step": 290
+    },
+    {
+      "epoch": 0.025634466066361827,
+      "grad_norm": 0.193359375,
+      "learning_rate": 0.005,
+      "loss": 3.473104238510132,
+      "step": 292
+    },
+    {
+      "epoch": 0.025810044601062936,
+      "grad_norm": 0.1875,
+      "learning_rate": 0.005,
+      "loss": 3.4929323196411133,
+      "step": 294
+    },
+    {
+      "epoch": 0.025985623135764045,
+      "grad_norm": 0.2060546875,
+      "learning_rate": 0.005,
+      "loss": 3.433466911315918,
+      "step": 296
+    },
+    {
+      "epoch": 0.026161201670465154,
+      "grad_norm": 0.2373046875,
+      "learning_rate": 0.005,
+      "loss": 3.480882406234741,
+      "step": 298
+    },
+    {
+      "epoch": 0.02633678020516626,
+      "grad_norm": 0.2333984375,
+      "learning_rate": 0.005,
+      "loss": 3.4181106090545654,
+      "step": 300
+    }
+  ],
+  "logging_steps": 2,
+  "max_steps": 11961,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 300,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 5.071178624049867e+17,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}