Instructions to use k1h0/OpenCoder-8B-Instruct-query_nlx_under8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use k1h0/OpenCoder-8B-Instruct-query_nlx_under8 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="k1h0/OpenCoder-8B-Instruct-query_nlx_under8")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("k1h0/OpenCoder-8B-Instruct-query_nlx_under8")
model = AutoModelForCausalLM.from_pretrained("k1h0/OpenCoder-8B-Instruct-query_nlx_under8")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use k1h0/OpenCoder-8B-Instruct-query_nlx_under8 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "k1h0/OpenCoder-8B-Instruct-query_nlx_under8"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "k1h0/OpenCoder-8B-Instruct-query_nlx_under8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/k1h0/OpenCoder-8B-Instruct-query_nlx_under8

SGLang

How to use k1h0/OpenCoder-8B-Instruct-query_nlx_under8 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "k1h0/OpenCoder-8B-Instruct-query_nlx_under8" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "k1h0/OpenCoder-8B-Instruct-query_nlx_under8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "k1h0/OpenCoder-8B-Instruct-query_nlx_under8" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "k1h0/OpenCoder-8B-Instruct-query_nlx_under8",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use k1h0/OpenCoder-8B-Instruct-query_nlx_under8 with Docker Model Runner:
```
docker model run hf.co/k1h0/OpenCoder-8B-Instruct-query_nlx_under8
```

k1h0 commited on Jul 10, 2025

Commit

a63fab9

verified ·

1 Parent(s): 2f98cc6

Upload folder using huggingface_hub

Browse files

Files changed (19) hide show

README.md +60 -0
added_tokens.json +44 -0
all_results.json +9 -0
config.json +36 -0
generation_config.json +6 -0
model-00001-of-00004.safetensors +3 -0
model-00002-of-00004.safetensors +3 -0
model-00003-of-00004.safetensors +3 -0
model-00004-of-00004.safetensors +3 -0
model.safetensors.index.json +298 -0
special_tokens_map.json +34 -0
tokenization_inflm.py +292 -0
tokenizer.model +3 -0
tokenizer_config.json +396 -0
train_results.json +9 -0
trainer_log.jsonl +48 -0
trainer_state.json +419 -0
training_args.bin +3 -0
training_loss.png +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,60 @@

+---
+library_name: transformers
+license: other
+base_model: infly/OpenCoder-8B-Instruct
+tags:
+- llama-factory
+- freeze
+- generated_from_trainer
+model-index:
+- name: opencoder_under8_nlx
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# opencoder_under8_nlx
+This model is a fine-tuned version of [infly/OpenCoder-8B-Instruct](https://huggingface.co/infly/OpenCoder-8B-Instruct) on the codes_nlx_under8 dataset.
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-05
+- train_batch_size: 16
+- eval_batch_size: 8
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 512
+- total_eval_batch_size: 32
+- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- num_epochs: 1.0
+### Training results
+### Framework versions
+- Transformers 4.48.2
+- Pytorch 2.5.1+cu124
+- Datasets 3.2.0
+- Tokenizers 0.21.0

added_tokens.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "<code_to_intermediate>": 96521,
+  "<empty_output>": 96520,
+  "<file_sep>": 96511,
+  "<fim_middle>": 96508,
+  "<fim_prefix>": 96507,
+  "<fim_suffix>": 96509,
+  "<intermediate_to_code>": 96522,
+  "<issue_closed>": 96514,
+  "<issue_comment>": 96513,
+  "<issue_start>": 96512,
+  "<jupyter_code>": 96517,
+  "<jupyter_output>": 96518,
+  "<jupyter_script>": 96519,
+  "<jupyter_start>": 96515,
+  "<jupyter_text>": 96516,
+  "<pr>": 96523,
+  "<pr_base>": 96526,
+  "<pr_base_code>": 96528,
+  "<pr_comment>": 96531,
+  "<pr_diff>": 96529,
+  "<pr_diff_hunk>": 96530,
+  "<pr_diff_hunk_comment_line>": 96538,
+  "<pr_event_id>": 96532,
+  "<pr_file>": 96527,
+  "<pr_in_reply_to_comment_id>": 96537,
+  "<pr_in_reply_to_review_id>": 96536,
+  "<pr_is_merged>": 96525,
+  "<pr_review>": 96533,
+  "<pr_review_comment>": 96535,
+  "<pr_review_state>": 96534,
+  "<pr_status>": 96524,
+  "<repo_name>": 96510,
+  "<|endoftext|>": 96506,
+  "<|end|>": 96500,
+  "<|im_end|>": 96539,
+  "<|im_start|>": 96540,
+  "<|message|>": 96501,
+  "<|pad|>": 96505,
+  "<|start|>": 96499,
+  "<|tool_end|>": 96504,
+  "<|tool_excute|>": 96503,
+  "<|tool_start|>": 96502
+}

all_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 0.9868766404199475,
+    "num_input_tokens_seen": 98566144,
+    "total_flos": 4.361803532655919e+18,
+    "train_loss": 0.6996991406095788,
+    "train_runtime": 8190.5414,
+    "train_samples_per_second": 2.973,
+    "train_steps_per_second": 0.006
+}

config.json ADDED Viewed

	@@ -0,0 +1,36 @@

+{
+  "_name_or_path": "infly/OpenCoder-8B-Instruct",
+  "architectures": [
+    "LlamaForCausalLM"
+  ],
+  "attention_bias": false,
+  "attention_dropout": 0.0,
+  "bos_token_id": 96540,
+  "eos_token_id": 96539,
+  "head_dim": 128,
+  "hidden_act": "silu",
+  "hidden_size": 4096,
+  "initializer_range": 0.02,
+  "intermediate_size": 14336,
+  "max_position_embeddings": 8192,
+  "mlp_bias": false,
+  "model_type": "llama",
+  "num_attention_heads": 32,
+  "num_hidden_layers": 32,
+  "num_key_value_heads": 8,
+  "pretraining_tp": 1,
+  "rms_norm_eps": 1e-05,
+  "rope_scaling": {
+    "factor": 1.0,
+    "high_freq_factor": 4.0,
+    "low_freq_factor": 1.0,
+    "original_max_position_embeddings": 8192,
+    "rope_type": "llama3"
+  },
+  "rope_theta": 500000.0,
+  "tie_word_embeddings": false,
+  "torch_dtype": "bfloat16",
+  "transformers_version": "4.48.2",
+  "use_cache": false,
+  "vocab_size": 96640
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 96540,
+  "eos_token_id": 96539,
+  "transformers_version": "4.48.2"
+}

model-00001-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:182be20097743536eb8333475d94fdb0eeea45e2d7f20103b65a61c2771ee9bf
+size 4919027568

model-00002-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7eaf7fc3833d4f0150a3d3a19d11133d4769d35dc82e0f301824097de6a12f89
+size 4915915128

model-00003-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b0a3bb567b07eca75c691d5b9832073c084ebf7b958bc01f75f306dfb5bd7875
+size 4999819112

model-00004-of-00004.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:580b1ceaa651f743f213580934e6260e830e0039bb7fa6fe409a913ad1ccc6fe
+size 1580246000

model.safetensors.index.json ADDED Viewed

	@@ -0,0 +1,298 @@

+{
+  "metadata": {
+    "total_size": 16414973952
+  },
+  "weight_map": {
+    "lm_head.weight": "model-00004-of-00004.safetensors",
+    "model.embed_tokens.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.19.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.19.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.20.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.30.input_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.31.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
+    "model.layers.31.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
+    "model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.input_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.8.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
+    "model.layers.9.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.9.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.9.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
+    "model.layers.9.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
+    "model.norm.weight": "model-00004-of-00004.safetensors"
+  }
+}

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "additional_special_tokens": [
+    "<|im_end|>",
+    "<|im_start|>"
+  ],
+  "bos_token": {
+    "content": "<|im_start|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenization_inflm.py ADDED Viewed

	@@ -0,0 +1,292 @@

+# coding=utf-8
+# Copyright 2022 EleutherAI and the HuggingFace Inc. team. All rights reserved.
+#
+# This code is based on EleutherAI's GPT-NeoX library and the GPT-NeoX
+# and OPT implementations in this library. It has been modified from its
+# original forms to accommodate minor architectural differences compared
+# to GPT-NeoX and OPT used by the Meta AI team that trained the model.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Tokenization classes for INFLMTokenizer."""
+import os
+from shutil import copyfile
+from typing import Any, Dict, List, Optional, Tuple
+import sentencepiece as spm
+from transformers.tokenization_utils import PreTrainedTokenizer
+from transformers.utils import logging
+from tokenizers import pre_tokenizers,Regex,decoders
+from tokenizers.pre_tokenizers import Digits, Split, ByteLevel
+import os
+# same as gpt4 cl-base-100k
+PATTERN = Regex("(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+\s+(\S)+")
+logger = logging.get_logger(__name__)
+VOCAB_FILES_NAMES = {"vocab_file": "./tokenizer.model"}
+PRETRAINED_VOCAB_FILES_MAP = {}
+class INFLMTokenizer(PreTrainedTokenizer):
+    """
+    Construct a INFLMTokenizer tokenizer based on sentence-piece
+    Args:
+        vocab_file (`str`):
+            Path to the vocabulary file.
+    """
+    vocab_files_names = VOCAB_FILES_NAMES
+    pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
+    model_input_names = ["input_ids", "attention_mask"]
+    _auto_class = "AutoTokenizer"
+    def __init__(
+        self,
+        vocab_file,
+        unk_token="<unk>",
+        bos_token="<s>",
+        eos_token="</s>",
+        pad_token="<pad>",
+        sp_model_kwargs: Optional[Dict[str, Any]] = None,
+        add_bos_token=False,
+        add_eos_token=False,
+        decode_with_prefix_space=False,
+        clean_up_tokenization_spaces=False,
+        spaces_between_special_tokens=False,
+        **kwargs,
+    ):
+        self.sp_model_kwargs = {} if sp_model_kwargs is None else sp_model_kwargs
+        self.vocab_file = vocab_file
+        self.add_bos_token = add_bos_token
+        self.add_eos_token = add_eos_token
+        self.decode_with_prefix_space = decode_with_prefix_space
+        self.sp_model = spm.SentencePieceProcessor(**self.sp_model_kwargs)
+        self.sp_model.Load(vocab_file)
+        self._no_prefix_space_tokens = None
+        self.pre_tokenizer = pre_tokenizers.Sequence([Split(pattern =PATTERN,behavior = "isolated", invert = False)])
+        super().__init__(
+            bos_token=bos_token,
+            eos_token=eos_token,
+            unk_token=unk_token,
+            pad_token=pad_token,
+            clean_up_tokenization_spaces=clean_up_tokenization_spaces,
+            spaces_between_special_tokens=spaces_between_special_tokens,
+            **kwargs,
+        )
+        """ Initialisation"""
+    @property
+    def no_prefix_space_tokens(self):
+        if self._no_prefix_space_tokens is None:
+            vocab = self.convert_ids_to_tokens(list(range(self.vocab_size)))
+            self._no_prefix_space_tokens = {i for i, tok in enumerate(vocab) if not tok.startswith("▁")}
+        return self._no_prefix_space_tokens
+    @property
+    def vocab_size(self):
+        """Returns vocab size"""
+        return self.sp_model.get_piece_size()
+    @property
+    def bos_token_id(self) -> Optional[int]:
+        return self.sp_model.bos_id()
+    @property
+    def eos_token_id(self) -> Optional[int]:
+        return self.sp_model.eos_id()
+    def get_vocab(self):
+        """Returns vocab as a dict"""
+        vocab = {self.convert_ids_to_tokens(i): i for i in range(self.vocab_size)}
+        vocab.update(self.added_tokens_encoder)
+        return vocab
+    def _tokenize(self, text):
+        """Returns a tokenized string."""
+        splits = self.pre_tokenizer.pre_tokenize_str(text)
+        texts=[]
+        for split in splits:
+            texts.extend(self.sp_model.encode(split[0], out_type=str))
+        return texts
+    def _convert_token_to_id(self, token):
+        """Converts a token (str) in an id using the vocab."""
+        return self.sp_model.piece_to_id(token)
+    def _convert_id_to_token(self, index):
+        """Converts an index (integer) in a token (str) using the vocab."""
+        token = self.sp_model.IdToPiece(index)
+        return token
+    def _maybe_add_prefix_space(self, tokens, decoded):
+        if tokens and tokens[0] not in self.no_prefix_space_tokens:
+            return " " + decoded
+        else:
+            return decoded
+    def convert_tokens_to_string(self, tokens):
+        """Converts a sequence of tokens (string) in a single string."""
+        current_sub_tokens = []
+        out_string = ""
+        prev_is_special = False
+        for token in tokens:
+            # make sure that special tokens are not decoded using sentencepiece model
+            if token in self.all_special_tokens:
+                out_string += self.sp_model.decode(current_sub_tokens) + token
+                prev_is_special = True
+                current_sub_tokens = []
+            else:
+                current_sub_tokens.append(token)
+                prev_is_special = False
+        out_string += self.sp_model.decode(current_sub_tokens)
+        return out_string
+    def save_vocabulary(self, save_directory, filename_prefix: Optional[str] = None) -> Tuple[str]:
+        """
+        Save the vocabulary and special tokens file to a directory.
+        Args:
+            save_directory (`str`):
+                The directory in which to save the vocabulary.
+        Returns:
+            `Tuple(str)`: Paths to the files saved.
+        """
+        if not os.path.isdir(save_directory):
+            logger.error(f"Vocabulary path ({save_directory}) should be a directory")
+            return
+        out_vocab_file = os.path.join(
+            save_directory, (filename_prefix + "-" if filename_prefix else "") + VOCAB_FILES_NAMES["vocab_file"]
+        )
+        if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file) and os.path.isfile(self.vocab_file):
+            copyfile(self.vocab_file, out_vocab_file)
+        elif not os.path.isfile(self.vocab_file):
+            with open(out_vocab_file, "wb") as fi:
+                content_spiece_model = self.sp_model.serialized_model_proto()
+                fi.write(content_spiece_model)
+        return (out_vocab_file,)
+    def build_inputs_with_special_tokens(self, token_ids_0, token_ids_1=None):
+        if self.add_bos_token:
+            bos_token_ids = [self.bos_token_id]
+        else:
+            bos_token_ids = []
+        output = bos_token_ids + token_ids_0
+        if token_ids_1 is not None:
+            output = output + token_ids_1
+        if self.add_eos_token:
+            output = output + [self.eos_token_id]
+        return output
+    def get_special_tokens_mask(
+        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None, already_has_special_tokens: bool = False
+    ) -> List[int]:
+        """
+        Retrieve sequence ids from a token list that has no special tokens added. This method is called when adding
+        special tokens using the tokenizer `prepare_for_model` method.
+        Args:
+            token_ids_0 (`List[int]`):
+                List of IDs.
+            token_ids_1 (`List[int]`, *optional*):
+                Optional second list of IDs for sequence pairs.
+            already_has_special_tokens (`bool`, *optional*, defaults to `False`):
+                Whether or not the token list is already formatted with special tokens for the model.
+        Returns:
+            `List[int]`: A list of integers in the range [0, 1]: 1 for a special token, 0 for a sequence token.
+        """
+        if already_has_special_tokens:
+            return super().get_special_tokens_mask(
+                token_ids_0=token_ids_0, token_ids_1=token_ids_1, already_has_special_tokens=True
+            )
+        eos_token_id = [1] if self.add_eos_token else []
+        if token_ids_1 is None:
+            return  ([0] * len(token_ids_0)) + eos_token_id
+        return  ([0] * len(token_ids_0)) + eos_token_id + ([0] * len(token_ids_1)) + eos_token_id
+    def create_token_type_ids_from_sequences(
+        self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
+    ) -> List[int]:
+        """
+        Creates a mask from the two sequences passed to be used in a sequence-pair classification task. An ALBERT
+        sequence pair mask has the following format:
+        ```
+        0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
+        | first sequence    | second sequence |
+        ```
+        if token_ids_1 is None, only returns the first portion of the mask (0s).
+        Note this is only used for back compatiblity, thus list of zero is returned.
+        Args:
+            token_ids_0 (`List[int]`):
+                List of ids.
+            token_ids_1 (`List[int]`, *optional*):
+                Optional second list of IDs for sequence pairs.
+        Returns:
+            `List[int]`: List of zeros.
+        """
+        eos = [self.eos_token_id]
+        if token_ids_1 is None:
+            return len(token_ids_0 + eos) * [0]
+        return len(token_ids_0 + eos + token_ids_1 + eos) * [0]
+    @property
+    def default_chat_template(self):
+        return None
+    def decode(
+        self,
+        token_ids,
+        skip_special_tokens: bool = False,
+        clean_up_tokenization_spaces: Optional[bool] = False,
+        spaces_between_special_tokens: bool = False,
+        **kwargs,
+    ) -> str:
+        # default spaces_between_special_tokens should be false.
+        if spaces_between_special_tokens:
+            logger.warning_once('spaces_between_special_tokens is set. \
+                                It has no effect for bos,eos,pad,unk when transformers<=4.38.')
+        return super().decode(
+            token_ids,
+            skip_special_tokens=skip_special_tokens,
+            clean_up_tokenization_spaces=clean_up_tokenization_spaces,
+            spaces_between_special_tokens=spaces_between_special_tokens,
+            **kwargs,
+        )

tokenizer.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:76d43d618fc0c5a7c79dc4e72579f9f29bb803b36e4a4d709d1233626fd8fe2a
+size 1535725

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,396 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96499": {
+      "content": "<|start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96500": {
+      "content": "<|end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96501": {
+      "content": "<|message|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96502": {
+      "content": "<|tool_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96503": {
+      "content": "<|tool_excute|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96504": {
+      "content": "<|tool_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96505": {
+      "content": "<|pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96506": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96507": {
+      "content": "<fim_prefix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96508": {
+      "content": "<fim_middle>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96509": {
+      "content": "<fim_suffix>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96510": {
+      "content": "<repo_name>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96511": {
+      "content": "<file_sep>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96512": {
+      "content": "<issue_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96513": {
+      "content": "<issue_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96514": {
+      "content": "<issue_closed>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96515": {
+      "content": "<jupyter_start>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96516": {
+      "content": "<jupyter_text>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96517": {
+      "content": "<jupyter_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96518": {
+      "content": "<jupyter_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96519": {
+      "content": "<jupyter_script>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96520": {
+      "content": "<empty_output>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96521": {
+      "content": "<code_to_intermediate>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96522": {
+      "content": "<intermediate_to_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96523": {
+      "content": "<pr>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96524": {
+      "content": "<pr_status>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96525": {
+      "content": "<pr_is_merged>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96526": {
+      "content": "<pr_base>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96527": {
+      "content": "<pr_file>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96528": {
+      "content": "<pr_base_code>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96529": {
+      "content": "<pr_diff>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96530": {
+      "content": "<pr_diff_hunk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96531": {
+      "content": "<pr_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96532": {
+      "content": "<pr_event_id>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96533": {
+      "content": "<pr_review>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96534": {
+      "content": "<pr_review_state>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96535": {
+      "content": "<pr_review_comment>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96536": {
+      "content": "<pr_in_reply_to_review_id>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96537": {
+      "content": "<pr_in_reply_to_comment_id>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96538": {
+      "content": "<pr_diff_hunk_comment_line>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96539": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "96540": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_end|>",
+    "<|im_start|>"
+  ],
+  "auto_map": {
+    "AutoTokenizer": [
+      "tokenization_inflm.INFLMTokenizer",
+      null
+    ]
+  },
+  "bos_token": "<|im_start|>",
+  "chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are OpenCoder, created by OpenCoder Team.<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "extra_special_tokens": {},
+  "model_max_length": 4096,
+  "pad_token": "<pad>",
+  "padding_side": "right",
+  "return_tensors": true,
+  "spaces_between_special_tokens": false,
+  "split_special_tokens": false,
+  "tokenizer_class": "INFLMTokenizer",
+  "unk_token": "<unk>"
+}

train_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 0.9868766404199475,
+    "num_input_tokens_seen": 98566144,
+    "total_flos": 4.361803532655919e+18,
+    "train_loss": 0.6996991406095788,
+    "train_runtime": 8190.5414,
+    "train_samples_per_second": 2.973,
+    "train_steps_per_second": 0.006
+}

trainer_log.jsonl ADDED Viewed

	@@ -0,0 +1,48 @@

+{"current_steps": 1, "total_steps": 47, "loss": 1.099, "lr": 4.9944171965578836e-05, "epoch": 0.02099737532808399, "percentage": 2.13, "elapsed_time": "0:03:01", "remaining_time": "2:18:47", "throughput": 11584.36, "total_tokens": 2097152}
+{"current_steps": 2, "total_steps": 47, "loss": 1.0954, "lr": 4.97769372038695e-05, "epoch": 0.04199475065616798, "percentage": 4.26, "elapsed_time": "0:05:54", "remaining_time": "2:13:07", "throughput": 11815.38, "total_tokens": 4194304}
+{"current_steps": 3, "total_steps": 47, "loss": 1.0273, "lr": 4.9499042625914674e-05, "epoch": 0.06299212598425197, "percentage": 6.38, "elapsed_time": "0:08:49", "remaining_time": "2:09:21", "throughput": 11889.36, "total_tokens": 6291456}
+{"current_steps": 4, "total_steps": 47, "loss": 0.9301, "lr": 4.911172937635942e-05, "epoch": 0.08398950131233596, "percentage": 8.51, "elapsed_time": "0:11:43", "remaining_time": "2:05:59", "throughput": 11929.11, "total_tokens": 8388608}
+{"current_steps": 5, "total_steps": 47, "loss": 0.8696, "lr": 4.861672729019797e-05, "epoch": 0.10498687664041995, "percentage": 10.64, "elapsed_time": "0:14:37", "remaining_time": "2:02:50", "throughput": 11950.39, "total_tokens": 10485760}
+{"current_steps": 6, "total_steps": 47, "loss": 0.8379, "lr": 4.801624716691072e-05, "epoch": 0.12598425196850394, "percentage": 12.77, "elapsed_time": "0:17:31", "remaining_time": "1:59:47", "throughput": 11963.64, "total_tokens": 12582912}
+{"current_steps": 7, "total_steps": 47, "loss": 0.8087, "lr": 4.731297089649703e-05, "epoch": 0.14698162729658792, "percentage": 14.89, "elapsed_time": "0:20:25", "remaining_time": "1:56:45", "throughput": 11974.16, "total_tokens": 14680064}
+{"current_steps": 8, "total_steps": 47, "loss": 0.8071, "lr": 4.651003948150349e-05, "epoch": 0.1679790026246719, "percentage": 17.02, "elapsed_time": "0:23:20", "remaining_time": "1:53:45", "throughput": 11982.48, "total_tokens": 16777216}
+{"current_steps": 9, "total_steps": 47, "loss": 0.7521, "lr": 4.561103900854401e-05, "epoch": 0.1889763779527559, "percentage": 19.15, "elapsed_time": "0:26:14", "remaining_time": "1:50:47", "throughput": 11987.35, "total_tokens": 18874368}
+{"current_steps": 10, "total_steps": 47, "loss": 0.7666, "lr": 4.4619984631966524e-05, "epoch": 0.2099737532808399, "percentage": 21.28, "elapsed_time": "0:29:09", "remaining_time": "1:47:51", "throughput": 11990.25, "total_tokens": 20971520}
+{"current_steps": 11, "total_steps": 47, "loss": 0.7226, "lr": 4.354130264119894e-05, "epoch": 0.23097112860892388, "percentage": 23.4, "elapsed_time": "0:32:03", "remaining_time": "1:44:54", "throughput": 11994.52, "total_tokens": 23068672}
+{"current_steps": 12, "total_steps": 47, "loss": 0.719, "lr": 4.2379810691866064e-05, "epoch": 0.25196850393700787, "percentage": 25.53, "elapsed_time": "0:34:57", "remaining_time": "1:41:57", "throughput": 11998.58, "total_tokens": 25165824}
+{"current_steps": 13, "total_steps": 47, "loss": 0.7129, "lr": 4.114069628897006e-05, "epoch": 0.27296587926509186, "percentage": 27.66, "elapsed_time": "0:37:51", "remaining_time": "1:39:01", "throughput": 12001.09, "total_tokens": 27262976}
+{"current_steps": 14, "total_steps": 47, "loss": 0.7029, "lr": 3.982949361823388e-05, "epoch": 0.29396325459317585, "percentage": 29.79, "elapsed_time": "0:40:45", "remaining_time": "1:36:05", "throughput": 12003.39, "total_tokens": 29360128}
+{"current_steps": 15, "total_steps": 47, "loss": 0.693, "lr": 3.845205882908432e-05, "epoch": 0.31496062992125984, "percentage": 31.91, "elapsed_time": "0:43:40", "remaining_time": "1:33:09", "throughput": 12005.3, "total_tokens": 31457280}
+{"current_steps": 16, "total_steps": 47, "loss": 0.689, "lr": 3.7014543879667094e-05, "epoch": 0.3359580052493438, "percentage": 34.04, "elapsed_time": "0:46:34", "remaining_time": "1:30:14", "throughput": 12006.81, "total_tokens": 33554432}
+{"current_steps": 17, "total_steps": 47, "loss": 0.6898, "lr": 3.552336906070838e-05, "epoch": 0.3569553805774278, "percentage": 36.17, "elapsed_time": "0:49:28", "remaining_time": "1:27:18", "throughput": 12008.94, "total_tokens": 35651584}
+{"current_steps": 18, "total_steps": 47, "loss": 0.6596, "lr": 3.398519432093782e-05, "epoch": 0.3779527559055118, "percentage": 38.3, "elapsed_time": "0:52:22", "remaining_time": "1:24:23", "throughput": 12010.5, "total_tokens": 37748736}
+{"current_steps": 19, "total_steps": 47, "loss": 0.6796, "lr": 3.2406889522140856e-05, "epoch": 0.3989501312335958, "percentage": 40.43, "elapsed_time": "0:55:16", "remaining_time": "1:21:28", "throughput": 12012.77, "total_tokens": 39845888}
+{"current_steps": 20, "total_steps": 47, "loss": 0.6595, "lr": 3.079550375668821e-05, "epoch": 0.4199475065616798, "percentage": 42.55, "elapsed_time": "0:58:10", "remaining_time": "1:18:32", "throughput": 12016.11, "total_tokens": 41943040}
+{"current_steps": 21, "total_steps": 47, "loss": 0.655, "lr": 2.9158233864578254e-05, "epoch": 0.4409448818897638, "percentage": 44.68, "elapsed_time": "1:01:03", "remaining_time": "1:15:35", "throughput": 12021.16, "total_tokens": 44040192}
+{"current_steps": 22, "total_steps": 47, "loss": 0.6566, "lr": 2.7502392290602463e-05, "epoch": 0.46194225721784776, "percentage": 46.81, "elapsed_time": "1:03:56", "remaining_time": "1:12:39", "throughput": 12025.76, "total_tokens": 46137344}
+{"current_steps": 23, "total_steps": 47, "loss": 0.6483, "lr": 2.5835374425191866e-05, "epoch": 0.48293963254593175, "percentage": 48.94, "elapsed_time": "1:06:49", "remaining_time": "1:09:44", "throughput": 12029.04, "total_tokens": 48234496}
+{"current_steps": 24, "total_steps": 47, "loss": 0.6319, "lr": 2.4164625574808146e-05, "epoch": 0.5039370078740157, "percentage": 51.06, "elapsed_time": "1:09:42", "remaining_time": "1:06:48", "throughput": 12032.55, "total_tokens": 50331648}
+{"current_steps": 25, "total_steps": 47, "loss": 0.6479, "lr": 2.2497607709397543e-05, "epoch": 0.5249343832020997, "percentage": 53.19, "elapsed_time": "1:12:36", "remaining_time": "1:03:53", "throughput": 12035.5, "total_tokens": 52428800}
+{"current_steps": 26, "total_steps": 47, "loss": 0.6253, "lr": 2.0841766135421752e-05, "epoch": 0.5459317585301837, "percentage": 55.32, "elapsed_time": "1:15:29", "remaining_time": "1:00:58", "throughput": 12037.82, "total_tokens": 54525952}
+{"current_steps": 27, "total_steps": 47, "loss": 0.6248, "lr": 1.920449624331179e-05, "epoch": 0.5669291338582677, "percentage": 57.45, "elapsed_time": "1:18:23", "remaining_time": "0:58:03", "throughput": 12039.18, "total_tokens": 56623104}
+{"current_steps": 28, "total_steps": 47, "loss": 0.6237, "lr": 1.7593110477859153e-05, "epoch": 0.5879265091863517, "percentage": 59.57, "elapsed_time": "1:21:16", "remaining_time": "0:55:09", "throughput": 12040.88, "total_tokens": 58720256}
+{"current_steps": 29, "total_steps": 47, "loss": 0.63, "lr": 1.6014805679062185e-05, "epoch": 0.6089238845144357, "percentage": 61.7, "elapsed_time": "1:24:10", "remaining_time": "0:52:14", "throughput": 12042.42, "total_tokens": 60817408}
+{"current_steps": 30, "total_steps": 47, "loss": 0.6241, "lr": 1.447663093929163e-05, "epoch": 0.6299212598425197, "percentage": 63.83, "elapsed_time": "1:27:03", "remaining_time": "0:49:20", "throughput": 12043.81, "total_tokens": 62914560}
+{"current_steps": 31, "total_steps": 47, "loss": 0.6062, "lr": 1.2985456120332906e-05, "epoch": 0.6509186351706037, "percentage": 65.96, "elapsed_time": "1:29:57", "remaining_time": "0:46:25", "throughput": 12044.95, "total_tokens": 65011712}
+{"current_steps": 32, "total_steps": 47, "loss": 0.6356, "lr": 1.1547941170915686e-05, "epoch": 0.6719160104986877, "percentage": 68.09, "elapsed_time": "1:32:51", "remaining_time": "0:43:31", "throughput": 12046.11, "total_tokens": 67108864}
+{"current_steps": 33, "total_steps": 47, "loss": 0.6151, "lr": 1.0170506381766121e-05, "epoch": 0.6929133858267716, "percentage": 70.21, "elapsed_time": "1:35:44", "remaining_time": "0:40:37", "throughput": 12046.85, "total_tokens": 69206016}
+{"current_steps": 34, "total_steps": 47, "loss": 0.6198, "lr": 8.85930371102994e-06, "epoch": 0.7139107611548556, "percentage": 72.34, "elapsed_time": "1:38:38", "remaining_time": "0:37:43", "throughput": 12047.18, "total_tokens": 71303168}
+{"current_steps": 35, "total_steps": 47, "loss": 0.5952, "lr": 7.620189308133943e-06, "epoch": 0.7349081364829396, "percentage": 74.47, "elapsed_time": "1:41:32", "remaining_time": "0:34:48", "throughput": 12047.45, "total_tokens": 73400320}
+{"current_steps": 36, "total_steps": 47, "loss": 0.6126, "lr": 6.458697358801061e-06, "epoch": 0.7559055118110236, "percentage": 76.6, "elapsed_time": "1:44:26", "remaining_time": "0:31:54", "throughput": 12047.72, "total_tokens": 75497472}
+{"current_steps": 37, "total_steps": 47, "loss": 0.637, "lr": 5.380015368033476e-06, "epoch": 0.7769028871391076, "percentage": 78.72, "elapsed_time": "1:47:20", "remaining_time": "0:29:00", "throughput": 12047.79, "total_tokens": 77594624}
+{"current_steps": 38, "total_steps": 47, "loss": 0.6432, "lr": 4.388960991455998e-06, "epoch": 0.7979002624671916, "percentage": 80.85, "elapsed_time": "1:50:15", "remaining_time": "0:26:06", "throughput": 12047.09, "total_tokens": 79691776}
+{"current_steps": 39, "total_steps": 47, "loss": 0.6329, "lr": 3.489960518496521e-06, "epoch": 0.8188976377952756, "percentage": 82.98, "elapsed_time": "1:53:08", "remaining_time": "0:23:12", "throughput": 12047.43, "total_tokens": 81788928}
+{"current_steps": 40, "total_steps": 47, "loss": 0.629, "lr": 2.687029103502972e-06, "epoch": 0.8398950131233596, "percentage": 85.11, "elapsed_time": "1:56:03", "remaining_time": "0:20:18", "throughput": 12047.22, "total_tokens": 83886080}
+{"current_steps": 41, "total_steps": 47, "loss": 0.6277, "lr": 1.983752833089278e-06, "epoch": 0.8608923884514436, "percentage": 87.23, "elapsed_time": "1:58:57", "remaining_time": "0:17:24", "throughput": 12046.86, "total_tokens": 85983232}
+{"current_steps": 42, "total_steps": 47, "loss": 0.6192, "lr": 1.3832727098020332e-06, "epoch": 0.8818897637795275, "percentage": 89.36, "elapsed_time": "2:01:52", "remaining_time": "0:14:30", "throughput": 12045.85, "total_tokens": 88080384}
+{"current_steps": 43, "total_steps": 47, "loss": 0.6203, "lr": 8.882706236405886e-07, "epoch": 0.9028871391076115, "percentage": 91.49, "elapsed_time": "2:04:47", "remaining_time": "0:11:36", "throughput": 12043.63, "total_tokens": 90177536}
+{"current_steps": 44, "total_steps": 47, "loss": 0.6155, "lr": 5.009573740853313e-07, "epoch": 0.9238845144356955, "percentage": 93.62, "elapsed_time": "2:07:37", "remaining_time": "0:08:42", "throughput": 12050.55, "total_tokens": 92274688}
+{"current_steps": 45, "total_steps": 47, "loss": 0.6273, "lr": 2.230627961304993e-07, "epoch": 0.9448818897637795, "percentage": 95.74, "elapsed_time": "2:10:26", "remaining_time": "0:05:47", "throughput": 12057.85, "total_tokens": 94371840}
+{"current_steps": 46, "total_steps": 47, "loss": 0.624, "lr": 5.5828034421170907e-08, "epoch": 0.9658792650918635, "percentage": 97.87, "elapsed_time": "2:13:15", "remaining_time": "0:02:53", "throughput": 12064.68, "total_tokens": 96468992}
+{"current_steps": 47, "total_steps": 47, "loss": 0.6356, "lr": 0.0, "epoch": 0.9868766404199475, "percentage": 100.0, "elapsed_time": "2:16:05", "remaining_time": "0:00:00", "throughput": 12071.55, "total_tokens": 98566144}
+{"current_steps": 47, "total_steps": 47, "epoch": 0.9868766404199475, "percentage": 100.0, "elapsed_time": "2:16:29", "remaining_time": "0:00:00", "throughput": 12035.81, "total_tokens": 98566144}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,419 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.9868766404199475,
+  "eval_steps": 500,
+  "global_step": 47,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.02099737532808399,
+      "grad_norm": 1.512850046157837,
+      "learning_rate": 4.9944171965578836e-05,
+      "loss": 1.099,
+      "num_input_tokens_seen": 2097152,
+      "step": 1
+    },
+    {
+      "epoch": 0.04199475065616798,
+      "grad_norm": 1.480190396308899,
+      "learning_rate": 4.97769372038695e-05,
+      "loss": 1.0954,
+      "num_input_tokens_seen": 4194304,
+      "step": 2
+    },
+    {
+      "epoch": 0.06299212598425197,
+      "grad_norm": 1.2695276737213135,
+      "learning_rate": 4.9499042625914674e-05,
+      "loss": 1.0273,
+      "num_input_tokens_seen": 6291456,
+      "step": 3
+    },
+    {
+      "epoch": 0.08398950131233596,
+      "grad_norm": 1.0559132099151611,
+      "learning_rate": 4.911172937635942e-05,
+      "loss": 0.9301,
+      "num_input_tokens_seen": 8388608,
+      "step": 4
+    },
+    {
+      "epoch": 0.10498687664041995,
+      "grad_norm": 0.8047496676445007,
+      "learning_rate": 4.861672729019797e-05,
+      "loss": 0.8696,
+      "num_input_tokens_seen": 10485760,
+      "step": 5
+    },
+    {
+      "epoch": 0.12598425196850394,
+      "grad_norm": 0.6325356364250183,
+      "learning_rate": 4.801624716691072e-05,
+      "loss": 0.8379,
+      "num_input_tokens_seen": 12582912,
+      "step": 6
+    },
+    {
+      "epoch": 0.14698162729658792,
+      "grad_norm": 0.504206120967865,
+      "learning_rate": 4.731297089649703e-05,
+      "loss": 0.8087,
+      "num_input_tokens_seen": 14680064,
+      "step": 7
+    },
+    {
+      "epoch": 0.1679790026246719,
+      "grad_norm": 0.4543064534664154,
+      "learning_rate": 4.651003948150349e-05,
+      "loss": 0.8071,
+      "num_input_tokens_seen": 16777216,
+      "step": 8
+    },
+    {
+      "epoch": 0.1889763779527559,
+      "grad_norm": 0.35835036635398865,
+      "learning_rate": 4.561103900854401e-05,
+      "loss": 0.7521,
+      "num_input_tokens_seen": 18874368,
+      "step": 9
+    },
+    {
+      "epoch": 0.2099737532808399,
+      "grad_norm": 0.33651018142700195,
+      "learning_rate": 4.4619984631966524e-05,
+      "loss": 0.7666,
+      "num_input_tokens_seen": 20971520,
+      "step": 10
+    },
+    {
+      "epoch": 0.23097112860892388,
+      "grad_norm": 0.2762841284275055,
+      "learning_rate": 4.354130264119894e-05,
+      "loss": 0.7226,
+      "num_input_tokens_seen": 23068672,
+      "step": 11
+    },
+    {
+      "epoch": 0.25196850393700787,
+      "grad_norm": 0.2664271891117096,
+      "learning_rate": 4.2379810691866064e-05,
+      "loss": 0.719,
+      "num_input_tokens_seen": 25165824,
+      "step": 12
+    },
+    {
+      "epoch": 0.27296587926509186,
+      "grad_norm": 0.21707402169704437,
+      "learning_rate": 4.114069628897006e-05,
+      "loss": 0.7129,
+      "num_input_tokens_seen": 27262976,
+      "step": 13
+    },
+    {
+      "epoch": 0.29396325459317585,
+      "grad_norm": 0.19864299893379211,
+      "learning_rate": 3.982949361823388e-05,
+      "loss": 0.7029,
+      "num_input_tokens_seen": 29360128,
+      "step": 14
+    },
+    {
+      "epoch": 0.31496062992125984,
+      "grad_norm": 0.20525172352790833,
+      "learning_rate": 3.845205882908432e-05,
+      "loss": 0.693,
+      "num_input_tokens_seen": 31457280,
+      "step": 15
+    },
+    {
+      "epoch": 0.3359580052493438,
+      "grad_norm": 0.18027806282043457,
+      "learning_rate": 3.7014543879667094e-05,
+      "loss": 0.689,
+      "num_input_tokens_seen": 33554432,
+      "step": 16
+    },
+    {
+      "epoch": 0.3569553805774278,
+      "grad_norm": 0.16686110198497772,
+      "learning_rate": 3.552336906070838e-05,
+      "loss": 0.6898,
+      "num_input_tokens_seen": 35651584,
+      "step": 17
+    },
+    {
+      "epoch": 0.3779527559055118,
+      "grad_norm": 0.15808887779712677,
+      "learning_rate": 3.398519432093782e-05,
+      "loss": 0.6596,
+      "num_input_tokens_seen": 37748736,
+      "step": 18
+    },
+    {
+      "epoch": 0.3989501312335958,
+      "grad_norm": 0.15036313235759735,
+      "learning_rate": 3.2406889522140856e-05,
+      "loss": 0.6796,
+      "num_input_tokens_seen": 39845888,
+      "step": 19
+    },
+    {
+      "epoch": 0.4199475065616798,
+      "grad_norm": 0.13290229439735413,
+      "learning_rate": 3.079550375668821e-05,
+      "loss": 0.6595,
+      "num_input_tokens_seen": 41943040,
+      "step": 20
+    },
+    {
+      "epoch": 0.4409448818897638,
+      "grad_norm": 0.12948521971702576,
+      "learning_rate": 2.9158233864578254e-05,
+      "loss": 0.655,
+      "num_input_tokens_seen": 44040192,
+      "step": 21
+    },
+    {
+      "epoch": 0.46194225721784776,
+      "grad_norm": 0.12121973931789398,
+      "learning_rate": 2.7502392290602463e-05,
+      "loss": 0.6566,
+      "num_input_tokens_seen": 46137344,
+      "step": 22
+    },
+    {
+      "epoch": 0.48293963254593175,
+      "grad_norm": 0.11144606024026871,
+      "learning_rate": 2.5835374425191866e-05,
+      "loss": 0.6483,
+      "num_input_tokens_seen": 48234496,
+      "step": 23
+    },
+    {
+      "epoch": 0.5039370078740157,
+      "grad_norm": 0.09845685958862305,
+      "learning_rate": 2.4164625574808146e-05,
+      "loss": 0.6319,
+      "num_input_tokens_seen": 50331648,
+      "step": 24
+    },
+    {
+      "epoch": 0.5249343832020997,
+      "grad_norm": 0.10502230376005173,
+      "learning_rate": 2.2497607709397543e-05,
+      "loss": 0.6479,
+      "num_input_tokens_seen": 52428800,
+      "step": 25
+    },
+    {
+      "epoch": 0.5459317585301837,
+      "grad_norm": 0.103838250041008,
+      "learning_rate": 2.0841766135421752e-05,
+      "loss": 0.6253,
+      "num_input_tokens_seen": 54525952,
+      "step": 26
+    },
+    {
+      "epoch": 0.5669291338582677,
+      "grad_norm": 0.09724501520395279,
+      "learning_rate": 1.920449624331179e-05,
+      "loss": 0.6248,
+      "num_input_tokens_seen": 56623104,
+      "step": 27
+    },
+    {
+      "epoch": 0.5879265091863517,
+      "grad_norm": 0.09455479681491852,
+      "learning_rate": 1.7593110477859153e-05,
+      "loss": 0.6237,
+      "num_input_tokens_seen": 58720256,
+      "step": 28
+    },
+    {
+      "epoch": 0.6089238845144357,
+      "grad_norm": 0.09271286427974701,
+      "learning_rate": 1.6014805679062185e-05,
+      "loss": 0.63,
+      "num_input_tokens_seen": 60817408,
+      "step": 29
+    },
+    {
+      "epoch": 0.6299212598425197,
+      "grad_norm": 0.09387161582708359,
+      "learning_rate": 1.447663093929163e-05,
+      "loss": 0.6241,
+      "num_input_tokens_seen": 62914560,
+      "step": 30
+    },
+    {
+      "epoch": 0.6509186351706037,
+      "grad_norm": 0.09017566591501236,
+      "learning_rate": 1.2985456120332906e-05,
+      "loss": 0.6062,
+      "num_input_tokens_seen": 65011712,
+      "step": 31
+    },
+    {
+      "epoch": 0.6719160104986877,
+      "grad_norm": 0.09072048217058182,
+      "learning_rate": 1.1547941170915686e-05,
+      "loss": 0.6356,
+      "num_input_tokens_seen": 67108864,
+      "step": 32
+    },
+    {
+      "epoch": 0.6929133858267716,
+      "grad_norm": 0.09561938792467117,
+      "learning_rate": 1.0170506381766121e-05,
+      "loss": 0.6151,
+      "num_input_tokens_seen": 69206016,
+      "step": 33
+    },
+    {
+      "epoch": 0.7139107611548556,
+      "grad_norm": 0.0859409049153328,
+      "learning_rate": 8.85930371102994e-06,
+      "loss": 0.6198,
+      "num_input_tokens_seen": 71303168,
+      "step": 34
+    },
+    {
+      "epoch": 0.7349081364829396,
+      "grad_norm": 0.08907345682382584,
+      "learning_rate": 7.620189308133943e-06,
+      "loss": 0.5952,
+      "num_input_tokens_seen": 73400320,
+      "step": 35
+    },
+    {
+      "epoch": 0.7559055118110236,
+      "grad_norm": 0.08716382831335068,
+      "learning_rate": 6.458697358801061e-06,
+      "loss": 0.6126,
+      "num_input_tokens_seen": 75497472,
+      "step": 36
+    },
+    {
+      "epoch": 0.7769028871391076,
+      "grad_norm": 0.08723893016576767,
+      "learning_rate": 5.380015368033476e-06,
+      "loss": 0.637,
+      "num_input_tokens_seen": 77594624,
+      "step": 37
+    },
+    {
+      "epoch": 0.7979002624671916,
+      "grad_norm": 0.09199921041727066,
+      "learning_rate": 4.388960991455998e-06,
+      "loss": 0.6432,
+      "num_input_tokens_seen": 79691776,
+      "step": 38
+    },
+    {
+      "epoch": 0.8188976377952756,
+      "grad_norm": 0.08754907548427582,
+      "learning_rate": 3.489960518496521e-06,
+      "loss": 0.6329,
+      "num_input_tokens_seen": 81788928,
+      "step": 39
+    },
+    {
+      "epoch": 0.8398950131233596,
+      "grad_norm": 0.08782719820737839,
+      "learning_rate": 2.687029103502972e-06,
+      "loss": 0.629,
+      "num_input_tokens_seen": 83886080,
+      "step": 40
+    },
+    {
+      "epoch": 0.8608923884514436,
+      "grad_norm": 0.08878407627344131,
+      "learning_rate": 1.983752833089278e-06,
+      "loss": 0.6277,
+      "num_input_tokens_seen": 85983232,
+      "step": 41
+    },
+    {
+      "epoch": 0.8818897637795275,
+      "grad_norm": 0.08673521876335144,
+      "learning_rate": 1.3832727098020332e-06,
+      "loss": 0.6192,
+      "num_input_tokens_seen": 88080384,
+      "step": 42
+    },
+    {
+      "epoch": 0.9028871391076115,
+      "grad_norm": 0.09247046709060669,
+      "learning_rate": 8.882706236405886e-07,
+      "loss": 0.6203,
+      "num_input_tokens_seen": 90177536,
+      "step": 43
+    },
+    {
+      "epoch": 0.9238845144356955,
+      "grad_norm": 0.08419251441955566,
+      "learning_rate": 5.009573740853313e-07,
+      "loss": 0.6155,
+      "num_input_tokens_seen": 92274688,
+      "step": 44
+    },
+    {
+      "epoch": 0.9448818897637795,
+      "grad_norm": 0.08720867335796356,
+      "learning_rate": 2.230627961304993e-07,
+      "loss": 0.6273,
+      "num_input_tokens_seen": 94371840,
+      "step": 45
+    },
+    {
+      "epoch": 0.9658792650918635,
+      "grad_norm": 0.08411751687526703,
+      "learning_rate": 5.5828034421170907e-08,
+      "loss": 0.624,
+      "num_input_tokens_seen": 96468992,
+      "step": 46
+    },
+    {
+      "epoch": 0.9868766404199475,
+      "grad_norm": 0.08487113565206528,
+      "learning_rate": 0.0,
+      "loss": 0.6356,
+      "num_input_tokens_seen": 98566144,
+      "step": 47
+    },
+    {
+      "epoch": 0.9868766404199475,
+      "num_input_tokens_seen": 98566144,
+      "step": 47,
+      "total_flos": 4.361803532655919e+18,
+      "train_loss": 0.6996991406095788,
+      "train_runtime": 8190.5414,
+      "train_samples_per_second": 2.973,
+      "train_steps_per_second": 0.006
+    }
+  ],
+  "logging_steps": 1.0,
+  "max_steps": 47,
+  "num_input_tokens_seen": 98566144,
+  "num_train_epochs": 1,
+  "save_steps": 1000,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 4.361803532655919e+18,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:671114193a3823ad5a89e2f6394b2ff9bf4502f73a43e936a619b95cec825f31
+size 5688

training_loss.png ADDED Viewed