Training in progress, step 200, checkpoint

Browse files

Files changed (11) hide show

last-checkpoint/README.md +210 -0
last-checkpoint/adapter_config.json +52 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/chat_template.jinja +154 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/tokenizer.json +3 -0
last-checkpoint/tokenizer_config.json +31 -0
last-checkpoint/trainer_state.json +334 -0
last-checkpoint/training_args.bin +3 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,210 @@

+---
+base_model: Qwen/Qwen3.5-0.8B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Qwen/Qwen3.5-0.8B
+- dpo
+- lora
+- transformers
+- trl
+- unsloth
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.19.1

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,52 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": {
+    "base_model_class": "Qwen3_5ForConditionalGeneration",
+    "parent_library": "transformers.models.qwen3_5.modeling_qwen3_5",
+    "unsloth_fixed": true
+  },
+  "base_model_name_or_path": "Qwen/Qwen3.5-0.8B",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_bias": false,
+  "lora_dropout": 0,
+  "lora_ga_config": null,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.19.1",
+  "qalora_group_size": 16,
+  "r": 128,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "gate_proj",
+    "v_proj",
+    "o_proj",
+    "k_proj",
+    "up_proj",
+    "down_proj",
+    "q_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_bdlora": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:397981c35ab0e961efee228768af1e761a05c64d7fd48066cf3c9be94cc1f401
+size 204500912

last-checkpoint/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,154 @@

+{%- set image_count = namespace(value=0) %}
+{%- set video_count = namespace(value=0) %}
+{%- macro render_content(content, do_vision_count, is_system_content=false) %}
+    {%- if content is string %}
+        {{- content }}
+    {%- elif content is iterable and content is not mapping %}
+        {%- for item in content %}
+            {%- if 'image' in item or 'image_url' in item or item.type == 'image' %}
+                {%- if is_system_content %}
+                    {{- raise_exception('System message cannot contain images.') }}
+                {%- endif %}
+                {%- if do_vision_count %}
+                    {%- set image_count.value = image_count.value + 1 %}
+                {%- endif %}
+                {%- if add_vision_id %}
+                    {{- 'Picture ' ~ image_count.value ~ ': ' }}
+                {%- endif %}
+                {{- '<|vision_start|><|image_pad|><|vision_end|>' }}
+            {%- elif 'video' in item or item.type == 'video' %}
+                {%- if is_system_content %}
+                    {{- raise_exception('System message cannot contain videos.') }}
+                {%- endif %}
+                {%- if do_vision_count %}
+                    {%- set video_count.value = video_count.value + 1 %}
+                {%- endif %}
+                {%- if add_vision_id %}
+                    {{- 'Video ' ~ video_count.value ~ ': ' }}
+                {%- endif %}
+                {{- '<|vision_start|><|video_pad|><|vision_end|>' }}
+            {%- elif 'text' in item %}
+                {{- item.text }}
+            {%- else %}
+                {{- raise_exception('Unexpected item type in content.') }}
+            {%- endif %}
+        {%- endfor %}
+    {%- elif content is none or content is undefined %}
+        {{- '' }}
+    {%- else %}
+        {{- raise_exception('Unexpected content type.') }}
+    {%- endif %}
+{%- endmacro %}
+{%- if not messages %}
+    {{- raise_exception('No messages provided.') }}
+{%- endif %}
+{%- if tools and tools is iterable and tools is not mapping %}
+    {{- '<|im_start|>system\n' }}
+    {{- "# Tools\n\nYou have access to the following functions:\n\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>" }}
+    {{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
+    {%- if messages[0].role == 'system' %}
+        {%- set content = render_content(messages[0].content, false, true)|trim %}
+        {%- if content %}
+            {{- '\n\n' + content }}
+        {%- endif %}
+    {%- endif %}
+    {{- '<|im_end|>\n' }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {%- set content = render_content(messages[0].content, false, true)|trim %}
+        {{- '<|im_start|>system\n' + content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
+{%- for message in messages[::-1] %}
+    {%- set index = (messages|length - 1) - loop.index0 %}
+    {%- if ns.multi_step_tool and message.role == "user" %}
+        {%- set content = render_content(message.content, false)|trim %}
+        {%- if not(content.startswith('<tool_response>') and content.endswith('</tool_response>')) %}
+            {%- set ns.multi_step_tool = false %}
+            {%- set ns.last_query_index = index %}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if ns.multi_step_tool %}
+    {{- raise_exception('No user query found in messages.') }}
+{%- endif %}
+{%- for message in messages %}
+    {%- set content = render_content(message.content, true)|trim %}
+    {%- if message.role == "system" %}
+        {%- if not loop.first %}
+            {{- raise_exception('System message must be at the beginning.') }}
+        {%- endif %}
+    {%- elif message.role == "user" %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {%- set reasoning_content = '' %}
+        {%- if message.reasoning_content is string %}
+            {%- set reasoning_content = message.reasoning_content %}
+        {%- else %}
+            {%- if '</think>' in content %}
+                {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+                {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+            {%- endif %}
+        {%- endif %}
+        {%- set reasoning_content = reasoning_content|trim %}
+        {%- if loop.index0 > ns.last_query_index %}
+            {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content + '\n</think>\n\n' + content }}
+        {%- else %}
+            {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- endif %}
+        {%- if message.tool_calls and message.tool_calls is iterable and message.tool_calls is not mapping %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if tool_call.function is defined %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {%- if loop.first %}
+                    {%- if content|trim %}
+                        {{- '\n\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                    {%- else %}
+                        {{- '<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                    {%- endif %}
+                {%- else %}
+                    {{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
+                {%- endif %}
+                {%- if tool_call.arguments is defined %}
+                    {%- for args_name, args_value in tool_call.arguments|items %}
+                        {{- '<parameter=' + args_name + '>\n' }}
+                        {%- set args_value = args_value | tojson | safe if args_value is mapping or (args_value is sequence and args_value is not string) else args_value | string %}
+                        {{- args_value }}
+                        {{- '\n</parameter>\n' }}
+                    {%- endfor %}
+                {%- endif %}
+                {{- '</function>\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.previtem and loop.previtem.role != "tool" %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if not loop.last and loop.nextitem.role != "tool" %}
+            {{- '<|im_end|>\n' }}
+        {%- elif loop.last %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- else %}
+        {{- raise_exception('Unexpected message role.') }}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+    {%- if enable_thinking is defined and enable_thinking is true %}
+        {{- '<think>\n' }}
+    {%- else %}
+        {{- '<think>\n\n</think>\n\n' }}
+    {%- endif %}
+{%- endif %}

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:87cdf0c40c8d25bb377f19c303e233965a5a758dc24cb6835f06245f3dc44448
+size 104062731

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7c800b778fa7e115e4c34de8529902de8b61c9a1b4bab3eb8295d06dafff030e
+size 14645

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bb10f76b99bd9a0760995f07057438774f17eb187337f2cd834a5ab7c625dd05
+size 1465

last-checkpoint/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:87a7830d63fcf43bf241c3c5242e96e62dd3fdc29224ca26fed8ea333db72de4
+size 19989343

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "add_prefix_space": false,
+  "audio_bos_token": "<|audio_start|>",
+  "audio_eos_token": "<|audio_end|>",
+  "audio_token": "<|audio_pad|>",
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "image_token": "<|image_pad|>",
+  "is_local": false,
+  "model_max_length": 262144,
+  "model_specific_special_tokens": {
+    "audio_bos_token": "<|audio_start|>",
+    "audio_eos_token": "<|audio_end|>",
+    "audio_token": "<|audio_pad|>",
+    "image_token": "<|image_pad|>",
+    "video_token": "<|video_pad|>",
+    "vision_bos_token": "<|vision_start|>",
+    "vision_eos_token": "<|vision_end|>"
+  },
+  "pad_token": "<|endoftext|>",
+  "pretokenize_regex": "(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?[\\p{L}\\p{M}]+|\\p{N}| ?[^\\s\\p{L}\\p{M}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
+  "split_special_tokens": false,
+  "tokenizer_class": "TokenizersBackend",
+  "unk_token": null,
+  "video_token": "<|video_pad|>",
+  "vision_bos_token": "<|vision_start|>",
+  "vision_eos_token": "<|vision_end|>"
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,334 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.5317381189764041,
+  "eval_steps": 500,
+  "global_step": 200,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.026586905948820207,
+      "grad_norm": 92.03909301757812,
+      "learning_rate": 9e-08,
+      "logits/chosen": 1.8763988018035889,
+      "logits/rejected": 2.256129264831543,
+      "logps/chosen": -180.8492431640625,
+      "logps/rejected": -294.6668395996094,
+      "loss": 16.764971923828124,
+      "rewards/accuracies": 0.643750011920929,
+      "rewards/chosen": 50.876712799072266,
+      "rewards/margins": 20.225709915161133,
+      "rewards/rejected": 30.651004791259766,
+      "step": 10
+    },
+    {
+      "epoch": 0.053173811897640415,
+      "grad_norm": 72.75655364990234,
+      "learning_rate": 1.8999999999999998e-07,
+      "logits/chosen": 2.2195005416870117,
+      "logits/rejected": 2.3702588081359863,
+      "logps/chosen": -199.29591369628906,
+      "logps/rejected": -293.90887451171875,
+      "loss": 14.240003967285157,
+      "rewards/accuracies": 0.625,
+      "rewards/chosen": 53.17363357543945,
+      "rewards/margins": 24.12602996826172,
+      "rewards/rejected": 29.0476016998291,
+      "step": 20
+    },
+    {
+      "epoch": 0.07976071784646062,
+      "grad_norm": 101.80017852783203,
+      "learning_rate": 2.9e-07,
+      "logits/chosen": 2.360567569732666,
+      "logits/rejected": 2.484600305557251,
+      "logps/chosen": -201.53787231445312,
+      "logps/rejected": -286.9433898925781,
+      "loss": 13.708811950683593,
+      "rewards/accuracies": 0.6875,
+      "rewards/chosen": 59.95117950439453,
+      "rewards/margins": 25.14548110961914,
+      "rewards/rejected": 34.805702209472656,
+      "step": 30
+    },
+    {
+      "epoch": 0.10634762379528083,
+      "grad_norm": 85.12960052490234,
+      "learning_rate": 3.8999999999999997e-07,
+      "logits/chosen": 1.8842649459838867,
+      "logits/rejected": 2.0478363037109375,
+      "logps/chosen": -178.1483917236328,
+      "logps/rejected": -285.0755920410156,
+      "loss": 17.000025939941406,
+      "rewards/accuracies": 0.5874999761581421,
+      "rewards/chosen": 50.454071044921875,
+      "rewards/margins": 17.00231170654297,
+      "rewards/rejected": 33.451759338378906,
+      "step": 40
+    },
+    {
+      "epoch": 0.13293452974410103,
+      "grad_norm": 33.85184097290039,
+      "learning_rate": 4.9e-07,
+      "logits/chosen": 2.229463577270508,
+      "logits/rejected": 2.204373836517334,
+      "logps/chosen": -212.55178833007812,
+      "logps/rejected": -280.9806213378906,
+      "loss": 18.418368530273437,
+      "rewards/accuracies": 0.625,
+      "rewards/chosen": 57.14277267456055,
+      "rewards/margins": 21.43265151977539,
+      "rewards/rejected": 35.710121154785156,
+      "step": 50
+    },
+    {
+      "epoch": 0.15952143569292124,
+      "grad_norm": 61.26063537597656,
+      "learning_rate": 5.9e-07,
+      "logits/chosen": 2.045487403869629,
+      "logits/rejected": 2.2564761638641357,
+      "logps/chosen": -183.6549072265625,
+      "logps/rejected": -311.9967956542969,
+      "loss": 12.530684661865234,
+      "rewards/accuracies": 0.6312500238418579,
+      "rewards/chosen": 40.03901672363281,
+      "rewards/margins": 22.57087516784668,
+      "rewards/rejected": 17.468143463134766,
+      "step": 60
+    },
+    {
+      "epoch": 0.18610834164174145,
+      "grad_norm": 65.56060791015625,
+      "learning_rate": 6.9e-07,
+      "logits/chosen": 2.3583855628967285,
+      "logits/rejected": 2.518134593963623,
+      "logps/chosen": -215.484375,
+      "logps/rejected": -292.7709045410156,
+      "loss": 17.06499786376953,
+      "rewards/accuracies": 0.581250011920929,
+      "rewards/chosen": 54.328125,
+      "rewards/margins": 18.9035587310791,
+      "rewards/rejected": 35.42456817626953,
+      "step": 70
+    },
+    {
+      "epoch": 0.21269524759056166,
+      "grad_norm": 78.31404876708984,
+      "learning_rate": 7.9e-07,
+      "logits/chosen": 2.389976978302002,
+      "logits/rejected": 2.5061419010162354,
+      "logps/chosen": -199.54867553710938,
+      "logps/rejected": -313.9349670410156,
+      "loss": 14.476513671875,
+      "rewards/accuracies": 0.668749988079071,
+      "rewards/chosen": 58.9052619934082,
+      "rewards/margins": 28.68975830078125,
+      "rewards/rejected": 30.215505599975586,
+      "step": 80
+    },
+    {
+      "epoch": 0.23928215353938184,
+      "grad_norm": 55.09129333496094,
+      "learning_rate": 8.9e-07,
+      "logits/chosen": 2.343313694000244,
+      "logits/rejected": 2.381267547607422,
+      "logps/chosen": -195.31007385253906,
+      "logps/rejected": -315.2503356933594,
+      "loss": 13.177040100097656,
+      "rewards/accuracies": 0.6000000238418579,
+      "rewards/chosen": 62.564231872558594,
+      "rewards/margins": 32.514305114746094,
+      "rewards/rejected": 30.049936294555664,
+      "step": 90
+    },
+    {
+      "epoch": 0.26586905948820205,
+      "grad_norm": 94.35275268554688,
+      "learning_rate": 9.9e-07,
+      "logits/chosen": 2.3925650119781494,
+      "logits/rejected": 2.607084274291992,
+      "logps/chosen": -189.29811096191406,
+      "logps/rejected": -319.96844482421875,
+      "loss": 18.26203155517578,
+      "rewards/accuracies": 0.574999988079071,
+      "rewards/chosen": 50.321632385253906,
+      "rewards/margins": 22.894689559936523,
+      "rewards/rejected": 27.426937103271484,
+      "step": 100
+    },
+    {
+      "epoch": 0.2924559654370223,
+      "grad_norm": 93.89908599853516,
+      "learning_rate": 9.9e-07,
+      "logits/chosen": 2.4763197898864746,
+      "logits/rejected": 2.6758036613464355,
+      "logps/chosen": -187.5391082763672,
+      "logps/rejected": -340.25250244140625,
+      "loss": 10.57765884399414,
+      "rewards/accuracies": 0.6875,
+      "rewards/chosen": 58.184059143066406,
+      "rewards/margins": 36.878929138183594,
+      "rewards/rejected": 21.305124282836914,
+      "step": 110
+    },
+    {
+      "epoch": 0.3190428713858425,
+      "grad_norm": 91.25633239746094,
+      "learning_rate": 9.788888888888889e-07,
+      "logits/chosen": 2.5278210639953613,
+      "logits/rejected": 2.6886465549468994,
+      "logps/chosen": -205.1584014892578,
+      "logps/rejected": -349.90093994140625,
+      "loss": 13.945356750488282,
+      "rewards/accuracies": 0.6499999761581421,
+      "rewards/chosen": 46.65082550048828,
+      "rewards/margins": 27.151325225830078,
+      "rewards/rejected": 19.499500274658203,
+      "step": 120
+    },
+    {
+      "epoch": 0.34562977733466266,
+      "grad_norm": 96.81977844238281,
+      "learning_rate": 9.677777777777777e-07,
+      "logits/chosen": 3.0266711711883545,
+      "logits/rejected": 3.194408416748047,
+      "logps/chosen": -198.1504669189453,
+      "logps/rejected": -356.78485107421875,
+      "loss": 15.321591186523438,
+      "rewards/accuracies": 0.6312500238418579,
+      "rewards/chosen": 58.57115936279297,
+      "rewards/margins": 34.659385681152344,
+      "rewards/rejected": 23.911775588989258,
+      "step": 130
+    },
+    {
+      "epoch": 0.3722166832834829,
+      "grad_norm": 93.63339233398438,
+      "learning_rate": 9.566666666666667e-07,
+      "logits/chosen": 3.055471181869507,
+      "logits/rejected": 3.145911455154419,
+      "logps/chosen": -219.0845184326172,
+      "logps/rejected": -345.9827880859375,
+      "loss": 13.172528076171876,
+      "rewards/accuracies": 0.6499999761581421,
+      "rewards/chosen": 55.67122268676758,
+      "rewards/margins": 32.06965637207031,
+      "rewards/rejected": 23.6015682220459,
+      "step": 140
+    },
+    {
+      "epoch": 0.3988035892323031,
+      "grad_norm": 73.2032699584961,
+      "learning_rate": 9.455555555555556e-07,
+      "logits/chosen": 2.777052640914917,
+      "logits/rejected": 2.8150634765625,
+      "logps/chosen": -197.19174194335938,
+      "logps/rejected": -374.73822021484375,
+      "loss": 15.409014892578124,
+      "rewards/accuracies": 0.643750011920929,
+      "rewards/chosen": 48.60415267944336,
+      "rewards/margins": 28.52492332458496,
+      "rewards/rejected": 20.079227447509766,
+      "step": 150
+    },
+    {
+      "epoch": 0.4253904951811233,
+      "grad_norm": 66.92320251464844,
+      "learning_rate": 9.344444444444444e-07,
+      "logits/chosen": 2.996166467666626,
+      "logits/rejected": 3.1385650634765625,
+      "logps/chosen": -212.75375366210938,
+      "logps/rejected": -371.66522216796875,
+      "loss": 10.577291870117188,
+      "rewards/accuracies": 0.6937500238418579,
+      "rewards/chosen": 63.399871826171875,
+      "rewards/margins": 42.74666213989258,
+      "rewards/rejected": 20.653209686279297,
+      "step": 160
+    },
+    {
+      "epoch": 0.4519774011299435,
+      "grad_norm": 64.92620849609375,
+      "learning_rate": 9.233333333333333e-07,
+      "logits/chosen": 2.832219362258911,
+      "logits/rejected": 3.1428098678588867,
+      "logps/chosen": -196.91094970703125,
+      "logps/rejected": -397.47064208984375,
+      "loss": 12.637787628173829,
+      "rewards/accuracies": 0.699999988079071,
+      "rewards/chosen": 54.22844314575195,
+      "rewards/margins": 42.87944793701172,
+      "rewards/rejected": 11.348990440368652,
+      "step": 170
+    },
+    {
+      "epoch": 0.4785643070787637,
+      "grad_norm": 88.73342895507812,
+      "learning_rate": 9.122222222222222e-07,
+      "logits/chosen": 3.001598358154297,
+      "logits/rejected": 3.18257737159729,
+      "logps/chosen": -204.97628784179688,
+      "logps/rejected": -451.9921875,
+      "loss": 10.512740325927734,
+      "rewards/accuracies": 0.699999988079071,
+      "rewards/chosen": 49.384117126464844,
+      "rewards/margins": 53.00005340576172,
+      "rewards/rejected": -3.6159355640411377,
+      "step": 180
+    },
+    {
+      "epoch": 0.5051512130275839,
+      "grad_norm": 97.91619110107422,
+      "learning_rate": 9.01111111111111e-07,
+      "logits/chosen": 2.735273599624634,
+      "logits/rejected": 2.9921531677246094,
+      "logps/chosen": -185.69210815429688,
+      "logps/rejected": -439.780029296875,
+      "loss": 7.097893524169922,
+      "rewards/accuracies": 0.75,
+      "rewards/chosen": 51.99469757080078,
+      "rewards/margins": 52.051963806152344,
+      "rewards/rejected": -0.057262420654296875,
+      "step": 190
+    },
+    {
+      "epoch": 0.5317381189764041,
+      "grad_norm": 75.27459716796875,
+      "learning_rate": 8.9e-07,
+      "logits/chosen": 3.015864610671997,
+      "logits/rejected": 3.321819305419922,
+      "logps/chosen": -192.43789672851562,
+      "logps/rejected": -471.51055908203125,
+      "loss": 11.397718048095703,
+      "rewards/accuracies": 0.7124999761581421,
+      "rewards/chosen": 49.085548400878906,
+      "rewards/margins": 58.6345329284668,
+      "rewards/rejected": -9.54898452758789,
+      "step": 200
+    }
+  ],
+  "logging_steps": 10,
+  "max_steps": 1000,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 3,
+  "save_steps": 200,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:07d1084fbcea73eed4529408d2dd186b09d81c71318b95b1f0d3c71ddb884015
+size 6289