Model save

Browse files

Files changed (13) hide show

.gitattributes +1 -0
README.md +68 -0
adapter/README.md +209 -0
adapter/adapter_config.json +46 -0
adapter/adapter_model.safetensors +3 -0
adapter/chat_template.jinja +1 -0
adapter/special_tokens_map.json +23 -0
adapter/tokenizer.json +3 -0
adapter/tokenizer_config.json +194 -0
adapter/training_args.bin +3 -0
all_results.json +8 -0
train_results.json +8 -0
trainer_state.json +1943 -0

.gitattributes CHANGED Viewed

@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text

 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
+adapter/tokenizer.json filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,68 @@

+---
+base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+library_name: transformers
+model_name: DAPO-7B
+tags:
+- generated_from_trainer
+- trl
+- dapo
+licence: license
+---
+# Model Card for DAPO-7B
+This model is a fine-tuned version of [deepseek-ai/DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B).
+It has been trained using [TRL](https://github.com/huggingface/trl).
+## Quick start
+```python
+from transformers import pipeline
+question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="kangdawei/DAPO-7B", device="cuda")
+output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
+print(output["generated_text"])
+```
+## Training procedure
+This model was trained with DAPO, a method introduced in [DAPO: An Open-Source LLM Reinforcement Learning System at Scale](https://huggingface.co/papers/2503.14476).
+### Framework versions
+- TRL: 0.16.0.dev0
+- Transformers: 4.57.1
+- Pytorch: 2.5.1
+- Datasets: 3.2.0
+- Tokenizers: 0.22.1
+## Citations
+Cite DAPO as:
+```bibtex
+@article{yu2025dapo,
+    title        = {{DAPO: An Open-Source LLM Reinforcement Learning System at Scale}},
+    author       = {Qiying Yu and Zheng Zhang and others},
+    year         = 2025,
+    eprint       = {arXiv:2503.14476},
+}
+```
+Cite TRL as:
+```bibtex
+@misc{vonwerra2022trl,
+	title        = {{TRL: Transformer Reinforcement Learning}},
+	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
+	year         = 2020,
+	journal      = {GitHub repository},
+	publisher    = {GitHub},
+	howpublished = {\url{https://github.com/huggingface/trl}}
+}
+```

adapter/README.md ADDED Viewed

	@@ -0,0 +1,209 @@

+---
+base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
+- dapo
+- lora
+- transformers
+- trl
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

adapter/adapter_config.json ADDED Viewed

	@@ -0,0 +1,46 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 128,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 64,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "q_proj",
+    "gate_proj",
+    "v_proj",
+    "k_proj",
+    "o_proj",
+    "up_proj",
+    "down_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7f41cef2bb66b9353a9df6cc7f16045d9af89f4627456717e3247d6b15a98a0f
+size 323014560

adapter/chat_template.jinja ADDED Viewed

	@@ -0,0 +1 @@

+ {% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='') %}{%- for message in messages %}{%- if message['role'] == 'system' %}{% set ns.system_prompt = message['content'] %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<｜User｜>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<｜Assistant｜><｜tool▁calls▁begin｜><｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{%- set ns.is_first = true -%}{%- else %}{{'\n' + '<｜tool▁call▁begin｜>' + tool['type'] + '<｜tool▁sep｜>' + tool['function']['name'] + '\n' + '```json' + '\n' + tool['function']['arguments'] + '\n' + '```' + '<｜tool▁call▁end｜>'}}{{'<｜tool▁calls▁end｜><｜end▁of▁sentence｜>'}}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<｜tool▁outputs▁end｜>' + message['content'] + '<｜end▁of▁sentence｜>'}}{%- set ns.is_tool = false -%}{%- else %}{% set content = message['content'] %}{% if '</think>' in content %}{% set content = content.split('</think>')[-1] %}{% endif %}{{'<｜Assistant｜>' + content + '<｜end▁of▁sentence｜>'}}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<｜tool▁outputs▁begin｜><｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- set ns.is_output_first = false %}{%- else %}{{'\n<｜tool▁output▁begin｜>' + message['content'] + '<｜tool▁output▁end｜>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<｜tool▁outputs▁end｜>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<｜Assistant｜><think>\n'}}{% endif %}

adapter/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "bos_token": {
+    "content": "<｜begin▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<｜end▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<｜end▁of▁sentence｜>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

adapter/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a4256422650d141f228fe954acee98679da412984c29a569877eefd3af69315a
+size 11422959

adapter/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,194 @@

+{
+  "add_bos_token": true,
+  "add_eos_token": false,
+  "add_prefix_space": null,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<｜end▁of▁sentence｜>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<｜User｜>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151645": {
+      "content": "<｜Assistant｜>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151646": {
+      "content": "<｜begin▁of▁sentence｜>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|EOT|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151648": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151649": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "bos_token": "<｜begin▁of▁sentence｜>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<｜end▁of▁sentence｜>",
+  "extra_special_tokens": {},
+  "legacy": true,
+  "model_max_length": 16384,
+  "pad_token": "<｜end▁of▁sentence｜>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "LlamaTokenizerFast",
+  "unk_token": null,
+  "use_default_system_prompt": false
+}

adapter/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:92df2953de292a8a4d447867c90e350e8357338da5214d2a17070cb10ce845a7
+size 8760

all_results.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+    "total_flos": 0.0,
+    "train_loss": 0.03280465058982372,
+    "train_runtime": 132752.8895,
+    "train_samples": 7000,
+    "train_samples_per_second": 0.036,
+    "train_steps_per_second": 0.001
+}

train_results.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+    "total_flos": 0.0,
+    "train_loss": 0.03280465058982372,
+    "train_runtime": 132752.8895,
+    "train_samples": 7000,
+    "train_samples_per_second": 0.036,
+    "train_steps_per_second": 0.001
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,1943 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.11428571428571428,
+  "eval_steps": 500,
+  "global_step": 100,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1681.8854370117188,
+      "dapo/avg_reward_std": 0.3420590679896505,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.48245614610220255,
+      "dapo/num_sampling_attempts": 2.375,
+      "dapo/sampling_efficiency": 54.58333333333333,
+      "dapo/total_prompts_processed": 14.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.001142857142857143,
+      "grad_norm": 0.011931957677006721,
+      "kl": 0.0,
+      "learning_rate": 0.0,
+      "loss": 0.0219,
+      "reward": 0.8671084493398666,
+      "reward_std": 0.964848667383194,
+      "step": 1
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2172.913185119629,
+      "dapo/avg_reward_std": 0.27327019289920207,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4824561500235608,
+      "dapo/num_sampling_attempts": 2.375,
+      "dapo/sampling_efficiency": 67.41071428571428,
+      "dapo/total_prompts_processed": 14.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.002285714285714286,
+      "grad_norm": 0.014162006787955761,
+      "kl": 0.0,
+      "learning_rate": 1e-07,
+      "loss": 0.0232,
+      "reward": 0.932205643504858,
+      "reward_std": 0.9607091471552849,
+      "step": 2
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2418.3611373901367,
+      "dapo/avg_reward_std": 0.3202404692769051,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.45833334177732465,
+      "dapo/num_sampling_attempts": 2.5,
+      "dapo/sampling_efficiency": 51.04166666666666,
+      "dapo/total_prompts_processed": 15.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.0034285714285714284,
+      "grad_norm": 0.011303936131298542,
+      "kl": 0.0001301020383834839,
+      "learning_rate": 2e-07,
+      "loss": 0.0371,
+      "reward": 0.5818949677050114,
+      "reward_std": 0.928392305970192,
+      "step": 3
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2080.6250228881836,
+      "dapo/avg_reward_std": 0.3523675338788466,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4545454586094076,
+      "dapo/num_sampling_attempts": 2.75,
+      "dapo/sampling_efficiency": 45.20833333333333,
+      "dapo/total_prompts_processed": 16.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.004571428571428572,
+      "grad_norm": 0.010935964062809944,
+      "kl": 8.246302604675293e-05,
+      "learning_rate": 3e-07,
+      "loss": 0.007,
+      "reward": 0.6902085058391094,
+      "reward_std": 0.9576746746897697,
+      "step": 4
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2208.1910247802734,
+      "dapo/avg_reward_std": 0.33842799224351583,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4912280746196446,
+      "dapo/num_sampling_attempts": 2.375,
+      "dapo/sampling_efficiency": 54.166666666666664,
+      "dapo/total_prompts_processed": 14.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.005714285714285714,
+      "grad_norm": 0.01424587145447731,
+      "kl": 0.00011987239122390747,
+      "learning_rate": 4e-07,
+      "loss": 0.0916,
+      "reward": 0.5482002776116133,
+      "reward_std": 0.9192102774977684,
+      "step": 5
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2428.8646087646484,
+      "dapo/avg_reward_std": 0.2724780907233556,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.37222223381201425,
+      "dapo/num_sampling_attempts": 3.75,
+      "dapo/sampling_efficiency": 37.39583333333333,
+      "dapo/total_prompts_processed": 22.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.006857142857142857,
+      "grad_norm": 0.012209060601890087,
+      "kl": 0.00013336539268493652,
+      "learning_rate": 5e-07,
+      "loss": 0.063,
+      "reward": 0.6304261162877083,
+      "reward_std": 0.947055421769619,
+      "step": 6
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2028.1111297607422,
+      "dapo/avg_reward_std": 0.35396890342235565,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.5151515284722502,
+      "dapo/num_sampling_attempts": 2.75,
+      "dapo/sampling_efficiency": 48.95833333333333,
+      "dapo/total_prompts_processed": 16.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.008,
+      "grad_norm": 0.01456605363637209,
+      "kl": 0.00010842084884643555,
+      "learning_rate": 6e-07,
+      "loss": 0.0863,
+      "reward": 0.7125897314399481,
+      "reward_std": 0.938522607088089,
+      "step": 7
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1825.9792022705078,
+      "dapo/avg_reward_std": 0.3198123288154602,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.45333334505558015,
+      "dapo/num_sampling_attempts": 3.125,
+      "dapo/sampling_efficiency": 36.45833333333333,
+      "dapo/total_prompts_processed": 18.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.009142857142857144,
+      "grad_norm": 0.014117815531790257,
+      "kl": 8.45193862915039e-05,
+      "learning_rate": 7e-07,
+      "loss": 0.024,
+      "reward": 0.7728112610056996,
+      "reward_std": 0.953309640288353,
+      "step": 8
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2424.159713745117,
+      "dapo/avg_reward_std": 0.4454919546842575,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.6785714392151151,
+      "dapo/num_sampling_attempts": 1.75,
+      "dapo/sampling_efficiency": 70.83333333333333,
+      "dapo/total_prompts_processed": 10.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.010285714285714285,
+      "grad_norm": 0.008895393460988998,
+      "kl": 0.00011056661605834961,
+      "learning_rate": 8e-07,
+      "loss": 0.013,
+      "reward": 0.6077092736959457,
+      "reward_std": 0.994397833943367,
+      "step": 9
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1959.0763702392578,
+      "dapo/avg_reward_std": 0.25889470875263215,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.350000007947286,
+      "dapo/num_sampling_attempts": 3.75,
+      "dapo/sampling_efficiency": 40.20833333333333,
+      "dapo/total_prompts_processed": 22.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.011428571428571429,
+      "grad_norm": 0.011032010428607464,
+      "kl": 8.809566497802734e-05,
+      "learning_rate": 9e-07,
+      "loss": 0.018,
+      "reward": 0.7773313578218222,
+      "reward_std": 0.9549762830138206,
+      "step": 10
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2597.6979217529297,
+      "dapo/avg_reward_std": 0.3167818512605584,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.44202899284984754,
+      "dapo/num_sampling_attempts": 2.875,
+      "dapo/sampling_efficiency": 42.70833333333333,
+      "dapo/total_prompts_processed": 17.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.012571428571428572,
+      "grad_norm": 0.010659257881343365,
+      "kl": 0.00013309717178344727,
+      "learning_rate": 1e-06,
+      "loss": 0.0026,
+      "reward": 0.5649524200707674,
+      "reward_std": 0.9257139712572098,
+      "step": 11
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2214.9444580078125,
+      "dapo/avg_reward_std": 0.33351172175672317,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.5648148208856583,
+      "dapo/num_sampling_attempts": 2.25,
+      "dapo/sampling_efficiency": 49.99999999999999,
+      "dapo/total_prompts_processed": 13.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.013714285714285714,
+      "grad_norm": 0.010501649230718613,
+      "kl": 9.53376293182373e-05,
+      "learning_rate": 9.997258721585931e-07,
+      "loss": 0.0287,
+      "reward": 0.7854772098362446,
+      "reward_std": 0.9361946359276772,
+      "step": 12
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1984.5416717529297,
+      "dapo/avg_reward_std": 0.3313978049490187,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.5925926052861743,
+      "dapo/num_sampling_attempts": 2.25,
+      "dapo/sampling_efficiency": 56.666666666666664,
+      "dapo/total_prompts_processed": 13.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.014857142857142857,
+      "grad_norm": 0.012102734297513962,
+      "kl": 9.861588478088379e-05,
+      "learning_rate": 9.989038226169207e-07,
+      "loss": 0.0277,
+      "reward": 0.9007548745721579,
+      "reward_std": 0.9196444824337959,
+      "step": 13
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2267.5069885253906,
+      "dapo/avg_reward_std": 0.21889745750847986,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.3186274560935357,
+      "dapo/num_sampling_attempts": 4.25,
+      "dapo/sampling_efficiency": 40.63988095238095,
+      "dapo/total_prompts_processed": 25.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.016,
+      "grad_norm": 0.01004031766206026,
+      "kl": 0.00010375678539276123,
+      "learning_rate": 9.975348529157229e-07,
+      "loss": 0.0342,
+      "reward": 0.5439228732138872,
+      "reward_std": 0.9444419518113136,
+      "step": 14
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2403.170135498047,
+      "dapo/avg_reward_std": 0.24896668710491873,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4242424321445552,
+      "dapo/num_sampling_attempts": 2.75,
+      "dapo/sampling_efficiency": 58.45238095238095,
+      "dapo/total_prompts_processed": 16.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.017142857142857144,
+      "grad_norm": 0.013138854876160622,
+      "kl": 0.00011286139488220215,
+      "learning_rate": 9.956206309337066e-07,
+      "loss": 0.0341,
+      "reward": 0.6446905825287104,
+      "reward_std": 0.9305006489157677,
+      "step": 15
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2368.579849243164,
+      "dapo/avg_reward_std": 0.32238917201757433,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4416666716337204,
+      "dapo/num_sampling_attempts": 2.5,
+      "dapo/sampling_efficiency": 53.125,
+      "dapo/total_prompts_processed": 15.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.018285714285714287,
+      "grad_norm": 0.009644324891269207,
+      "kl": 0.00011764466762542725,
+      "learning_rate": 9.931634888554935e-07,
+      "loss": 0.0184,
+      "reward": 0.6319684982299805,
+      "reward_std": 0.9385868087410927,
+      "step": 16
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2354.590286254883,
+      "dapo/avg_reward_std": 0.2929895012466996,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.41358025482407323,
+      "dapo/num_sampling_attempts": 3.375,
+      "dapo/sampling_efficiency": 43.95833333333333,
+      "dapo/total_prompts_processed": 20.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.019428571428571427,
+      "grad_norm": 0.010750290006399155,
+      "kl": 0.00012104213237762451,
+      "learning_rate": 9.901664203302124e-07,
+      "loss": 0.0512,
+      "reward": 0.7495243214070797,
+      "reward_std": 0.9604936093091965,
+      "step": 17
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2353.548599243164,
+      "dapo/avg_reward_std": 0.3144007975404913,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.46212122250686993,
+      "dapo/num_sampling_attempts": 2.75,
+      "dapo/sampling_efficiency": 52.5,
+      "dapo/total_prompts_processed": 16.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.02057142857142857,
+      "grad_norm": 0.0106205390766263,
+      "kl": 0.0001283884048461914,
+      "learning_rate": 9.866330768241983e-07,
+      "loss": 0.0356,
+      "reward": 0.7090531028807163,
+      "reward_std": 0.927816279232502,
+      "step": 18
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2599.90283203125,
+      "dapo/avg_reward_std": 0.31102153037985164,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.46527778667708236,
+      "dapo/num_sampling_attempts": 3.0,
+      "dapo/sampling_efficiency": 43.125,
+      "dapo/total_prompts_processed": 18.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.021714285714285714,
+      "grad_norm": 0.00998625811189413,
+      "kl": 0.00011986494064331055,
+      "learning_rate": 9.825677631722435e-07,
+      "loss": 0.0501,
+      "reward": 0.8357332646846771,
+      "reward_std": 0.9608008861541748,
+      "step": 19
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2307.482650756836,
+      "dapo/avg_reward_std": 0.3105274804613807,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4545454633506862,
+      "dapo/num_sampling_attempts": 2.75,
+      "dapo/sampling_efficiency": 45.83333333333333,
+      "dapo/total_prompts_processed": 16.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.022857142857142857,
+      "grad_norm": 0.010738078504800797,
+      "kl": 9.399652481079102e-05,
+      "learning_rate": 9.779754323328192e-07,
+      "loss": 0.0104,
+      "reward": 0.7927055042237043,
+      "reward_std": 0.9697678238153458,
+      "step": 20
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1943.2500457763672,
+      "dapo/avg_reward_std": 0.3021106570959091,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.384615390919722,
+      "dapo/num_sampling_attempts": 3.25,
+      "dapo/sampling_efficiency": 41.78571428571428,
+      "dapo/total_prompts_processed": 19.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.024,
+      "grad_norm": 0.01025764923542738,
+      "kl": 6.92903995513916e-05,
+      "learning_rate": 9.728616793536587e-07,
+      "loss": 0.0005,
+      "reward": 0.7050843685865402,
+      "reward_std": 0.9542289972305298,
+      "step": 21
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2265.222198486328,
+      "dapo/avg_reward_std": 0.2858178478020888,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4102564144593019,
+      "dapo/num_sampling_attempts": 3.25,
+      "dapo/sampling_efficiency": 36.160714285714285,
+      "dapo/total_prompts_processed": 19.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.025142857142857144,
+      "grad_norm": 0.015554007142782211,
+      "kl": 0.00011515617370605469,
+      "learning_rate": 9.672327345550543e-07,
+      "loss": 0.1143,
+      "reward": 0.7392658032476902,
+      "reward_std": 0.9592578783631325,
+      "step": 22
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2213.857635498047,
+      "dapo/avg_reward_std": 0.28609917419297354,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.410714291036129,
+      "dapo/num_sampling_attempts": 3.5,
+      "dapo/sampling_efficiency": 38.66071428571428,
+      "dapo/total_prompts_processed": 21.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.026285714285714287,
+      "grad_norm": 0.00819400418549776,
+      "kl": 7.683038711547852e-05,
+      "learning_rate": 9.610954559391704e-07,
+      "loss": 0.018,
+      "reward": 0.6645980039611459,
+      "reward_std": 0.919261984527111,
+      "step": 23
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1544.9930610656738,
+      "dapo/avg_reward_std": 0.27062960465749103,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.38888889948527017,
+      "dapo/num_sampling_attempts": 3.75,
+      "dapo/sampling_efficiency": 37.20238095238095,
+      "dapo/total_prompts_processed": 22.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.027428571428571427,
+      "grad_norm": 0.013472510501742363,
+      "kl": 6.948411464691162e-05,
+      "learning_rate": 9.54457320834625e-07,
+      "loss": 0.0006,
+      "reward": 0.6155341246630996,
+      "reward_std": 0.9053066149353981,
+      "step": 24
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2005.5104598999023,
+      "dapo/avg_reward_std": 0.2877837224253293,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.38505747760164327,
+      "dapo/num_sampling_attempts": 3.625,
+      "dapo/sampling_efficiency": 38.75,
+      "dapo/total_prompts_processed": 21.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.02857142857142857,
+      "grad_norm": 0.011138558387756348,
+      "kl": 8.162856101989746e-05,
+      "learning_rate": 9.473264167865171e-07,
+      "loss": 0.0493,
+      "reward": 0.6912501659244299,
+      "reward_std": 0.9633006453514099,
+      "step": 25
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2387.5555725097656,
+      "dapo/avg_reward_std": 0.19959817528724672,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.3055555591980616,
+      "dapo/num_sampling_attempts": 3.75,
+      "dapo/sampling_efficiency": 44.49404761904761,
+      "dapo/total_prompts_processed": 22.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.029714285714285714,
+      "grad_norm": 0.011900709010660648,
+      "kl": 9.435415267944336e-05,
+      "learning_rate": 9.397114317029974e-07,
+      "loss": 0.0815,
+      "reward": 0.5562675036489964,
+      "reward_std": 0.9110650941729546,
+      "step": 26
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2044.7292137145996,
+      "dapo/avg_reward_std": 0.3619746658951044,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.6354166744276881,
+      "dapo/num_sampling_attempts": 2.0,
+      "dapo/sampling_efficiency": 69.16666666666666,
+      "dapo/total_prompts_processed": 12.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.030857142857142857,
+      "grad_norm": 0.01303341705352068,
+      "kl": 8.736550807952881e-05,
+      "learning_rate": 9.316216432703916e-07,
+      "loss": 0.0141,
+      "reward": 0.7769045419991016,
+      "reward_std": 0.9760870188474655,
+      "step": 27
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2458.9305572509766,
+      "dapo/avg_reward_std": 0.2839898039465365,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.427536239442618,
+      "dapo/num_sampling_attempts": 2.875,
+      "dapo/sampling_efficiency": 42.08333333333333,
+      "dapo/total_prompts_processed": 17.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.032,
+      "grad_norm": 0.013889433816075325,
+      "kl": 0.00014150142669677734,
+      "learning_rate": 9.230669076497687e-07,
+      "loss": 0.0479,
+      "reward": 0.5980293937027454,
+      "reward_std": 0.9796791076660156,
+      "step": 28
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2496.451416015625,
+      "dapo/avg_reward_std": 0.35542283952236176,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.5648148175742891,
+      "dapo/num_sampling_attempts": 2.25,
+      "dapo/sampling_efficiency": 67.5,
+      "dapo/total_prompts_processed": 13.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.03314285714285714,
+      "grad_norm": 0.011365516111254692,
+      "kl": 0.00010502338409423828,
+      "learning_rate": 9.140576474687263e-07,
+      "loss": 0.0278,
+      "reward": 0.6495406329631805,
+      "reward_std": 0.9649527370929718,
+      "step": 29
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1831.333351135254,
+      "dapo/avg_reward_std": 0.2628121712933416,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.41304348279600556,
+      "dapo/num_sampling_attempts": 2.875,
+      "dapo/sampling_efficiency": 60.625,
+      "dapo/total_prompts_processed": 17.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.03428571428571429,
+      "grad_norm": 0.012428080663084984,
+      "kl": 8.240342140197754e-05,
+      "learning_rate": 9.046048391230247e-07,
+      "loss": 0.0408,
+      "reward": 0.7913381233811378,
+      "reward_std": 0.9801043272018433,
+      "step": 30
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2105.7118225097656,
+      "dapo/avg_reward_std": 0.2843361473083496,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4266666781902313,
+      "dapo/num_sampling_attempts": 3.125,
+      "dapo/sampling_efficiency": 53.75,
+      "dapo/total_prompts_processed": 18.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.03542857142857143,
+      "grad_norm": 0.016210218891501427,
+      "kl": 0.0001112520694732666,
+      "learning_rate": 8.9471999940354e-07,
+      "loss": 0.1052,
+      "reward": 0.5814057979732752,
+      "reward_std": 0.9699539840221405,
+      "step": 31
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2366.718818664551,
+      "dapo/avg_reward_std": 0.2371666719173563,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.34482759648355943,
+      "dapo/num_sampling_attempts": 3.625,
+      "dapo/sampling_efficiency": 38.4375,
+      "dapo/total_prompts_processed": 21.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.036571428571428574,
+      "grad_norm": 0.01111757755279541,
+      "kl": 0.00011564791202545166,
+      "learning_rate": 8.844151714648274e-07,
+      "loss": 0.0379,
+      "reward": 0.6102676652371883,
+      "reward_std": 0.9229060783982277,
+      "step": 32
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2388.1909942626953,
+      "dapo/avg_reward_std": 0.29336222237156284,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.3118279624369837,
+      "dapo/num_sampling_attempts": 3.875,
+      "dapo/sampling_efficiency": 42.1875,
+      "dapo/total_prompts_processed": 23.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.037714285714285714,
+      "grad_norm": 0.01051933504641056,
+      "kl": 9.141862392425537e-05,
+      "learning_rate": 8.737029101523929e-07,
+      "loss": 0.041,
+      "reward": 0.6971308812499046,
+      "reward_std": 0.9577681049704552,
+      "step": 33
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2259.065963745117,
+      "dapo/avg_reward_std": 0.3195795826613903,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.5833333367481828,
+      "dapo/num_sampling_attempts": 2.0,
+      "dapo/sampling_efficiency": 62.49999999999999,
+      "dapo/total_prompts_processed": 12.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.038857142857142854,
+      "grad_norm": 0.010114133358001709,
+      "kl": 9.936094284057617e-05,
+      "learning_rate": 8.625962667065487e-07,
+      "loss": 0.0019,
+      "reward": 0.706351961940527,
+      "reward_std": 0.9608398601412773,
+      "step": 34
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2236.6563262939453,
+      "dapo/avg_reward_std": 0.2805841226002266,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.33908046679250126,
+      "dapo/num_sampling_attempts": 3.625,
+      "dapo/sampling_efficiency": 30.952380952380942,
+      "dapo/total_prompts_processed": 21.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.04,
+      "grad_norm": 0.01071652490645647,
+      "kl": 0.00013333559036254883,
+      "learning_rate": 8.511087728614862e-07,
+      "loss": 0.0108,
+      "reward": 0.6857370678335428,
+      "reward_std": 0.9366307482123375,
+      "step": 35
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1998.9166717529297,
+      "dapo/avg_reward_std": 0.30676539919593115,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4772727360779589,
+      "dapo/num_sampling_attempts": 2.75,
+      "dapo/sampling_efficiency": 44.791666666666664,
+      "dapo/total_prompts_processed": 16.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.04114285714285714,
+      "grad_norm": 0.011716869659721851,
+      "kl": 0.00010579824447631836,
+      "learning_rate": 8.392544243589427e-07,
+      "loss": 0.0577,
+      "reward": 0.8430320359766483,
+      "reward_std": 0.8613111302256584,
+      "step": 36
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2699.8819580078125,
+      "dapo/avg_reward_std": 0.280869146873211,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.36781610034663104,
+      "dapo/num_sampling_attempts": 3.625,
+      "dapo/sampling_efficiency": 36.45833333333333,
+      "dapo/total_prompts_processed": 21.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.04228571428571429,
+      "grad_norm": 0.011984186246991158,
+      "kl": 0.00011450052261352539,
+      "learning_rate": 8.270476638965461e-07,
+      "loss": 0.0641,
+      "reward": 0.6952194459736347,
+      "reward_std": 0.9531055390834808,
+      "step": 37
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2508.343765258789,
+      "dapo/avg_reward_std": 0.3086147890204475,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.44444445485160466,
+      "dapo/num_sampling_attempts": 2.625,
+      "dapo/sampling_efficiency": 51.45833333333333,
+      "dapo/total_prompts_processed": 15.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.04342857142857143,
+      "grad_norm": 0.014813189394772053,
+      "kl": 0.00013363361358642578,
+      "learning_rate": 8.145033635316128e-07,
+      "loss": 0.0815,
+      "reward": 0.6981049925088882,
+      "reward_std": 0.9795023873448372,
+      "step": 38
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2568.090286254883,
+      "dapo/avg_reward_std": 0.2281228665149573,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.30303030799735675,
+      "dapo/num_sampling_attempts": 4.125,
+      "dapo/sampling_efficiency": 35.3125,
+      "dapo/total_prompts_processed": 24.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.044571428571428574,
+      "grad_norm": 0.010284055955708027,
+      "kl": 0.0001270771026611328,
+      "learning_rate": 8.01636806561836e-07,
+      "loss": 0.0129,
+      "reward": 0.5480891708284616,
+      "reward_std": 0.9542658925056458,
+      "step": 39
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2255.0798721313477,
+      "dapo/avg_reward_std": 0.3315709355202588,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.46969698437235574,
+      "dapo/num_sampling_attempts": 2.75,
+      "dapo/sampling_efficiency": 45.20833333333333,
+      "dapo/total_prompts_processed": 16.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.045714285714285714,
+      "grad_norm": 0.01235182024538517,
+      "kl": 0.00011420249938964844,
+      "learning_rate": 7.884636689049422e-07,
+      "loss": 0.0472,
+      "reward": 0.8707308620214462,
+      "reward_std": 0.9157829731702805,
+      "step": 40
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2417.9444427490234,
+      "dapo/avg_reward_std": 0.2831250044607347,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.3655914020153784,
+      "dapo/num_sampling_attempts": 3.875,
+      "dapo/sampling_efficiency": 37.723214285714285,
+      "dapo/total_prompts_processed": 23.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.046857142857142854,
+      "grad_norm": 0.010439831763505936,
+      "kl": 0.00012230873107910156,
+      "learning_rate": 7.75e-07,
+      "loss": 0.0395,
+      "reward": 0.7518008537590504,
+      "reward_std": 0.9689745083451271,
+      "step": 41
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2325.5937881469727,
+      "dapo/avg_reward_std": 0.28424168271677835,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.3869047707745007,
+      "dapo/num_sampling_attempts": 3.5,
+      "dapo/sampling_efficiency": 33.75,
+      "dapo/total_prompts_processed": 21.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.048,
+      "grad_norm": 0.010445328429341316,
+      "kl": 8.326023817062378e-05,
+      "learning_rate": 7.612622032536507e-07,
+      "loss": 0.0004,
+      "reward": 0.6408937154337764,
+      "reward_std": 0.9007892906665802,
+      "step": 42
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2423.9617919921875,
+      "dapo/avg_reward_std": 0.28680659715945905,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4038461624429776,
+      "dapo/num_sampling_attempts": 3.25,
+      "dapo/sampling_efficiency": 46.041666666666664,
+      "dapo/total_prompts_processed": 19.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.04914285714285714,
+      "grad_norm": 0.010229532606899738,
+      "kl": 0.00013530254364013672,
+      "learning_rate": 7.472670160550848e-07,
+      "loss": 0.0104,
+      "reward": 0.6538480781018734,
+      "reward_std": 0.9688718169927597,
+      "step": 43
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2088.677085876465,
+      "dapo/avg_reward_std": 0.3208466252455345,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4423077031970024,
+      "dapo/num_sampling_attempts": 3.25,
+      "dapo/sampling_efficiency": 41.041666666666664,
+      "dapo/total_prompts_processed": 19.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.05028571428571429,
+      "grad_norm": 0.011106742545962334,
+      "kl": 0.00012566149234771729,
+      "learning_rate": 7.330314893841101e-07,
+      "loss": 0.0239,
+      "reward": 0.8764502704143524,
+      "reward_std": 0.9285347983241081,
+      "step": 44
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1721.781234741211,
+      "dapo/avg_reward_std": 0.3683280497789383,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.5083333447575569,
+      "dapo/num_sampling_attempts": 2.5,
+      "dapo/sampling_efficiency": 47.916666666666664,
+      "dapo/total_prompts_processed": 15.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.05142857142857143,
+      "grad_norm": 0.01152133010327816,
+      "kl": 7.429718971252441e-05,
+      "learning_rate": 7.185729670371604e-07,
+      "loss": 0.0259,
+      "reward": 0.8203496672213078,
+      "reward_std": 0.9882074818015099,
+      "step": 45
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 3020.9757232666016,
+      "dapo/avg_reward_std": 0.294668085873127,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.37500000691839624,
+      "dapo/num_sampling_attempts": 3.5,
+      "dapo/sampling_efficiency": 38.660714285714285,
+      "dapo/total_prompts_processed": 21.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.052571428571428575,
+      "grad_norm": 0.009526599198579788,
+      "kl": 0.00014853477478027344,
+      "learning_rate": 7.039090644965509e-07,
+      "loss": 0.0314,
+      "reward": 0.6035567373037338,
+      "reward_std": 0.9617942646145821,
+      "step": 46
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2869.8958892822266,
+      "dapo/avg_reward_std": 0.37419558623257804,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.5196078481043086,
+      "dapo/num_sampling_attempts": 2.125,
+      "dapo/sampling_efficiency": 66.66666666666666,
+      "dapo/total_prompts_processed": 12.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.053714285714285714,
+      "grad_norm": 0.008854555897414684,
+      "kl": 0.00012740492820739746,
+      "learning_rate": 6.890576474687263e-07,
+      "loss": 0.0266,
+      "reward": 0.5126286232843995,
+      "reward_std": 0.9323688969016075,
+      "step": 47
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1974.5069999694824,
+      "dapo/avg_reward_std": 0.31826632221539813,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.42361111628512543,
+      "dapo/num_sampling_attempts": 3.0,
+      "dapo/sampling_efficiency": 43.541666666666664,
+      "dapo/total_prompts_processed": 18.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.054857142857142854,
+      "grad_norm": 0.012630482204258442,
+      "kl": 0.00011485815048217773,
+      "learning_rate": 6.740368101176495e-07,
+      "loss": 0.0259,
+      "reward": 0.7998449765145779,
+      "reward_std": 0.9614248275756836,
+      "step": 48
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2775.854164123535,
+      "dapo/avg_reward_std": 0.24803236694563002,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.41269841435409726,
+      "dapo/num_sampling_attempts": 2.625,
+      "dapo/sampling_efficiency": 65.97222222222223,
+      "dapo/total_prompts_processed": 15.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.056,
+      "grad_norm": 0.0115203270688653,
+      "kl": 0.00010813772678375244,
+      "learning_rate": 6.588648530198504e-07,
+      "loss": 0.0626,
+      "reward": 0.5735284592956305,
+      "reward_std": 0.9657324403524399,
+      "step": 49
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2555.2743377685547,
+      "dapo/avg_reward_std": 0.3077625359098117,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.423611119389534,
+      "dapo/num_sampling_attempts": 3.0,
+      "dapo/sampling_efficiency": 48.33333333333333,
+      "dapo/total_prompts_processed": 18.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.05714285714285714,
+      "grad_norm": 0.012258801609277725,
+      "kl": 0.00013893842697143555,
+      "learning_rate": 6.435602608679916e-07,
+      "loss": 0.0575,
+      "reward": 0.8288873583078384,
+      "reward_std": 0.950613297522068,
+      "step": 50
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2645.576400756836,
+      "dapo/avg_reward_std": 0.3462034153441588,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4236111169060071,
+      "dapo/num_sampling_attempts": 3.0,
+      "dapo/sampling_efficiency": 39.99999999999999,
+      "dapo/total_prompts_processed": 18.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.05828571428571429,
+      "grad_norm": 0.01161988079547882,
+      "kl": 0.0001646280288696289,
+      "learning_rate": 6.281416799501187e-07,
+      "loss": 0.046,
+      "reward": 0.46879277005791664,
+      "reward_std": 0.9387945607304573,
+      "step": 51
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2043.677101135254,
+      "dapo/avg_reward_std": 0.3387378570826157,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4347826171180476,
+      "dapo/num_sampling_attempts": 2.875,
+      "dapo/sampling_efficiency": 45.83333333333333,
+      "dapo/total_prompts_processed": 17.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.05942857142857143,
+      "grad_norm": 0.011719447560608387,
+      "kl": 0.00012214481830596924,
+      "learning_rate": 6.126278954320294e-07,
+      "loss": 0.0093,
+      "reward": 0.7487262971699238,
+      "reward_std": 0.9444489181041718,
+      "step": 52
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2277.902801513672,
+      "dapo/avg_reward_std": 0.269059170936716,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.37356322695469035,
+      "dapo/num_sampling_attempts": 3.625,
+      "dapo/sampling_efficiency": 41.88988095238095,
+      "dapo/total_prompts_processed": 21.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.060571428571428575,
+      "grad_norm": 0.012477328069508076,
+      "kl": 0.00015044212341308594,
+      "learning_rate": 5.97037808470444e-07,
+      "loss": 0.048,
+      "reward": 0.6608240492641926,
+      "reward_std": 0.9770755022764206,
+      "step": 53
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2374.232635498047,
+      "dapo/avg_reward_std": 0.34054997433786804,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.500000013605408,
+      "dapo/num_sampling_attempts": 2.875,
+      "dapo/sampling_efficiency": 37.916666666666664,
+      "dapo/total_prompts_processed": 17.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.061714285714285715,
+      "grad_norm": 0.013303548097610474,
+      "kl": 0.0001438036561012268,
+      "learning_rate": 5.813904131848564e-07,
+      "loss": 0.0614,
+      "reward": 0.75572844222188,
+      "reward_std": 0.9565529599785805,
+      "step": 54
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2442.232666015625,
+      "dapo/avg_reward_std": 0.27056889484326047,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4097222263614337,
+      "dapo/num_sampling_attempts": 3.0,
+      "dapo/sampling_efficiency": 45.83333333333333,
+      "dapo/total_prompts_processed": 18.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.06285714285714286,
+      "grad_norm": 0.011922283098101616,
+      "kl": 0.00014710426330566406,
+      "learning_rate": 5.657047735161255e-07,
+      "loss": 0.0447,
+      "reward": 0.6145301992073655,
+      "reward_std": 0.9308876842260361,
+      "step": 55
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2163.7604064941406,
+      "dapo/avg_reward_std": 0.306766193537485,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.47619048612458365,
+      "dapo/num_sampling_attempts": 2.625,
+      "dapo/sampling_efficiency": 57.291666666666664,
+      "dapo/total_prompts_processed": 15.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.064,
+      "grad_norm": 0.009786682203412056,
+      "kl": 0.00011900067329406738,
+      "learning_rate": 5.5e-07,
+      "loss": 0.0353,
+      "reward": 0.7467220462858677,
+      "reward_std": 0.9404179230332375,
+      "step": 56
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1992.7430953979492,
+      "dapo/avg_reward_std": 0.21240893006324768,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.2631579006188794,
+      "dapo/num_sampling_attempts": 4.75,
+      "dapo/sampling_efficiency": 27.708333333333332,
+      "dapo/total_prompts_processed": 28.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.06514285714285714,
+      "grad_norm": 0.015636112540960312,
+      "kl": 0.00013278424739837646,
+      "learning_rate": 5.342952264838747e-07,
+      "loss": 0.0652,
+      "reward": 0.5448480695486069,
+      "reward_std": 0.8946049734950066,
+      "step": 57
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1786.927101135254,
+      "dapo/avg_reward_std": 0.27395731459061307,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.479166679084301,
+      "dapo/num_sampling_attempts": 3.0,
+      "dapo/sampling_efficiency": 51.979166666666664,
+      "dapo/total_prompts_processed": 18.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.06628571428571428,
+      "grad_norm": 0.012302345596253872,
+      "kl": 0.00010266900062561035,
+      "learning_rate": 5.186095868151436e-07,
+      "loss": 0.0222,
+      "reward": 0.7567729391157627,
+      "reward_std": 0.9539604857563972,
+      "step": 58
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1871.125015258789,
+      "dapo/avg_reward_std": 0.26716366639504063,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.3461538547506699,
+      "dapo/num_sampling_attempts": 3.25,
+      "dapo/sampling_efficiency": 51.25,
+      "dapo/total_prompts_processed": 19.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.06742857142857143,
+      "grad_norm": 0.012423303909599781,
+      "kl": 0.00013174861669540405,
+      "learning_rate": 5.02962191529556e-07,
+      "loss": 0.0051,
+      "reward": 0.5472707431763411,
+      "reward_std": 0.9848242700099945,
+      "step": 59
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2110.0104446411133,
+      "dapo/avg_reward_std": 0.27772934675216676,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.3933333379030228,
+      "dapo/num_sampling_attempts": 3.125,
+      "dapo/sampling_efficiency": 55.416666666666664,
+      "dapo/total_prompts_processed": 18.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.06857142857142857,
+      "grad_norm": 0.010305487550795078,
+      "kl": 0.00013266503810882568,
+      "learning_rate": 4.873721045679706e-07,
+      "loss": -0.0051,
+      "reward": 0.5918029174208641,
+      "reward_std": 0.9419775605201721,
+      "step": 60
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1820.1597595214844,
+      "dapo/avg_reward_std": 0.2844862639904022,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.351190483463662,
+      "dapo/num_sampling_attempts": 3.5,
+      "dapo/sampling_efficiency": 39.28571428571428,
+      "dapo/total_prompts_processed": 21.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.06971428571428571,
+      "grad_norm": 0.01057644933462143,
+      "kl": 9.304285049438477e-05,
+      "learning_rate": 4.7185832004988133e-07,
+      "loss": 0.0019,
+      "reward": 0.5361353289335966,
+      "reward_std": 0.9243106096982956,
+      "step": 61
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2268.913215637207,
+      "dapo/avg_reward_std": 0.2805037432246738,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.3456790220958215,
+      "dapo/num_sampling_attempts": 3.375,
+      "dapo/sampling_efficiency": 39.791666666666664,
+      "dapo/total_prompts_processed": 20.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.07085714285714285,
+      "grad_norm": 0.010327951982617378,
+      "kl": 0.00013640522956848145,
+      "learning_rate": 4.5643973913200837e-07,
+      "loss": 0.011,
+      "reward": 0.5703515652567148,
+      "reward_std": 0.9485230222344398,
+      "step": 62
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2150.541679382324,
+      "dapo/avg_reward_std": 0.3610766388868031,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.5000000164697045,
+      "dapo/num_sampling_attempts": 2.375,
+      "dapo/sampling_efficiency": 48.95833333333333,
+      "dapo/total_prompts_processed": 14.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.072,
+      "grad_norm": 0.01420843880623579,
+      "kl": 0.00017371773719787598,
+      "learning_rate": 4.4113514698014953e-07,
+      "loss": 0.027,
+      "reward": 0.8152667284011841,
+      "reward_std": 0.9553957208991051,
+      "step": 63
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2542.954879760742,
+      "dapo/avg_reward_std": 0.25789711397627124,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4057971077120822,
+      "dapo/num_sampling_attempts": 2.875,
+      "dapo/sampling_efficiency": 55.0,
+      "dapo/total_prompts_processed": 17.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.07314285714285715,
+      "grad_norm": 0.010388275608420372,
+      "kl": 0.00016424059867858887,
+      "learning_rate": 4.2596318988235037e-07,
+      "loss": 0.0153,
+      "reward": 0.8328269198536873,
+      "reward_std": 0.946412943303585,
+      "step": 64
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2573.9132385253906,
+      "dapo/avg_reward_std": 0.27658049833206905,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4682539779515493,
+      "dapo/num_sampling_attempts": 2.625,
+      "dapo/sampling_efficiency": 67.01388888888889,
+      "dapo/total_prompts_processed": 15.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.07428571428571429,
+      "grad_norm": 0.016587890684604645,
+      "kl": 0.0002205371856689453,
+      "learning_rate": 4.1094235253127374e-07,
+      "loss": 0.071,
+      "reward": 0.8272522762417793,
+      "reward_std": 0.9939362108707428,
+      "step": 65
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2272.4132080078125,
+      "dapo/avg_reward_std": 0.28441278512279194,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.38888889737427235,
+      "dapo/num_sampling_attempts": 3.0,
+      "dapo/sampling_efficiency": 49.375,
+      "dapo/total_prompts_processed": 18.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.07542857142857143,
+      "grad_norm": 0.01080800499767065,
+      "kl": 0.00015676021575927734,
+      "learning_rate": 3.9609093550344907e-07,
+      "loss": -0.0104,
+      "reward": 0.7243790216743946,
+      "reward_std": 1.0099836066365242,
+      "step": 66
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2551.920150756836,
+      "dapo/avg_reward_std": 0.29605763202363794,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4242424314672297,
+      "dapo/num_sampling_attempts": 2.75,
+      "dapo/sampling_efficiency": 50.416666666666664,
+      "dapo/total_prompts_processed": 16.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.07657142857142857,
+      "grad_norm": 0.01253009494394064,
+      "kl": 0.0001944899559020996,
+      "learning_rate": 3.8142703296283953e-07,
+      "loss": 0.0544,
+      "reward": 0.7982187271118164,
+      "reward_std": 0.9796509444713593,
+      "step": 67
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2039.6910400390625,
+      "dapo/avg_reward_std": 0.3305485857029756,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4305555634200573,
+      "dapo/num_sampling_attempts": 3.0,
+      "dapo/sampling_efficiency": 41.041666666666664,
+      "dapo/total_prompts_processed": 18.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.07771428571428571,
+      "grad_norm": 0.013196859508752823,
+      "kl": 0.00021713972091674805,
+      "learning_rate": 3.6696851061588994e-07,
+      "loss": 0.0185,
+      "reward": 0.8682084418833256,
+      "reward_std": 0.9861341118812561,
+      "step": 68
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2549.642364501953,
+      "dapo/avg_reward_std": 0.28639274001121523,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4133333384990692,
+      "dapo/num_sampling_attempts": 3.125,
+      "dapo/sampling_efficiency": 38.95833333333333,
+      "dapo/total_prompts_processed": 18.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.07885714285714286,
+      "grad_norm": 0.010159006342291832,
+      "kl": 0.00016075372695922852,
+      "learning_rate": 3.5273298394491515e-07,
+      "loss": -0.0284,
+      "reward": 0.5912708025425673,
+      "reward_std": 0.9797485172748566,
+      "step": 69
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2719.5382232666016,
+      "dapo/avg_reward_std": 0.28611900960957565,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.351851859026485,
+      "dapo/num_sampling_attempts": 3.375,
+      "dapo/sampling_efficiency": 40.625,
+      "dapo/total_prompts_processed": 20.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.08,
+      "grad_norm": 0.011270755901932716,
+      "kl": 0.00022423267364501953,
+      "learning_rate": 3.387377967463493e-07,
+      "loss": 0.0265,
+      "reward": 0.5740308649837971,
+      "reward_std": 0.8749020621180534,
+      "step": 70
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2073.2916946411133,
+      "dapo/avg_reward_std": 0.28938476492961246,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.45833334264655906,
+      "dapo/num_sampling_attempts": 3.0,
+      "dapo/sampling_efficiency": 42.49999999999999,
+      "dapo/total_prompts_processed": 18.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.08114285714285714,
+      "grad_norm": 0.011867412365972996,
+      "kl": 0.0001347661018371582,
+      "learning_rate": 3.250000000000001e-07,
+      "loss": -0.0577,
+      "reward": 0.5955507848411798,
+      "reward_std": 0.9116542786359787,
+      "step": 71
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2239.322914123535,
+      "dapo/avg_reward_std": 0.30952110344713385,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4469697041945024,
+      "dapo/num_sampling_attempts": 2.75,
+      "dapo/sampling_efficiency": 44.166666666666664,
+      "dapo/total_prompts_processed": 16.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.08228571428571428,
+      "grad_norm": 0.011070906184613705,
+      "kl": 0.000155717134475708,
+      "learning_rate": 3.115363310950578e-07,
+      "loss": 0.0339,
+      "reward": 0.7990612685680389,
+      "reward_std": 0.9683424234390259,
+      "step": 72
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2044.489601135254,
+      "dapo/avg_reward_std": 0.21984713185917248,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.3131313206571521,
+      "dapo/num_sampling_attempts": 4.125,
+      "dapo/sampling_efficiency": 36.77083333333333,
+      "dapo/total_prompts_processed": 24.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.08342857142857144,
+      "grad_norm": 0.014109701849520206,
+      "kl": 0.0001436173915863037,
+      "learning_rate": 2.9836319343816397e-07,
+      "loss": 0.085,
+      "reward": 0.8676656074821949,
+      "reward_std": 0.9657078757882118,
+      "step": 73
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1958.7361068725586,
+      "dapo/avg_reward_std": 0.30799518460812775,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4927536339863487,
+      "dapo/num_sampling_attempts": 2.875,
+      "dapo/sampling_efficiency": 40.625,
+      "dapo/total_prompts_processed": 17.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.08457142857142858,
+      "grad_norm": 0.013041837140917778,
+      "kl": 0.0001519918441772461,
+      "learning_rate": 2.854966364683872e-07,
+      "loss": 0.0492,
+      "reward": 0.6045123310759664,
+      "reward_std": 0.9384523630142212,
+      "step": 74
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1523.1284942626953,
+      "dapo/avg_reward_std": 0.31539708146682155,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.391025647521019,
+      "dapo/num_sampling_attempts": 3.25,
+      "dapo/sampling_efficiency": 36.875,
+      "dapo/total_prompts_processed": 19.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.08571428571428572,
+      "grad_norm": 0.014472462236881256,
+      "kl": 0.0001392364501953125,
+      "learning_rate": 2.729523361034538e-07,
+      "loss": 0.0358,
+      "reward": 0.7163376174867153,
+      "reward_std": 0.9508332461118698,
+      "step": 75
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2640.7813110351562,
+      "dapo/avg_reward_std": 0.3144421911239624,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.44000000655651095,
+      "dapo/num_sampling_attempts": 3.125,
+      "dapo/sampling_efficiency": 43.75,
+      "dapo/total_prompts_processed": 18.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.08685714285714285,
+      "grad_norm": 0.011127221398055553,
+      "kl": 0.0002060532569885254,
+      "learning_rate": 2.6074557564105724e-07,
+      "loss": 0.0604,
+      "reward": 0.6046733632683754,
+      "reward_std": 0.9528723284602165,
+      "step": 76
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2088.7292098999023,
+      "dapo/avg_reward_std": 0.3257487453520298,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4444444552063942,
+      "dapo/num_sampling_attempts": 3.0,
+      "dapo/sampling_efficiency": 45.31249999999999,
+      "dapo/total_prompts_processed": 18.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.088,
+      "grad_norm": 0.013021063059568405,
+      "kl": 0.00017440319061279297,
+      "learning_rate": 2.488912271385139e-07,
+      "loss": 0.0353,
+      "reward": 0.5843205824494362,
+      "reward_std": 0.9498706609010696,
+      "step": 77
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2710.0069580078125,
+      "dapo/avg_reward_std": 0.4117408903206096,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.5784313836518455,
+      "dapo/num_sampling_attempts": 2.125,
+      "dapo/sampling_efficiency": 52.08333333333333,
+      "dapo/total_prompts_processed": 12.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.08914285714285715,
+      "grad_norm": 0.00956858042627573,
+      "kl": 0.00020110607147216797,
+      "learning_rate": 2.374037332934512e-07,
+      "loss": -0.0019,
+      "reward": 0.7558267749845982,
+      "reward_std": 0.9872319549322128,
+      "step": 78
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2532.888916015625,
+      "dapo/avg_reward_std": 0.29725510747201983,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.34408603031789103,
+      "dapo/num_sampling_attempts": 3.875,
+      "dapo/sampling_efficiency": 31.696428571428562,
+      "dapo/total_prompts_processed": 23.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.09028571428571429,
+      "grad_norm": 0.010455719195306301,
+      "kl": 0.00019878149032592773,
+      "learning_rate": 2.2629708984760706e-07,
+      "loss": 0.0433,
+      "reward": 0.7071553282439709,
+      "reward_std": 0.936428040266037,
+      "step": 79
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2045.3507232666016,
+      "dapo/avg_reward_std": 0.24797727167606354,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4318181872367859,
+      "dapo/num_sampling_attempts": 2.75,
+      "dapo/sampling_efficiency": 56.24999999999999,
+      "dapo/total_prompts_processed": 16.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.09142857142857143,
+      "grad_norm": 0.011657273396849632,
+      "kl": 0.00015923380851745605,
+      "learning_rate": 2.1558482853517253e-07,
+      "loss": 0.0016,
+      "reward": 0.8354307417757809,
+      "reward_std": 0.9478549808263779,
+      "step": 80
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2517.621482849121,
+      "dapo/avg_reward_std": 0.3837103931342854,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.5882353055126527,
+      "dapo/num_sampling_attempts": 2.125,
+      "dapo/sampling_efficiency": 56.24999999999999,
+      "dapo/total_prompts_processed": 12.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.09257142857142857,
+      "grad_norm": 0.011230596341192722,
+      "kl": 0.00020751357078552246,
+      "learning_rate": 2.0528000059645995e-07,
+      "loss": 0.0523,
+      "reward": 0.6180859599262476,
+      "reward_std": 0.9601781144738197,
+      "step": 81
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2189.6805725097656,
+      "dapo/avg_reward_std": 0.33485331758856773,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.47916667970518273,
+      "dapo/num_sampling_attempts": 3.0,
+      "dapo/sampling_efficiency": 42.70833333333333,
+      "dapo/total_prompts_processed": 18.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.09371428571428571,
+      "grad_norm": 0.01085925567895174,
+      "kl": 0.00018781423568725586,
+      "learning_rate": 1.9539516087697517e-07,
+      "loss": 0.0277,
+      "reward": 0.7506253309547901,
+      "reward_std": 0.9654112830758095,
+      "step": 82
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2063.197952270508,
+      "dapo/avg_reward_std": 0.3108914480322883,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.45238096444379716,
+      "dapo/num_sampling_attempts": 2.625,
+      "dapo/sampling_efficiency": 47.291666666666664,
+      "dapo/total_prompts_processed": 15.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.09485714285714286,
+      "grad_norm": 0.01137411966919899,
+      "kl": 0.00018197298049926758,
+      "learning_rate": 1.8594235253127372e-07,
+      "loss": 0.0165,
+      "reward": 0.6088770590722561,
+      "reward_std": 0.9752795398235321,
+      "step": 83
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2032.7708587646484,
+      "dapo/avg_reward_std": 0.35138528971444993,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.5000000070957911,
+      "dapo/num_sampling_attempts": 2.625,
+      "dapo/sampling_efficiency": 40.62499999999999,
+      "dapo/total_prompts_processed": 15.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.096,
+      "grad_norm": 0.009788557887077332,
+      "kl": 0.0001645982265472412,
+      "learning_rate": 1.7693309235023127e-07,
+      "loss": -0.0005,
+      "reward": 0.6485470458865166,
+      "reward_std": 0.8980466201901436,
+      "step": 84
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2723.2083892822266,
+      "dapo/avg_reward_std": 0.35491983592510223,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.5500000104308128,
+      "dapo/num_sampling_attempts": 2.5,
+      "dapo/sampling_efficiency": 46.87499999999999,
+      "dapo/total_prompts_processed": 15.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.09714285714285714,
+      "grad_norm": 0.012261813506484032,
+      "kl": 0.0002092123031616211,
+      "learning_rate": 1.6837835672960831e-07,
+      "loss": 0.0428,
+      "reward": 0.769347533583641,
+      "reward_std": 0.9622702524065971,
+      "step": 85
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2813.6979370117188,
+      "dapo/avg_reward_std": 0.31041908973739263,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.46825397582281203,
+      "dapo/num_sampling_attempts": 2.625,
+      "dapo/sampling_efficiency": 53.125,
+      "dapo/total_prompts_processed": 15.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.09828571428571428,
+      "grad_norm": 0.013307915069162846,
+      "kl": 0.00022363662719726562,
+      "learning_rate": 1.6028856829700258e-07,
+      "loss": 0.0893,
+      "reward": 0.7634551003575325,
+      "reward_std": 0.9385863840579987,
+      "step": 86
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2645.5486907958984,
+      "dapo/avg_reward_std": 0.29486309762658747,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.37931035099358396,
+      "dapo/num_sampling_attempts": 3.625,
+      "dapo/sampling_efficiency": 38.95833333333333,
+      "dapo/total_prompts_processed": 21.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.09942857142857142,
+      "grad_norm": 0.009606744162738323,
+      "kl": 0.00017684698104858398,
+      "learning_rate": 1.5267358321348285e-07,
+      "loss": 0.0337,
+      "reward": 0.6225443221628666,
+      "reward_std": 0.9135682806372643,
+      "step": 87
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2211.1111221313477,
+      "dapo/avg_reward_std": 0.2131810395254029,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.30092593075500595,
+      "dapo/num_sampling_attempts": 4.5,
+      "dapo/sampling_efficiency": 25.535714285714285,
+      "dapo/total_prompts_processed": 27.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.10057142857142858,
+      "grad_norm": 0.011731365695595741,
+      "kl": 0.00017218291759490967,
+      "learning_rate": 1.4554267916537495e-07,
+      "loss": 0.0114,
+      "reward": 0.574246758595109,
+      "reward_std": 0.9149169996380806,
+      "step": 88
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2617.9445037841797,
+      "dapo/avg_reward_std": 0.34073091808118317,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.5087719379287017,
+      "dapo/num_sampling_attempts": 2.375,
+      "dapo/sampling_efficiency": 47.91666666666666,
+      "dapo/total_prompts_processed": 14.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.10171428571428572,
+      "grad_norm": 0.013213962316513062,
+      "kl": 0.0002383589744567871,
+      "learning_rate": 1.3890454406082956e-07,
+      "loss": 0.072,
+      "reward": 0.7886459194123745,
+      "reward_std": 0.9416129812598228,
+      "step": 89
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2265.7743225097656,
+      "dapo/avg_reward_std": 0.39400896430015564,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.48412699145930155,
+      "dapo/num_sampling_attempts": 2.625,
+      "dapo/sampling_efficiency": 46.24999999999999,
+      "dapo/total_prompts_processed": 15.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.10285714285714286,
+      "grad_norm": 0.011279975064098835,
+      "kl": 0.00017967820167541504,
+      "learning_rate": 1.3276726544494571e-07,
+      "loss": 0.0115,
+      "reward": 0.8188270814716816,
+      "reward_std": 0.956598699092865,
+      "step": 90
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1751.7951850891113,
+      "dapo/avg_reward_std": 0.346651555462317,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.44696970690380444,
+      "dapo/num_sampling_attempts": 2.75,
+      "dapo/sampling_efficiency": 46.875,
+      "dapo/total_prompts_processed": 16.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.104,
+      "grad_norm": 0.013495221734046936,
+      "kl": 0.00012958049774169922,
+      "learning_rate": 1.2713832064634125e-07,
+      "loss": 0.0244,
+      "reward": 0.7544833142310381,
+      "reward_std": 0.920841209590435,
+      "step": 91
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2176.5868530273438,
+      "dapo/avg_reward_std": 0.31276301860809325,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.3866666704416275,
+      "dapo/num_sampling_attempts": 3.125,
+      "dapo/sampling_efficiency": 42.410714285714285,
+      "dapo/total_prompts_processed": 18.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.10514285714285715,
+      "grad_norm": 0.014705290086567402,
+      "kl": 0.00018972158432006836,
+      "learning_rate": 1.220245676671809e-07,
+      "loss": 0.082,
+      "reward": 0.6609778106212616,
+      "reward_std": 0.9741540849208832,
+      "step": 92
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2418.0035095214844,
+      "dapo/avg_reward_std": 0.3533540232615037,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.45454546131870965,
+      "dapo/num_sampling_attempts": 2.75,
+      "dapo/sampling_efficiency": 50.416666666666664,
+      "dapo/total_prompts_processed": 16.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.10628571428571429,
+      "grad_norm": 0.014526835642755032,
+      "kl": 0.00022083520889282227,
+      "learning_rate": 1.1743223682775649e-07,
+      "loss": 0.0467,
+      "reward": 0.6240662466734648,
+      "reward_std": 0.9587830454111099,
+      "step": 93
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1759.4409713745117,
+      "dapo/avg_reward_std": 0.31654878084858257,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.4166666741172473,
+      "dapo/num_sampling_attempts": 3.0,
+      "dapo/sampling_efficiency": 47.70833333333333,
+      "dapo/total_prompts_processed": 18.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.10742857142857143,
+      "grad_norm": 0.011724472045898438,
+      "kl": 0.00012111663818359375,
+      "learning_rate": 1.1336692317580158e-07,
+      "loss": -0.0008,
+      "reward": 0.8961930721998215,
+      "reward_std": 0.9275476858019829,
+      "step": 94
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1968.3958435058594,
+      "dapo/avg_reward_std": 0.31933523178100587,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.36000000715255737,
+      "dapo/num_sampling_attempts": 3.125,
+      "dapo/sampling_efficiency": 41.666666666666664,
+      "dapo/total_prompts_processed": 18.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.10857142857142857,
+      "grad_norm": 0.012760731391608715,
+      "kl": 0.00015205144882202148,
+      "learning_rate": 1.0983357966978745e-07,
+      "loss": 0.0303,
+      "reward": 0.7966429069638252,
+      "reward_std": 0.9104023575782776,
+      "step": 95
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 1705.9930610656738,
+      "dapo/avg_reward_std": 0.26930796217035363,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.38888889771920665,
+      "dapo/num_sampling_attempts": 3.375,
+      "dapo/sampling_efficiency": 48.4375,
+      "dapo/total_prompts_processed": 20.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.10971428571428571,
+      "grad_norm": 0.016185246407985687,
+      "kl": 0.00014796853065490723,
+      "learning_rate": 1.068365111445064e-07,
+      "loss": -0.0016,
+      "reward": 0.7683778572827578,
+      "reward_std": 0.9466121271252632,
+      "step": 96
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2056.079864501953,
+      "dapo/avg_reward_std": 0.3310448744080283,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.48484849387949164,
+      "dapo/num_sampling_attempts": 2.75,
+      "dapo/sampling_efficiency": 51.785714285714285,
+      "dapo/total_prompts_processed": 16.5,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.11085714285714286,
+      "grad_norm": 0.010300490073859692,
+      "kl": 0.00016963481903076172,
+      "learning_rate": 1.0437936906629334e-07,
+      "loss": 0.0027,
+      "reward": 0.7596820928156376,
+      "reward_std": 0.9540099799633026,
+      "step": 97
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2592.8403244018555,
+      "dapo/avg_reward_std": 0.21406691299902425,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.3018018116016646,
+      "dapo/num_sampling_attempts": 4.625,
+      "dapo/sampling_efficiency": 30.376984126984123,
+      "dapo/total_prompts_processed": 27.75,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.112,
+      "grad_norm": 0.01034973282366991,
+      "kl": 0.000193670392036438,
+      "learning_rate": 1.0246514708427701e-07,
+      "loss": 0.0254,
+      "reward": 0.7206093966960907,
+      "reward_std": 0.9074158370494843,
+      "step": 98
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2686.343780517578,
+      "dapo/avg_reward_std": 0.24782394810959144,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.35802469926851765,
+      "dapo/num_sampling_attempts": 3.375,
+      "dapo/sampling_efficiency": 44.513888888888886,
+      "dapo/total_prompts_processed": 20.25,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.11314285714285714,
+      "grad_norm": 0.011502859182655811,
+      "kl": 0.00023734569549560547,
+      "learning_rate": 1.0109617738307911e-07,
+      "loss": 0.0346,
+      "reward": 0.6300379456952214,
+      "reward_std": 0.9057611152529716,
+      "step": 99
+    },
+    {
+      "clip_fraction": 0.0,
+      "completion_length": 2050.166664123535,
+      "dapo/avg_reward_std": 0.3082110931475957,
+      "dapo/filter_reward_index": 0.0,
+      "dapo/kept_prompts_ratio": 0.43055556404093903,
+      "dapo/num_sampling_attempts": 3.0,
+      "dapo/sampling_efficiency": 44.166666666666664,
+      "dapo/total_prompts_processed": 18.0,
+      "dapo/valid_prompts_collected": 6.0,
+      "epoch": 0.11428571428571428,
+      "grad_norm": 0.015181603841483593,
+      "kl": 0.00023311376571655273,
+      "learning_rate": 1.002741278414069e-07,
+      "loss": 0.0389,
+      "reward": 0.7550710588693619,
+      "reward_std": 0.9816905185580254,
+      "step": 100
+    },
+    {
+      "epoch": 0.11428571428571428,
+      "step": 100,
+      "total_flos": 0.0,
+      "train_loss": 0.03280465058982372,
+      "train_runtime": 132752.8895,
+      "train_samples_per_second": 0.036,
+      "train_steps_per_second": 0.001
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 100,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 10,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 6,
+  "trial_name": null,
+  "trial_params": null
+}