kangdawei commited on
Commit
3939526
·
verified ·
1 Parent(s): b69a83f

Model save

Browse files
Files changed (4) hide show
  1. README.md +2 -4
  2. all_results.json +4 -4
  3. train_results.json +4 -4
  4. trainer_state.json +1909 -9
README.md CHANGED
@@ -1,19 +1,17 @@
1
  ---
2
  base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
3
- datasets: knoveleng/open-rs
4
  library_name: transformers
5
  model_name: DAPO
6
  tags:
7
  - generated_from_trainer
8
- - open-r1
9
- - dapo
10
  - trl
 
11
  licence: license
12
  ---
13
 
14
  # Model Card for DAPO
15
 
16
- This model is a fine-tuned version of [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) on the [knoveleng/open-rs](https://huggingface.co/datasets/knoveleng/open-rs) dataset.
17
  It has been trained using [TRL](https://github.com/huggingface/trl).
18
 
19
  ## Quick start
 
1
  ---
2
  base_model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
 
3
  library_name: transformers
4
  model_name: DAPO
5
  tags:
6
  - generated_from_trainer
 
 
7
  - trl
8
+ - dapo
9
  licence: license
10
  ---
11
 
12
  # Model Card for DAPO
13
 
14
+ This model is a fine-tuned version of [deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B).
15
  It has been trained using [TRL](https://github.com/huggingface/trl).
16
 
17
  ## Quick start
all_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "total_flos": 0.0,
3
- "train_loss": 0.06302435559220612,
4
- "train_runtime": 83388.0569,
5
  "train_samples": 7000,
6
- "train_samples_per_second": 0.058,
7
- "train_steps_per_second": 0.001
8
  }
 
1
  {
2
  "total_flos": 0.0,
3
+ "train_loss": 0.02940896774176508,
4
+ "train_runtime": 83918.4654,
5
  "train_samples": 7000,
6
+ "train_samples_per_second": 0.114,
7
+ "train_steps_per_second": 0.002
8
  }
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "total_flos": 0.0,
3
- "train_loss": 0.06302435559220612,
4
- "train_runtime": 83388.0569,
5
  "train_samples": 7000,
6
- "train_samples_per_second": 0.058,
7
- "train_steps_per_second": 0.001
8
  }
 
1
  {
2
  "total_flos": 0.0,
3
+ "train_loss": 0.02940896774176508,
4
+ "train_runtime": 83918.4654,
5
  "train_samples": 7000,
6
+ "train_samples_per_second": 0.114,
7
+ "train_steps_per_second": 0.002
8
  }
trainer_state.json CHANGED
@@ -2,9 +2,9 @@
2
  "best_global_step": null,
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
- "epoch": 0.11428571428571428,
6
  "eval_steps": 500,
7
- "global_step": 100,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
@@ -1910,17 +1910,1917 @@
1910
  "step": 100
1911
  },
1912
  {
1913
- "epoch": 0.11428571428571428,
1914
- "step": 100,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1915
  "total_flos": 0.0,
1916
- "train_loss": 0.06302435559220612,
1917
- "train_runtime": 83388.0569,
1918
- "train_samples_per_second": 0.058,
1919
- "train_steps_per_second": 0.001
1920
  }
1921
  ],
1922
  "logging_steps": 1,
1923
- "max_steps": 100,
1924
  "num_input_tokens_seen": 0,
1925
  "num_train_epochs": 1,
1926
  "save_steps": 10,
 
2
  "best_global_step": null,
3
  "best_metric": null,
4
  "best_model_checkpoint": null,
5
+ "epoch": 0.22857142857142856,
6
  "eval_steps": 500,
7
+ "global_step": 200,
8
  "is_hyper_param_search": false,
9
  "is_local_process_zero": true,
10
  "is_world_process_zero": true,
 
1910
  "step": 100
1911
  },
1912
  {
1913
+ "clip_fraction": 0.0,
1914
+ "completion_length": 3318.513916015625,
1915
+ "dapo/avg_reward_std": 0.22042016812733242,
1916
+ "dapo/filter_reward_index": 0.0,
1917
+ "dapo/kept_prompts_ratio": 0.29523810063089645,
1918
+ "dapo/num_sampling_attempts": 4.375,
1919
+ "dapo/sampling_efficiency": 28.645833333333332,
1920
+ "dapo/total_prompts_processed": 26.25,
1921
+ "dapo/valid_prompts_collected": 6.0,
1922
+ "epoch": 0.11542857142857142,
1923
+ "grad_norm": 0.22150926291942596,
1924
+ "kl": 0.011791229248046875,
1925
+ "learning_rate": 1e-07,
1926
+ "loss": 0.0631,
1927
+ "reward": 0.46524661034345627,
1928
+ "reward_std": 0.9665903598070145,
1929
+ "step": 101
1930
+ },
1931
+ {
1932
+ "clip_fraction": 0.0,
1933
+ "completion_length": 3083.875,
1934
+ "dapo/avg_reward_std": 0.21663353669232335,
1935
+ "dapo/filter_reward_index": 0.0,
1936
+ "dapo/kept_prompts_ratio": 0.3390804637095024,
1937
+ "dapo/num_sampling_attempts": 3.625,
1938
+ "dapo/sampling_efficiency": 39.93055555555555,
1939
+ "dapo/total_prompts_processed": 21.75,
1940
+ "dapo/valid_prompts_collected": 6.0,
1941
+ "epoch": 0.11657142857142858,
1942
+ "grad_norm": 0.16289636492729187,
1943
+ "kl": 0.008695602416992188,
1944
+ "learning_rate": 6.203955092681039e-07,
1945
+ "loss": 0.098,
1946
+ "reward": 0.8642945289611816,
1947
+ "reward_std": 1.031830094754696,
1948
+ "step": 102
1949
+ },
1950
+ {
1951
+ "clip_fraction": 0.0,
1952
+ "completion_length": 3364.701446533203,
1953
+ "dapo/avg_reward_std": 0.24887267331923207,
1954
+ "dapo/filter_reward_index": 0.0,
1955
+ "dapo/kept_prompts_ratio": 0.3172043090866458,
1956
+ "dapo/num_sampling_attempts": 3.875,
1957
+ "dapo/sampling_efficiency": 31.69642857142857,
1958
+ "dapo/total_prompts_processed": 23.25,
1959
+ "dapo/valid_prompts_collected": 6.0,
1960
+ "epoch": 0.11771428571428572,
1961
+ "grad_norm": 0.08825232833623886,
1962
+ "kl": 0.009820938110351562,
1963
+ "learning_rate": 6.126278954320294e-07,
1964
+ "loss": 0.0178,
1965
+ "reward": 0.3627179069444537,
1966
+ "reward_std": 0.8941863179206848,
1967
+ "step": 103
1968
+ },
1969
+ {
1970
+ "clip_fraction": 0.0,
1971
+ "completion_length": 3255.3055725097656,
1972
+ "dapo/avg_reward_std": 0.24808817549988074,
1973
+ "dapo/filter_reward_index": 0.0,
1974
+ "dapo/kept_prompts_ratio": 0.33950618074999916,
1975
+ "dapo/num_sampling_attempts": 3.375,
1976
+ "dapo/sampling_efficiency": 38.95833333333333,
1977
+ "dapo/total_prompts_processed": 20.25,
1978
+ "dapo/valid_prompts_collected": 6.0,
1979
+ "epoch": 0.11885714285714286,
1980
+ "grad_norm": 0.13638561964035034,
1981
+ "kl": 0.011318206787109375,
1982
+ "learning_rate": 6.048412045323164e-07,
1983
+ "loss": 0.0643,
1984
+ "reward": 0.5508436523377895,
1985
+ "reward_std": 0.9409585371613503,
1986
+ "step": 104
1987
+ },
1988
+ {
1989
+ "clip_fraction": 0.0,
1990
+ "completion_length": 3270.4930419921875,
1991
+ "dapo/avg_reward_std": 0.23700118958950042,
1992
+ "dapo/filter_reward_index": 0.0,
1993
+ "dapo/kept_prompts_ratio": 0.3166666706403097,
1994
+ "dapo/num_sampling_attempts": 3.75,
1995
+ "dapo/sampling_efficiency": 61.07142857142857,
1996
+ "dapo/total_prompts_processed": 22.5,
1997
+ "dapo/valid_prompts_collected": 6.0,
1998
+ "epoch": 0.12,
1999
+ "grad_norm": 0.10357476025819778,
2000
+ "kl": 0.0117034912109375,
2001
+ "learning_rate": 5.97037808470444e-07,
2002
+ "loss": 0.0278,
2003
+ "reward": 0.4137148158624768,
2004
+ "reward_std": 0.9205853268504143,
2005
+ "step": 105
2006
+ },
2007
+ {
2008
+ "clip_fraction": 0.0,
2009
+ "completion_length": 3118.9584045410156,
2010
+ "dapo/avg_reward_std": 0.22452521603554487,
2011
+ "dapo/filter_reward_index": 0.0,
2012
+ "dapo/kept_prompts_ratio": 0.3333333395421505,
2013
+ "dapo/num_sampling_attempts": 4.0,
2014
+ "dapo/sampling_efficiency": 28.869047619047613,
2015
+ "dapo/total_prompts_processed": 24.0,
2016
+ "dapo/valid_prompts_collected": 6.0,
2017
+ "epoch": 0.12114285714285715,
2018
+ "grad_norm": 0.11885393410921097,
2019
+ "kl": 0.011783599853515625,
2020
+ "learning_rate": 5.892200842364462e-07,
2021
+ "loss": 0.0786,
2022
+ "reward": 0.673494272865355,
2023
+ "reward_std": 0.9388571679592133,
2024
+ "step": 106
2025
+ },
2026
+ {
2027
+ "clip_fraction": 0.0,
2028
+ "completion_length": 3183.666717529297,
2029
+ "dapo/avg_reward_std": 0.23609773551716523,
2030
+ "dapo/filter_reward_index": 0.0,
2031
+ "dapo/kept_prompts_ratio": 0.30882353467099805,
2032
+ "dapo/num_sampling_attempts": 4.25,
2033
+ "dapo/sampling_efficiency": 37.74305555555556,
2034
+ "dapo/total_prompts_processed": 25.5,
2035
+ "dapo/valid_prompts_collected": 6.0,
2036
+ "epoch": 0.12228571428571429,
2037
+ "grad_norm": 0.13629400730133057,
2038
+ "kl": 0.0092010498046875,
2039
+ "learning_rate": 5.813904131848564e-07,
2040
+ "loss": 0.0615,
2041
+ "reward": 0.5680118557065725,
2042
+ "reward_std": 0.8982010260224342,
2043
+ "step": 107
2044
+ },
2045
+ {
2046
+ "clip_fraction": 0.0,
2047
+ "completion_length": 3170.263916015625,
2048
+ "dapo/avg_reward_std": 0.21017570431168014,
2049
+ "dapo/filter_reward_index": 0.0,
2050
+ "dapo/kept_prompts_ratio": 0.3018018079770578,
2051
+ "dapo/num_sampling_attempts": 4.625,
2052
+ "dapo/sampling_efficiency": 30.625,
2053
+ "dapo/total_prompts_processed": 27.75,
2054
+ "dapo/valid_prompts_collected": 6.0,
2055
+ "epoch": 0.12342857142857143,
2056
+ "grad_norm": 0.1134539544582367,
2057
+ "kl": 0.010692596435546875,
2058
+ "learning_rate": 5.735511803093248e-07,
2059
+ "loss": 0.0433,
2060
+ "reward": 0.6368884779512882,
2061
+ "reward_std": 0.9655679985880852,
2062
+ "step": 108
2063
+ },
2064
+ {
2065
+ "clip_fraction": 0.0,
2066
+ "completion_length": 2938.5243530273438,
2067
+ "dapo/avg_reward_std": 0.30796096875117374,
2068
+ "dapo/filter_reward_index": 0.0,
2069
+ "dapo/kept_prompts_ratio": 0.3974359052685591,
2070
+ "dapo/num_sampling_attempts": 3.25,
2071
+ "dapo/sampling_efficiency": 38.95833333333333,
2072
+ "dapo/total_prompts_processed": 19.5,
2073
+ "dapo/valid_prompts_collected": 6.0,
2074
+ "epoch": 0.12457142857142857,
2075
+ "grad_norm": 0.16064728796482086,
2076
+ "kl": 0.014812469482421875,
2077
+ "learning_rate": 5.657047735161255e-07,
2078
+ "loss": 0.0874,
2079
+ "reward": 0.4405923653393984,
2080
+ "reward_std": 0.899710550904274,
2081
+ "step": 109
2082
+ },
2083
+ {
2084
+ "clip_fraction": 0.0,
2085
+ "completion_length": 3333.5556030273438,
2086
+ "dapo/avg_reward_std": 0.17683410130698105,
2087
+ "dapo/filter_reward_index": 0.0,
2088
+ "dapo/kept_prompts_ratio": 0.28735632475080164,
2089
+ "dapo/num_sampling_attempts": 3.625,
2090
+ "dapo/sampling_efficiency": 40.104166666666664,
2091
+ "dapo/total_prompts_processed": 21.75,
2092
+ "dapo/valid_prompts_collected": 6.0,
2093
+ "epoch": 0.12571428571428572,
2094
+ "grad_norm": 0.1374766230583191,
2095
+ "kl": 0.00823211669921875,
2096
+ "learning_rate": 5.578535828967777e-07,
2097
+ "loss": 0.0525,
2098
+ "reward": 0.6373127717524767,
2099
+ "reward_std": 0.949370414018631,
2100
+ "step": 110
2101
+ },
2102
+ {
2103
+ "clip_fraction": 0.0,
2104
+ "completion_length": 3404.166717529297,
2105
+ "dapo/avg_reward_std": 0.2707539377734065,
2106
+ "dapo/filter_reward_index": 0.0,
2107
+ "dapo/kept_prompts_ratio": 0.3437500074505806,
2108
+ "dapo/num_sampling_attempts": 4.0,
2109
+ "dapo/sampling_efficiency": 28.124999999999996,
2110
+ "dapo/total_prompts_processed": 24.0,
2111
+ "dapo/valid_prompts_collected": 6.0,
2112
+ "epoch": 0.12685714285714286,
2113
+ "grad_norm": 0.09096160531044006,
2114
+ "kl": 0.0152435302734375,
2115
+ "learning_rate": 5.5e-07,
2116
+ "loss": 0.0286,
2117
+ "reward": 0.4166172882542014,
2118
+ "reward_std": 0.9417606145143509,
2119
+ "step": 111
2120
+ },
2121
+ {
2122
+ "clip_fraction": 0.0,
2123
+ "completion_length": 3306.263946533203,
2124
+ "dapo/avg_reward_std": 0.17227381931410896,
2125
+ "dapo/filter_reward_index": 0.0,
2126
+ "dapo/kept_prompts_ratio": 0.21481482055452134,
2127
+ "dapo/num_sampling_attempts": 5.625,
2128
+ "dapo/sampling_efficiency": 27.395833333333332,
2129
+ "dapo/total_prompts_processed": 33.75,
2130
+ "dapo/valid_prompts_collected": 6.0,
2131
+ "epoch": 0.128,
2132
+ "grad_norm": 0.11950567364692688,
2133
+ "kl": 0.01320648193359375,
2134
+ "learning_rate": 5.421464171032224e-07,
2135
+ "loss": 0.0449,
2136
+ "reward": 0.4937558462843299,
2137
+ "reward_std": 0.9720155894756317,
2138
+ "step": 112
2139
+ },
2140
+ {
2141
+ "clip_fraction": 0.0,
2142
+ "completion_length": 3117.1979064941406,
2143
+ "dapo/avg_reward_std": 0.30339551545106447,
2144
+ "dapo/filter_reward_index": 0.0,
2145
+ "dapo/kept_prompts_ratio": 0.3846153886272357,
2146
+ "dapo/num_sampling_attempts": 3.25,
2147
+ "dapo/sampling_efficiency": 38.95833333333333,
2148
+ "dapo/total_prompts_processed": 19.5,
2149
+ "dapo/valid_prompts_collected": 6.0,
2150
+ "epoch": 0.12914285714285714,
2151
+ "grad_norm": 0.15823398530483246,
2152
+ "kl": 0.01418304443359375,
2153
+ "learning_rate": 5.342952264838747e-07,
2154
+ "loss": 0.0743,
2155
+ "reward": 0.5596551271155477,
2156
+ "reward_std": 0.8979872986674309,
2157
+ "step": 113
2158
+ },
2159
+ {
2160
+ "clip_fraction": 0.0,
2161
+ "completion_length": 3239.031280517578,
2162
+ "dapo/avg_reward_std": 0.24120492219924927,
2163
+ "dapo/filter_reward_index": 0.0,
2164
+ "dapo/kept_prompts_ratio": 0.34000000298023225,
2165
+ "dapo/num_sampling_attempts": 3.125,
2166
+ "dapo/sampling_efficiency": 56.770833333333336,
2167
+ "dapo/total_prompts_processed": 18.75,
2168
+ "dapo/valid_prompts_collected": 6.0,
2169
+ "epoch": 0.13028571428571428,
2170
+ "grad_norm": 0.20106364786624908,
2171
+ "kl": 0.01206207275390625,
2172
+ "learning_rate": 5.264488196906752e-07,
2173
+ "loss": 0.0817,
2174
+ "reward": 0.697497084736824,
2175
+ "reward_std": 0.9489930346608162,
2176
+ "step": 114
2177
+ },
2178
+ {
2179
+ "clip_fraction": 0.0,
2180
+ "completion_length": 3197.2430725097656,
2181
+ "dapo/avg_reward_std": 0.20663932577157632,
2182
+ "dapo/filter_reward_index": 0.0,
2183
+ "dapo/kept_prompts_ratio": 0.26495727056112045,
2184
+ "dapo/num_sampling_attempts": 4.875,
2185
+ "dapo/sampling_efficiency": 38.4375,
2186
+ "dapo/total_prompts_processed": 29.25,
2187
+ "dapo/valid_prompts_collected": 6.0,
2188
+ "epoch": 0.13142857142857142,
2189
+ "grad_norm": 0.15399962663650513,
2190
+ "kl": 0.015567779541015625,
2191
+ "learning_rate": 5.186095868151436e-07,
2192
+ "loss": 0.0667,
2193
+ "reward": 0.5802914081141353,
2194
+ "reward_std": 0.9295158162713051,
2195
+ "step": 115
2196
+ },
2197
+ {
2198
+ "clip_fraction": 0.0,
2199
+ "completion_length": 3272.6007080078125,
2200
+ "dapo/avg_reward_std": 0.22710687816143035,
2201
+ "dapo/filter_reward_index": 0.0,
2202
+ "dapo/kept_prompts_ratio": 0.3166666701436043,
2203
+ "dapo/num_sampling_attempts": 3.75,
2204
+ "dapo/sampling_efficiency": 37.61904761904762,
2205
+ "dapo/total_prompts_processed": 22.5,
2206
+ "dapo/valid_prompts_collected": 6.0,
2207
+ "epoch": 0.13257142857142856,
2208
+ "grad_norm": 0.140142023563385,
2209
+ "kl": 0.01934814453125,
2210
+ "learning_rate": 5.107799157635538e-07,
2211
+ "loss": 0.0611,
2212
+ "reward": 0.6176847349852324,
2213
+ "reward_std": 0.944318100810051,
2214
+ "step": 116
2215
+ },
2216
+ {
2217
+ "clip_fraction": 0.0,
2218
+ "completion_length": 3268.4305725097656,
2219
+ "dapo/avg_reward_std": 0.23266587586238466,
2220
+ "dapo/filter_reward_index": 0.0,
2221
+ "dapo/kept_prompts_ratio": 0.344827591345228,
2222
+ "dapo/num_sampling_attempts": 3.625,
2223
+ "dapo/sampling_efficiency": 38.125,
2224
+ "dapo/total_prompts_processed": 21.75,
2225
+ "dapo/valid_prompts_collected": 6.0,
2226
+ "epoch": 0.1337142857142857,
2227
+ "grad_norm": 0.1582440286874771,
2228
+ "kl": 0.01198577880859375,
2229
+ "learning_rate": 5.02962191529556e-07,
2230
+ "loss": 0.0556,
2231
+ "reward": 0.5785031230188906,
2232
+ "reward_std": 0.954645112156868,
2233
+ "step": 117
2234
+ },
2235
+ {
2236
+ "clip_fraction": 0.0,
2237
+ "completion_length": 2941.9722595214844,
2238
+ "dapo/avg_reward_std": 0.24969401342027328,
2239
+ "dapo/filter_reward_index": 0.0,
2240
+ "dapo/kept_prompts_ratio": 0.3284313814604984,
2241
+ "dapo/num_sampling_attempts": 4.25,
2242
+ "dapo/sampling_efficiency": 27.20238095238095,
2243
+ "dapo/total_prompts_processed": 25.5,
2244
+ "dapo/valid_prompts_collected": 6.0,
2245
+ "epoch": 0.13485714285714287,
2246
+ "grad_norm": 0.1869765818119049,
2247
+ "kl": 0.01676177978515625,
2248
+ "learning_rate": 4.951587954676837e-07,
2249
+ "loss": 0.1063,
2250
+ "reward": 0.6486848145723343,
2251
+ "reward_std": 0.9332743212580681,
2252
+ "step": 118
2253
+ },
2254
+ {
2255
+ "clip_fraction": 0.0,
2256
+ "completion_length": 3206.982635498047,
2257
+ "dapo/avg_reward_std": 0.20580977627209254,
2258
+ "dapo/filter_reward_index": 0.0,
2259
+ "dapo/kept_prompts_ratio": 0.26666667333671024,
2260
+ "dapo/num_sampling_attempts": 4.375,
2261
+ "dapo/sampling_efficiency": 41.28472222222222,
2262
+ "dapo/total_prompts_processed": 26.25,
2263
+ "dapo/valid_prompts_collected": 6.0,
2264
+ "epoch": 0.136,
2265
+ "grad_norm": 0.13004696369171143,
2266
+ "kl": 0.015842437744140625,
2267
+ "learning_rate": 4.873721045679706e-07,
2268
+ "loss": 0.0453,
2269
+ "reward": 0.4798949249088764,
2270
+ "reward_std": 0.9390313774347305,
2271
+ "step": 119
2272
+ },
2273
+ {
2274
+ "clip_fraction": 0.0,
2275
+ "completion_length": 3015.545135498047,
2276
+ "dapo/avg_reward_std": 0.22217401381461852,
2277
+ "dapo/filter_reward_index": 0.0,
2278
+ "dapo/kept_prompts_ratio": 0.3548387149649282,
2279
+ "dapo/num_sampling_attempts": 3.875,
2280
+ "dapo/sampling_efficiency": 28.95833333333333,
2281
+ "dapo/total_prompts_processed": 23.25,
2282
+ "dapo/valid_prompts_collected": 6.0,
2283
+ "epoch": 0.13714285714285715,
2284
+ "grad_norm": 0.229897141456604,
2285
+ "kl": 0.02198028564453125,
2286
+ "learning_rate": 4.79604490731896e-07,
2287
+ "loss": 0.0749,
2288
+ "reward": 0.7311479561030865,
2289
+ "reward_std": 0.9607837572693825,
2290
+ "step": 120
2291
+ },
2292
+ {
2293
+ "clip_fraction": 0.0,
2294
+ "completion_length": 3098.656280517578,
2295
+ "dapo/avg_reward_std": 0.22588159143924713,
2296
+ "dapo/filter_reward_index": 0.0,
2297
+ "dapo/kept_prompts_ratio": 0.32777778506278993,
2298
+ "dapo/num_sampling_attempts": 3.75,
2299
+ "dapo/sampling_efficiency": 44.613095238095234,
2300
+ "dapo/total_prompts_processed": 22.5,
2301
+ "dapo/valid_prompts_collected": 6.0,
2302
+ "epoch": 0.1382857142857143,
2303
+ "grad_norm": 0.13800247013568878,
2304
+ "kl": 0.014202117919921875,
2305
+ "learning_rate": 4.7185832004988133e-07,
2306
+ "loss": 0.0814,
2307
+ "reward": 0.8461479842662811,
2308
+ "reward_std": 0.9660850539803505,
2309
+ "step": 121
2310
+ },
2311
+ {
2312
+ "clip_fraction": 0.0,
2313
+ "completion_length": 3064.3924255371094,
2314
+ "dapo/avg_reward_std": 0.16500467896461488,
2315
+ "dapo/filter_reward_index": 0.0,
2316
+ "dapo/kept_prompts_ratio": 0.19666667193174361,
2317
+ "dapo/num_sampling_attempts": 6.25,
2318
+ "dapo/sampling_efficiency": 21.07142857142857,
2319
+ "dapo/total_prompts_processed": 37.5,
2320
+ "dapo/valid_prompts_collected": 6.0,
2321
+ "epoch": 0.13942857142857143,
2322
+ "grad_norm": 0.1680934727191925,
2323
+ "kl": 0.01361083984375,
2324
+ "learning_rate": 4.641359520805548e-07,
2325
+ "loss": 0.066,
2326
+ "reward": 0.7812346797436476,
2327
+ "reward_std": 0.9529108256101608,
2328
+ "step": 122
2329
+ },
2330
+ {
2331
+ "clip_fraction": 0.0,
2332
+ "completion_length": 3097.4861755371094,
2333
+ "dapo/avg_reward_std": 0.22939075000824466,
2334
+ "dapo/filter_reward_index": 0.0,
2335
+ "dapo/kept_prompts_ratio": 0.33333334038334506,
2336
+ "dapo/num_sampling_attempts": 3.875,
2337
+ "dapo/sampling_efficiency": 33.75,
2338
+ "dapo/total_prompts_processed": 23.25,
2339
+ "dapo/valid_prompts_collected": 6.0,
2340
+ "epoch": 0.14057142857142857,
2341
+ "grad_norm": 0.18081900477409363,
2342
+ "kl": 0.014842987060546875,
2343
+ "learning_rate": 4.5643973913200837e-07,
2344
+ "loss": 0.0877,
2345
+ "reward": 0.7531900368630886,
2346
+ "reward_std": 0.9868133068084717,
2347
+ "step": 123
2348
+ },
2349
+ {
2350
+ "clip_fraction": 0.0,
2351
+ "completion_length": 3203.888885498047,
2352
+ "dapo/avg_reward_std": 0.24352495979379724,
2353
+ "dapo/filter_reward_index": 0.0,
2354
+ "dapo/kept_prompts_ratio": 0.35185185737080044,
2355
+ "dapo/num_sampling_attempts": 3.375,
2356
+ "dapo/sampling_efficiency": 43.05555555555556,
2357
+ "dapo/total_prompts_processed": 20.25,
2358
+ "dapo/valid_prompts_collected": 6.0,
2359
+ "epoch": 0.1417142857142857,
2360
+ "grad_norm": 0.16807734966278076,
2361
+ "kl": 0.0139007568359375,
2362
+ "learning_rate": 4.4877202554526084e-07,
2363
+ "loss": 0.0612,
2364
+ "reward": 0.715996683575213,
2365
+ "reward_std": 0.9595553278923035,
2366
+ "step": 124
2367
+ },
2368
+ {
2369
+ "clip_fraction": 0.0,
2370
+ "completion_length": 2885.5625610351562,
2371
+ "dapo/avg_reward_std": 0.2548297820612788,
2372
+ "dapo/filter_reward_index": 0.0,
2373
+ "dapo/kept_prompts_ratio": 0.31770833814516664,
2374
+ "dapo/num_sampling_attempts": 4.0,
2375
+ "dapo/sampling_efficiency": 27.20238095238095,
2376
+ "dapo/total_prompts_processed": 24.0,
2377
+ "dapo/valid_prompts_collected": 6.0,
2378
+ "epoch": 0.14285714285714285,
2379
+ "grad_norm": 0.16355834901332855,
2380
+ "kl": 0.02027130126953125,
2381
+ "learning_rate": 4.4113514698014953e-07,
2382
+ "loss": 0.0597,
2383
+ "reward": 0.8311022147536278,
2384
+ "reward_std": 0.9600836709141731,
2385
+ "step": 125
2386
+ },
2387
+ {
2388
+ "clip_fraction": 0.0,
2389
+ "completion_length": 3250.843780517578,
2390
+ "dapo/avg_reward_std": 0.2203440727858708,
2391
+ "dapo/filter_reward_index": 0.0,
2392
+ "dapo/kept_prompts_ratio": 0.32758621152105005,
2393
+ "dapo/num_sampling_attempts": 3.625,
2394
+ "dapo/sampling_efficiency": 46.770833333333336,
2395
+ "dapo/total_prompts_processed": 21.75,
2396
+ "dapo/valid_prompts_collected": 6.0,
2397
+ "epoch": 0.144,
2398
+ "grad_norm": 0.18190248310565948,
2399
+ "kl": 0.0158843994140625,
2400
+ "learning_rate": 4.3353142970386557e-07,
2401
+ "loss": 0.068,
2402
+ "reward": 0.7400151332840323,
2403
+ "reward_std": 0.9569809287786484,
2404
+ "step": 126
2405
+ },
2406
+ {
2407
+ "clip_fraction": 0.0,
2408
+ "completion_length": 3264.420166015625,
2409
+ "dapo/avg_reward_std": 0.25137073759521755,
2410
+ "dapo/filter_reward_index": 0.0,
2411
+ "dapo/kept_prompts_ratio": 0.41666667429464205,
2412
+ "dapo/num_sampling_attempts": 3.5,
2413
+ "dapo/sampling_efficiency": 40.11904761904761,
2414
+ "dapo/total_prompts_processed": 21.0,
2415
+ "dapo/valid_prompts_collected": 6.0,
2416
+ "epoch": 0.14514285714285713,
2417
+ "grad_norm": 0.17950685322284698,
2418
+ "kl": 0.0223236083984375,
2419
+ "learning_rate": 4.2596318988235037e-07,
2420
+ "loss": 0.0528,
2421
+ "reward": 0.5194851458072662,
2422
+ "reward_std": 0.9414050430059433,
2423
+ "step": 127
2424
+ },
2425
+ {
2426
+ "clip_fraction": 0.0,
2427
+ "completion_length": 2892.9132690429688,
2428
+ "dapo/avg_reward_std": 0.2416491061449051,
2429
+ "dapo/filter_reward_index": 0.0,
2430
+ "dapo/kept_prompts_ratio": 0.2631579002267436,
2431
+ "dapo/num_sampling_attempts": 4.75,
2432
+ "dapo/sampling_efficiency": 26.9047619047619,
2433
+ "dapo/total_prompts_processed": 28.5,
2434
+ "dapo/valid_prompts_collected": 6.0,
2435
+ "epoch": 0.1462857142857143,
2436
+ "grad_norm": 0.25602471828460693,
2437
+ "kl": 0.02016448974609375,
2438
+ "learning_rate": 4.1843273287476854e-07,
2439
+ "loss": 0.0933,
2440
+ "reward": 0.8592288717627525,
2441
+ "reward_std": 0.9212958365678787,
2442
+ "step": 128
2443
+ },
2444
+ {
2445
+ "clip_fraction": 0.0,
2446
+ "completion_length": 3146.6944580078125,
2447
+ "dapo/avg_reward_std": 0.22558308675371366,
2448
+ "dapo/filter_reward_index": 0.0,
2449
+ "dapo/kept_prompts_ratio": 0.3218390854268238,
2450
+ "dapo/num_sampling_attempts": 3.625,
2451
+ "dapo/sampling_efficiency": 54.07738095238095,
2452
+ "dapo/total_prompts_processed": 21.75,
2453
+ "dapo/valid_prompts_collected": 6.0,
2454
+ "epoch": 0.14742857142857144,
2455
+ "grad_norm": 0.21352027356624603,
2456
+ "kl": 0.0198211669921875,
2457
+ "learning_rate": 4.1094235253127374e-07,
2458
+ "loss": 0.0679,
2459
+ "reward": 0.5732525363564491,
2460
+ "reward_std": 0.9645283669233322,
2461
+ "step": 129
2462
+ },
2463
+ {
2464
+ "clip_fraction": 0.0,
2465
+ "completion_length": 3248.4236450195312,
2466
+ "dapo/avg_reward_std": 0.35807471639580196,
2467
+ "dapo/filter_reward_index": 0.0,
2468
+ "dapo/kept_prompts_ratio": 0.5000000066227384,
2469
+ "dapo/num_sampling_attempts": 2.25,
2470
+ "dapo/sampling_efficiency": 51.041666666666664,
2471
+ "dapo/total_prompts_processed": 13.5,
2472
+ "dapo/valid_prompts_collected": 6.0,
2473
+ "epoch": 0.14857142857142858,
2474
+ "grad_norm": 0.1599435657262802,
2475
+ "kl": 0.0216827392578125,
2476
+ "learning_rate": 4.034943304942796e-07,
2477
+ "loss": 0.0443,
2478
+ "reward": 0.5955070666968822,
2479
+ "reward_std": 0.9924386888742447,
2480
+ "step": 130
2481
+ },
2482
+ {
2483
+ "clip_fraction": 0.0,
2484
+ "completion_length": 2958.5347595214844,
2485
+ "dapo/avg_reward_std": 0.18185590389298228,
2486
+ "dapo/filter_reward_index": 0.0,
2487
+ "dapo/kept_prompts_ratio": 0.23170731998071437,
2488
+ "dapo/num_sampling_attempts": 5.125,
2489
+ "dapo/sampling_efficiency": 24.945436507936506,
2490
+ "dapo/total_prompts_processed": 30.75,
2491
+ "dapo/valid_prompts_collected": 6.0,
2492
+ "epoch": 0.14971428571428572,
2493
+ "grad_norm": 0.21188445389270782,
2494
+ "kl": 0.02074432373046875,
2495
+ "learning_rate": 3.9609093550344907e-07,
2496
+ "loss": 0.0628,
2497
+ "reward": 0.8608505353331566,
2498
+ "reward_std": 0.9059992283582687,
2499
+ "step": 131
2500
+ },
2501
+ {
2502
+ "clip_fraction": 0.0,
2503
+ "completion_length": 3019.888931274414,
2504
+ "dapo/avg_reward_std": 0.3038036392794715,
2505
+ "dapo/filter_reward_index": 0.0,
2506
+ "dapo/kept_prompts_ratio": 0.36419753785486575,
2507
+ "dapo/num_sampling_attempts": 3.375,
2508
+ "dapo/sampling_efficiency": 38.33333333333333,
2509
+ "dapo/total_prompts_processed": 20.25,
2510
+ "dapo/valid_prompts_collected": 6.0,
2511
+ "epoch": 0.15085714285714286,
2512
+ "grad_norm": 0.19752100110054016,
2513
+ "kl": 0.024078369140625,
2514
+ "learning_rate": 3.8873442270461485e-07,
2515
+ "loss": 0.0698,
2516
+ "reward": 0.7191393785178661,
2517
+ "reward_std": 0.9548436179757118,
2518
+ "step": 132
2519
+ },
2520
+ {
2521
+ "clip_fraction": 0.0,
2522
+ "completion_length": 3251.6909790039062,
2523
+ "dapo/avg_reward_std": 0.17617152915114448,
2524
+ "dapo/filter_reward_index": 0.0,
2525
+ "dapo/kept_prompts_ratio": 0.22222222494227545,
2526
+ "dapo/num_sampling_attempts": 5.25,
2527
+ "dapo/sampling_efficiency": 31.369047619047613,
2528
+ "dapo/total_prompts_processed": 31.5,
2529
+ "dapo/valid_prompts_collected": 6.0,
2530
+ "epoch": 0.152,
2531
+ "grad_norm": 0.1220565065741539,
2532
+ "kl": 0.01824951171875,
2533
+ "learning_rate": 3.8142703296283953e-07,
2534
+ "loss": 0.0249,
2535
+ "reward": 0.3546891317819245,
2536
+ "reward_std": 0.9377138167619705,
2537
+ "step": 133
2538
+ },
2539
+ {
2540
+ "clip_fraction": 0.0,
2541
+ "completion_length": 3146.545196533203,
2542
+ "dapo/avg_reward_std": 0.2565364229679108,
2543
+ "dapo/filter_reward_index": 0.0,
2544
+ "dapo/kept_prompts_ratio": 0.32666667103767394,
2545
+ "dapo/num_sampling_attempts": 3.125,
2546
+ "dapo/sampling_efficiency": 47.08333333333333,
2547
+ "dapo/total_prompts_processed": 18.75,
2548
+ "dapo/valid_prompts_collected": 6.0,
2549
+ "epoch": 0.15314285714285714,
2550
+ "grad_norm": 0.15810362994670868,
2551
+ "kl": 0.03081512451171875,
2552
+ "learning_rate": 3.7417099217982686e-07,
2553
+ "loss": 0.0306,
2554
+ "reward": 0.5206232005730271,
2555
+ "reward_std": 0.9619846642017365,
2556
+ "step": 134
2557
+ },
2558
+ {
2559
+ "clip_fraction": 0.0,
2560
+ "completion_length": 3085.5972900390625,
2561
+ "dapo/avg_reward_std": 0.30491976333515985,
2562
+ "dapo/filter_reward_index": 0.0,
2563
+ "dapo/kept_prompts_ratio": 0.40476191469601225,
2564
+ "dapo/num_sampling_attempts": 3.5,
2565
+ "dapo/sampling_efficiency": 31.666666666666664,
2566
+ "dapo/total_prompts_processed": 21.0,
2567
+ "dapo/valid_prompts_collected": 6.0,
2568
+ "epoch": 0.15428571428571428,
2569
+ "grad_norm": 0.2133372277021408,
2570
+ "kl": 0.0204620361328125,
2571
+ "learning_rate": 3.6696851061588994e-07,
2572
+ "loss": 0.0681,
2573
+ "reward": 0.7713347226381302,
2574
+ "reward_std": 0.9403144493699074,
2575
+ "step": 135
2576
+ },
2577
+ {
2578
+ "clip_fraction": 0.0,
2579
+ "completion_length": 3326.295196533203,
2580
+ "dapo/avg_reward_std": 0.22884555886953306,
2581
+ "dapo/filter_reward_index": 0.0,
2582
+ "dapo/kept_prompts_ratio": 0.24358974817471626,
2583
+ "dapo/num_sampling_attempts": 4.875,
2584
+ "dapo/sampling_efficiency": 25.868055555555557,
2585
+ "dapo/total_prompts_processed": 29.25,
2586
+ "dapo/valid_prompts_collected": 6.0,
2587
+ "epoch": 0.15542857142857142,
2588
+ "grad_norm": 0.18792302906513214,
2589
+ "kl": 0.029754638671875,
2590
+ "learning_rate": 3.5982178221668533e-07,
2591
+ "loss": 0.0468,
2592
+ "reward": 0.5651950668543577,
2593
+ "reward_std": 0.9934203922748566,
2594
+ "step": 136
2595
+ },
2596
+ {
2597
+ "clip_fraction": 0.0,
2598
+ "completion_length": 3265.2882080078125,
2599
+ "dapo/avg_reward_std": 0.304972759137551,
2600
+ "dapo/filter_reward_index": 0.0,
2601
+ "dapo/kept_prompts_ratio": 0.43055556155741215,
2602
+ "dapo/num_sampling_attempts": 3.0,
2603
+ "dapo/sampling_efficiency": 54.375,
2604
+ "dapo/total_prompts_processed": 18.0,
2605
+ "dapo/valid_prompts_collected": 6.0,
2606
+ "epoch": 0.15657142857142858,
2607
+ "grad_norm": 0.13081717491149902,
2608
+ "kl": 0.0223846435546875,
2609
+ "learning_rate": 3.5273298394491515e-07,
2610
+ "loss": 0.0443,
2611
+ "reward": 0.5535581167787313,
2612
+ "reward_std": 0.9467164501547813,
2613
+ "step": 137
2614
+ },
2615
+ {
2616
+ "clip_fraction": 0.0,
2617
+ "completion_length": 2895.8646545410156,
2618
+ "dapo/avg_reward_std": 0.2690910736719767,
2619
+ "dapo/filter_reward_index": 0.0,
2620
+ "dapo/kept_prompts_ratio": 0.3333333387970924,
2621
+ "dapo/num_sampling_attempts": 3.75,
2622
+ "dapo/sampling_efficiency": 32.82738095238095,
2623
+ "dapo/total_prompts_processed": 22.5,
2624
+ "dapo/valid_prompts_collected": 6.0,
2625
+ "epoch": 0.15771428571428572,
2626
+ "grad_norm": 0.18165208399295807,
2627
+ "kl": 0.032073974609375,
2628
+ "learning_rate": 3.45704275117204e-07,
2629
+ "loss": 0.0288,
2630
+ "reward": 0.5253790076822042,
2631
+ "reward_std": 0.9247673749923706,
2632
+ "step": 138
2633
+ },
2634
+ {
2635
+ "clip_fraction": 0.0,
2636
+ "completion_length": 3049.8507080078125,
2637
+ "dapo/avg_reward_std": 0.2440622321196965,
2638
+ "dapo/filter_reward_index": 0.0,
2639
+ "dapo/kept_prompts_ratio": 0.33928572067192625,
2640
+ "dapo/num_sampling_attempts": 3.5,
2641
+ "dapo/sampling_efficiency": 40.11904761904761,
2642
+ "dapo/total_prompts_processed": 21.0,
2643
+ "dapo/valid_prompts_collected": 6.0,
2644
+ "epoch": 0.15885714285714286,
2645
+ "grad_norm": 0.19676071405410767,
2646
+ "kl": 0.03052520751953125,
2647
+ "learning_rate": 3.387377967463493e-07,
2648
+ "loss": 0.0477,
2649
+ "reward": 0.6778539270162582,
2650
+ "reward_std": 0.9344745948910713,
2651
+ "step": 139
2652
+ },
2653
+ {
2654
+ "clip_fraction": 0.0,
2655
+ "completion_length": 3029.0486450195312,
2656
+ "dapo/avg_reward_std": 0.3111469969153404,
2657
+ "dapo/filter_reward_index": 0.0,
2658
+ "dapo/kept_prompts_ratio": 0.4916666768491268,
2659
+ "dapo/num_sampling_attempts": 2.5,
2660
+ "dapo/sampling_efficiency": 41.666666666666664,
2661
+ "dapo/total_prompts_processed": 15.0,
2662
+ "dapo/valid_prompts_collected": 6.0,
2663
+ "epoch": 0.16,
2664
+ "grad_norm": 0.18594416975975037,
2665
+ "kl": 0.0277557373046875,
2666
+ "learning_rate": 3.3183567088914833e-07,
2667
+ "loss": 0.0431,
2668
+ "reward": 0.5210836753249168,
2669
+ "reward_std": 0.9851464107632637,
2670
+ "step": 140
2671
+ },
2672
+ {
2673
+ "clip_fraction": 0.0,
2674
+ "completion_length": 3151.5486755371094,
2675
+ "dapo/avg_reward_std": 0.23511080997330802,
2676
+ "dapo/filter_reward_index": 0.0,
2677
+ "dapo/kept_prompts_ratio": 0.3095238127878734,
2678
+ "dapo/num_sampling_attempts": 4.375,
2679
+ "dapo/sampling_efficiency": 26.18055555555555,
2680
+ "dapo/total_prompts_processed": 26.25,
2681
+ "dapo/valid_prompts_collected": 6.0,
2682
+ "epoch": 0.16114285714285714,
2683
+ "grad_norm": 0.17807213962078094,
2684
+ "kl": 0.0266265869140625,
2685
+ "learning_rate": 3.250000000000001e-07,
2686
+ "loss": 0.0498,
2687
+ "reward": 0.5591800361871719,
2688
+ "reward_std": 0.9730060175061226,
2689
+ "step": 141
2690
+ },
2691
+ {
2692
+ "clip_fraction": 0.0,
2693
+ "completion_length": 2963.59033203125,
2694
+ "dapo/avg_reward_std": 0.19928012508898973,
2695
+ "dapo/filter_reward_index": 0.0,
2696
+ "dapo/kept_prompts_ratio": 0.2812500069849193,
2697
+ "dapo/num_sampling_attempts": 4.0,
2698
+ "dapo/sampling_efficiency": 38.02083333333333,
2699
+ "dapo/total_prompts_processed": 24.0,
2700
+ "dapo/valid_prompts_collected": 6.0,
2701
+ "epoch": 0.16228571428571428,
2702
+ "grad_norm": 0.24388359487056732,
2703
+ "kl": 0.0318603515625,
2704
+ "learning_rate": 3.182328662904756e-07,
2705
+ "loss": 0.0567,
2706
+ "reward": 0.7148469444364309,
2707
+ "reward_std": 0.9495278596878052,
2708
+ "step": 142
2709
+ },
2710
+ {
2711
+ "clip_fraction": 0.0,
2712
+ "completion_length": 3157.791717529297,
2713
+ "dapo/avg_reward_std": 0.23966079843895777,
2714
+ "dapo/filter_reward_index": 0.0,
2715
+ "dapo/kept_prompts_ratio": 0.3214285767504147,
2716
+ "dapo/num_sampling_attempts": 3.5,
2717
+ "dapo/sampling_efficiency": 39.166666666666664,
2718
+ "dapo/total_prompts_processed": 21.0,
2719
+ "dapo/valid_prompts_collected": 6.0,
2720
+ "epoch": 0.16342857142857142,
2721
+ "grad_norm": 0.20528583228588104,
2722
+ "kl": 0.041290283203125,
2723
+ "learning_rate": 3.115363310950578e-07,
2724
+ "loss": 0.0443,
2725
+ "reward": 0.5249591246247292,
2726
+ "reward_std": 0.9509934857487679,
2727
+ "step": 143
2728
+ },
2729
+ {
2730
+ "clip_fraction": 0.0,
2731
+ "completion_length": 3030.187530517578,
2732
+ "dapo/avg_reward_std": 0.30880050485332805,
2733
+ "dapo/filter_reward_index": 0.0,
2734
+ "dapo/kept_prompts_ratio": 0.4375000099341075,
2735
+ "dapo/num_sampling_attempts": 3.0,
2736
+ "dapo/sampling_efficiency": 41.04166666666666,
2737
+ "dapo/total_prompts_processed": 18.0,
2738
+ "dapo/valid_prompts_collected": 6.0,
2739
+ "epoch": 0.16457142857142856,
2740
+ "grad_norm": 0.15082307159900665,
2741
+ "kl": 0.02729034423828125,
2742
+ "learning_rate": 3.0491243424323783e-07,
2743
+ "loss": 0.0511,
2744
+ "reward": 0.5894143544137478,
2745
+ "reward_std": 0.954010546207428,
2746
+ "step": 144
2747
+ },
2748
+ {
2749
+ "clip_fraction": 0.0,
2750
+ "completion_length": 2973.3993225097656,
2751
+ "dapo/avg_reward_std": 0.32683228328824043,
2752
+ "dapo/filter_reward_index": 0.0,
2753
+ "dapo/kept_prompts_ratio": 0.4236111181477706,
2754
+ "dapo/num_sampling_attempts": 3.0,
2755
+ "dapo/sampling_efficiency": 48.66071428571428,
2756
+ "dapo/total_prompts_processed": 18.0,
2757
+ "dapo/valid_prompts_collected": 6.0,
2758
+ "epoch": 0.1657142857142857,
2759
+ "grad_norm": 0.2588576078414917,
2760
+ "kl": 0.038238525390625,
2761
+ "learning_rate": 2.9836319343816397e-07,
2762
+ "loss": 0.0611,
2763
+ "reward": 0.6702784113585949,
2764
+ "reward_std": 0.9678368121385574,
2765
+ "step": 145
2766
+ },
2767
+ {
2768
+ "clip_fraction": 0.0,
2769
+ "completion_length": 3289.8368530273438,
2770
+ "dapo/avg_reward_std": 0.29686578666722335,
2771
+ "dapo/filter_reward_index": 0.0,
2772
+ "dapo/kept_prompts_ratio": 0.34567901823255753,
2773
+ "dapo/num_sampling_attempts": 3.375,
2774
+ "dapo/sampling_efficiency": 51.57738095238095,
2775
+ "dapo/total_prompts_processed": 20.25,
2776
+ "dapo/valid_prompts_collected": 6.0,
2777
+ "epoch": 0.16685714285714287,
2778
+ "grad_norm": 0.2035798877477646,
2779
+ "kl": 0.0394744873046875,
2780
+ "learning_rate": 2.918906036420294e-07,
2781
+ "loss": 0.0576,
2782
+ "reward": 0.4602743685245514,
2783
+ "reward_std": 0.9194413796067238,
2784
+ "step": 146
2785
+ },
2786
+ {
2787
+ "clip_fraction": 0.0,
2788
+ "completion_length": 3068.7604064941406,
2789
+ "dapo/avg_reward_std": 0.27814541943371296,
2790
+ "dapo/filter_reward_index": 0.0,
2791
+ "dapo/kept_prompts_ratio": 0.3437500069849193,
2792
+ "dapo/num_sampling_attempts": 4.0,
2793
+ "dapo/sampling_efficiency": 36.666666666666664,
2794
+ "dapo/total_prompts_processed": 24.0,
2795
+ "dapo/valid_prompts_collected": 6.0,
2796
+ "epoch": 0.168,
2797
+ "grad_norm": 0.22469140589237213,
2798
+ "kl": 0.030426025390625,
2799
+ "learning_rate": 2.854966364683872e-07,
2800
+ "loss": 0.0696,
2801
+ "reward": 0.6243265215307474,
2802
+ "reward_std": 0.9174878597259521,
2803
+ "step": 147
2804
+ },
2805
+ {
2806
+ "clip_fraction": 0.0,
2807
+ "completion_length": 3041.357635498047,
2808
+ "dapo/avg_reward_std": 0.2907161459326744,
2809
+ "dapo/filter_reward_index": 0.0,
2810
+ "dapo/kept_prompts_ratio": 0.458333346247673,
2811
+ "dapo/num_sampling_attempts": 2.5,
2812
+ "dapo/sampling_efficiency": 57.70833333333333,
2813
+ "dapo/total_prompts_processed": 15.0,
2814
+ "dapo/valid_prompts_collected": 6.0,
2815
+ "epoch": 0.16914285714285715,
2816
+ "grad_norm": 0.3123789429664612,
2817
+ "kl": 0.0328521728515625,
2818
+ "learning_rate": 2.791832395815782e-07,
2819
+ "loss": 0.0819,
2820
+ "reward": 0.8250775411725044,
2821
+ "reward_std": 0.9233218431472778,
2822
+ "step": 148
2823
+ },
2824
+ {
2825
+ "clip_fraction": 0.0,
2826
+ "completion_length": 2433.0694732666016,
2827
+ "dapo/avg_reward_std": 0.22243764168686336,
2828
+ "dapo/filter_reward_index": 0.0,
2829
+ "dapo/kept_prompts_ratio": 0.2777777839865949,
2830
+ "dapo/num_sampling_attempts": 4.5,
2831
+ "dapo/sampling_efficiency": 35.75892857142857,
2832
+ "dapo/total_prompts_processed": 27.0,
2833
+ "dapo/valid_prompts_collected": 6.0,
2834
+ "epoch": 0.1702857142857143,
2835
+ "grad_norm": 0.2827485203742981,
2836
+ "kl": 0.0386505126953125,
2837
+ "learning_rate": 2.729523361034538e-07,
2838
+ "loss": 0.0784,
2839
+ "reward": 0.6995697831735015,
2840
+ "reward_std": 0.9434132054448128,
2841
+ "step": 149
2842
+ },
2843
+ {
2844
+ "clip_fraction": 0.0,
2845
+ "completion_length": 3096.59033203125,
2846
+ "dapo/avg_reward_std": 0.347408726811409,
2847
+ "dapo/filter_reward_index": 0.0,
2848
+ "dapo/kept_prompts_ratio": 0.541666672565043,
2849
+ "dapo/num_sampling_attempts": 2.0,
2850
+ "dapo/sampling_efficiency": 63.541666666666664,
2851
+ "dapo/total_prompts_processed": 12.0,
2852
+ "dapo/valid_prompts_collected": 6.0,
2853
+ "epoch": 0.17142857142857143,
2854
+ "grad_norm": 0.30529579520225525,
2855
+ "kl": 0.03045654296875,
2856
+ "learning_rate": 2.6680582402757324e-07,
2857
+ "loss": 0.0868,
2858
+ "reward": 0.7112221932038665,
2859
+ "reward_std": 0.9602288007736206,
2860
+ "step": 150
2861
+ },
2862
+ {
2863
+ "clip_fraction": 0.0,
2864
+ "completion_length": 3184.611083984375,
2865
+ "dapo/avg_reward_std": 0.1674806038115887,
2866
+ "dapo/filter_reward_index": 0.0,
2867
+ "dapo/kept_prompts_ratio": 0.20212766528129578,
2868
+ "dapo/num_sampling_attempts": 5.875,
2869
+ "dapo/sampling_efficiency": 23.749999999999996,
2870
+ "dapo/total_prompts_processed": 35.25,
2871
+ "dapo/valid_prompts_collected": 6.0,
2872
+ "epoch": 0.17257142857142857,
2873
+ "grad_norm": 0.19142813980579376,
2874
+ "kl": 0.037353515625,
2875
+ "learning_rate": 2.6074557564105724e-07,
2876
+ "loss": 0.045,
2877
+ "reward": 0.41017685225233436,
2878
+ "reward_std": 0.9152907580137253,
2879
+ "step": 151
2880
+ },
2881
+ {
2882
+ "clip_fraction": 0.0,
2883
+ "completion_length": 3437.3541564941406,
2884
+ "dapo/avg_reward_std": 0.208841644014631,
2885
+ "dapo/filter_reward_index": 0.0,
2886
+ "dapo/kept_prompts_ratio": 0.2571428622518267,
2887
+ "dapo/num_sampling_attempts": 4.375,
2888
+ "dapo/sampling_efficiency": 40.416666666666664,
2889
+ "dapo/total_prompts_processed": 26.25,
2890
+ "dapo/valid_prompts_collected": 6.0,
2891
+ "epoch": 0.1737142857142857,
2892
+ "grad_norm": 0.15321692824363708,
2893
+ "kl": 0.03997802734375,
2894
+ "learning_rate": 2.547734369542718e-07,
2895
+ "loss": 0.0346,
2896
+ "reward": 0.34562894329428673,
2897
+ "reward_std": 0.856454074382782,
2898
+ "step": 152
2899
+ },
2900
+ {
2901
+ "clip_fraction": 0.0,
2902
+ "completion_length": 3008.1285095214844,
2903
+ "dapo/avg_reward_std": 0.3009934023022652,
2904
+ "dapo/filter_reward_index": 0.0,
2905
+ "dapo/kept_prompts_ratio": 0.5000000096857548,
2906
+ "dapo/num_sampling_attempts": 2.5,
2907
+ "dapo/sampling_efficiency": 43.75,
2908
+ "dapo/total_prompts_processed": 15.0,
2909
+ "dapo/valid_prompts_collected": 6.0,
2910
+ "epoch": 0.17485714285714285,
2911
+ "grad_norm": 0.20332548022270203,
2912
+ "kl": 0.0509033203125,
2913
+ "learning_rate": 2.488912271385139e-07,
2914
+ "loss": 0.0536,
2915
+ "reward": 0.7641689777374268,
2916
+ "reward_std": 0.95648343116045,
2917
+ "step": 153
2918
+ },
2919
+ {
2920
+ "clip_fraction": 0.0,
2921
+ "completion_length": 3165.52783203125,
2922
+ "dapo/avg_reward_std": 0.2268627045246271,
2923
+ "dapo/filter_reward_index": 0.0,
2924
+ "dapo/kept_prompts_ratio": 0.35256410905948055,
2925
+ "dapo/num_sampling_attempts": 3.25,
2926
+ "dapo/sampling_efficiency": 40.625,
2927
+ "dapo/total_prompts_processed": 19.5,
2928
+ "dapo/valid_prompts_collected": 6.0,
2929
+ "epoch": 0.176,
2930
+ "grad_norm": 0.2415708601474762,
2931
+ "kl": 0.032623291015625,
2932
+ "learning_rate": 2.4310073797187573e-07,
2933
+ "loss": 0.0658,
2934
+ "reward": 0.6375892572104931,
2935
+ "reward_std": 0.9544621706008911,
2936
+ "step": 154
2937
+ },
2938
+ {
2939
+ "clip_fraction": 0.0,
2940
+ "completion_length": 3226.4652709960938,
2941
+ "dapo/avg_reward_std": 0.2563069482644399,
2942
+ "dapo/filter_reward_index": 0.0,
2943
+ "dapo/kept_prompts_ratio": 0.38333334078391396,
2944
+ "dapo/num_sampling_attempts": 3.75,
2945
+ "dapo/sampling_efficiency": 31.249999999999996,
2946
+ "dapo/total_prompts_processed": 22.5,
2947
+ "dapo/valid_prompts_collected": 6.0,
2948
+ "epoch": 0.17714285714285713,
2949
+ "grad_norm": 0.2137623131275177,
2950
+ "kl": 0.0427093505859375,
2951
+ "learning_rate": 2.374037332934512e-07,
2952
+ "loss": 0.0533,
2953
+ "reward": 0.537381574511528,
2954
+ "reward_std": 0.9281218275427818,
2955
+ "step": 155
2956
+ },
2957
+ {
2958
+ "clip_fraction": 0.0,
2959
+ "completion_length": 2680.3090209960938,
2960
+ "dapo/avg_reward_std": 0.22888225678241614,
2961
+ "dapo/filter_reward_index": 0.0,
2962
+ "dapo/kept_prompts_ratio": 0.3181818226973216,
2963
+ "dapo/num_sampling_attempts": 4.125,
2964
+ "dapo/sampling_efficiency": 31.29960317460317,
2965
+ "dapo/total_prompts_processed": 24.75,
2966
+ "dapo/valid_prompts_collected": 6.0,
2967
+ "epoch": 0.1782857142857143,
2968
+ "grad_norm": 0.3409210443496704,
2969
+ "kl": 0.03851318359375,
2970
+ "learning_rate": 2.3180194846605364e-07,
2971
+ "loss": 0.0962,
2972
+ "reward": 0.8820424377918243,
2973
+ "reward_std": 0.9246840327978134,
2974
+ "step": 156
2975
+ },
2976
+ {
2977
+ "clip_fraction": 0.0,
2978
+ "completion_length": 3045.3299255371094,
2979
+ "dapo/avg_reward_std": 0.2491180575810946,
2980
+ "dapo/filter_reward_index": 0.0,
2981
+ "dapo/kept_prompts_ratio": 0.3653846222620744,
2982
+ "dapo/num_sampling_attempts": 3.25,
2983
+ "dapo/sampling_efficiency": 45.83333333333332,
2984
+ "dapo/total_prompts_processed": 19.5,
2985
+ "dapo/valid_prompts_collected": 6.0,
2986
+ "epoch": 0.17942857142857144,
2987
+ "grad_norm": 0.23701035976409912,
2988
+ "kl": 0.0436248779296875,
2989
+ "learning_rate": 2.2629708984760706e-07,
2990
+ "loss": 0.0414,
2991
+ "reward": 0.6551959328353405,
2992
+ "reward_std": 0.9744707196950912,
2993
+ "step": 157
2994
+ },
2995
+ {
2996
+ "clip_fraction": 0.0,
2997
+ "completion_length": 2918.892364501953,
2998
+ "dapo/avg_reward_std": 0.22537656256130764,
2999
+ "dapo/filter_reward_index": 0.0,
3000
+ "dapo/kept_prompts_ratio": 0.33333333730697634,
3001
+ "dapo/num_sampling_attempts": 4.375,
3002
+ "dapo/sampling_efficiency": 39.93055555555556,
3003
+ "dapo/total_prompts_processed": 26.25,
3004
+ "dapo/valid_prompts_collected": 6.0,
3005
+ "epoch": 0.18057142857142858,
3006
+ "grad_norm": 0.3551786541938782,
3007
+ "kl": 0.0572357177734375,
3008
+ "learning_rate": 2.2089083427137329e-07,
3009
+ "loss": 0.0732,
3010
+ "reward": 0.5248121619224548,
3011
+ "reward_std": 0.9334831684827805,
3012
+ "step": 158
3013
+ },
3014
+ {
3015
+ "clip_fraction": 0.0,
3016
+ "completion_length": 2874.0729446411133,
3017
+ "dapo/avg_reward_std": 0.18832522351294756,
3018
+ "dapo/filter_reward_index": 0.0,
3019
+ "dapo/kept_prompts_ratio": 0.2812500046566129,
3020
+ "dapo/num_sampling_attempts": 4.0,
3021
+ "dapo/sampling_efficiency": 38.69047619047618,
3022
+ "dapo/total_prompts_processed": 24.0,
3023
+ "dapo/valid_prompts_collected": 6.0,
3024
+ "epoch": 0.18171428571428572,
3025
+ "grad_norm": 0.25500980019569397,
3026
+ "kl": 0.03741455078125,
3027
+ "learning_rate": 2.1558482853517253e-07,
3028
+ "loss": 0.0537,
3029
+ "reward": 0.7963100634515285,
3030
+ "reward_std": 0.987776905298233,
3031
+ "step": 159
3032
+ },
3033
+ {
3034
+ "clip_fraction": 0.0,
3035
+ "completion_length": 2940.701385498047,
3036
+ "dapo/avg_reward_std": 0.16297742784023284,
3037
+ "dapo/filter_reward_index": 0.0,
3038
+ "dapo/kept_prompts_ratio": 0.20000000536441803,
3039
+ "dapo/num_sampling_attempts": 6.25,
3040
+ "dapo/sampling_efficiency": 18.368055555555557,
3041
+ "dapo/total_prompts_processed": 37.5,
3042
+ "dapo/valid_prompts_collected": 6.0,
3043
+ "epoch": 0.18285714285714286,
3044
+ "grad_norm": 0.2898014187812805,
3045
+ "kl": 0.058013916015625,
3046
+ "learning_rate": 2.1038068889975259e-07,
3047
+ "loss": 0.037,
3048
+ "reward": 0.5323189618065953,
3049
+ "reward_std": 0.9483579620718956,
3050
+ "step": 160
3051
+ },
3052
+ {
3053
+ "clip_fraction": 0.0,
3054
+ "completion_length": 3090.7882385253906,
3055
+ "dapo/avg_reward_std": 0.3046227526664734,
3056
+ "dapo/filter_reward_index": 0.0,
3057
+ "dapo/kept_prompts_ratio": 0.3733333414793015,
3058
+ "dapo/num_sampling_attempts": 3.125,
3059
+ "dapo/sampling_efficiency": 43.45238095238095,
3060
+ "dapo/total_prompts_processed": 18.75,
3061
+ "dapo/valid_prompts_collected": 6.0,
3062
+ "epoch": 0.184,
3063
+ "grad_norm": 0.28573325276374817,
3064
+ "kl": 0.040771484375,
3065
+ "learning_rate": 2.0528000059645995e-07,
3066
+ "loss": 0.0511,
3067
+ "reward": 0.6970310118049383,
3068
+ "reward_std": 0.9432796016335487,
3069
+ "step": 161
3070
+ },
3071
+ {
3072
+ "clip_fraction": 0.0,
3073
+ "completion_length": 3205.4270629882812,
3074
+ "dapo/avg_reward_std": 0.36972329020500183,
3075
+ "dapo/filter_reward_index": 0.0,
3076
+ "dapo/kept_prompts_ratio": 0.5438596621940011,
3077
+ "dapo/num_sampling_attempts": 2.375,
3078
+ "dapo/sampling_efficiency": 55.625,
3079
+ "dapo/total_prompts_processed": 14.25,
3080
+ "dapo/valid_prompts_collected": 6.0,
3081
+ "epoch": 0.18514285714285714,
3082
+ "grad_norm": 0.390523225069046,
3083
+ "kl": 0.052459716796875,
3084
+ "learning_rate": 2.0028431734436308e-07,
3085
+ "loss": 0.0818,
3086
+ "reward": 0.6346883065998554,
3087
+ "reward_std": 0.9713371768593788,
3088
+ "step": 162
3089
+ },
3090
+ {
3091
+ "clip_fraction": 0.0,
3092
+ "completion_length": 3082.107635498047,
3093
+ "dapo/avg_reward_std": 0.2315557522158469,
3094
+ "dapo/filter_reward_index": 0.0,
3095
+ "dapo/kept_prompts_ratio": 0.3440860264724301,
3096
+ "dapo/num_sampling_attempts": 3.875,
3097
+ "dapo/sampling_efficiency": 44.513888888888886,
3098
+ "dapo/total_prompts_processed": 23.25,
3099
+ "dapo/valid_prompts_collected": 6.0,
3100
+ "epoch": 0.18628571428571428,
3101
+ "grad_norm": 0.31898149847984314,
3102
+ "kl": 0.05328369140625,
3103
+ "learning_rate": 1.9539516087697517e-07,
3104
+ "loss": 0.0722,
3105
+ "reward": 0.6942785531282425,
3106
+ "reward_std": 0.9776681512594223,
3107
+ "step": 163
3108
+ },
3109
+ {
3110
+ "clip_fraction": 0.0,
3111
+ "completion_length": 3027.0243530273438,
3112
+ "dapo/avg_reward_std": 0.15836979811255997,
3113
+ "dapo/filter_reward_index": 0.0,
3114
+ "dapo/kept_prompts_ratio": 0.22972973214613424,
3115
+ "dapo/num_sampling_attempts": 4.625,
3116
+ "dapo/sampling_efficiency": 41.69642857142857,
3117
+ "dapo/total_prompts_processed": 27.75,
3118
+ "dapo/valid_prompts_collected": 6.0,
3119
+ "epoch": 0.18742857142857142,
3120
+ "grad_norm": 0.2931766211986542,
3121
+ "kl": 0.033111572265625,
3122
+ "learning_rate": 1.9061402047871833e-07,
3123
+ "loss": 0.0754,
3124
+ "reward": 0.944303285330534,
3125
+ "reward_std": 0.9451126903295517,
3126
+ "step": 164
3127
+ },
3128
+ {
3129
+ "clip_fraction": 0.0,
3130
+ "completion_length": 2894.260482788086,
3131
+ "dapo/avg_reward_std": 0.224585828371346,
3132
+ "dapo/filter_reward_index": 0.0,
3133
+ "dapo/kept_prompts_ratio": 0.2916666716337204,
3134
+ "dapo/num_sampling_attempts": 4.0,
3135
+ "dapo/sampling_efficiency": 37.5,
3136
+ "dapo/total_prompts_processed": 24.0,
3137
+ "dapo/valid_prompts_collected": 6.0,
3138
+ "epoch": 0.18857142857142858,
3139
+ "grad_norm": 0.24178634583950043,
3140
+ "kl": 0.0533447265625,
3141
+ "learning_rate": 1.8594235253127372e-07,
3142
+ "loss": 0.0505,
3143
+ "reward": 0.6519163623452187,
3144
+ "reward_std": 0.9615699052810669,
3145
+ "step": 165
3146
+ },
3147
+ {
3148
+ "clip_fraction": 0.0,
3149
+ "completion_length": 3002.7882385253906,
3150
+ "dapo/avg_reward_std": 0.29886600477942105,
3151
+ "dapo/filter_reward_index": 0.0,
3152
+ "dapo/kept_prompts_ratio": 0.3160919598464308,
3153
+ "dapo/num_sampling_attempts": 3.625,
3154
+ "dapo/sampling_efficiency": 35.416666666666664,
3155
+ "dapo/total_prompts_processed": 21.75,
3156
+ "dapo/valid_prompts_collected": 6.0,
3157
+ "epoch": 0.18971428571428572,
3158
+ "grad_norm": 0.31221655011177063,
3159
+ "kl": 0.047943115234375,
3160
+ "learning_rate": 1.8138158006995363e-07,
3161
+ "loss": 0.066,
3162
+ "reward": 0.6383479349315166,
3163
+ "reward_std": 0.9029820337891579,
3164
+ "step": 166
3165
+ },
3166
+ {
3167
+ "clip_fraction": 0.0,
3168
+ "completion_length": 2927.295150756836,
3169
+ "dapo/avg_reward_std": 0.34752671499001353,
3170
+ "dapo/filter_reward_index": 0.0,
3171
+ "dapo/kept_prompts_ratio": 0.5438596621940011,
3172
+ "dapo/num_sampling_attempts": 2.375,
3173
+ "dapo/sampling_efficiency": 48.95833333333333,
3174
+ "dapo/total_prompts_processed": 14.25,
3175
+ "dapo/valid_prompts_collected": 6.0,
3176
+ "epoch": 0.19085714285714286,
3177
+ "grad_norm": 0.2697528600692749,
3178
+ "kl": 0.045745849609375,
3179
+ "learning_rate": 1.7693309235023127e-07,
3180
+ "loss": 0.0483,
3181
+ "reward": 0.8266985702211969,
3182
+ "reward_std": 0.9544429406523705,
3183
+ "step": 167
3184
+ },
3185
+ {
3186
+ "clip_fraction": 0.0,
3187
+ "completion_length": 3212.857666015625,
3188
+ "dapo/avg_reward_std": 0.263968757220677,
3189
+ "dapo/filter_reward_index": 0.0,
3190
+ "dapo/kept_prompts_ratio": 0.3690476247242519,
3191
+ "dapo/num_sampling_attempts": 3.5,
3192
+ "dapo/sampling_efficiency": 41.388888888888886,
3193
+ "dapo/total_prompts_processed": 21.0,
3194
+ "dapo/valid_prompts_collected": 6.0,
3195
+ "epoch": 0.192,
3196
+ "grad_norm": 0.27940821647644043,
3197
+ "kl": 0.05059814453125,
3198
+ "learning_rate": 1.7259824442455923e-07,
3199
+ "loss": 0.0415,
3200
+ "reward": 0.7715255841612816,
3201
+ "reward_std": 0.95072440803051,
3202
+ "step": 168
3203
+ },
3204
+ {
3205
+ "clip_fraction": 0.0,
3206
+ "completion_length": 3112.5799255371094,
3207
+ "dapo/avg_reward_std": 0.22730760558231458,
3208
+ "dapo/filter_reward_index": 0.0,
3209
+ "dapo/kept_prompts_ratio": 0.3153153222960395,
3210
+ "dapo/num_sampling_attempts": 4.625,
3211
+ "dapo/sampling_efficiency": 26.249999999999996,
3212
+ "dapo/total_prompts_processed": 27.75,
3213
+ "dapo/valid_prompts_collected": 6.0,
3214
+ "epoch": 0.19314285714285714,
3215
+ "grad_norm": 0.4339730143547058,
3216
+ "kl": 0.06396484375,
3217
+ "learning_rate": 1.6837835672960831e-07,
3218
+ "loss": 0.0777,
3219
+ "reward": 0.5262689627707005,
3220
+ "reward_std": 0.9779800549149513,
3221
+ "step": 169
3222
+ },
3223
+ {
3224
+ "clip_fraction": 0.0,
3225
+ "completion_length": 3088.6632385253906,
3226
+ "dapo/avg_reward_std": 0.2333034286275506,
3227
+ "dapo/filter_reward_index": 0.0,
3228
+ "dapo/kept_prompts_ratio": 0.33333333721384406,
3229
+ "dapo/num_sampling_attempts": 4.0,
3230
+ "dapo/sampling_efficiency": 38.263888888888886,
3231
+ "dapo/total_prompts_processed": 24.0,
3232
+ "dapo/valid_prompts_collected": 6.0,
3233
+ "epoch": 0.19428571428571428,
3234
+ "grad_norm": 0.48384836316108704,
3235
+ "kl": 0.0555419921875,
3236
+ "learning_rate": 1.6427471468404952e-07,
3237
+ "loss": 0.0974,
3238
+ "reward": 0.7407102398574352,
3239
+ "reward_std": 0.9568767622113228,
3240
+ "step": 170
3241
+ },
3242
+ {
3243
+ "clip_fraction": 0.0,
3244
+ "completion_length": 3099.347198486328,
3245
+ "dapo/avg_reward_std": 0.17301563743282766,
3246
+ "dapo/filter_reward_index": 0.0,
3247
+ "dapo/kept_prompts_ratio": 0.27941177127992406,
3248
+ "dapo/num_sampling_attempts": 4.25,
3249
+ "dapo/sampling_efficiency": 31.874999999999996,
3250
+ "dapo/total_prompts_processed": 25.5,
3251
+ "dapo/valid_prompts_collected": 6.0,
3252
+ "epoch": 0.19542857142857142,
3253
+ "grad_norm": 0.42263394594192505,
3254
+ "kl": 0.0595703125,
3255
+ "learning_rate": 1.6028856829700258e-07,
3256
+ "loss": 0.0812,
3257
+ "reward": 0.4282900430262089,
3258
+ "reward_std": 0.914498083293438,
3259
+ "step": 171
3260
+ },
3261
+ {
3262
+ "clip_fraction": 0.0,
3263
+ "completion_length": 3111.232696533203,
3264
+ "dapo/avg_reward_std": 0.2433939976617694,
3265
+ "dapo/filter_reward_index": 0.0,
3266
+ "dapo/kept_prompts_ratio": 0.33333334093913436,
3267
+ "dapo/num_sampling_attempts": 4.0,
3268
+ "dapo/sampling_efficiency": 36.80555555555555,
3269
+ "dapo/total_prompts_processed": 24.0,
3270
+ "dapo/valid_prompts_collected": 6.0,
3271
+ "epoch": 0.19657142857142856,
3272
+ "grad_norm": 0.4814501404762268,
3273
+ "kl": 0.05926513671875,
3274
+ "learning_rate": 1.5642113178727193e-07,
3275
+ "loss": 0.0843,
3276
+ "reward": 0.6843680012971163,
3277
+ "reward_std": 0.8743765726685524,
3278
+ "step": 172
3279
+ },
3280
+ {
3281
+ "clip_fraction": 0.0,
3282
+ "completion_length": 3008.6563110351562,
3283
+ "dapo/avg_reward_std": 0.25363275137814606,
3284
+ "dapo/filter_reward_index": 0.0,
3285
+ "dapo/kept_prompts_ratio": 0.31818182811592566,
3286
+ "dapo/num_sampling_attempts": 4.125,
3287
+ "dapo/sampling_efficiency": 38.78472222222222,
3288
+ "dapo/total_prompts_processed": 24.75,
3289
+ "dapo/valid_prompts_collected": 6.0,
3290
+ "epoch": 0.1977142857142857,
3291
+ "grad_norm": 0.285697877407074,
3292
+ "kl": 0.05755615234375,
3293
+ "learning_rate": 1.5267358321348285e-07,
3294
+ "loss": 0.0456,
3295
+ "reward": 0.5798944532871246,
3296
+ "reward_std": 0.984041191637516,
3297
+ "step": 173
3298
+ },
3299
+ {
3300
+ "clip_fraction": 0.0,
3301
+ "completion_length": 3067.9791870117188,
3302
+ "dapo/avg_reward_std": 0.3438388824462891,
3303
+ "dapo/filter_reward_index": 0.0,
3304
+ "dapo/kept_prompts_ratio": 0.4500000044703484,
3305
+ "dapo/num_sampling_attempts": 2.5,
3306
+ "dapo/sampling_efficiency": 48.33333333333333,
3307
+ "dapo/total_prompts_processed": 15.0,
3308
+ "dapo/valid_prompts_collected": 6.0,
3309
+ "epoch": 0.19885714285714284,
3310
+ "grad_norm": 0.43520498275756836,
3311
+ "kl": 0.07098388671875,
3312
+ "learning_rate": 1.4904706411523448e-07,
3313
+ "loss": 0.0716,
3314
+ "reward": 0.5646946905180812,
3315
+ "reward_std": 0.9460153579711914,
3316
+ "step": 174
3317
+ },
3318
+ {
3319
+ "clip_fraction": 0.0,
3320
+ "completion_length": 3223.2916870117188,
3321
+ "dapo/avg_reward_std": 0.2690600073337555,
3322
+ "dapo/filter_reward_index": 0.0,
3323
+ "dapo/kept_prompts_ratio": 0.4133333420753479,
3324
+ "dapo/num_sampling_attempts": 3.125,
3325
+ "dapo/sampling_efficiency": 41.45833333333333,
3326
+ "dapo/total_prompts_processed": 18.75,
3327
+ "dapo/valid_prompts_collected": 6.0,
3328
+ "epoch": 0.2,
3329
+ "grad_norm": 0.35144945979118347,
3330
+ "kl": 0.06170654296875,
3331
+ "learning_rate": 1.4554267916537495e-07,
3332
+ "loss": 0.0348,
3333
+ "reward": 0.556399748660624,
3334
+ "reward_std": 0.9192204177379608,
3335
+ "step": 175
3336
+ },
3337
+ {
3338
+ "clip_fraction": 0.0,
3339
+ "completion_length": 2946.0799102783203,
3340
+ "dapo/avg_reward_std": 0.25316954652468365,
3341
+ "dapo/filter_reward_index": 0.0,
3342
+ "dapo/kept_prompts_ratio": 0.37500000558793545,
3343
+ "dapo/num_sampling_attempts": 3.0,
3344
+ "dapo/sampling_efficiency": 43.75,
3345
+ "dapo/total_prompts_processed": 18.0,
3346
+ "dapo/valid_prompts_collected": 6.0,
3347
+ "epoch": 0.20114285714285715,
3348
+ "grad_norm": 0.46807849407196045,
3349
+ "kl": 0.063018798828125,
3350
+ "learning_rate": 1.4216149583350755e-07,
3351
+ "loss": 0.0796,
3352
+ "reward": 0.6736351866275072,
3353
+ "reward_std": 0.9649264737963676,
3354
+ "step": 176
3355
+ },
3356
+ {
3357
+ "clip_fraction": 0.0,
3358
+ "completion_length": 3096.829864501953,
3359
+ "dapo/avg_reward_std": 0.31567848042437907,
3360
+ "dapo/filter_reward_index": 0.0,
3361
+ "dapo/kept_prompts_ratio": 0.482456142965116,
3362
+ "dapo/num_sampling_attempts": 2.375,
3363
+ "dapo/sampling_efficiency": 55.625,
3364
+ "dapo/total_prompts_processed": 14.25,
3365
+ "dapo/valid_prompts_collected": 6.0,
3366
+ "epoch": 0.2022857142857143,
3367
+ "grad_norm": 0.31731271743774414,
3368
+ "kl": 0.055938720703125,
3369
+ "learning_rate": 1.3890454406082956e-07,
3370
+ "loss": 0.0386,
3371
+ "reward": 0.681073285639286,
3372
+ "reward_std": 0.9661536440253258,
3373
+ "step": 177
3374
+ },
3375
+ {
3376
+ "clip_fraction": 0.0,
3377
+ "completion_length": 3235.8056030273438,
3378
+ "dapo/avg_reward_std": 0.24198689542967697,
3379
+ "dapo/filter_reward_index": 0.0,
3380
+ "dapo/kept_prompts_ratio": 0.3448275898037286,
3381
+ "dapo/num_sampling_attempts": 3.625,
3382
+ "dapo/sampling_efficiency": 47.08333333333333,
3383
+ "dapo/total_prompts_processed": 21.75,
3384
+ "dapo/valid_prompts_collected": 6.0,
3385
+ "epoch": 0.20342857142857143,
3386
+ "grad_norm": 0.4640950560569763,
3387
+ "kl": 0.072052001953125,
3388
+ "learning_rate": 1.3577281594640182e-07,
3389
+ "loss": 0.0702,
3390
+ "reward": 0.5520291309803724,
3391
+ "reward_std": 0.9967257082462311,
3392
+ "step": 178
3393
+ },
3394
+ {
3395
+ "clip_fraction": 0.0,
3396
+ "completion_length": 3237.77783203125,
3397
+ "dapo/avg_reward_std": 0.30828417566689575,
3398
+ "dapo/filter_reward_index": 0.0,
3399
+ "dapo/kept_prompts_ratio": 0.4242424341765317,
3400
+ "dapo/num_sampling_attempts": 2.75,
3401
+ "dapo/sampling_efficiency": 52.82738095238095,
3402
+ "dapo/total_prompts_processed": 16.5,
3403
+ "dapo/valid_prompts_collected": 6.0,
3404
+ "epoch": 0.20457142857142857,
3405
+ "grad_norm": 0.4502318203449249,
3406
+ "kl": 0.07550048828125,
3407
+ "learning_rate": 1.3276726544494571e-07,
3408
+ "loss": 0.0614,
3409
+ "reward": 0.6213867999613285,
3410
+ "reward_std": 0.9431608989834785,
3411
+ "step": 179
3412
+ },
3413
+ {
3414
+ "clip_fraction": 0.0,
3415
+ "completion_length": 2887.9236450195312,
3416
+ "dapo/avg_reward_std": 0.2488611958645008,
3417
+ "dapo/filter_reward_index": 0.0,
3418
+ "dapo/kept_prompts_ratio": 0.3518518612340645,
3419
+ "dapo/num_sampling_attempts": 3.375,
3420
+ "dapo/sampling_efficiency": 48.035714285714285,
3421
+ "dapo/total_prompts_processed": 20.25,
3422
+ "dapo/valid_prompts_collected": 6.0,
3423
+ "epoch": 0.2057142857142857,
3424
+ "grad_norm": 0.44646504521369934,
3425
+ "kl": 0.073760986328125,
3426
+ "learning_rate": 1.2988880807625927e-07,
3427
+ "loss": 0.0683,
3428
+ "reward": 0.5839751102030277,
3429
+ "reward_std": 0.9090578481554985,
3430
+ "step": 180
3431
+ },
3432
+ {
3433
+ "clip_fraction": 0.0,
3434
+ "completion_length": 3021.2916870117188,
3435
+ "dapo/avg_reward_std": 0.20883248069069602,
3436
+ "dapo/filter_reward_index": 0.0,
3437
+ "dapo/kept_prompts_ratio": 0.2878787942004926,
3438
+ "dapo/num_sampling_attempts": 4.125,
3439
+ "dapo/sampling_efficiency": 39.632936507936506,
3440
+ "dapo/total_prompts_processed": 24.75,
3441
+ "dapo/valid_prompts_collected": 6.0,
3442
+ "epoch": 0.20685714285714285,
3443
+ "grad_norm": 0.36042678356170654,
3444
+ "kl": 0.07421875,
3445
+ "learning_rate": 1.2713832064634125e-07,
3446
+ "loss": 0.054,
3447
+ "reward": 0.5517729418352246,
3448
+ "reward_std": 0.9483400657773018,
3449
+ "step": 181
3450
+ },
3451
+ {
3452
+ "clip_fraction": 0.0,
3453
+ "completion_length": 3249.2118530273438,
3454
+ "dapo/avg_reward_std": 0.2615335573043142,
3455
+ "dapo/filter_reward_index": 0.0,
3456
+ "dapo/kept_prompts_ratio": 0.33333333847778185,
3457
+ "dapo/num_sampling_attempts": 3.5,
3458
+ "dapo/sampling_efficiency": 46.785714285714285,
3459
+ "dapo/total_prompts_processed": 21.0,
3460
+ "dapo/valid_prompts_collected": 6.0,
3461
+ "epoch": 0.208,
3462
+ "grad_norm": 0.4518042504787445,
3463
+ "kl": 0.072021484375,
3464
+ "learning_rate": 1.2451664098030743e-07,
3465
+ "loss": 0.0654,
3466
+ "reward": 0.686168298125267,
3467
+ "reward_std": 0.9350233674049377,
3468
+ "step": 182
3469
+ },
3470
+ {
3471
+ "clip_fraction": 0.0,
3472
+ "completion_length": 3221.6631774902344,
3473
+ "dapo/avg_reward_std": 0.27866364789731574,
3474
+ "dapo/filter_reward_index": 0.0,
3475
+ "dapo/kept_prompts_ratio": 0.3686868738044392,
3476
+ "dapo/num_sampling_attempts": 4.125,
3477
+ "dapo/sampling_efficiency": 28.4375,
3478
+ "dapo/total_prompts_processed": 24.75,
3479
+ "dapo/valid_prompts_collected": 6.0,
3480
+ "epoch": 0.20914285714285713,
3481
+ "grad_norm": 0.32408109307289124,
3482
+ "kl": 0.062255859375,
3483
+ "learning_rate": 1.220245676671809e-07,
3484
+ "loss": 0.0384,
3485
+ "reward": 0.6384344138205051,
3486
+ "reward_std": 0.9783304929733276,
3487
+ "step": 183
3488
+ },
3489
+ {
3490
+ "clip_fraction": 0.0,
3491
+ "completion_length": 3199.1354370117188,
3492
+ "dapo/avg_reward_std": 0.2816663732131322,
3493
+ "dapo/filter_reward_index": 0.0,
3494
+ "dapo/kept_prompts_ratio": 0.316666671137015,
3495
+ "dapo/num_sampling_attempts": 3.75,
3496
+ "dapo/sampling_efficiency": 45.55555555555555,
3497
+ "dapo/total_prompts_processed": 22.5,
3498
+ "dapo/valid_prompts_collected": 6.0,
3499
+ "epoch": 0.2102857142857143,
3500
+ "grad_norm": 0.2197091430425644,
3501
+ "kl": 0.07550048828125,
3502
+ "learning_rate": 1.1966285981663407e-07,
3503
+ "loss": 0.0211,
3504
+ "reward": 0.45471471454948187,
3505
+ "reward_std": 0.9136239141225815,
3506
+ "step": 184
3507
+ },
3508
+ {
3509
+ "clip_fraction": 0.0,
3510
+ "completion_length": 3037.420166015625,
3511
+ "dapo/avg_reward_std": 0.17516983683044846,
3512
+ "dapo/filter_reward_index": 0.0,
3513
+ "dapo/kept_prompts_ratio": 0.2657657728807346,
3514
+ "dapo/num_sampling_attempts": 4.625,
3515
+ "dapo/sampling_efficiency": 25.729166666666664,
3516
+ "dapo/total_prompts_processed": 27.75,
3517
+ "dapo/valid_prompts_collected": 6.0,
3518
+ "epoch": 0.21142857142857144,
3519
+ "grad_norm": 0.4012245535850525,
3520
+ "kl": 0.091796875,
3521
+ "learning_rate": 1.1743223682775649e-07,
3522
+ "loss": 0.0442,
3523
+ "reward": 0.7168623730540276,
3524
+ "reward_std": 0.9515729621052742,
3525
+ "step": 185
3526
+ },
3527
+ {
3528
+ "clip_fraction": 0.0,
3529
+ "completion_length": 3222.767364501953,
3530
+ "dapo/avg_reward_std": 0.2550514280796051,
3531
+ "dapo/filter_reward_index": 0.0,
3532
+ "dapo/kept_prompts_ratio": 0.3444444512327512,
3533
+ "dapo/num_sampling_attempts": 3.75,
3534
+ "dapo/sampling_efficiency": 43.64583333333333,
3535
+ "dapo/total_prompts_processed": 22.5,
3536
+ "dapo/valid_prompts_collected": 6.0,
3537
+ "epoch": 0.21257142857142858,
3538
+ "grad_norm": 0.4945845305919647,
3539
+ "kl": 0.083465576171875,
3540
+ "learning_rate": 1.1533337816991931e-07,
3541
+ "loss": 0.0667,
3542
+ "reward": 0.5391142014414072,
3543
+ "reward_std": 0.9342528805136681,
3544
+ "step": 186
3545
+ },
3546
+ {
3547
+ "clip_fraction": 0.0,
3548
+ "completion_length": 2858.5659942626953,
3549
+ "dapo/avg_reward_std": 0.23423856112264818,
3550
+ "dapo/filter_reward_index": 0.0,
3551
+ "dapo/kept_prompts_ratio": 0.3118279609949358,
3552
+ "dapo/num_sampling_attempts": 3.875,
3553
+ "dapo/sampling_efficiency": 40.0297619047619,
3554
+ "dapo/total_prompts_processed": 23.25,
3555
+ "dapo/valid_prompts_collected": 6.0,
3556
+ "epoch": 0.21371428571428572,
3557
+ "grad_norm": 0.4291866421699524,
3558
+ "kl": 0.091796875,
3559
+ "learning_rate": 1.1336692317580158e-07,
3560
+ "loss": 0.0384,
3561
+ "reward": 0.7481220848858356,
3562
+ "reward_std": 0.9474795907735825,
3563
+ "step": 187
3564
+ },
3565
+ {
3566
+ "clip_fraction": 0.0,
3567
+ "completion_length": 3123.170166015625,
3568
+ "dapo/avg_reward_std": 0.1988734739857751,
3569
+ "dapo/filter_reward_index": 0.0,
3570
+ "dapo/kept_prompts_ratio": 0.25675675997862946,
3571
+ "dapo/num_sampling_attempts": 4.625,
3572
+ "dapo/sampling_efficiency": 31.875,
3573
+ "dapo/total_prompts_processed": 27.75,
3574
+ "dapo/valid_prompts_collected": 6.0,
3575
+ "epoch": 0.21485714285714286,
3576
+ "grad_norm": 0.30453264713287354,
3577
+ "kl": 0.080657958984375,
3578
+ "learning_rate": 1.1153347084664419e-07,
3579
+ "loss": 0.0273,
3580
+ "reward": 0.6236942922696471,
3581
+ "reward_std": 0.9715093299746513,
3582
+ "step": 188
3583
+ },
3584
+ {
3585
+ "clip_fraction": 0.0,
3586
+ "completion_length": 2872.9618530273438,
3587
+ "dapo/avg_reward_std": 0.21385114904372923,
3588
+ "dapo/filter_reward_index": 0.0,
3589
+ "dapo/kept_prompts_ratio": 0.3333333365378841,
3590
+ "dapo/num_sampling_attempts": 3.875,
3591
+ "dapo/sampling_efficiency": 39.18154761904762,
3592
+ "dapo/total_prompts_processed": 23.25,
3593
+ "dapo/valid_prompts_collected": 6.0,
3594
+ "epoch": 0.216,
3595
+ "grad_norm": 0.5780288577079773,
3596
+ "kl": 0.08612060546875,
3597
+ "learning_rate": 1.0983357966978745e-07,
3598
+ "loss": 0.0607,
3599
+ "reward": 0.7514887787401676,
3600
+ "reward_std": 1.0098591819405556,
3601
+ "step": 189
3602
+ },
3603
+ {
3604
+ "clip_fraction": 0.0,
3605
+ "completion_length": 2937.093780517578,
3606
+ "dapo/avg_reward_std": 0.1677520631575117,
3607
+ "dapo/filter_reward_index": 0.0,
3608
+ "dapo/kept_prompts_ratio": 0.21895425284610076,
3609
+ "dapo/num_sampling_attempts": 6.375,
3610
+ "dapo/sampling_efficiency": 20.689484126984123,
3611
+ "dapo/total_prompts_processed": 38.25,
3612
+ "dapo/valid_prompts_collected": 6.0,
3613
+ "epoch": 0.21714285714285714,
3614
+ "grad_norm": 0.3947860896587372,
3615
+ "kl": 0.076263427734375,
3616
+ "learning_rate": 1.0826776744855121e-07,
3617
+ "loss": 0.0487,
3618
+ "reward": 0.6180934552103281,
3619
+ "reward_std": 0.9050487726926804,
3620
+ "step": 190
3621
+ },
3622
+ {
3623
+ "clip_fraction": 0.0,
3624
+ "completion_length": 3252.090301513672,
3625
+ "dapo/avg_reward_std": 0.24265852073828378,
3626
+ "dapo/filter_reward_index": 0.0,
3627
+ "dapo/kept_prompts_ratio": 0.3055555621782939,
3628
+ "dapo/num_sampling_attempts": 3.75,
3629
+ "dapo/sampling_efficiency": 38.020833333333336,
3630
+ "dapo/total_prompts_processed": 22.5,
3631
+ "dapo/valid_prompts_collected": 6.0,
3632
+ "epoch": 0.21828571428571428,
3633
+ "grad_norm": 0.48333072662353516,
3634
+ "kl": 0.09661865234375,
3635
+ "learning_rate": 1.068365111445064e-07,
3636
+ "loss": 0.0584,
3637
+ "reward": 0.4759152363985777,
3638
+ "reward_std": 0.9479196071624756,
3639
+ "step": 191
3640
+ },
3641
+ {
3642
+ "clip_fraction": 0.0,
3643
+ "completion_length": 3074.420166015625,
3644
+ "dapo/avg_reward_std": 0.2189681170315578,
3645
+ "dapo/filter_reward_index": 0.0,
3646
+ "dapo/kept_prompts_ratio": 0.3333333371014431,
3647
+ "dapo/num_sampling_attempts": 3.625,
3648
+ "dapo/sampling_efficiency": 46.45833333333333,
3649
+ "dapo/total_prompts_processed": 21.75,
3650
+ "dapo/valid_prompts_collected": 6.0,
3651
+ "epoch": 0.21942857142857142,
3652
+ "grad_norm": 0.5536202192306519,
3653
+ "kl": 0.09814453125,
3654
+ "learning_rate": 1.0554024673218806e-07,
3655
+ "loss": 0.0731,
3656
+ "reward": 0.48804986744653434,
3657
+ "reward_std": 0.9367131069302559,
3658
+ "step": 192
3659
+ },
3660
+ {
3661
+ "clip_fraction": 0.0,
3662
+ "completion_length": 3026.9097595214844,
3663
+ "dapo/avg_reward_std": 0.21337791310774312,
3664
+ "dapo/filter_reward_index": 0.0,
3665
+ "dapo/kept_prompts_ratio": 0.256756762395034,
3666
+ "dapo/num_sampling_attempts": 4.625,
3667
+ "dapo/sampling_efficiency": 30.3125,
3668
+ "dapo/total_prompts_processed": 27.75,
3669
+ "dapo/valid_prompts_collected": 6.0,
3670
+ "epoch": 0.22057142857142858,
3671
+ "grad_norm": 0.5239105224609375,
3672
+ "kl": 0.0985107421875,
3673
+ "learning_rate": 1.0437936906629334e-07,
3674
+ "loss": 0.0561,
3675
+ "reward": 0.45341441221535206,
3676
+ "reward_std": 0.8912393003702164,
3677
+ "step": 193
3678
+ },
3679
+ {
3680
+ "clip_fraction": 0.0,
3681
+ "completion_length": 2896.656280517578,
3682
+ "dapo/avg_reward_std": 0.31374274492263793,
3683
+ "dapo/filter_reward_index": 0.0,
3684
+ "dapo/kept_prompts_ratio": 0.4333333417773247,
3685
+ "dapo/num_sampling_attempts": 2.5,
3686
+ "dapo/sampling_efficiency": 46.875,
3687
+ "dapo/total_prompts_processed": 15.0,
3688
+ "dapo/valid_prompts_collected": 6.0,
3689
+ "epoch": 0.22171428571428572,
3690
+ "grad_norm": 0.6310634016990662,
3691
+ "kl": 0.108062744140625,
3692
+ "learning_rate": 1.0335423176140511e-07,
3693
+ "loss": 0.0809,
3694
+ "reward": 0.6844924800097942,
3695
+ "reward_std": 0.9649646729230881,
3696
+ "step": 194
3697
+ },
3698
+ {
3699
+ "clip_fraction": 0.0,
3700
+ "completion_length": 3319.7048950195312,
3701
+ "dapo/avg_reward_std": 0.21983732057340216,
3702
+ "dapo/filter_reward_index": 0.0,
3703
+ "dapo/kept_prompts_ratio": 0.30303031251286017,
3704
+ "dapo/num_sampling_attempts": 4.125,
3705
+ "dapo/sampling_efficiency": 29.479166666666664,
3706
+ "dapo/total_prompts_processed": 24.75,
3707
+ "dapo/valid_prompts_collected": 6.0,
3708
+ "epoch": 0.22285714285714286,
3709
+ "grad_norm": 0.47936248779296875,
3710
+ "kl": 0.0997314453125,
3711
+ "learning_rate": 1.0246514708427701e-07,
3712
+ "loss": 0.0479,
3713
+ "reward": 0.3993752491660416,
3714
+ "reward_std": 0.9481607303023338,
3715
+ "step": 195
3716
+ },
3717
+ {
3718
+ "clip_fraction": 0.0,
3719
+ "completion_length": 3298.1736450195312,
3720
+ "dapo/avg_reward_std": 0.2514548934996128,
3721
+ "dapo/filter_reward_index": 0.0,
3722
+ "dapo/kept_prompts_ratio": 0.28333333916962145,
3723
+ "dapo/num_sampling_attempts": 5.0,
3724
+ "dapo/sampling_efficiency": 33.13988095238095,
3725
+ "dapo/total_prompts_processed": 30.0,
3726
+ "dapo/valid_prompts_collected": 6.0,
3727
+ "epoch": 0.224,
3728
+ "grad_norm": 0.36350947618484497,
3729
+ "kl": 0.1043701171875,
3730
+ "learning_rate": 1.017123858587145e-07,
3731
+ "loss": 0.0389,
3732
+ "reward": 0.31427645590156317,
3733
+ "reward_std": 0.8980218172073364,
3734
+ "step": 196
3735
+ },
3736
+ {
3737
+ "clip_fraction": 0.0,
3738
+ "completion_length": 3260.4861450195312,
3739
+ "dapo/avg_reward_std": 0.1836753969009106,
3740
+ "dapo/filter_reward_index": 0.0,
3741
+ "dapo/kept_prompts_ratio": 0.24786325486806723,
3742
+ "dapo/num_sampling_attempts": 4.875,
3743
+ "dapo/sampling_efficiency": 28.154761904761905,
3744
+ "dapo/total_prompts_processed": 29.25,
3745
+ "dapo/valid_prompts_collected": 6.0,
3746
+ "epoch": 0.22514285714285714,
3747
+ "grad_norm": 0.3354601562023163,
3748
+ "kl": 0.0946044921875,
3749
+ "learning_rate": 1.0109617738307911e-07,
3750
+ "loss": 0.0301,
3751
+ "reward": 0.5015182960778475,
3752
+ "reward_std": 0.9334053322672844,
3753
+ "step": 197
3754
+ },
3755
+ {
3756
+ "clip_fraction": 0.0,
3757
+ "completion_length": 3031.3958129882812,
3758
+ "dapo/avg_reward_std": 0.3008538554696476,
3759
+ "dapo/filter_reward_index": 0.0,
3760
+ "dapo/kept_prompts_ratio": 0.5196078463512308,
3761
+ "dapo/num_sampling_attempts": 2.125,
3762
+ "dapo/sampling_efficiency": 76.5625,
3763
+ "dapo/total_prompts_processed": 12.75,
3764
+ "dapo/valid_prompts_collected": 6.0,
3765
+ "epoch": 0.22628571428571428,
3766
+ "grad_norm": 0.48223650455474854,
3767
+ "kl": 0.10247802734375,
3768
+ "learning_rate": 1.0061670936044178e-07,
3769
+ "loss": 0.0648,
3770
+ "reward": 0.573589576408267,
3771
+ "reward_std": 0.9578919112682343,
3772
+ "step": 198
3773
+ },
3774
+ {
3775
+ "clip_fraction": 0.0,
3776
+ "completion_length": 2948.3854064941406,
3777
+ "dapo/avg_reward_std": 0.43072181940078735,
3778
+ "dapo/filter_reward_index": 0.0,
3779
+ "dapo/kept_prompts_ratio": 0.8333333373069763,
3780
+ "dapo/num_sampling_attempts": 1.25,
3781
+ "dapo/sampling_efficiency": 87.5,
3782
+ "dapo/total_prompts_processed": 7.5,
3783
+ "dapo/valid_prompts_collected": 6.0,
3784
+ "epoch": 0.22742857142857142,
3785
+ "grad_norm": 0.6141620874404907,
3786
+ "kl": 0.09808349609375,
3787
+ "learning_rate": 1.002741278414069e-07,
3788
+ "loss": 0.0827,
3789
+ "reward": 0.7053878791630268,
3790
+ "reward_std": 0.9694960787892342,
3791
+ "step": 199
3792
+ },
3793
+ {
3794
+ "clip_fraction": 0.0,
3795
+ "completion_length": 2714.482666015625,
3796
+ "dapo/avg_reward_std": 0.26207208441149804,
3797
+ "dapo/filter_reward_index": 0.0,
3798
+ "dapo/kept_prompts_ratio": 0.37096774914572317,
3799
+ "dapo/num_sampling_attempts": 3.875,
3800
+ "dapo/sampling_efficiency": 30.624999999999993,
3801
+ "dapo/total_prompts_processed": 23.25,
3802
+ "dapo/valid_prompts_collected": 6.0,
3803
+ "epoch": 0.22857142857142856,
3804
+ "grad_norm": 0.2072688341140747,
3805
+ "kl": 0.1064453125,
3806
+ "learning_rate": 1.0006853717962393e-07,
3807
+ "loss": 0.0122,
3808
+ "reward": 0.5771910101175308,
3809
+ "reward_std": 0.9156405553221703,
3810
+ "step": 200
3811
+ },
3812
+ {
3813
+ "epoch": 0.22857142857142856,
3814
+ "step": 200,
3815
  "total_flos": 0.0,
3816
+ "train_loss": 0.02940896774176508,
3817
+ "train_runtime": 83918.4654,
3818
+ "train_samples_per_second": 0.114,
3819
+ "train_steps_per_second": 0.002
3820
  }
3821
  ],
3822
  "logging_steps": 1,
3823
+ "max_steps": 200,
3824
  "num_input_tokens_seen": 0,
3825
  "num_train_epochs": 1,
3826
  "save_steps": 10,