Text Generation
Transformers
Safetensors
English
bolmo
custom_code
benjamin commited on
Commit
5b2811e
·
verified ·
1 Parent(s): 1ff0ce4

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +513 -0
  2. olmo_core/.metadata.json +1 -0
  3. olmo_core/config.json +361 -0
  4. olmo_core/model_and_optim/.metadata +3 -0
  5. olmo_core/model_and_optim/__0_0.distcp +3 -0
  6. olmo_core/model_and_optim/__0_1.distcp +3 -0
  7. olmo_core/model_and_optim/__0_10.distcp +3 -0
  8. olmo_core/model_and_optim/__0_11.distcp +3 -0
  9. olmo_core/model_and_optim/__0_12.distcp +3 -0
  10. olmo_core/model_and_optim/__0_13.distcp +3 -0
  11. olmo_core/model_and_optim/__0_14.distcp +3 -0
  12. olmo_core/model_and_optim/__0_15.distcp +3 -0
  13. olmo_core/model_and_optim/__0_2.distcp +3 -0
  14. olmo_core/model_and_optim/__0_3.distcp +3 -0
  15. olmo_core/model_and_optim/__0_4.distcp +3 -0
  16. olmo_core/model_and_optim/__0_5.distcp +3 -0
  17. olmo_core/model_and_optim/__0_6.distcp +3 -0
  18. olmo_core/model_and_optim/__0_7.distcp +3 -0
  19. olmo_core/model_and_optim/__0_8.distcp +3 -0
  20. olmo_core/model_and_optim/__0_9.distcp +3 -0
  21. olmo_core/model_and_optim/__10_0.distcp +3 -0
  22. olmo_core/model_and_optim/__10_1.distcp +3 -0
  23. olmo_core/model_and_optim/__10_10.distcp +3 -0
  24. olmo_core/model_and_optim/__10_11.distcp +3 -0
  25. olmo_core/model_and_optim/__10_12.distcp +3 -0
  26. olmo_core/model_and_optim/__10_13.distcp +3 -0
  27. olmo_core/model_and_optim/__10_14.distcp +3 -0
  28. olmo_core/model_and_optim/__10_15.distcp +3 -0
  29. olmo_core/model_and_optim/__10_2.distcp +3 -0
  30. olmo_core/model_and_optim/__10_3.distcp +3 -0
  31. olmo_core/model_and_optim/__10_4.distcp +3 -0
  32. olmo_core/model_and_optim/__10_5.distcp +3 -0
  33. olmo_core/model_and_optim/__10_6.distcp +3 -0
  34. olmo_core/model_and_optim/__10_7.distcp +3 -0
  35. olmo_core/model_and_optim/__10_8.distcp +3 -0
  36. olmo_core/model_and_optim/__10_9.distcp +3 -0
  37. olmo_core/model_and_optim/__11_0.distcp +3 -0
  38. olmo_core/model_and_optim/__11_1.distcp +3 -0
  39. olmo_core/model_and_optim/__11_10.distcp +3 -0
  40. olmo_core/model_and_optim/__11_11.distcp +3 -0
  41. olmo_core/model_and_optim/__11_12.distcp +3 -0
  42. olmo_core/model_and_optim/__11_13.distcp +3 -0
  43. olmo_core/model_and_optim/__11_14.distcp +3 -0
  44. olmo_core/model_and_optim/__11_15.distcp +3 -0
  45. olmo_core/model_and_optim/__11_2.distcp +3 -0
  46. olmo_core/model_and_optim/__11_3.distcp +3 -0
  47. olmo_core/model_and_optim/__11_4.distcp +3 -0
  48. olmo_core/model_and_optim/__11_5.distcp +3 -0
  49. olmo_core/model_and_optim/__11_6.distcp +3 -0
  50. olmo_core/model_and_optim/__11_7.distcp +3 -0
.gitattributes CHANGED
@@ -33,3 +33,516 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ olmo_core/model_and_optim/.metadata filter=lfs diff=lfs merge=lfs -text
37
+ olmo_core/model_and_optim/__0_0.distcp filter=lfs diff=lfs merge=lfs -text
38
+ olmo_core/model_and_optim/__0_1.distcp filter=lfs diff=lfs merge=lfs -text
39
+ olmo_core/model_and_optim/__0_10.distcp filter=lfs diff=lfs merge=lfs -text
40
+ olmo_core/model_and_optim/__0_11.distcp filter=lfs diff=lfs merge=lfs -text
41
+ olmo_core/model_and_optim/__0_12.distcp filter=lfs diff=lfs merge=lfs -text
42
+ olmo_core/model_and_optim/__0_13.distcp filter=lfs diff=lfs merge=lfs -text
43
+ olmo_core/model_and_optim/__0_14.distcp filter=lfs diff=lfs merge=lfs -text
44
+ olmo_core/model_and_optim/__0_15.distcp filter=lfs diff=lfs merge=lfs -text
45
+ olmo_core/model_and_optim/__0_2.distcp filter=lfs diff=lfs merge=lfs -text
46
+ olmo_core/model_and_optim/__0_3.distcp filter=lfs diff=lfs merge=lfs -text
47
+ olmo_core/model_and_optim/__0_4.distcp filter=lfs diff=lfs merge=lfs -text
48
+ olmo_core/model_and_optim/__0_5.distcp filter=lfs diff=lfs merge=lfs -text
49
+ olmo_core/model_and_optim/__0_6.distcp filter=lfs diff=lfs merge=lfs -text
50
+ olmo_core/model_and_optim/__0_7.distcp filter=lfs diff=lfs merge=lfs -text
51
+ olmo_core/model_and_optim/__0_8.distcp filter=lfs diff=lfs merge=lfs -text
52
+ olmo_core/model_and_optim/__0_9.distcp filter=lfs diff=lfs merge=lfs -text
53
+ olmo_core/model_and_optim/__10_0.distcp filter=lfs diff=lfs merge=lfs -text
54
+ olmo_core/model_and_optim/__10_1.distcp filter=lfs diff=lfs merge=lfs -text
55
+ olmo_core/model_and_optim/__10_10.distcp filter=lfs diff=lfs merge=lfs -text
56
+ olmo_core/model_and_optim/__10_11.distcp filter=lfs diff=lfs merge=lfs -text
57
+ olmo_core/model_and_optim/__10_12.distcp filter=lfs diff=lfs merge=lfs -text
58
+ olmo_core/model_and_optim/__10_13.distcp filter=lfs diff=lfs merge=lfs -text
59
+ olmo_core/model_and_optim/__10_14.distcp filter=lfs diff=lfs merge=lfs -text
60
+ olmo_core/model_and_optim/__10_15.distcp filter=lfs diff=lfs merge=lfs -text
61
+ olmo_core/model_and_optim/__10_2.distcp filter=lfs diff=lfs merge=lfs -text
62
+ olmo_core/model_and_optim/__10_3.distcp filter=lfs diff=lfs merge=lfs -text
63
+ olmo_core/model_and_optim/__10_4.distcp filter=lfs diff=lfs merge=lfs -text
64
+ olmo_core/model_and_optim/__10_5.distcp filter=lfs diff=lfs merge=lfs -text
65
+ olmo_core/model_and_optim/__10_6.distcp filter=lfs diff=lfs merge=lfs -text
66
+ olmo_core/model_and_optim/__10_7.distcp filter=lfs diff=lfs merge=lfs -text
67
+ olmo_core/model_and_optim/__10_8.distcp filter=lfs diff=lfs merge=lfs -text
68
+ olmo_core/model_and_optim/__10_9.distcp filter=lfs diff=lfs merge=lfs -text
69
+ olmo_core/model_and_optim/__11_0.distcp filter=lfs diff=lfs merge=lfs -text
70
+ olmo_core/model_and_optim/__11_1.distcp filter=lfs diff=lfs merge=lfs -text
71
+ olmo_core/model_and_optim/__11_10.distcp filter=lfs diff=lfs merge=lfs -text
72
+ olmo_core/model_and_optim/__11_11.distcp filter=lfs diff=lfs merge=lfs -text
73
+ olmo_core/model_and_optim/__11_12.distcp filter=lfs diff=lfs merge=lfs -text
74
+ olmo_core/model_and_optim/__11_13.distcp filter=lfs diff=lfs merge=lfs -text
75
+ olmo_core/model_and_optim/__11_14.distcp filter=lfs diff=lfs merge=lfs -text
76
+ olmo_core/model_and_optim/__11_15.distcp filter=lfs diff=lfs merge=lfs -text
77
+ olmo_core/model_and_optim/__11_2.distcp filter=lfs diff=lfs merge=lfs -text
78
+ olmo_core/model_and_optim/__11_3.distcp filter=lfs diff=lfs merge=lfs -text
79
+ olmo_core/model_and_optim/__11_4.distcp filter=lfs diff=lfs merge=lfs -text
80
+ olmo_core/model_and_optim/__11_5.distcp filter=lfs diff=lfs merge=lfs -text
81
+ olmo_core/model_and_optim/__11_6.distcp filter=lfs diff=lfs merge=lfs -text
82
+ olmo_core/model_and_optim/__11_7.distcp filter=lfs diff=lfs merge=lfs -text
83
+ olmo_core/model_and_optim/__11_8.distcp filter=lfs diff=lfs merge=lfs -text
84
+ olmo_core/model_and_optim/__11_9.distcp filter=lfs diff=lfs merge=lfs -text
85
+ olmo_core/model_and_optim/__12_0.distcp filter=lfs diff=lfs merge=lfs -text
86
+ olmo_core/model_and_optim/__12_1.distcp filter=lfs diff=lfs merge=lfs -text
87
+ olmo_core/model_and_optim/__12_10.distcp filter=lfs diff=lfs merge=lfs -text
88
+ olmo_core/model_and_optim/__12_11.distcp filter=lfs diff=lfs merge=lfs -text
89
+ olmo_core/model_and_optim/__12_12.distcp filter=lfs diff=lfs merge=lfs -text
90
+ olmo_core/model_and_optim/__12_13.distcp filter=lfs diff=lfs merge=lfs -text
91
+ olmo_core/model_and_optim/__12_14.distcp filter=lfs diff=lfs merge=lfs -text
92
+ olmo_core/model_and_optim/__12_15.distcp filter=lfs diff=lfs merge=lfs -text
93
+ olmo_core/model_and_optim/__12_2.distcp filter=lfs diff=lfs merge=lfs -text
94
+ olmo_core/model_and_optim/__12_3.distcp filter=lfs diff=lfs merge=lfs -text
95
+ olmo_core/model_and_optim/__12_4.distcp filter=lfs diff=lfs merge=lfs -text
96
+ olmo_core/model_and_optim/__12_5.distcp filter=lfs diff=lfs merge=lfs -text
97
+ olmo_core/model_and_optim/__12_6.distcp filter=lfs diff=lfs merge=lfs -text
98
+ olmo_core/model_and_optim/__12_7.distcp filter=lfs diff=lfs merge=lfs -text
99
+ olmo_core/model_and_optim/__12_8.distcp filter=lfs diff=lfs merge=lfs -text
100
+ olmo_core/model_and_optim/__12_9.distcp filter=lfs diff=lfs merge=lfs -text
101
+ olmo_core/model_and_optim/__13_0.distcp filter=lfs diff=lfs merge=lfs -text
102
+ olmo_core/model_and_optim/__13_1.distcp filter=lfs diff=lfs merge=lfs -text
103
+ olmo_core/model_and_optim/__13_10.distcp filter=lfs diff=lfs merge=lfs -text
104
+ olmo_core/model_and_optim/__13_11.distcp filter=lfs diff=lfs merge=lfs -text
105
+ olmo_core/model_and_optim/__13_12.distcp filter=lfs diff=lfs merge=lfs -text
106
+ olmo_core/model_and_optim/__13_13.distcp filter=lfs diff=lfs merge=lfs -text
107
+ olmo_core/model_and_optim/__13_14.distcp filter=lfs diff=lfs merge=lfs -text
108
+ olmo_core/model_and_optim/__13_15.distcp filter=lfs diff=lfs merge=lfs -text
109
+ olmo_core/model_and_optim/__13_2.distcp filter=lfs diff=lfs merge=lfs -text
110
+ olmo_core/model_and_optim/__13_3.distcp filter=lfs diff=lfs merge=lfs -text
111
+ olmo_core/model_and_optim/__13_4.distcp filter=lfs diff=lfs merge=lfs -text
112
+ olmo_core/model_and_optim/__13_5.distcp filter=lfs diff=lfs merge=lfs -text
113
+ olmo_core/model_and_optim/__13_6.distcp filter=lfs diff=lfs merge=lfs -text
114
+ olmo_core/model_and_optim/__13_7.distcp filter=lfs diff=lfs merge=lfs -text
115
+ olmo_core/model_and_optim/__13_8.distcp filter=lfs diff=lfs merge=lfs -text
116
+ olmo_core/model_and_optim/__13_9.distcp filter=lfs diff=lfs merge=lfs -text
117
+ olmo_core/model_and_optim/__14_0.distcp filter=lfs diff=lfs merge=lfs -text
118
+ olmo_core/model_and_optim/__14_1.distcp filter=lfs diff=lfs merge=lfs -text
119
+ olmo_core/model_and_optim/__14_10.distcp filter=lfs diff=lfs merge=lfs -text
120
+ olmo_core/model_and_optim/__14_11.distcp filter=lfs diff=lfs merge=lfs -text
121
+ olmo_core/model_and_optim/__14_12.distcp filter=lfs diff=lfs merge=lfs -text
122
+ olmo_core/model_and_optim/__14_13.distcp filter=lfs diff=lfs merge=lfs -text
123
+ olmo_core/model_and_optim/__14_14.distcp filter=lfs diff=lfs merge=lfs -text
124
+ olmo_core/model_and_optim/__14_15.distcp filter=lfs diff=lfs merge=lfs -text
125
+ olmo_core/model_and_optim/__14_2.distcp filter=lfs diff=lfs merge=lfs -text
126
+ olmo_core/model_and_optim/__14_3.distcp filter=lfs diff=lfs merge=lfs -text
127
+ olmo_core/model_and_optim/__14_4.distcp filter=lfs diff=lfs merge=lfs -text
128
+ olmo_core/model_and_optim/__14_5.distcp filter=lfs diff=lfs merge=lfs -text
129
+ olmo_core/model_and_optim/__14_6.distcp filter=lfs diff=lfs merge=lfs -text
130
+ olmo_core/model_and_optim/__14_7.distcp filter=lfs diff=lfs merge=lfs -text
131
+ olmo_core/model_and_optim/__14_8.distcp filter=lfs diff=lfs merge=lfs -text
132
+ olmo_core/model_and_optim/__14_9.distcp filter=lfs diff=lfs merge=lfs -text
133
+ olmo_core/model_and_optim/__15_0.distcp filter=lfs diff=lfs merge=lfs -text
134
+ olmo_core/model_and_optim/__15_1.distcp filter=lfs diff=lfs merge=lfs -text
135
+ olmo_core/model_and_optim/__15_10.distcp filter=lfs diff=lfs merge=lfs -text
136
+ olmo_core/model_and_optim/__15_11.distcp filter=lfs diff=lfs merge=lfs -text
137
+ olmo_core/model_and_optim/__15_12.distcp filter=lfs diff=lfs merge=lfs -text
138
+ olmo_core/model_and_optim/__15_13.distcp filter=lfs diff=lfs merge=lfs -text
139
+ olmo_core/model_and_optim/__15_14.distcp filter=lfs diff=lfs merge=lfs -text
140
+ olmo_core/model_and_optim/__15_15.distcp filter=lfs diff=lfs merge=lfs -text
141
+ olmo_core/model_and_optim/__15_2.distcp filter=lfs diff=lfs merge=lfs -text
142
+ olmo_core/model_and_optim/__15_3.distcp filter=lfs diff=lfs merge=lfs -text
143
+ olmo_core/model_and_optim/__15_4.distcp filter=lfs diff=lfs merge=lfs -text
144
+ olmo_core/model_and_optim/__15_5.distcp filter=lfs diff=lfs merge=lfs -text
145
+ olmo_core/model_and_optim/__15_6.distcp filter=lfs diff=lfs merge=lfs -text
146
+ olmo_core/model_and_optim/__15_7.distcp filter=lfs diff=lfs merge=lfs -text
147
+ olmo_core/model_and_optim/__15_8.distcp filter=lfs diff=lfs merge=lfs -text
148
+ olmo_core/model_and_optim/__15_9.distcp filter=lfs diff=lfs merge=lfs -text
149
+ olmo_core/model_and_optim/__16_0.distcp filter=lfs diff=lfs merge=lfs -text
150
+ olmo_core/model_and_optim/__16_1.distcp filter=lfs diff=lfs merge=lfs -text
151
+ olmo_core/model_and_optim/__16_10.distcp filter=lfs diff=lfs merge=lfs -text
152
+ olmo_core/model_and_optim/__16_11.distcp filter=lfs diff=lfs merge=lfs -text
153
+ olmo_core/model_and_optim/__16_12.distcp filter=lfs diff=lfs merge=lfs -text
154
+ olmo_core/model_and_optim/__16_13.distcp filter=lfs diff=lfs merge=lfs -text
155
+ olmo_core/model_and_optim/__16_14.distcp filter=lfs diff=lfs merge=lfs -text
156
+ olmo_core/model_and_optim/__16_15.distcp filter=lfs diff=lfs merge=lfs -text
157
+ olmo_core/model_and_optim/__16_2.distcp filter=lfs diff=lfs merge=lfs -text
158
+ olmo_core/model_and_optim/__16_3.distcp filter=lfs diff=lfs merge=lfs -text
159
+ olmo_core/model_and_optim/__16_4.distcp filter=lfs diff=lfs merge=lfs -text
160
+ olmo_core/model_and_optim/__16_5.distcp filter=lfs diff=lfs merge=lfs -text
161
+ olmo_core/model_and_optim/__16_6.distcp filter=lfs diff=lfs merge=lfs -text
162
+ olmo_core/model_and_optim/__16_7.distcp filter=lfs diff=lfs merge=lfs -text
163
+ olmo_core/model_and_optim/__16_8.distcp filter=lfs diff=lfs merge=lfs -text
164
+ olmo_core/model_and_optim/__16_9.distcp filter=lfs diff=lfs merge=lfs -text
165
+ olmo_core/model_and_optim/__17_0.distcp filter=lfs diff=lfs merge=lfs -text
166
+ olmo_core/model_and_optim/__17_1.distcp filter=lfs diff=lfs merge=lfs -text
167
+ olmo_core/model_and_optim/__17_10.distcp filter=lfs diff=lfs merge=lfs -text
168
+ olmo_core/model_and_optim/__17_11.distcp filter=lfs diff=lfs merge=lfs -text
169
+ olmo_core/model_and_optim/__17_12.distcp filter=lfs diff=lfs merge=lfs -text
170
+ olmo_core/model_and_optim/__17_13.distcp filter=lfs diff=lfs merge=lfs -text
171
+ olmo_core/model_and_optim/__17_14.distcp filter=lfs diff=lfs merge=lfs -text
172
+ olmo_core/model_and_optim/__17_15.distcp filter=lfs diff=lfs merge=lfs -text
173
+ olmo_core/model_and_optim/__17_2.distcp filter=lfs diff=lfs merge=lfs -text
174
+ olmo_core/model_and_optim/__17_3.distcp filter=lfs diff=lfs merge=lfs -text
175
+ olmo_core/model_and_optim/__17_4.distcp filter=lfs diff=lfs merge=lfs -text
176
+ olmo_core/model_and_optim/__17_5.distcp filter=lfs diff=lfs merge=lfs -text
177
+ olmo_core/model_and_optim/__17_6.distcp filter=lfs diff=lfs merge=lfs -text
178
+ olmo_core/model_and_optim/__17_7.distcp filter=lfs diff=lfs merge=lfs -text
179
+ olmo_core/model_and_optim/__17_8.distcp filter=lfs diff=lfs merge=lfs -text
180
+ olmo_core/model_and_optim/__17_9.distcp filter=lfs diff=lfs merge=lfs -text
181
+ olmo_core/model_and_optim/__18_0.distcp filter=lfs diff=lfs merge=lfs -text
182
+ olmo_core/model_and_optim/__18_1.distcp filter=lfs diff=lfs merge=lfs -text
183
+ olmo_core/model_and_optim/__18_10.distcp filter=lfs diff=lfs merge=lfs -text
184
+ olmo_core/model_and_optim/__18_11.distcp filter=lfs diff=lfs merge=lfs -text
185
+ olmo_core/model_and_optim/__18_12.distcp filter=lfs diff=lfs merge=lfs -text
186
+ olmo_core/model_and_optim/__18_13.distcp filter=lfs diff=lfs merge=lfs -text
187
+ olmo_core/model_and_optim/__18_14.distcp filter=lfs diff=lfs merge=lfs -text
188
+ olmo_core/model_and_optim/__18_15.distcp filter=lfs diff=lfs merge=lfs -text
189
+ olmo_core/model_and_optim/__18_2.distcp filter=lfs diff=lfs merge=lfs -text
190
+ olmo_core/model_and_optim/__18_3.distcp filter=lfs diff=lfs merge=lfs -text
191
+ olmo_core/model_and_optim/__18_4.distcp filter=lfs diff=lfs merge=lfs -text
192
+ olmo_core/model_and_optim/__18_5.distcp filter=lfs diff=lfs merge=lfs -text
193
+ olmo_core/model_and_optim/__18_6.distcp filter=lfs diff=lfs merge=lfs -text
194
+ olmo_core/model_and_optim/__18_7.distcp filter=lfs diff=lfs merge=lfs -text
195
+ olmo_core/model_and_optim/__18_8.distcp filter=lfs diff=lfs merge=lfs -text
196
+ olmo_core/model_and_optim/__18_9.distcp filter=lfs diff=lfs merge=lfs -text
197
+ olmo_core/model_and_optim/__19_0.distcp filter=lfs diff=lfs merge=lfs -text
198
+ olmo_core/model_and_optim/__19_1.distcp filter=lfs diff=lfs merge=lfs -text
199
+ olmo_core/model_and_optim/__19_10.distcp filter=lfs diff=lfs merge=lfs -text
200
+ olmo_core/model_and_optim/__19_11.distcp filter=lfs diff=lfs merge=lfs -text
201
+ olmo_core/model_and_optim/__19_12.distcp filter=lfs diff=lfs merge=lfs -text
202
+ olmo_core/model_and_optim/__19_13.distcp filter=lfs diff=lfs merge=lfs -text
203
+ olmo_core/model_and_optim/__19_14.distcp filter=lfs diff=lfs merge=lfs -text
204
+ olmo_core/model_and_optim/__19_15.distcp filter=lfs diff=lfs merge=lfs -text
205
+ olmo_core/model_and_optim/__19_2.distcp filter=lfs diff=lfs merge=lfs -text
206
+ olmo_core/model_and_optim/__19_3.distcp filter=lfs diff=lfs merge=lfs -text
207
+ olmo_core/model_and_optim/__19_4.distcp filter=lfs diff=lfs merge=lfs -text
208
+ olmo_core/model_and_optim/__19_5.distcp filter=lfs diff=lfs merge=lfs -text
209
+ olmo_core/model_and_optim/__19_6.distcp filter=lfs diff=lfs merge=lfs -text
210
+ olmo_core/model_and_optim/__19_7.distcp filter=lfs diff=lfs merge=lfs -text
211
+ olmo_core/model_and_optim/__19_8.distcp filter=lfs diff=lfs merge=lfs -text
212
+ olmo_core/model_and_optim/__19_9.distcp filter=lfs diff=lfs merge=lfs -text
213
+ olmo_core/model_and_optim/__1_0.distcp filter=lfs diff=lfs merge=lfs -text
214
+ olmo_core/model_and_optim/__1_1.distcp filter=lfs diff=lfs merge=lfs -text
215
+ olmo_core/model_and_optim/__1_10.distcp filter=lfs diff=lfs merge=lfs -text
216
+ olmo_core/model_and_optim/__1_11.distcp filter=lfs diff=lfs merge=lfs -text
217
+ olmo_core/model_and_optim/__1_12.distcp filter=lfs diff=lfs merge=lfs -text
218
+ olmo_core/model_and_optim/__1_13.distcp filter=lfs diff=lfs merge=lfs -text
219
+ olmo_core/model_and_optim/__1_14.distcp filter=lfs diff=lfs merge=lfs -text
220
+ olmo_core/model_and_optim/__1_15.distcp filter=lfs diff=lfs merge=lfs -text
221
+ olmo_core/model_and_optim/__1_2.distcp filter=lfs diff=lfs merge=lfs -text
222
+ olmo_core/model_and_optim/__1_3.distcp filter=lfs diff=lfs merge=lfs -text
223
+ olmo_core/model_and_optim/__1_4.distcp filter=lfs diff=lfs merge=lfs -text
224
+ olmo_core/model_and_optim/__1_5.distcp filter=lfs diff=lfs merge=lfs -text
225
+ olmo_core/model_and_optim/__1_6.distcp filter=lfs diff=lfs merge=lfs -text
226
+ olmo_core/model_and_optim/__1_7.distcp filter=lfs diff=lfs merge=lfs -text
227
+ olmo_core/model_and_optim/__1_8.distcp filter=lfs diff=lfs merge=lfs -text
228
+ olmo_core/model_and_optim/__1_9.distcp filter=lfs diff=lfs merge=lfs -text
229
+ olmo_core/model_and_optim/__20_0.distcp filter=lfs diff=lfs merge=lfs -text
230
+ olmo_core/model_and_optim/__20_1.distcp filter=lfs diff=lfs merge=lfs -text
231
+ olmo_core/model_and_optim/__20_10.distcp filter=lfs diff=lfs merge=lfs -text
232
+ olmo_core/model_and_optim/__20_11.distcp filter=lfs diff=lfs merge=lfs -text
233
+ olmo_core/model_and_optim/__20_12.distcp filter=lfs diff=lfs merge=lfs -text
234
+ olmo_core/model_and_optim/__20_13.distcp filter=lfs diff=lfs merge=lfs -text
235
+ olmo_core/model_and_optim/__20_14.distcp filter=lfs diff=lfs merge=lfs -text
236
+ olmo_core/model_and_optim/__20_15.distcp filter=lfs diff=lfs merge=lfs -text
237
+ olmo_core/model_and_optim/__20_2.distcp filter=lfs diff=lfs merge=lfs -text
238
+ olmo_core/model_and_optim/__20_3.distcp filter=lfs diff=lfs merge=lfs -text
239
+ olmo_core/model_and_optim/__20_4.distcp filter=lfs diff=lfs merge=lfs -text
240
+ olmo_core/model_and_optim/__20_5.distcp filter=lfs diff=lfs merge=lfs -text
241
+ olmo_core/model_and_optim/__20_6.distcp filter=lfs diff=lfs merge=lfs -text
242
+ olmo_core/model_and_optim/__20_7.distcp filter=lfs diff=lfs merge=lfs -text
243
+ olmo_core/model_and_optim/__20_8.distcp filter=lfs diff=lfs merge=lfs -text
244
+ olmo_core/model_and_optim/__20_9.distcp filter=lfs diff=lfs merge=lfs -text
245
+ olmo_core/model_and_optim/__21_0.distcp filter=lfs diff=lfs merge=lfs -text
246
+ olmo_core/model_and_optim/__21_1.distcp filter=lfs diff=lfs merge=lfs -text
247
+ olmo_core/model_and_optim/__21_10.distcp filter=lfs diff=lfs merge=lfs -text
248
+ olmo_core/model_and_optim/__21_11.distcp filter=lfs diff=lfs merge=lfs -text
249
+ olmo_core/model_and_optim/__21_12.distcp filter=lfs diff=lfs merge=lfs -text
250
+ olmo_core/model_and_optim/__21_13.distcp filter=lfs diff=lfs merge=lfs -text
251
+ olmo_core/model_and_optim/__21_14.distcp filter=lfs diff=lfs merge=lfs -text
252
+ olmo_core/model_and_optim/__21_15.distcp filter=lfs diff=lfs merge=lfs -text
253
+ olmo_core/model_and_optim/__21_2.distcp filter=lfs diff=lfs merge=lfs -text
254
+ olmo_core/model_and_optim/__21_3.distcp filter=lfs diff=lfs merge=lfs -text
255
+ olmo_core/model_and_optim/__21_4.distcp filter=lfs diff=lfs merge=lfs -text
256
+ olmo_core/model_and_optim/__21_5.distcp filter=lfs diff=lfs merge=lfs -text
257
+ olmo_core/model_and_optim/__21_6.distcp filter=lfs diff=lfs merge=lfs -text
258
+ olmo_core/model_and_optim/__21_7.distcp filter=lfs diff=lfs merge=lfs -text
259
+ olmo_core/model_and_optim/__21_8.distcp filter=lfs diff=lfs merge=lfs -text
260
+ olmo_core/model_and_optim/__21_9.distcp filter=lfs diff=lfs merge=lfs -text
261
+ olmo_core/model_and_optim/__22_0.distcp filter=lfs diff=lfs merge=lfs -text
262
+ olmo_core/model_and_optim/__22_1.distcp filter=lfs diff=lfs merge=lfs -text
263
+ olmo_core/model_and_optim/__22_10.distcp filter=lfs diff=lfs merge=lfs -text
264
+ olmo_core/model_and_optim/__22_11.distcp filter=lfs diff=lfs merge=lfs -text
265
+ olmo_core/model_and_optim/__22_12.distcp filter=lfs diff=lfs merge=lfs -text
266
+ olmo_core/model_and_optim/__22_13.distcp filter=lfs diff=lfs merge=lfs -text
267
+ olmo_core/model_and_optim/__22_14.distcp filter=lfs diff=lfs merge=lfs -text
268
+ olmo_core/model_and_optim/__22_15.distcp filter=lfs diff=lfs merge=lfs -text
269
+ olmo_core/model_and_optim/__22_2.distcp filter=lfs diff=lfs merge=lfs -text
270
+ olmo_core/model_and_optim/__22_3.distcp filter=lfs diff=lfs merge=lfs -text
271
+ olmo_core/model_and_optim/__22_4.distcp filter=lfs diff=lfs merge=lfs -text
272
+ olmo_core/model_and_optim/__22_5.distcp filter=lfs diff=lfs merge=lfs -text
273
+ olmo_core/model_and_optim/__22_6.distcp filter=lfs diff=lfs merge=lfs -text
274
+ olmo_core/model_and_optim/__22_7.distcp filter=lfs diff=lfs merge=lfs -text
275
+ olmo_core/model_and_optim/__22_8.distcp filter=lfs diff=lfs merge=lfs -text
276
+ olmo_core/model_and_optim/__22_9.distcp filter=lfs diff=lfs merge=lfs -text
277
+ olmo_core/model_and_optim/__23_0.distcp filter=lfs diff=lfs merge=lfs -text
278
+ olmo_core/model_and_optim/__23_1.distcp filter=lfs diff=lfs merge=lfs -text
279
+ olmo_core/model_and_optim/__23_10.distcp filter=lfs diff=lfs merge=lfs -text
280
+ olmo_core/model_and_optim/__23_11.distcp filter=lfs diff=lfs merge=lfs -text
281
+ olmo_core/model_and_optim/__23_12.distcp filter=lfs diff=lfs merge=lfs -text
282
+ olmo_core/model_and_optim/__23_13.distcp filter=lfs diff=lfs merge=lfs -text
283
+ olmo_core/model_and_optim/__23_14.distcp filter=lfs diff=lfs merge=lfs -text
284
+ olmo_core/model_and_optim/__23_15.distcp filter=lfs diff=lfs merge=lfs -text
285
+ olmo_core/model_and_optim/__23_2.distcp filter=lfs diff=lfs merge=lfs -text
286
+ olmo_core/model_and_optim/__23_3.distcp filter=lfs diff=lfs merge=lfs -text
287
+ olmo_core/model_and_optim/__23_4.distcp filter=lfs diff=lfs merge=lfs -text
288
+ olmo_core/model_and_optim/__23_5.distcp filter=lfs diff=lfs merge=lfs -text
289
+ olmo_core/model_and_optim/__23_6.distcp filter=lfs diff=lfs merge=lfs -text
290
+ olmo_core/model_and_optim/__23_7.distcp filter=lfs diff=lfs merge=lfs -text
291
+ olmo_core/model_and_optim/__23_8.distcp filter=lfs diff=lfs merge=lfs -text
292
+ olmo_core/model_and_optim/__23_9.distcp filter=lfs diff=lfs merge=lfs -text
293
+ olmo_core/model_and_optim/__24_0.distcp filter=lfs diff=lfs merge=lfs -text
294
+ olmo_core/model_and_optim/__24_1.distcp filter=lfs diff=lfs merge=lfs -text
295
+ olmo_core/model_and_optim/__24_10.distcp filter=lfs diff=lfs merge=lfs -text
296
+ olmo_core/model_and_optim/__24_11.distcp filter=lfs diff=lfs merge=lfs -text
297
+ olmo_core/model_and_optim/__24_12.distcp filter=lfs diff=lfs merge=lfs -text
298
+ olmo_core/model_and_optim/__24_13.distcp filter=lfs diff=lfs merge=lfs -text
299
+ olmo_core/model_and_optim/__24_14.distcp filter=lfs diff=lfs merge=lfs -text
300
+ olmo_core/model_and_optim/__24_15.distcp filter=lfs diff=lfs merge=lfs -text
301
+ olmo_core/model_and_optim/__24_2.distcp filter=lfs diff=lfs merge=lfs -text
302
+ olmo_core/model_and_optim/__24_3.distcp filter=lfs diff=lfs merge=lfs -text
303
+ olmo_core/model_and_optim/__24_4.distcp filter=lfs diff=lfs merge=lfs -text
304
+ olmo_core/model_and_optim/__24_5.distcp filter=lfs diff=lfs merge=lfs -text
305
+ olmo_core/model_and_optim/__24_6.distcp filter=lfs diff=lfs merge=lfs -text
306
+ olmo_core/model_and_optim/__24_7.distcp filter=lfs diff=lfs merge=lfs -text
307
+ olmo_core/model_and_optim/__24_8.distcp filter=lfs diff=lfs merge=lfs -text
308
+ olmo_core/model_and_optim/__24_9.distcp filter=lfs diff=lfs merge=lfs -text
309
+ olmo_core/model_and_optim/__25_0.distcp filter=lfs diff=lfs merge=lfs -text
310
+ olmo_core/model_and_optim/__25_1.distcp filter=lfs diff=lfs merge=lfs -text
311
+ olmo_core/model_and_optim/__25_10.distcp filter=lfs diff=lfs merge=lfs -text
312
+ olmo_core/model_and_optim/__25_11.distcp filter=lfs diff=lfs merge=lfs -text
313
+ olmo_core/model_and_optim/__25_12.distcp filter=lfs diff=lfs merge=lfs -text
314
+ olmo_core/model_and_optim/__25_13.distcp filter=lfs diff=lfs merge=lfs -text
315
+ olmo_core/model_and_optim/__25_14.distcp filter=lfs diff=lfs merge=lfs -text
316
+ olmo_core/model_and_optim/__25_15.distcp filter=lfs diff=lfs merge=lfs -text
317
+ olmo_core/model_and_optim/__25_2.distcp filter=lfs diff=lfs merge=lfs -text
318
+ olmo_core/model_and_optim/__25_3.distcp filter=lfs diff=lfs merge=lfs -text
319
+ olmo_core/model_and_optim/__25_4.distcp filter=lfs diff=lfs merge=lfs -text
320
+ olmo_core/model_and_optim/__25_5.distcp filter=lfs diff=lfs merge=lfs -text
321
+ olmo_core/model_and_optim/__25_6.distcp filter=lfs diff=lfs merge=lfs -text
322
+ olmo_core/model_and_optim/__25_7.distcp filter=lfs diff=lfs merge=lfs -text
323
+ olmo_core/model_and_optim/__25_8.distcp filter=lfs diff=lfs merge=lfs -text
324
+ olmo_core/model_and_optim/__25_9.distcp filter=lfs diff=lfs merge=lfs -text
325
+ olmo_core/model_and_optim/__26_0.distcp filter=lfs diff=lfs merge=lfs -text
326
+ olmo_core/model_and_optim/__26_1.distcp filter=lfs diff=lfs merge=lfs -text
327
+ olmo_core/model_and_optim/__26_10.distcp filter=lfs diff=lfs merge=lfs -text
328
+ olmo_core/model_and_optim/__26_11.distcp filter=lfs diff=lfs merge=lfs -text
329
+ olmo_core/model_and_optim/__26_12.distcp filter=lfs diff=lfs merge=lfs -text
330
+ olmo_core/model_and_optim/__26_13.distcp filter=lfs diff=lfs merge=lfs -text
331
+ olmo_core/model_and_optim/__26_14.distcp filter=lfs diff=lfs merge=lfs -text
332
+ olmo_core/model_and_optim/__26_15.distcp filter=lfs diff=lfs merge=lfs -text
333
+ olmo_core/model_and_optim/__26_2.distcp filter=lfs diff=lfs merge=lfs -text
334
+ olmo_core/model_and_optim/__26_3.distcp filter=lfs diff=lfs merge=lfs -text
335
+ olmo_core/model_and_optim/__26_4.distcp filter=lfs diff=lfs merge=lfs -text
336
+ olmo_core/model_and_optim/__26_5.distcp filter=lfs diff=lfs merge=lfs -text
337
+ olmo_core/model_and_optim/__26_6.distcp filter=lfs diff=lfs merge=lfs -text
338
+ olmo_core/model_and_optim/__26_7.distcp filter=lfs diff=lfs merge=lfs -text
339
+ olmo_core/model_and_optim/__26_8.distcp filter=lfs diff=lfs merge=lfs -text
340
+ olmo_core/model_and_optim/__26_9.distcp filter=lfs diff=lfs merge=lfs -text
341
+ olmo_core/model_and_optim/__27_0.distcp filter=lfs diff=lfs merge=lfs -text
342
+ olmo_core/model_and_optim/__27_1.distcp filter=lfs diff=lfs merge=lfs -text
343
+ olmo_core/model_and_optim/__27_10.distcp filter=lfs diff=lfs merge=lfs -text
344
+ olmo_core/model_and_optim/__27_11.distcp filter=lfs diff=lfs merge=lfs -text
345
+ olmo_core/model_and_optim/__27_12.distcp filter=lfs diff=lfs merge=lfs -text
346
+ olmo_core/model_and_optim/__27_13.distcp filter=lfs diff=lfs merge=lfs -text
347
+ olmo_core/model_and_optim/__27_14.distcp filter=lfs diff=lfs merge=lfs -text
348
+ olmo_core/model_and_optim/__27_15.distcp filter=lfs diff=lfs merge=lfs -text
349
+ olmo_core/model_and_optim/__27_2.distcp filter=lfs diff=lfs merge=lfs -text
350
+ olmo_core/model_and_optim/__27_3.distcp filter=lfs diff=lfs merge=lfs -text
351
+ olmo_core/model_and_optim/__27_4.distcp filter=lfs diff=lfs merge=lfs -text
352
+ olmo_core/model_and_optim/__27_5.distcp filter=lfs diff=lfs merge=lfs -text
353
+ olmo_core/model_and_optim/__27_6.distcp filter=lfs diff=lfs merge=lfs -text
354
+ olmo_core/model_and_optim/__27_7.distcp filter=lfs diff=lfs merge=lfs -text
355
+ olmo_core/model_and_optim/__27_8.distcp filter=lfs diff=lfs merge=lfs -text
356
+ olmo_core/model_and_optim/__27_9.distcp filter=lfs diff=lfs merge=lfs -text
357
+ olmo_core/model_and_optim/__28_0.distcp filter=lfs diff=lfs merge=lfs -text
358
+ olmo_core/model_and_optim/__28_1.distcp filter=lfs diff=lfs merge=lfs -text
359
+ olmo_core/model_and_optim/__28_10.distcp filter=lfs diff=lfs merge=lfs -text
360
+ olmo_core/model_and_optim/__28_11.distcp filter=lfs diff=lfs merge=lfs -text
361
+ olmo_core/model_and_optim/__28_12.distcp filter=lfs diff=lfs merge=lfs -text
362
+ olmo_core/model_and_optim/__28_13.distcp filter=lfs diff=lfs merge=lfs -text
363
+ olmo_core/model_and_optim/__28_14.distcp filter=lfs diff=lfs merge=lfs -text
364
+ olmo_core/model_and_optim/__28_15.distcp filter=lfs diff=lfs merge=lfs -text
365
+ olmo_core/model_and_optim/__28_2.distcp filter=lfs diff=lfs merge=lfs -text
366
+ olmo_core/model_and_optim/__28_3.distcp filter=lfs diff=lfs merge=lfs -text
367
+ olmo_core/model_and_optim/__28_4.distcp filter=lfs diff=lfs merge=lfs -text
368
+ olmo_core/model_and_optim/__28_5.distcp filter=lfs diff=lfs merge=lfs -text
369
+ olmo_core/model_and_optim/__28_6.distcp filter=lfs diff=lfs merge=lfs -text
370
+ olmo_core/model_and_optim/__28_7.distcp filter=lfs diff=lfs merge=lfs -text
371
+ olmo_core/model_and_optim/__28_8.distcp filter=lfs diff=lfs merge=lfs -text
372
+ olmo_core/model_and_optim/__28_9.distcp filter=lfs diff=lfs merge=lfs -text
373
+ olmo_core/model_and_optim/__29_0.distcp filter=lfs diff=lfs merge=lfs -text
374
+ olmo_core/model_and_optim/__29_1.distcp filter=lfs diff=lfs merge=lfs -text
375
+ olmo_core/model_and_optim/__29_10.distcp filter=lfs diff=lfs merge=lfs -text
376
+ olmo_core/model_and_optim/__29_11.distcp filter=lfs diff=lfs merge=lfs -text
377
+ olmo_core/model_and_optim/__29_12.distcp filter=lfs diff=lfs merge=lfs -text
378
+ olmo_core/model_and_optim/__29_13.distcp filter=lfs diff=lfs merge=lfs -text
379
+ olmo_core/model_and_optim/__29_14.distcp filter=lfs diff=lfs merge=lfs -text
380
+ olmo_core/model_and_optim/__29_15.distcp filter=lfs diff=lfs merge=lfs -text
381
+ olmo_core/model_and_optim/__29_2.distcp filter=lfs diff=lfs merge=lfs -text
382
+ olmo_core/model_and_optim/__29_3.distcp filter=lfs diff=lfs merge=lfs -text
383
+ olmo_core/model_and_optim/__29_4.distcp filter=lfs diff=lfs merge=lfs -text
384
+ olmo_core/model_and_optim/__29_5.distcp filter=lfs diff=lfs merge=lfs -text
385
+ olmo_core/model_and_optim/__29_6.distcp filter=lfs diff=lfs merge=lfs -text
386
+ olmo_core/model_and_optim/__29_7.distcp filter=lfs diff=lfs merge=lfs -text
387
+ olmo_core/model_and_optim/__29_8.distcp filter=lfs diff=lfs merge=lfs -text
388
+ olmo_core/model_and_optim/__29_9.distcp filter=lfs diff=lfs merge=lfs -text
389
+ olmo_core/model_and_optim/__2_0.distcp filter=lfs diff=lfs merge=lfs -text
390
+ olmo_core/model_and_optim/__2_1.distcp filter=lfs diff=lfs merge=lfs -text
391
+ olmo_core/model_and_optim/__2_10.distcp filter=lfs diff=lfs merge=lfs -text
392
+ olmo_core/model_and_optim/__2_11.distcp filter=lfs diff=lfs merge=lfs -text
393
+ olmo_core/model_and_optim/__2_12.distcp filter=lfs diff=lfs merge=lfs -text
394
+ olmo_core/model_and_optim/__2_13.distcp filter=lfs diff=lfs merge=lfs -text
395
+ olmo_core/model_and_optim/__2_14.distcp filter=lfs diff=lfs merge=lfs -text
396
+ olmo_core/model_and_optim/__2_15.distcp filter=lfs diff=lfs merge=lfs -text
397
+ olmo_core/model_and_optim/__2_2.distcp filter=lfs diff=lfs merge=lfs -text
398
+ olmo_core/model_and_optim/__2_3.distcp filter=lfs diff=lfs merge=lfs -text
399
+ olmo_core/model_and_optim/__2_4.distcp filter=lfs diff=lfs merge=lfs -text
400
+ olmo_core/model_and_optim/__2_5.distcp filter=lfs diff=lfs merge=lfs -text
401
+ olmo_core/model_and_optim/__2_6.distcp filter=lfs diff=lfs merge=lfs -text
402
+ olmo_core/model_and_optim/__2_7.distcp filter=lfs diff=lfs merge=lfs -text
403
+ olmo_core/model_and_optim/__2_8.distcp filter=lfs diff=lfs merge=lfs -text
404
+ olmo_core/model_and_optim/__2_9.distcp filter=lfs diff=lfs merge=lfs -text
405
+ olmo_core/model_and_optim/__30_0.distcp filter=lfs diff=lfs merge=lfs -text
406
+ olmo_core/model_and_optim/__30_1.distcp filter=lfs diff=lfs merge=lfs -text
407
+ olmo_core/model_and_optim/__30_10.distcp filter=lfs diff=lfs merge=lfs -text
408
+ olmo_core/model_and_optim/__30_11.distcp filter=lfs diff=lfs merge=lfs -text
409
+ olmo_core/model_and_optim/__30_12.distcp filter=lfs diff=lfs merge=lfs -text
410
+ olmo_core/model_and_optim/__30_13.distcp filter=lfs diff=lfs merge=lfs -text
411
+ olmo_core/model_and_optim/__30_14.distcp filter=lfs diff=lfs merge=lfs -text
412
+ olmo_core/model_and_optim/__30_15.distcp filter=lfs diff=lfs merge=lfs -text
413
+ olmo_core/model_and_optim/__30_2.distcp filter=lfs diff=lfs merge=lfs -text
414
+ olmo_core/model_and_optim/__30_3.distcp filter=lfs diff=lfs merge=lfs -text
415
+ olmo_core/model_and_optim/__30_4.distcp filter=lfs diff=lfs merge=lfs -text
416
+ olmo_core/model_and_optim/__30_5.distcp filter=lfs diff=lfs merge=lfs -text
417
+ olmo_core/model_and_optim/__30_6.distcp filter=lfs diff=lfs merge=lfs -text
418
+ olmo_core/model_and_optim/__30_7.distcp filter=lfs diff=lfs merge=lfs -text
419
+ olmo_core/model_and_optim/__30_8.distcp filter=lfs diff=lfs merge=lfs -text
420
+ olmo_core/model_and_optim/__30_9.distcp filter=lfs diff=lfs merge=lfs -text
421
+ olmo_core/model_and_optim/__31_0.distcp filter=lfs diff=lfs merge=lfs -text
422
+ olmo_core/model_and_optim/__31_1.distcp filter=lfs diff=lfs merge=lfs -text
423
+ olmo_core/model_and_optim/__31_10.distcp filter=lfs diff=lfs merge=lfs -text
424
+ olmo_core/model_and_optim/__31_11.distcp filter=lfs diff=lfs merge=lfs -text
425
+ olmo_core/model_and_optim/__31_12.distcp filter=lfs diff=lfs merge=lfs -text
426
+ olmo_core/model_and_optim/__31_13.distcp filter=lfs diff=lfs merge=lfs -text
427
+ olmo_core/model_and_optim/__31_14.distcp filter=lfs diff=lfs merge=lfs -text
428
+ olmo_core/model_and_optim/__31_15.distcp filter=lfs diff=lfs merge=lfs -text
429
+ olmo_core/model_and_optim/__31_2.distcp filter=lfs diff=lfs merge=lfs -text
430
+ olmo_core/model_and_optim/__31_3.distcp filter=lfs diff=lfs merge=lfs -text
431
+ olmo_core/model_and_optim/__31_4.distcp filter=lfs diff=lfs merge=lfs -text
432
+ olmo_core/model_and_optim/__31_5.distcp filter=lfs diff=lfs merge=lfs -text
433
+ olmo_core/model_and_optim/__31_6.distcp filter=lfs diff=lfs merge=lfs -text
434
+ olmo_core/model_and_optim/__31_7.distcp filter=lfs diff=lfs merge=lfs -text
435
+ olmo_core/model_and_optim/__31_8.distcp filter=lfs diff=lfs merge=lfs -text
436
+ olmo_core/model_and_optim/__31_9.distcp filter=lfs diff=lfs merge=lfs -text
437
+ olmo_core/model_and_optim/__3_0.distcp filter=lfs diff=lfs merge=lfs -text
438
+ olmo_core/model_and_optim/__3_1.distcp filter=lfs diff=lfs merge=lfs -text
439
+ olmo_core/model_and_optim/__3_10.distcp filter=lfs diff=lfs merge=lfs -text
440
+ olmo_core/model_and_optim/__3_11.distcp filter=lfs diff=lfs merge=lfs -text
441
+ olmo_core/model_and_optim/__3_12.distcp filter=lfs diff=lfs merge=lfs -text
442
+ olmo_core/model_and_optim/__3_13.distcp filter=lfs diff=lfs merge=lfs -text
443
+ olmo_core/model_and_optim/__3_14.distcp filter=lfs diff=lfs merge=lfs -text
444
+ olmo_core/model_and_optim/__3_15.distcp filter=lfs diff=lfs merge=lfs -text
445
+ olmo_core/model_and_optim/__3_2.distcp filter=lfs diff=lfs merge=lfs -text
446
+ olmo_core/model_and_optim/__3_3.distcp filter=lfs diff=lfs merge=lfs -text
447
+ olmo_core/model_and_optim/__3_4.distcp filter=lfs diff=lfs merge=lfs -text
448
+ olmo_core/model_and_optim/__3_5.distcp filter=lfs diff=lfs merge=lfs -text
449
+ olmo_core/model_and_optim/__3_6.distcp filter=lfs diff=lfs merge=lfs -text
450
+ olmo_core/model_and_optim/__3_7.distcp filter=lfs diff=lfs merge=lfs -text
451
+ olmo_core/model_and_optim/__3_8.distcp filter=lfs diff=lfs merge=lfs -text
452
+ olmo_core/model_and_optim/__3_9.distcp filter=lfs diff=lfs merge=lfs -text
453
+ olmo_core/model_and_optim/__4_0.distcp filter=lfs diff=lfs merge=lfs -text
454
+ olmo_core/model_and_optim/__4_1.distcp filter=lfs diff=lfs merge=lfs -text
455
+ olmo_core/model_and_optim/__4_10.distcp filter=lfs diff=lfs merge=lfs -text
456
+ olmo_core/model_and_optim/__4_11.distcp filter=lfs diff=lfs merge=lfs -text
457
+ olmo_core/model_and_optim/__4_12.distcp filter=lfs diff=lfs merge=lfs -text
458
+ olmo_core/model_and_optim/__4_13.distcp filter=lfs diff=lfs merge=lfs -text
459
+ olmo_core/model_and_optim/__4_14.distcp filter=lfs diff=lfs merge=lfs -text
460
+ olmo_core/model_and_optim/__4_15.distcp filter=lfs diff=lfs merge=lfs -text
461
+ olmo_core/model_and_optim/__4_2.distcp filter=lfs diff=lfs merge=lfs -text
462
+ olmo_core/model_and_optim/__4_3.distcp filter=lfs diff=lfs merge=lfs -text
463
+ olmo_core/model_and_optim/__4_4.distcp filter=lfs diff=lfs merge=lfs -text
464
+ olmo_core/model_and_optim/__4_5.distcp filter=lfs diff=lfs merge=lfs -text
465
+ olmo_core/model_and_optim/__4_6.distcp filter=lfs diff=lfs merge=lfs -text
466
+ olmo_core/model_and_optim/__4_7.distcp filter=lfs diff=lfs merge=lfs -text
467
+ olmo_core/model_and_optim/__4_8.distcp filter=lfs diff=lfs merge=lfs -text
468
+ olmo_core/model_and_optim/__4_9.distcp filter=lfs diff=lfs merge=lfs -text
469
+ olmo_core/model_and_optim/__5_0.distcp filter=lfs diff=lfs merge=lfs -text
470
+ olmo_core/model_and_optim/__5_1.distcp filter=lfs diff=lfs merge=lfs -text
471
+ olmo_core/model_and_optim/__5_10.distcp filter=lfs diff=lfs merge=lfs -text
472
+ olmo_core/model_and_optim/__5_11.distcp filter=lfs diff=lfs merge=lfs -text
473
+ olmo_core/model_and_optim/__5_12.distcp filter=lfs diff=lfs merge=lfs -text
474
+ olmo_core/model_and_optim/__5_13.distcp filter=lfs diff=lfs merge=lfs -text
475
+ olmo_core/model_and_optim/__5_14.distcp filter=lfs diff=lfs merge=lfs -text
476
+ olmo_core/model_and_optim/__5_15.distcp filter=lfs diff=lfs merge=lfs -text
477
+ olmo_core/model_and_optim/__5_2.distcp filter=lfs diff=lfs merge=lfs -text
478
+ olmo_core/model_and_optim/__5_3.distcp filter=lfs diff=lfs merge=lfs -text
479
+ olmo_core/model_and_optim/__5_4.distcp filter=lfs diff=lfs merge=lfs -text
480
+ olmo_core/model_and_optim/__5_5.distcp filter=lfs diff=lfs merge=lfs -text
481
+ olmo_core/model_and_optim/__5_6.distcp filter=lfs diff=lfs merge=lfs -text
482
+ olmo_core/model_and_optim/__5_7.distcp filter=lfs diff=lfs merge=lfs -text
483
+ olmo_core/model_and_optim/__5_8.distcp filter=lfs diff=lfs merge=lfs -text
484
+ olmo_core/model_and_optim/__5_9.distcp filter=lfs diff=lfs merge=lfs -text
485
+ olmo_core/model_and_optim/__6_0.distcp filter=lfs diff=lfs merge=lfs -text
486
+ olmo_core/model_and_optim/__6_1.distcp filter=lfs diff=lfs merge=lfs -text
487
+ olmo_core/model_and_optim/__6_10.distcp filter=lfs diff=lfs merge=lfs -text
488
+ olmo_core/model_and_optim/__6_11.distcp filter=lfs diff=lfs merge=lfs -text
489
+ olmo_core/model_and_optim/__6_12.distcp filter=lfs diff=lfs merge=lfs -text
490
+ olmo_core/model_and_optim/__6_13.distcp filter=lfs diff=lfs merge=lfs -text
491
+ olmo_core/model_and_optim/__6_14.distcp filter=lfs diff=lfs merge=lfs -text
492
+ olmo_core/model_and_optim/__6_15.distcp filter=lfs diff=lfs merge=lfs -text
493
+ olmo_core/model_and_optim/__6_2.distcp filter=lfs diff=lfs merge=lfs -text
494
+ olmo_core/model_and_optim/__6_3.distcp filter=lfs diff=lfs merge=lfs -text
495
+ olmo_core/model_and_optim/__6_4.distcp filter=lfs diff=lfs merge=lfs -text
496
+ olmo_core/model_and_optim/__6_5.distcp filter=lfs diff=lfs merge=lfs -text
497
+ olmo_core/model_and_optim/__6_6.distcp filter=lfs diff=lfs merge=lfs -text
498
+ olmo_core/model_and_optim/__6_7.distcp filter=lfs diff=lfs merge=lfs -text
499
+ olmo_core/model_and_optim/__6_8.distcp filter=lfs diff=lfs merge=lfs -text
500
+ olmo_core/model_and_optim/__6_9.distcp filter=lfs diff=lfs merge=lfs -text
501
+ olmo_core/model_and_optim/__7_0.distcp filter=lfs diff=lfs merge=lfs -text
502
+ olmo_core/model_and_optim/__7_1.distcp filter=lfs diff=lfs merge=lfs -text
503
+ olmo_core/model_and_optim/__7_10.distcp filter=lfs diff=lfs merge=lfs -text
504
+ olmo_core/model_and_optim/__7_11.distcp filter=lfs diff=lfs merge=lfs -text
505
+ olmo_core/model_and_optim/__7_12.distcp filter=lfs diff=lfs merge=lfs -text
506
+ olmo_core/model_and_optim/__7_13.distcp filter=lfs diff=lfs merge=lfs -text
507
+ olmo_core/model_and_optim/__7_14.distcp filter=lfs diff=lfs merge=lfs -text
508
+ olmo_core/model_and_optim/__7_15.distcp filter=lfs diff=lfs merge=lfs -text
509
+ olmo_core/model_and_optim/__7_2.distcp filter=lfs diff=lfs merge=lfs -text
510
+ olmo_core/model_and_optim/__7_3.distcp filter=lfs diff=lfs merge=lfs -text
511
+ olmo_core/model_and_optim/__7_4.distcp filter=lfs diff=lfs merge=lfs -text
512
+ olmo_core/model_and_optim/__7_5.distcp filter=lfs diff=lfs merge=lfs -text
513
+ olmo_core/model_and_optim/__7_6.distcp filter=lfs diff=lfs merge=lfs -text
514
+ olmo_core/model_and_optim/__7_7.distcp filter=lfs diff=lfs merge=lfs -text
515
+ olmo_core/model_and_optim/__7_8.distcp filter=lfs diff=lfs merge=lfs -text
516
+ olmo_core/model_and_optim/__7_9.distcp filter=lfs diff=lfs merge=lfs -text
517
+ olmo_core/model_and_optim/__8_0.distcp filter=lfs diff=lfs merge=lfs -text
518
+ olmo_core/model_and_optim/__8_1.distcp filter=lfs diff=lfs merge=lfs -text
519
+ olmo_core/model_and_optim/__8_10.distcp filter=lfs diff=lfs merge=lfs -text
520
+ olmo_core/model_and_optim/__8_11.distcp filter=lfs diff=lfs merge=lfs -text
521
+ olmo_core/model_and_optim/__8_12.distcp filter=lfs diff=lfs merge=lfs -text
522
+ olmo_core/model_and_optim/__8_13.distcp filter=lfs diff=lfs merge=lfs -text
523
+ olmo_core/model_and_optim/__8_14.distcp filter=lfs diff=lfs merge=lfs -text
524
+ olmo_core/model_and_optim/__8_15.distcp filter=lfs diff=lfs merge=lfs -text
525
+ olmo_core/model_and_optim/__8_2.distcp filter=lfs diff=lfs merge=lfs -text
526
+ olmo_core/model_and_optim/__8_3.distcp filter=lfs diff=lfs merge=lfs -text
527
+ olmo_core/model_and_optim/__8_4.distcp filter=lfs diff=lfs merge=lfs -text
528
+ olmo_core/model_and_optim/__8_5.distcp filter=lfs diff=lfs merge=lfs -text
529
+ olmo_core/model_and_optim/__8_6.distcp filter=lfs diff=lfs merge=lfs -text
530
+ olmo_core/model_and_optim/__8_7.distcp filter=lfs diff=lfs merge=lfs -text
531
+ olmo_core/model_and_optim/__8_8.distcp filter=lfs diff=lfs merge=lfs -text
532
+ olmo_core/model_and_optim/__8_9.distcp filter=lfs diff=lfs merge=lfs -text
533
+ olmo_core/model_and_optim/__9_0.distcp filter=lfs diff=lfs merge=lfs -text
534
+ olmo_core/model_and_optim/__9_1.distcp filter=lfs diff=lfs merge=lfs -text
535
+ olmo_core/model_and_optim/__9_10.distcp filter=lfs diff=lfs merge=lfs -text
536
+ olmo_core/model_and_optim/__9_11.distcp filter=lfs diff=lfs merge=lfs -text
537
+ olmo_core/model_and_optim/__9_12.distcp filter=lfs diff=lfs merge=lfs -text
538
+ olmo_core/model_and_optim/__9_13.distcp filter=lfs diff=lfs merge=lfs -text
539
+ olmo_core/model_and_optim/__9_14.distcp filter=lfs diff=lfs merge=lfs -text
540
+ olmo_core/model_and_optim/__9_15.distcp filter=lfs diff=lfs merge=lfs -text
541
+ olmo_core/model_and_optim/__9_2.distcp filter=lfs diff=lfs merge=lfs -text
542
+ olmo_core/model_and_optim/__9_3.distcp filter=lfs diff=lfs merge=lfs -text
543
+ olmo_core/model_and_optim/__9_4.distcp filter=lfs diff=lfs merge=lfs -text
544
+ olmo_core/model_and_optim/__9_5.distcp filter=lfs diff=lfs merge=lfs -text
545
+ olmo_core/model_and_optim/__9_6.distcp filter=lfs diff=lfs merge=lfs -text
546
+ olmo_core/model_and_optim/__9_7.distcp filter=lfs diff=lfs merge=lfs -text
547
+ olmo_core/model_and_optim/__9_8.distcp filter=lfs diff=lfs merge=lfs -text
548
+ olmo_core/model_and_optim/__9_9.distcp filter=lfs diff=lfs merge=lfs -text
olmo_core/.metadata.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"version": "2.3.0"}
olmo_core/config.json ADDED
@@ -0,0 +1,361 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model": {
3
+ "d_model": 4096,
4
+ "vocab_size": 640,
5
+ "n_layers": 32,
6
+ "block": {
7
+ "attention": {
8
+ "name": "default",
9
+ "n_heads": 32,
10
+ "bias": false,
11
+ "rope": {
12
+ "name": "default",
13
+ "theta": 500000,
14
+ "full_precision": true,
15
+ "_CLASS_": "olmo_core.nn.rope.RoPEConfig"
16
+ },
17
+ "qk_norm": {
18
+ "name": "rms",
19
+ "eps": 1e-06,
20
+ "bias": false,
21
+ "dtype": "float32",
22
+ "_CLASS_": "olmo_core.nn.layer_norm.LayerNormConfig"
23
+ },
24
+ "use_flash": true,
25
+ "backend": "flash_2",
26
+ "dtype": "float32",
27
+ "sliding_window": {
28
+ "pattern": [
29
+ 4096,
30
+ 4096,
31
+ 4096,
32
+ -1
33
+ ],
34
+ "force_full_attention_on_first_layer": false,
35
+ "force_full_attention_on_last_layer": true,
36
+ "_CLASS_": "olmo_core.nn.attention.SlidingWindowAttentionConfig"
37
+ },
38
+ "_CLASS_": "olmo_core.nn.attention.AttentionConfig"
39
+ },
40
+ "layer_norm": {
41
+ "name": "rms",
42
+ "eps": 1e-06,
43
+ "bias": false,
44
+ "dtype": "float32",
45
+ "_CLASS_": "olmo_core.nn.layer_norm.LayerNormConfig"
46
+ },
47
+ "feed_forward": {
48
+ "hidden_size": 11008,
49
+ "name": "default",
50
+ "bias": false,
51
+ "dtype": "float32",
52
+ "act_name": "silu",
53
+ "_CLASS_": "olmo_core.nn.feed_forward.FeedForwardConfig"
54
+ },
55
+ "name": "reordered_norm",
56
+ "_CLASS_": "olmo_core.nn.transformer.config.TransformerBlockConfig"
57
+ },
58
+ "lm_head": {
59
+ "name": "default",
60
+ "layer_norm": {
61
+ "name": "rms",
62
+ "eps": 1e-06,
63
+ "bias": false,
64
+ "dtype": "float32",
65
+ "_CLASS_": "olmo_core.nn.layer_norm.LayerNormConfig"
66
+ },
67
+ "bias": false,
68
+ "dtype": "float32",
69
+ "loss_implementation": "default",
70
+ "_CLASS_": "olmo_core.nn.lm_head.LMHeadConfig"
71
+ },
72
+ "name": "bolmo_distill",
73
+ "dtype": "float32",
74
+ "init_method": "normal",
75
+ "init_seed": 0,
76
+ "init_std": 0.02,
77
+ "freeze_params": [
78
+ "boundary_predictor.*",
79
+ "teacher_embeddings.*"
80
+ ],
81
+ "local_encoder": {
82
+ "sliding_window_size": 0,
83
+ "d_model": 4096,
84
+ "n_layers": 1,
85
+ "block_config": {
86
+ "attention": {
87
+ "name": "default",
88
+ "n_heads": 16,
89
+ "dtype": "float32",
90
+ "_CLASS_": "olmo_core.nn.attention.AttentionConfig"
91
+ },
92
+ "layer_norm": {
93
+ "name": "rms",
94
+ "eps": 1e-06,
95
+ "bias": false,
96
+ "dtype": "float32",
97
+ "_CLASS_": "olmo_core.nn.layer_norm.LayerNormConfig"
98
+ },
99
+ "feed_forward": {
100
+ "hidden_size": 5504,
101
+ "name": "default",
102
+ "bias": false,
103
+ "dtype": "float32",
104
+ "act_name": "silu",
105
+ "_CLASS_": "olmo_core.nn.feed_forward.FeedForwardConfig"
106
+ },
107
+ "xlstm": {
108
+ "num_heads": 16,
109
+ "dtype": "float32",
110
+ "_CLASS_": "olmo_core.nn.xlstm.XLSTMConfig"
111
+ },
112
+ "name": "xlstm",
113
+ "_CLASS_": "olmo_core.nn.transformer.config.TransformerBlockConfig"
114
+ },
115
+ "cross_attn_n_heads": 0,
116
+ "cross_attn_do_project": true,
117
+ "cross_attn_init_pooling": "amax",
118
+ "pooling": "hnet",
119
+ "add_hash_embeddings": false,
120
+ "add_expanded_embeddings": true,
121
+ "hash_byte_group_size": [
122
+ 3,
123
+ 4,
124
+ 5,
125
+ 6,
126
+ 7,
127
+ 8
128
+ ],
129
+ "hash_byte_group_vocab": [
130
+ 1536,
131
+ 3072,
132
+ 6144,
133
+ 12288,
134
+ 24576,
135
+ 49152
136
+ ],
137
+ "hash_byte_group_nb_functions": 1,
138
+ "add_norm_after_last_block": true,
139
+ "add_norm_after_pool": false,
140
+ "add_out_projection": true,
141
+ "boundary_predictor": "hnet",
142
+ "boundary_predictor_lookahead": 1,
143
+ "represent_bytes_with_embeddings": false,
144
+ "represent_bytes_with_last_mixed_out": false,
145
+ "blt_compat": false,
146
+ "dtype": "float32",
147
+ "_CLASS_": "olmo_core.nn.bolmo.config.LocalEncoderConfig"
148
+ },
149
+ "local_decoder": {
150
+ "sliding_window_size": 0,
151
+ "d_model": 4096,
152
+ "n_layers": 4,
153
+ "cross_attn_n_heads": 0,
154
+ "block_config": {
155
+ "attention": {
156
+ "name": "default",
157
+ "n_heads": 16,
158
+ "dtype": "float32",
159
+ "_CLASS_": "olmo_core.nn.attention.AttentionConfig"
160
+ },
161
+ "layer_norm": {
162
+ "name": "rms",
163
+ "eps": 1e-06,
164
+ "bias": false,
165
+ "dtype": "float32",
166
+ "_CLASS_": "olmo_core.nn.layer_norm.LayerNormConfig"
167
+ },
168
+ "feed_forward": {
169
+ "hidden_size": 5504,
170
+ "name": "default",
171
+ "bias": false,
172
+ "dtype": "float32",
173
+ "act_name": "silu",
174
+ "_CLASS_": "olmo_core.nn.feed_forward.FeedForwardConfig"
175
+ },
176
+ "xlstm": {
177
+ "num_heads": 16,
178
+ "dtype": "float32",
179
+ "_CLASS_": "olmo_core.nn.xlstm.XLSTMConfig"
180
+ },
181
+ "name": "xlstm",
182
+ "_CLASS_": "olmo_core.nn.transformer.config.TransformerBlockConfig"
183
+ },
184
+ "depooling": "hnet",
185
+ "add_norm_before_first_block": true,
186
+ "add_norm_onto_residual": false,
187
+ "add_in_projection": true,
188
+ "add_projected_patch_residuals": false,
189
+ "hnet_smooth": false,
190
+ "hnet_smooth_ste": false,
191
+ "hnet_modulate": false,
192
+ "blt_compat": false,
193
+ "fuse_boundaries": true,
194
+ "no_boundaries": false,
195
+ "dtype": "float32",
196
+ "_CLASS_": "olmo_core.nn.bolmo.config.LocalDecoderConfig"
197
+ },
198
+ "share_blocks_between_teacher_and_student": false,
199
+ "_CLASS_": "olmo_core.nn.transformer.config.TransformerConfig"
200
+ },
201
+ "dataset": {
202
+ "tokenizer": {
203
+ "vocab_size": 520,
204
+ "eos_token_id": 1,
205
+ "pad_token_id": 0,
206
+ "bos_token_id": 1,
207
+ "special_tokens": [
208
+ "<pad>",
209
+ "<bos>",
210
+ "<eos>",
211
+ "<bpe_token_end>"
212
+ ],
213
+ "special_tokens_first": true,
214
+ "original_identifier": "allenai/dolma2-tokenizer",
215
+ "bpe_token_end_id": 3,
216
+ "_CLASS_": "olmo_core.data.tokenizer.ByteTokenizerConfig"
217
+ },
218
+ "paths": [],
219
+ "expand_glob": false,
220
+ "include_instance_metadata": true,
221
+ "work_dir": "",
222
+ "ignore_fingerprint_mismatch": false,
223
+ "sequence_length": 4096,
224
+ "generate_doc_lengths": false,
225
+ "byte_sequence_length": 24576,
226
+ "_CLASS_": "olmo_core.data.numpy_dataset.NumpyByteFSLDatasetConfig"
227
+ },
228
+ "data_loader": {
229
+ "global_batch_size": 1572864,
230
+ "seed": 1234,
231
+ "num_workers": 24,
232
+ "ignore_fingerprint_mismatch": false,
233
+ "_CLASS_": "olmo_core.data.data_loader.NumpyDataLoaderConfig"
234
+ },
235
+ "train_module": {
236
+ "rank_microbatch_size": 49152,
237
+ "max_sequence_length": 24576,
238
+ "optim": {
239
+ "group_overrides": [
240
+ {
241
+ "params": [
242
+ "local_encoder.embedding.weight",
243
+ "local_encoder.expanded_embeddings.weight"
244
+ ],
245
+ "opts": {
246
+ "weight_decay": 0.0
247
+ },
248
+ "_CLASS_": "olmo_core.optim.config.OptimGroupOverride"
249
+ },
250
+ {
251
+ "params": [
252
+ "blocks.*"
253
+ ],
254
+ "opts": {
255
+ "lr": 1.83e-05
256
+ },
257
+ "_CLASS_": "olmo_core.optim.config.OptimGroupOverride"
258
+ }
259
+ ],
260
+ "compile": false,
261
+ "fixed_fields": [
262
+ "initial_lr"
263
+ ],
264
+ "lr": 3.66e-05,
265
+ "betas": [
266
+ 0.9,
267
+ 0.95
268
+ ],
269
+ "eps": 1e-08,
270
+ "weight_decay": 0.1,
271
+ "_CLASS_": "olmo_core.optim.adamw.AdamWConfig"
272
+ },
273
+ "max_grad_norm": 0.5,
274
+ "scheduler": {
275
+ "lr_field": "lr",
276
+ "initial_lr_field": "initial_lr",
277
+ "units": "steps",
278
+ "alpha_f": 0.0,
279
+ "warmup_fraction": 0.1,
280
+ "warmup_min_lr": 0.0,
281
+ "_CLASS_": "olmo_core.optim.scheduler.LinearWithWarmup"
282
+ },
283
+ "compile_model": true,
284
+ "float8_config": {
285
+ "enabled": false,
286
+ "_CLASS_": "olmo_core.float8.Float8Config"
287
+ },
288
+ "dp_config": {
289
+ "name": "fsdp",
290
+ "param_dtype": "bfloat16",
291
+ "reduce_dtype": "float32",
292
+ "wrapping_strategy": "full",
293
+ "prefetch_factor": 0,
294
+ "_CLASS_": "olmo_core.train.train_module.transformer.config.TransformerDataParallelConfig"
295
+ },
296
+ "bolmo_config": {
297
+ "tokenizer": {
298
+ "vocab_size": 520,
299
+ "eos_token_id": 1,
300
+ "pad_token_id": 0,
301
+ "bos_token_id": 1,
302
+ "special_tokens": [
303
+ "<pad>",
304
+ "<bos>",
305
+ "<eos>",
306
+ "<bpe_token_end>"
307
+ ],
308
+ "special_tokens_first": true,
309
+ "original_identifier": "allenai/dolma2-tokenizer",
310
+ "bpe_token_end_id": 3,
311
+ "_CLASS_": "olmo_core.data.tokenizer.ByteTokenizerConfig"
312
+ },
313
+ "losses": [
314
+ "ce",
315
+ "boundary"
316
+ ],
317
+ "loss_weights": [
318
+ 1.0,
319
+ 4.0
320
+ ],
321
+ "binarization_temp": 1.0,
322
+ "temperature": 1.0,
323
+ "div_fn": "tvd_temp_limit",
324
+ "boundary_mode": "end",
325
+ "merge_boundary_loss": false,
326
+ "use_output_boundary_jsd": false,
327
+ "eval_add_boundary_logp": false,
328
+ "do_alm_debiasing": false,
329
+ "rep_compare_fn": "l2",
330
+ "start_ratio": 4.3,
331
+ "target_ratio": 4.3,
332
+ "gradual_boundary_compression_steps": 150000,
333
+ "encoder_loss_lookahead": 0,
334
+ "encoder_loss_no_lookahead_weight": 1.0,
335
+ "encoder_loss_lookahead_weights": [],
336
+ "patching": "dolma2",
337
+ "epsilon": 1e-06,
338
+ "skip_blocks": false,
339
+ "skip_teacher_blocks": false,
340
+ "skip_teacher": true,
341
+ "compute_teacher_ce": false,
342
+ "use_student_patch_reps_for_teacher": false,
343
+ "use_oracle_patch_reps": false,
344
+ "teacher_blocks_no_grad": true,
345
+ "student_blocks_no_grad": false,
346
+ "decoder_backprop_through_encoder": true,
347
+ "decoder_backprop_through_boundary_predictor": true,
348
+ "boundary_predictor_backprop_through_encoder": true,
349
+ "teacher_force_boundaries": false,
350
+ "boundary_threshold": "sample:0",
351
+ "xlstm_igate_bias_init": -10.0,
352
+ "skip_boundary_before_eos": true,
353
+ "_CLASS_": "olmo_core.nn.bolmo.config.BolmoConfig"
354
+ },
355
+ "label_ignore_index": -100,
356
+ "_CLASS_": "olmo_core.train.train_module.transformer.config.TransformerTrainModuleConfig"
357
+ },
358
+ "trainer": {},
359
+ "init_seed": 12536,
360
+ "_CLASS_": "__main__.ExperimentConfig"
361
+ }
olmo_core/model_and_optim/.metadata ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d226d4df3f8b40ba639641867a687cad7cf295ab52e3efcdc1aadb38a275054f
3
+ size 9450495
olmo_core/model_and_optim/__0_0.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1fde3916824ea77107fe98a10cd264cbf48cfc786abfb433740460a0003ea6a1
3
+ size 179234958
olmo_core/model_and_optim/__0_1.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10980e12c3db08d331302e306468cc0377ba625b1a4092d82fbd1bcdc4899659
3
+ size 179234958
olmo_core/model_and_optim/__0_10.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:556ce5527abdfe9790dba105a0a56396f89f505b13ca92366f3b389451dbc928
3
+ size 179672297
olmo_core/model_and_optim/__0_11.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fab5a12c2161877514ec0a5f6d13897d19a90b49a892cdeed0ef49a9e1247906
3
+ size 179672297
olmo_core/model_and_optim/__0_12.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e14c0a4f7c1f46e41e3ee187e65a513966de7a237cc555662e6d13b651a983fc
3
+ size 179280658
olmo_core/model_and_optim/__0_13.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4d1e8af43fa91f5972d537bce7397c73e16930d4615ab03839af4a79945b45f
3
+ size 179280658
olmo_core/model_and_optim/__0_14.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b790e6f14a76309144a942046be664dd21489fff9014d8bdc82fabe53fe7b646
3
+ size 180056746
olmo_core/model_and_optim/__0_15.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a97a37b6df0446af12dcb6301bbe7a7af8c864361a086b3121930981d6f2ce7f
3
+ size 180055169
olmo_core/model_and_optim/__0_2.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18a1c51f2b7ba82c1cc633c4f4305c71a2ea178ba7cddf6f55b4e2672a62adc7
3
+ size 179236535
olmo_core/model_and_optim/__0_3.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1637ee8cd584c7186271c88473a45e8a9055c7fedd159c9d084c8b58753706c5
3
+ size 179345878
olmo_core/model_and_optim/__0_4.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40b8b057823abbee8227b5b9b0c493defdf5f77b71847f5859533a254d08bf11
3
+ size 179344617
olmo_core/model_and_optim/__0_5.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1e9bbec4a5a7282e2e69857efde14becefab09b82c04b687d58042e3122b9d2a
3
+ size 179344617
olmo_core/model_and_optim/__0_6.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:83bbf408da0a0ab10c4e9c69f23b9db392911a11e8d9e649060e446d52d590f4
3
+ size 179672297
olmo_core/model_and_optim/__0_7.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca5bc3ab73fef5efc91a7f2311dca3741be6e9456bd1365bf37250e58ffb0263
3
+ size 179672297
olmo_core/model_and_optim/__0_8.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c04b53514e3fc2ec72789ce8b6cf4cf303056efa07c7aa0e3b757af8faf8be2d
3
+ size 179672297
olmo_core/model_and_optim/__0_9.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a05df82c033265d09355a5896191c6032b61b6c1e473dfdfa6db4089c3c8aa6
3
+ size 179672297
olmo_core/model_and_optim/__10_0.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbb5fac97c9b75ec68cf760f6c9b9318d16604691f68fc1b97889268937cb93f
3
+ size 178818828
olmo_core/model_and_optim/__10_1.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:569a71c136e97d2746545a32addb7127b0b1d071437bdea1ae924c15c013d9f2
3
+ size 178818828
olmo_core/model_and_optim/__10_10.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c1757135ce728caecb55e79a83eededc9d7d432c3440dcff76be32f4feebe18
3
+ size 179257428
olmo_core/model_and_optim/__10_11.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c8d3632eb2c3112095bc1b844af4c922afff5623c2168cda3713f87e420fa5e
3
+ size 179257428
olmo_core/model_and_optim/__10_12.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eb8caef839b9d1bf84959c53c64bfb3f85f82708ffe6a96d8317326d88e60f15
3
+ size 178865789
olmo_core/model_and_optim/__10_13.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:86017597935cf303efd55164064aa86a180a41576a34f47039627a5c4a156d45
3
+ size 178865789
olmo_core/model_and_optim/__10_14.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e9cc0aedad97f6f8c6c9eb14b791220c353d9fd02476213d95141aa40aee4b7
3
+ size 179294937
olmo_core/model_and_optim/__10_15.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:de14d46b4098bc6a5e4e911f9bd479a9871bb99d7237a914d2590f9866251d89
3
+ size 179294937
olmo_core/model_and_optim/__10_2.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:10f4f79877a2121215d81cb18a4dc5429390eaa6b39bd9566c69908008c654cf
3
+ size 178820405
olmo_core/model_and_optim/__10_3.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f116a86923d2096a055aafd1e520f9b61e5c6796578c8415b56f4a85f92c45b0
3
+ size 178929748
olmo_core/model_and_optim/__10_4.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3eae74a28adb5e9c57bc924fbba1205a423e6488f3ba89c62cc3b7f7694a1da1
3
+ size 178929748
olmo_core/model_and_optim/__10_5.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1cc74c4ea7f53200e309217958585660d18665e920730e5076082d04a6484a05
3
+ size 178929748
olmo_core/model_and_optim/__10_6.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2ff41964b709da0d9e5a17e38c9a44ed00aa6cf20a412e17a9522b3b7377ac29
3
+ size 179257428
olmo_core/model_and_optim/__10_7.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c06b8804f9d426d55ce0683baf08b5fec2015c4771137609a1ec933d01a58fb8
3
+ size 179257428
olmo_core/model_and_optim/__10_8.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4d018b5f2ff0b158574a00ccd1106ca6df8a0a8a829b0a509b76ac88937ca311
3
+ size 179257428
olmo_core/model_and_optim/__10_9.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e1312e5cd760fd83c909b20d3b430c1488112a39c7057d6b2298e34503570b6
3
+ size 179257428
olmo_core/model_and_optim/__11_0.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f55336df7daef34ed8ffaeac38da44ae6c65111665821411305869a2ea17ca95
3
+ size 178818828
olmo_core/model_and_optim/__11_1.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f1290ebf674851c99ade037dfd8e628906396a2e1c49b7f5e6ce943c8e90dcc3
3
+ size 178818828
olmo_core/model_and_optim/__11_10.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:11c5895fda2bc149ce8c4b532a8f3698873087326fc26093dfc6c1291fc16b0c
3
+ size 179257428
olmo_core/model_and_optim/__11_11.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f9a6a095e3fa712117ec5de5ebbad6cf0e7ac482a7b429b341ab7243ee2a499
3
+ size 179257428
olmo_core/model_and_optim/__11_12.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:749e9683ef7ab1cbceffe47b5d2c2e1ddf0c270fd1bcd2e739350c515cf2fb8e
3
+ size 178865789
olmo_core/model_and_optim/__11_13.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ac0b3c96fa5b4b7de7be230e4b377ab0e884b565c281c58c92f74015effc21c
3
+ size 178865789
olmo_core/model_and_optim/__11_14.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:609775f6385e0f6ca2b4765b328730fd314e24f5dd14443449159c6d0ab7c2bb
3
+ size 179294937
olmo_core/model_and_optim/__11_15.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ccb1922d13ad5112b32010f868fb70cab17ea8e1171911aa96d350b659dd344
3
+ size 179294937
olmo_core/model_and_optim/__11_2.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e7fdac412f40813a7e73205770e612f355f25a3a546ae6d0f0708747a84be18
3
+ size 178820405
olmo_core/model_and_optim/__11_3.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b34429d67974a376317b3b60e1f32c0561d5428f3348ef6e013ddaef683c25ac
3
+ size 178929748
olmo_core/model_and_optim/__11_4.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0b8c66fa5e2d3387017972391fb28d02577951c3040f33fb68d6c08fc18e3ca4
3
+ size 178929748
olmo_core/model_and_optim/__11_5.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1dd8b974d7a4fa4ef8478057b6075ccb524c3dbc82f5fef650c922bb35c03a1a
3
+ size 178929748
olmo_core/model_and_optim/__11_6.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f204ddb999a622d73b9a72e02e94fe6ab9a62f57b97752237d1cd93cf9cdf259
3
+ size 179257428
olmo_core/model_and_optim/__11_7.distcp ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:297de0e482eda03814b1505476278af6106ad65850b86cc4a2e8314a3de72a55
3
+ size 179257428