AntiAtropos / training /openenv_loop.py

Commit History

prompt fixes

8c4ef5c

div18 commited on 19 days ago

prompt changes

8425a53

div18 commited on 19 days ago

prompts

c157063

div18 commited on 19 days ago

fix CPU tensors

e408dca

div18 commited on 19 days ago

fix bug

69f37e9

div18 commited on 19 days ago

reward etc tuning

67810ba

div18 commited on 19 days ago

env changes

70cdeae

div18 commited on 19 days ago

training changes

7dbb622

div18 commited on 19 days ago

entropy spread

d23c9c4

div18 commited on 19 days ago

OOM

0f6141d

div18 commited on 19 days ago

fixes

d222529

div18 commited on 19 days ago

fixes

46cd5c4

div18 commited on 19 days ago

fix backprop

871c1ae

div18 commited on 19 days ago

changes

d41d25d

div18 commited on 19 days ago

fix structure

fa8da3f

div18 commited on 19 days ago

edits

619e74d

div18 commited on 19 days ago

fix: disable Qwen thinking at Jinja template level with enable_thinking=False

5edb1ce

div18 commited on 19 days ago

fix: disable Qwen thinking at Jinja template level with enable_thinking=False

65788cc

div18 commited on 19 days ago

fix: add /no_think to system prompts, strip TRACE blocks from model output

9fd06fa

div18 commited on 19 days ago

fixes

75c8df1

div18 commited on 19 days ago

code

e890160

div18 commited on 19 days ago

Commit History

prompt fixes 8c4ef5c

prompt changes 8425a53

prompts c157063

fix CPU tensors e408dca

fix bug 69f37e9

reward etc tuning 67810ba

env changes 70cdeae

training changes 7dbb622

entropy spread d23c9c4

OOM 0f6141d

fixes d222529

fixes 46cd5c4

fix backprop 871c1ae

changes d41d25d

fix structure fa8da3f

edits 619e74d

fix: disable Qwen thinking at Jinja template level with enable_thinking=False 5edb1ce

fix: disable Qwen thinking at Jinja template level with enable_thinking=False 65788cc

fix: add /no_think to system prompts, strip TRACE blocks from model output 9fd06fa

fixes 75c8df1

code e890160

prompt fixes

8c4ef5c

prompt changes

8425a53

prompts

c157063

fix CPU tensors

e408dca

fix bug

69f37e9

reward etc tuning

67810ba

env changes

70cdeae

training changes

7dbb622

entropy spread

d23c9c4

OOM

0f6141d

fixes

d222529

fixes

46cd5c4

fix backprop

871c1ae

changes

d41d25d

fix structure

fa8da3f

edits

619e74d

fix: disable Qwen thinking at Jinja template level with enable_thinking=False

5edb1ce

fix: disable Qwen thinking at Jinja template level with enable_thinking=False

65788cc

fix: add /no_think to system prompts, strip TRACE blocks from model output

9fd06fa

fixes

75c8df1

code

e890160