ReLIFT, a training method that interleaves RL with online FT, achieving superior performance and efficiency compared to using RL or SFT alone.
RoadMa
RoadQAQ
AI & ML interests
None yet
Recent Activity
liked
a model
about 7 hours ago
stepfun-ai/Step-3.5-Flash
liked
a dataset
about 7 hours ago
stepfun-ai/CF-Div2-Stepfun
updated
a dataset
about 1 month ago
RoadQAQ/sft_for_rl