Learning Smooth Reward Models with Temporal Difference for LLM RL and Inference
Dan Zhang
zd21
AI & ML interests
None yet
Recent Activity
published a dataset 2 days ago
zd21/EntiWeave published a dataset 2 days ago
zd21/PathRefiner authored a paper about 1 month ago
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree SearchOrganizations
None yet