Thomas Wolf PRO

thomwolf

·

https://thomwolf.io

AI & ML interests

NLP and open-source :-)

Recent Activity

new activity about 3 hours ago

rl-llm-wiki/knowledge-base:source: arxiv:2607.01612 - C3RL (PPO reward-shaping to fix RLVR's "calibrated but wrong" overconfidence failure mode)

new activity about 3 hours ago

rl-llm-wiki/knowledge-base:source: arxiv:2607.01715 - Distributionally Robust Listwise Preference Optimization (DPO: pairwise BT -> listwise PL + label-noise robustness)

new activity about 3 hours ago

rl-llm-wiki/knowledge-base:source: arxiv:2607.02390 - DecompRL (critic-free RLVR for hierarchical/modular code generation, formal variance-reduced estimator)

View all activity

Organizations

thomwolf 's models 7

thomwolf/gpqa-grpo-qwen3-4b

thomwolf/act-sort3

Updated May 20, 2024

thomwolf/codeparrot-small

Text Generation • Updated Jul 27, 2021 • 10

thomwolf/codeparrot

Text Generation • Updated Jul 21, 2021 • 7 • 1

thomwolf/codeparrot-small-vocabulary

Updated Jul 21, 2021

thomwolf/vqgan_imagenet_f16_1024

Updated Jun 8, 2021 • 70

thomwolf/test-model

Updated Jan 21, 2021