Leandro von Werra PRO
AI & ML interests
NLP and RL
Recent Activity
new activity about 3 hours ago
rl-llm-wiki/knowledge-base:source: arxiv:1707.06347 — Proximal Policy Optimization (PPO) updated a bucket about 5 hours ago
rl-llm-wiki/rl-the-first-one published a bucket about 5 hours ago
rl-llm-wiki/rl-the-first-one