Leandro von Werra's picture

Leandro von Werra PRO

lvwerra

huggingface

·

https://www.lvwerra.com

AI & ML interests

NLP and RL

Recent Activity

new activity about 3 hours ago

rl-llm-wiki/knowledge-base:source: arxiv:1707.06347 — Proximal Policy Optimization (PPO)

updated a bucket about 5 hours ago

rl-llm-wiki/rl-the-first-one

published a bucket about 5 hours ago

rl-llm-wiki/rl-the-first-one

View all activity

Organizations

New activity in rl-llm-wiki/knowledge-base about 3 hours ago

source: arxiv:1707.06347 — Proximal Policy Optimization (PPO)

#1 opened about 5 hours ago by

New activity in attention-wiki/knowledge-base about 12 hours ago

Add source: Shaw et al. — Self-Attention with Relative Position Representations

#20 opened 1 day ago by

Add sources: T5, DeBERTa, TUPE — relative & disentangled positional encoding

#26 opened 1 day ago by

Add source: In-context Learning and Induction Heads (arxiv:2209.11895)

#30 opened 1 day ago by

Add sources: the 'attention as explanation' debate — Jain&Wallace + Wiegreffe&Pinter

#32 opened 1 day ago by

Add source: NoPE — positional encoding & length generalization (arxiv:2305.19466)

#33 opened 1 day ago by

New activity in attention-wiki/knowledge-base 1 day ago

Add source: H2O — Heavy-Hitter KV-cache eviction (arxiv:2306.14048)

#29 opened 1 day ago by

Add source: S4 (structured state spaces) + claims + state-space-hybrids page

#25 opened 1 day ago by

Add source: Retrieval Head Mechanistically Explains Long-Context Factuality (arxiv:2404.15574)

#31 opened 1 day ago by

Add source: Long Range Arena + 'no universal winner' claim + LRA evidence on low-rank claim

#28 opened 1 day ago by

Add source: Reformer (LSH + reversible layers) + claim + sparse-attention page

#27 opened 1 day ago by

Process arXiv:2310.01889 - Ring Attention

#19 opened 1 day ago by

Add source: GQA — Grouped-Query Attention (arxiv:2305.13245)

#21 opened 1 day ago by

Add source: Ring Attention with Blockwise Transformers (arXiv:2310.01889)

#17 opened 1 day ago by

Add source: Multi-head Latent Attention (DeepSeek-V2, arxiv:2405.04434)

#22 opened 1 day ago by

Add source: Self-attention Does Not Need O(n^2) Memory (Rabe & Staats, arxiv:2112.05682)

#23 opened 1 day ago by

Add source: Linformer (linear-complexity self-attention) + claims + topic page

#24 opened 1 day ago by

Process arXiv:1901.02860 - Transformer-XL

#3 opened 1 day ago by

Process arXiv:1911.02150 - Multi-query attention

#4 opened 1 day ago by

Process arXiv:2108.12409 - ALiBi

#5 opened 1 day ago by