Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Jeffrey Magder's picture

Jeffrey Magder

jmagder

AIGeeker's profile picture

Mi6paulino's profile picture

·

jmagder

AI & ML interests

None yet

Organizations

None yet

jmagder 's collections 3

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30, 2024 • 80

Finished Reading

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1, 2024 • 28
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 15
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 122
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 150
Elucidating the Design Space of Diffusion-Based Generative Models

Paper • 2206.00364 • Published Jun 1, 2022 • 18
GLU Variants Improve Transformer

Paper • 2002.05202 • Published Feb 12, 2020 • 5
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 156

Better & Faster Large Language Models via Multi-token Prediction

Paper • 2404.19737 • Published Apr 30, 2024 • 80

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 150
Elucidating the Design Space of Diffusion-Based Generative Models

Paper • 2206.00364 • Published Jun 1, 2022 • 18
GLU Variants Improve Transformer

Paper • 2002.05202 • Published Feb 12, 2020 • 5
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 156

Finished Reading

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1, 2024 • 28
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 15
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 122
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 9

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs