chen's picture

chen

2395959141pq

·

AI & ML interests

生成式AI ， CV

Organizations

None yet

upvoted 2 articles 9 months ago

Article

Efficient Request Queueing – Optimizing LLM Performance

tngtech

•

Apr 2, 2025

• 26

Article

Prefill and Decode for Concurrent Requests - Optimizing LLM Performance

tngtech

•

Apr 16, 2025

• 78

upvoted 2 articles 12 months ago

Article

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

ybelkada, timdettmers

•

Aug 17, 2022

• 132

Article

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

+3

ybelkada, timdettmers, artidoro, sgugger, smangrul

•

May 24, 2023

• 180

upvoted an article about 1 year ago

Article

Mixture of Experts Explained

+4

osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq

•

Dec 11, 2023

• 1.13k