27 5

Austin Liu

Austin362667

austin362667

AI & ML interests

None yet

Recent Activity

upvoted an article 1 day ago

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

upvoted an article about 1 month ago

Understanding Vector Quantization in VQ-VAE

upvoted an article about 2 months ago

Pallas for people who know JAX but not kernels yet

View all activity

Organizations

upvoted an article 1 day ago

Article

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

nvidia

•

2 days ago

• 26

upvoted an article about 1 month ago

Article

Understanding Vector Quantization in VQ-VAE

ariG23498

•

Aug 28, 2024

• 64

upvoted an article about 2 months ago

Article

Pallas for people who know JAX but not kernels yet

ariG23498

•

Apr 29

• 21

upvoted an article 2 months ago

Article

The PR you would have opened yourself

pcuenq, awni

•

Apr 16

• 72

upvoted a paper 3 months ago

Embarrassingly Simple Self-Distillation Improves Code Generation

Paper • 2604.01193 • Published Apr 1 • 56

upvoted 2 articles 4 months ago

Article

Assisted Generation: a new direction toward low-latency text generation

joaogante

•

May 11, 2023

• 79

Article

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

ybelkada, timdettmers

•

Aug 17, 2022

• 136

upvoted a collection 4 months ago

SiliconMind-V1

Collection

5 items • Updated 9 days ago • 2

upvoted 2 articles 4 months ago

Article

OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve

codelion

•

May 20, 2025

• 70

Article

KV Cache from scratch in nanoVLM

ariG23498, kashif, lusxvr, andito, pcuenq

•

Jun 4, 2025

• 120

upvoted an article 5 months ago

Article

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

•

Jan 27

• 80

upvoted an article 7 months ago

Article

Continuous batching from first principles

ror, ArthurZ, mcpotato

•

Nov 25, 2025

• 411

upvoted 2 articles 9 months ago

Article

Key Insights into the Law of Vision Representations in MLLMs

Borise

•

Sep 2, 2024

• 20

Article

KV Caching Explained: Optimizing Transformer Inference Efficiency

not-lain

•

Jan 30, 2025

• 351

upvoted 5 articles 11 months ago

Article

An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct

leonardlin

•

Jun 11, 2024

• 69

Article

Parquet Content-Defined Chunking

kszucs

•

Jul 25, 2025

• 75

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

danaaubakirova, andito, merve, ariG23498, fracapuano, loubnabnl, pcuenq, mshukor, cadene

•

Jun 3, 2025

• 355

Article

TimeScope: How Long Can Your Video Large Multimodal Model Go?

orrzohar, ruili0, andito, nicholswang

•

Jul 23, 2025

• 48

Article

⚡ nano-vLLM: Lightweight, Low-Latency LLM Inference from Scratch

zamal

•

Jun 28, 2025

• 44

upvoted an article 12 months ago

Article

SmolVLM2: Bringing Video Understanding to Every Device

orrzohar, mfarre, andito, merve, pcuenq, cyrilzakka, Xenova

•

Feb 20, 2025

• 343

Austin Liu

AI & ML interests

Recent Activity

Organizations

Austin362667's activity

Accelerating Transformers Fine-Tuning with NVIDIA NeMo AutoModel

Understanding Vector Quantization in VQ-VAE

Pallas for people who know JAX but not kernels yet

The PR you would have opened yourself

Assisted Generation: a new direction toward low-latency text generation

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

OpenEvolve: An Open Source Implementation of Google DeepMind's AlphaEvolve

KV Cache from scratch in nanoVLM

Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective

Continuous batching from first principles

Key Insights into the Law of Vision Representations in MLLMs

KV Caching Explained: Optimizing Transformer Inference Efficiency

An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct

Parquet Content-Defined Chunking

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

TimeScope: How Long Can Your Video Large Multimodal Model Go?

⚡ nano-vLLM: Lightweight, Low-Latency LLM Inference from Scratch

SmolVLM2: Bringing Video Understanding to Every Device