Pham Van Linh

phamvanlinh143

2 87 39

AI & ML interests

OCR, AI, DL

Recent Activity

liked a model 4 days ago

baidu/Unlimited-OCR

liked a dataset about 1 month ago

infly/Infinity-Doc2-5M

liked a model about 2 months ago

microsoft/Phi-3.5-vision-instruct

View all activity

Organizations

None yet

upvoted 3 articles 2 months ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

natolambert, LouisCastricato, lvwerra, Dahoas

•

Dec 9, 2022

• 418

Article

Building a Fast Multilingual OCR Model with Synthetic Data

nvidia

•

Apr 17

• 34

Article

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

rishiraj

•

Jun 26, 2025

• 50

upvoted an article 3 months ago

Article

Welcome Gemma 4: Frontier multimodal intelligence on device

merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift

•

Apr 2

• 910

upvoted 5 articles 4 months ago

Article

KV Cache from scratch in nanoVLM

ariG23498, kashif, lusxvr, andito, pcuenq

•

Jun 4, 2025

• 120

Article

Unlocking Longer Generation with Key-Value Cache Quantization

RaushanTurganbay

•

May 16, 2024

• 57

Article

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

ggerganov, ngxson, allozaur, lysandre, victor, julien-c

•

Feb 20

• 507

Article

Continuous batching from first principles

ror, ArthurZ, mcpotato

•

Nov 25, 2025

• 411

Article

Mixture of Experts (MoEs) in Transformers

ariG23498, pcuenq, merve, IlyasMoutawwakil, ArthurZ, sergiopaniego, Molbap

•

Feb 26

• 169

upvoted 3 articles 5 months ago

Article

2. Attention Optimizations: From Standard Attention to FlashAttention

atharv6f

•

Feb 9

• 2

Article

VLM-OCR Recipes on GPU Infrastructure

florentgbelidji

•

Jan 15

• 16

Article

My Journey Into Vision Models

ngxson

•

Apr 12, 2025

• 8

upvoted 3 papers 5 months ago

Reinforcement Learning via Self-Distillation

Paper • 2601.20802 • Published Jan 28 • 51

Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 60

SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14, 2025 • 164

upvoted 2 articles 5 months ago

Article

Performant local mixture-of-experts CPU inference with GPU acceleration in llama.cpp

Doctor-Shotgun

•

Jan 30

• 28

Article

LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family

lightonai

•

Jan 19

• 96

upvoted 3 articles 6 months ago

Article

The Optimal Architecture for Small Language Models

codelion

•

Dec 26, 2025

• 121

Article

Tokenization in Transformers v5: Simpler, Clearer, and More Modular

itazap, ariG23498, ArthurZ, sergiopaniego, merve, pcuenq

•

Dec 18, 2025

• 125

Article

Shrinking Giants: The Quantization Mathematics Making LLMs Accessible

royswastik

•

May 3, 2025

• 2

Pham Van Linh

AI & ML interests

Recent Activity

Organizations

phamvanlinh143's activity

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Building a Fast Multilingual OCR Model with Synthetic Data

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

Welcome Gemma 4: Frontier multimodal intelligence on device

KV Cache from scratch in nanoVLM

Unlocking Longer Generation with Key-Value Cache Quantization

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

Continuous batching from first principles

Mixture of Experts (MoEs) in Transformers

2. Attention Optimizations: From Standard Attention to FlashAttention

VLM-OCR Recipes on GPU Infrastructure

My Journey Into Vision Models

Performant local mixture-of-experts CPU inference with GPU acceleration in llama.cpp

LightOnOCR-2-1B: a lightweight high-performance end-to-end OCR model family

The Optimal Architecture for Small Language Models

Tokenization in Transformers v5: Simpler, Clearer, and More Modular

Shrinking Giants: The Quantization Mathematics Making LLMs Accessible