Tung-Lin Wu's picture

Tung-Lin Wu

tunglinwood

·

tunglinwood

AI & ML interests

None yet

Recent Activity

upvoted a collection 12 days ago

new activity 4 months ago

moonshotai/Kimi-Audio-7B-Instruct:Add Kimi-Audio EOS and pad token ids

updated a model 4 months ago

tunglinwood/Kimi-Audio-7B-Instruct

View all activity

Organizations

None yet

upvoted a collection 12 days ago

Gemma 4

12 items • Updated 26 days ago • 862

upvoted an article 6 months ago

Article

Continuous batching from first principles

+1

ror, ArthurZ, mcpotato

•

Nov 25, 2025

• 397

upvoted a collection 9 months ago

DeepSeek-V3.1

3 items • Updated Mar 2 • 262

upvoted 2 collections about 1 year ago

Qwen3

84 items • Updated Dec 31, 2025 • 1.8k

GLM-4-0414

GLM-4-0414 series model • 6 items • Updated Mar 2 • 135

upvoted 2 papers about 1 year ago

OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models

Paper • 2502.01061 • Published Feb 3, 2025 • 225

Training Sparse Mixture Of Experts Text Embedding Models

Paper • 2502.07972 • Published Feb 11, 2025 • 10

upvoted a collection about 1 year ago

Qwen2.5-Omni

End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 6 items • Updated Mar 2 • 167

upvoted a paper about 1 year ago

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5, 2024 • 145

upvoted an article about 1 year ago

Article

Training and Finetuning Embedding Models with Sentence Transformers

tomaarsen

•

May 28, 2024

• 275

upvoted a paper about 1 year ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 260

upvoted 2 papers over 1 year ago

Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment

Paper • 2502.16894 • Published Feb 24, 2025 • 33

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22, 2025 • 452

upvoted 3 articles over 1 year ago

Article

What is test-time compute and how to scale it?

Kseniase

•

Feb 6, 2025

• 122

Article

Mixture of Experts Explained

+4

osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq

•

Dec 11, 2023

• 1.13k

Article

Open-source DeepResearch – Freeing our search agents

+3

m-ric, albertvillanova, merve, thomwolf, clefourrier

•

Feb 4, 2025

• 1.32k

upvoted a collection over 1 year ago

Llama 3.3

This collection hosts the transformers and original repos of the Llama 3.3 • 1 item • Updated Dec 6, 2024 • 205

upvoted a paper over 1 year ago

HelpSteer2-Preference: Complementing Ratings with Preferences

Paper • 2410.01257 • Published Oct 2, 2024 • 26

upvoted a collection over 1 year ago

Emu3

Emu3: Next-Token Prediction is All You Need • 7 items • Updated Feb 4 • 81

upvoted a paper over 1 year ago

NVLM: Open Frontier-Class Multimodal LLMs

Paper • 2409.11402 • Published Sep 17, 2024 • 74