Aayush Rajput

aayushhumai

1

·

AI & ML interests

None yet

Recent Activity

new activity about 1 month ago

SulphurAI/Sulphur-2-base:short movie script

reacted to alvarobartt's post with ❤️ about 1 month ago

Latest `hf-mem` release added a breakdown of Mixture-of-Experts (MoE) memory usage! TL; DR MoEs can be misleading to reason about from active parameters alone, since each token only activates a subset of experts, while the serving setup still needs to account for the full resident memory footprint. 🧠 `hf-mem` now splits MoE memory into base model weights, routed experts, and KV cache 🏗️ Dense models usually load and use most weights every forward pass, while MoEs load many experts but only route each token to a few of them ⚡ Active params isn't the same as memory footprint, especially for sparse architectures 📦 Runtime memory is about what is used per request/token, while loading memory also includes the expert weights that need to be resident 📚 KV cache can still dominate depending on context length, batch size, and concurrency 🔀 Expert Parallelism (EP) helps shard experts across accelerators when expert weights dominate 🚀 Data Parallelism (DP) + EP is often a good fit for throughput-oriented MoE serving Check the repository at https://github.com/alvarobartt/hf-mem

View all activity

Organizations

None yet

models 0

None public yet

datasets 0

None public yet