We just updated GPU-fryer ๐ณ to run on Grace Hopper Superchip (GH200) - fully optimized for ARM-based systems! With this release, we switched to cuBLASLt to support running FP8 benchmarks. You can monitor GPU throttling, TFLOPS outliers, HBM memory health, and ensure that you get the most of your hardware setup. Perfect for stress testing and tuning datacenter GPUs.
AMD summer hackathons are here! A chance to get hands-on with MI300X GPUs and accelerate models. ๐ซ๐ท Paris - Station F - July 5-6 ๐ฎ๐ณ Mumbai - July 12-13 ๐ฎ๐ณ Bengaluru - July 19-20
Hugging Face and GPU Mode will be on site and on July 6 in Paris @ror will share lessons learned while building new kernels to accelerate Llama 3.1 405B on ROCm
Wrapping up a week of shipping and announcements with Dell Enterprise Hub now featuring AI Applications, on-device models for AI PCs, a new CLI and Python SDK... all you need for building AI on premises!
Enterprise orgs now enable serverless Inference Providers for all members - includes $2 free usage per org member (e.g. an Enterprise org with 1,000 members share $2,000 free credit each month) - admins can set a monthly spend limit for the entire org - works today with Together, fal, Novita, Cerebras and HF Inference.
We are introducing multi-backend support in Hugging Face Text Generation Inference! With new TGI architecture we are now able to plug new modeling backends to get best performances according to selected model and available hardware. This first step will very soon be followed by the integration of new backends (TRT-LLM, llama.cpp, vLLM, Neuron and TPU).
We are polishing the TensorRT-LLM backend which achieves impressive performances on NVIDIA GPUs, stay tuned ๐ค !
Cosmos is a family of pre-trained models purpose-built for generating physics-aware videos and world states to advance physical AI development. The release includes Tokenizers nvidia/cosmos-tokenizer-672b93023add81b66a8ff8e6
Pro Tip - if you're a Firefox user, you can set up Hugging Chat as integrated AI Assistant, with contextual links to summarize or simplify any text - handy!
These 15 open models are available for serverless inference on Cloudflare Workers AI, powered by GPUs distributed in 150 datacenters globally - ๐ @rita3ko@mchenco@jtkipp@nkothariCF@philschmid