AI & ML interests

None defined yet.

angtΒ 
posted an update 1 day ago
view post
Post
1639
installama.sh at the TigerBeetle 1000x World Tour !

Last week I had the chance to give a short talk during the TigerBeetle 1000x World Tour (organized by @jedisct1 πŸ‘ ) a fantastic event celebrating high-performance engineering and the people who love pushing systems to their limits!

In the talk, I focused on the CPU and Linux side of things, with a simple goal in mind: making the installation of llama.cpp instant, automatic, and optimal, no matter your OS or hardware setup.

For the curious, here are the links worth checking out:
Event page: https://tigerbeetle.com/event/1000x
GitHub repo: https://github.com/angt/installama.sh
Talk: https://youtu.be/pg5NOeJZf0o?si=9Dkcfi2TqjnT_30e

More improvements are coming soon. Stay tuned!
  • 1 reply
Β·
angtΒ 
posted an update 8 days ago
view post
Post
1585
I'm excited to share that https://installama.sh is up and running! πŸš€

On Linux / macOS / FreeBSD it is easier than ever:
curl https://installama.sh | sh


And Windows just joined the party πŸ₯³
irm https://installama.sh | iex

Stay tuned for new backends on Windows!
angtΒ 
posted an update 13 days ago
view post
Post
385
πŸš€ installama.sh update: Vulkan & FreeBSD support added!

The fastest way to install and run llama.cpp has just been updated!

We are expanding hardware and OS support to make local AI even more accessible. This includes:

πŸŒ‹ Vulkan support for Linux on x86_64 and aarch64.
😈 FreeBSD support (CPU backend) on x86_64 and aarch64 too.
✨ Lots of small optimizations and improvements under the hood.

Give it a try right now:
curl angt.github.io/installama.sh | MODEL=unsloth/Qwen3-4B-GGUF:Q4_0 sh
angtΒ 
posted an update 21 days ago
view post
Post
1960
One command line is all you need...

...to launch a local llama.cpp server on any Linux box or any Metal-powered Mac πŸš€

curl angt.github.io/installama.sh | MODEL=unsloth/gpt-oss-20b-GGUF sh


Learn more: https://github.com/angt/installama.sh
hlarcherΒ 
posted an update 4 months ago
view post
Post
349
GH200 cooking time πŸ§‘β€πŸ³πŸ”₯!

We just updated GPU-fryer 🍳 to run on Grace Hopper Superchip (GH200) - fully optimized for ARM-based systems!
With this release, we switched to cuBLASLt to support running FP8 benchmarks. You can monitor GPU throttling, TFLOPS outliers, HBM memory health, and ensure that you get the most of your hardware setup.
Perfect for stress testing and tuning datacenter GPUs.

Check it out on Github πŸ‘‰ https://github.com/huggingface/gpu-fryer
angtΒ 
posted an update 4 months ago
angtΒ 
posted an update 6 months ago
hlarcherΒ 
posted an update 11 months ago
view post
Post
1171
We are introducing multi-backend support in Hugging Face Text Generation Inference!
With new TGI architecture we are now able to plug new modeling backends to get best performances according to selected model and available hardware. This first step will very soon be followed by the integration of new backends (TRT-LLM, llama.cpp, vLLM, Neuron and TPU).

We are polishing the TensorRT-LLM backend which achieves impressive performances on NVIDIA GPUs, stay tuned πŸ€— !

Check out the details: https://huggingface.co/blog/tgi-multi-backend