HF-Party (Hugging Face Party @ PyTorch Conference)

posted an update 9 days ago

Post

3018

1-bit GLM-5.2 GGUF vs. Claude 4.8 Opus vs. GPT-5.5

We gave 3 models the same prompt and compared one-shot outputs.

The 1-bit GLM-5.2 GGUF ran locally on a Mac Studio M3 Ultra with 256GB RAM at ~21.6 tok/s.

Which output do you like best?
GGUF: unsloth/GLM-5.2-GGUF

3 replies

·

danielhanchen

posted an update 16 days ago

Post

4488

Google's new DiffusionGemma can now run at 2000+ tokens/sec! ⚡

We made local DiffusionGemma inference 1.8× faster.
Run it on 18GB RAM via Unsloth Studio.

GitHub: https://github.com/unslothai/unsloth
Guide: https://unsloth.ai/docs/models/diffusiongemma

4 replies

·

danielhanchen

posted an update 22 days ago

Post

1114

Google releases DiffusionGemma.✨
The new 26B-A4B diffusion text model runs locally on 18GB RAM.

Run with 4x faster text generation, thinking, image, video and 256K context. Run and train via Unsloth Studio.

GGUF: unsloth/diffusiongemma-26B-A4B-it-GGUF
Guide: https://unsloth.ai/docs/models/diffusiongemma

1 reply

·

danielhanchen

posted an update 24 days ago

Post

4226

Google releases Gemma 4 QAT. ✨
You can now run Gemma 4 at 3x less memory with near original performance.

QAT makes it possible to run Gemma 4 26B-A4B on 16GB RAM.

GGUFs: https://huggingface.co/collections/unsloth/gemma-4-qat
QAT Guide: https://unsloth.ai/docs/models/gemma-4/qat

1 reply

·

danielhanchen

posted an update 29 days ago

Post

9251

Gemma 4 12B can now run locally on just 8GB RAM via Dynamic GGUFs.

Google's new model, Gemma 4 12B Unified supports image, audio and 256K context.
You can run and train the model via Unsloth Studio.

GGUF: unsloth/gemma-4-12b-it-GGUF
Guide: https://unsloth.ai/docs/models/gemma-4

5 replies

·

danielhanchen

posted an update about 1 month ago

Post

2803

Qwen3.6 MTP is here! Run locally on 20GB RAM. ⚡️

MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change.

Qwen3.6-27B: unsloth/Qwen3.6-27B-MTP-GGUF
Qwen3.6-35B-A3B: unsloth/Qwen3.6-35B-A3B-MTP-GGUF
Guide: https://unsloth.ai/docs/models/qwen3.6#mtp-guide

2 replies

·

danielhanchen

posted an update about 2 months ago

Post

5961

We’re excited to announce that Unsloth has joined the PyTorch Ecosystem! 🔥🦥

Unsloth is an open-source project that makes training & running models more accurate and faster with less compute. Our mission is to make local AI accessible to everyone. Thanks to all of you for making this possible! 💕

Blog: https://unsloth.ai/blog/pytorch
GitHub: https://github.com/unslothai/unsloth

2 replies

·

danielhanchen

posted an update about 2 months ago

Post

7769

We collaborated with NVIDIA to teach you how we made LLM training ~25% faster! 🚀

Learn how 3 optimizations help your home GPU train models faster:
1. Packed-sequence metadata caching
2. Double-buffered checkpoint reloads
3. Faster MoE routing

Guide: https://unsloth.ai/blog/nvidia-collab
GitHub: https://github.com/unslothai/unsloth

Aurelien-Morgan

posted an update about 2 months ago

Post

1097

@retrain-pipelines v0.2.0 is out !
I'm at Station F at My booth with GOSIM Paris 2026 today & tomorrow.
Come meet me for a live in-person demo and a chat !

1 reply

·

danielhanchen

posted an update about 2 months ago

Post

8910

We made a guide on how to run open LLMs in Claude Code, Codex and OpenClaw.

Use Gemma 4 and Qwen3.6 GGUFs for local agentic coding on 24GB RAM

Run with self-healing tool calls, code execution, web search via the Unsloth API endpoint and llama.cpp

Guide: https://unsloth.ai/docs/basics/api

woojun-jung

authored a paper 2 months ago

QEVA: A Reference-Free Evaluation Metric for Narrative Video Summarization with Multimodal Question Answering

Paper • 2604.24052 • Published Apr 27

danielhanchen

posted an update 2 months ago

Post

10853

Unsloth is now one of the top 10 most followed organizations on Hugging Face. 🤗🦥

Thanks so much for all the support!
Our HF page:

unsloth

5 replies

·

danielhanchen

posted an update 2 months ago

Post

5394

Qwen3.6-27B is out now! Run it locally on 18GB RAM. 💜

Qwen3.6-27B surpasses Qwen3.5-397B-A17B on all major coding benchmarks.

GGUFs to run: unsloth/Qwen3.6-27B-GGUF
Guide + MLX: https://unsloth.ai/docs/models/qwen3.6

danielhanchen

posted an update 3 months ago

Post

2869

Qwen3.6-35B-A3B can now be run locally! 💜

The model is the strongest mid-sized LLM on nearly all benchmarks.

Run on 23GB RAM via Unsloth Dynamic GGUFs.

GGUFs to run: unsloth/Qwen3.6-35B-A3B-GGUF
Guide: https://unsloth.ai/docs/models/qwen3.6

13 replies

·

Aurelien-Morgan

posted an update 3 months ago

Post

231

Launching a workweek of @retrain-pipelines wheels.

Day #1 : Compose

4 replies

·

parkneurals

authored a paper 3 months ago

KV Cache Recycling to Expand Usable Context Capacity in Low Parameter LLMs

Paper • 2512.11851 • Published Dec 4, 2025 • 1

danielhanchen

posted an update 3 months ago

Post

5567

You can now fine-tune Gemma 4 for free with our notebooks! 🔥

You just need 8GB VRAM to train Gemma 4 locally!

Unsloth trains Gemma4 1.5x faster with 50% less VRAM.
GitHub: https://github.com/unslothai/unsloth
Guide + Notebooks: https://unsloth.ai/docs/models/gemma-4/train

5 replies

·

danielhanchen

posted an update 3 months ago

Post

3880

Google releases Gemma 4. ✨
Gemma 4 introduces 4 models: E2B, E4B, 26B-A4B, 31B.
The multimodal reasoning models are under Apache 2.0.

Run E2B and E4B on ~6GB RAM, and on phones. Run 26B-A4B and 31B on ~18GB.

GGUFs: https://huggingface.co/collections/unsloth/gemma-4
Guide: https://unsloth.ai/docs/models/gemma-4

danielhanchen

posted an update 3 months ago

Post

2814

A new way to use Unsloth.

Coming soon...

1aurent

authored a paper 3 months ago

Voxtral TTS

Paper • 2603.25551 • Published Mar 26 • 63

AI & ML interests

Team members 200

HF-Party's activity