gg-tt

company
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

danielhanchen 
posted an update about 8 hours ago
tomaarsen 
posted an update 2 days ago
view post
Post
255
🤗 Announcing the Ettin Reranker family: six new state-of-the-art CrossEncoder rerankers for search from 17M to 1B parameters, plus the full training data and the ~150-line recipe. Built on the Ettin ModernBERT encoders, Apache 2.0. Details:

All six were trained with the same single-stage pointwise MSE distillation recipe, with mixedbread-ai/mxbai-rerank-large-v2 (1.54B) as the teacher. Only the learning rate and per-device batch size change between sizes. The 1B student matches the teacher within 0.0001 NDCG@10 on MTEB(eng, v2) Retrieval, the 150M is the strongest reranker I tested in the under-600M range, and the 17M beats the 33M ms-marco-MiniLM-L12-v2 by +0.051 NDCG@10 at roughly half the parameter count.

Speed matters as much as quality for a reranker, since it determines whether the model fits the latency budget between retrieval and showing results. Our 17M is the fastest reranker in the whole comparison at 7517 pairs/sec on an H100. Our 150M runs 2.3x faster than the two other 150M ModernBERT-base rerankers (gte-reranker-modernbert-base and granite-embedding-reranker-english-r2) because the modular Transformer module propagates unpadded inputs through every layer rather than just the FA2 attention kernel. And our 1B is 2.4x faster than its 1.5B teacher while matching it on quality.

I bootstrapped the training recipe with the new train-sentence-transformers Agent Skill shipped in Sentence Transformers v5.5.0. Install it with hf skills add train-sentence-transformers --claude and ask Claude Code (or Codex / Cursor / Gemini CLI) to fine-tune a SentenceTransformer, CrossEncoder, or SparseEncoder model on your data.

I wrote a blog post walking through usage, results across six embedder pairings, the speed story, and the complete training script. Check it out, or just point your Agent to the URL:

https://huggingface.co/blog/ettin-reranker

Collection: https://huggingface.co/collections/cross-encoder/ettin-rerankers
danielhanchen 
posted an update 8 days ago
view post
Post
5649
We’re excited to announce that Unsloth has joined the PyTorch Ecosystem! 🔥🦥

Unsloth is an open-source project that makes training & running models more accurate and faster with less compute. Our mission is to make local AI accessible to everyone. Thanks to all of you for making this possible! 💕

Blog: https://unsloth.ai/blog/pytorch
GitHub: https://github.com/unslothai/unsloth
  • 2 replies
·
tomaarsen 
posted an update 9 days ago
view post
Post
355
🤖 I've just published Sentence Transformers v5.5.0, headlined by a new train-sentence-transformers Agent Skill that lets your AI coding agent (Claude Code, Codex, Cursor, Gemini CLI, ...) train and finetune embedding, reranker, and sparse encoder models for you. Plus training losses & fixes. Details:

The skill bundles curated guidance for the whole training workflow across all three model types: base model selection, loss and evaluator choice, hard-negative mining, distillation, LoRA, Matryoshka, multilingual training, static embeddings, etc. It also ships production-ready training template scripts the agent can adapt. Install it with hf skills add train-sentence-transformers, then just describe what you want, e.g. "finetune a reranker on my (question, answer) pairs, mine hard negatives, and push it to the Hub".

On the loss side: EmbedDistillLoss is a new embedding-level distillation loss for SentenceTransformer. Instead of distilling teacher scores like MarginMSELoss, it aligns the student's embeddings directly with pre-computed teacher embeddings, wtih an optional learnable projection for when the student and teacher dimensions differ. Second, ADRMSELoss is a new listwise learning-to-rank loss for CrossEncoder from the Rank-DistilLLM paper, aimed at the LLM-distillation reranking setting.

encode() and predict() also gained a per-call processing_kwargs override, so you can change processor settings like max_length, a vision-language model's image resolution, or a video's fps, for a single call without rebuilding the model.

The Agent Skill is the part of this release I'm most keen for people to try. Curious to hear how it works for you. I've been using it myself a lot to quickly set up some training runs that immediately use a bunch of best practices.

> pip install sentence-transformers==5.5.0
> hf skills add train-sentence-transformers

The full release notes: https://github.com/huggingface/sentence-transformers/releases/tag/v5.5.0
  • 3 replies
·
danielhanchen 
posted an update 12 days ago
view post
Post
7620
We collaborated with NVIDIA to teach you how we made LLM training ~25% faster! 🚀

Learn how 3 optimizations help your home GPU train models faster:
1. Packed-sequence metadata caching
2. Double-buffered checkpoint reloads
3. Faster MoE routing

Guide: https://unsloth.ai/blog/nvidia-collab
GitHub: https://github.com/unslothai/unsloth
danielhanchen 
posted an update 16 days ago
view post
Post
8770
We made a guide on how to run open LLMs in Claude Code, Codex and OpenClaw.

Use Gemma 4 and Qwen3.6 GGUFs for local agentic coding on 24GB RAM

Run with self-healing tool calls, code execution, web search via the Unsloth API endpoint and llama.cpp

Guide: https://unsloth.ai/docs/basics/api
danielhanchen 
posted an update 23 days ago
view post
Post
10770
Unsloth is now one of the top 10 most followed organizations on Hugging Face. 🤗🦥

Thanks so much for all the support!
Our HF page:
unsloth
  • 5 replies
·
mlabonne 
posted an update 24 days ago
view post
Post
1705
Big update to llm-datasets, my curated list of datasets and tools for post-training LLMs.

> Added many new datasets
> New "thinking" column
> Refreshed recommended tools.

Thanks to everyone who told me they used it for their research at ICLR, you motivated this update!
  • 2 replies
·
danielhanchen 
posted an update 29 days ago
danielhanchen 
posted an update about 1 month ago
tomaarsen 
posted an update about 1 month ago
view post
Post
1012
🌐 I've just published Sentence Transformers v5.4 to make the project fully multimodal for embeddings and reranking. The release also includes a modular CrossEncoder, and automatic Flash Attention 2 input flattening. Details:

You can now use SentenceTransformer and CrossEncoder with text, images, audio, and video, with the same familiar API. That means you can compute embeddings for an image and a text query using model.encode(), compare them with model.similarity(), and it just works. Models like Qwen3-VL-Embedding-2B and jinaai/jina-reranker-m0 are supported out of the box.

Beyond multimodal, I also fully modularized the CrossEncoder class. It's now a torch.nn.Sequential of composable modules, just like SentenceTransformer has been. This unlocked support for generative rerankers (CausalLM-based models like mxbai-rerank-v2 and the Qwen3 rerankers) via a new LogitScore module, which wasn't possible before without custom code.

Also, Flash Attention 2 now automatically skips padding for text-only inputs. If your batch has a mix of short and long texts, this gives you a nice speedup and lower VRAM usage for free.

I wrote a blog post walking through the multimodal features with practical examples. Check it out if you want to get started, or just point your Agent to the URL: https://huggingface.co/blog/multimodal-sentence-transformers

This release has set up the groundwork for more easily introducing late-interaction models (both text-only and multimodal) into Sentence Transformers in the next major release. I'm looking forward to it!
danielhanchen 
posted an update about 1 month ago
danielhanchen 
posted an update about 2 months ago
danielhanchen 
posted an update about 2 months ago
view post
Post
2780
A new way to use Unsloth.

Coming soon...
danielhanchen 
posted an update about 2 months ago
view post
Post
938
You don’t need to set LLM parameters anymore! 🚀

llama.cpp uses only the context length + compute your local setup needs. Unsloth also auto-applies the correct model settings

Try in Unsloth Studio - now with precompiled llama.cpp binaries.

GitHub: https://github.com/unslothai/unsloth
  • 2 replies
·
danielhanchen 
posted an update 2 months ago
view post
Post
3425
Introducing Unsloth Studio ✨
A new open-source web UI to train and run LLMs.

• Run models locally on Mac, Windows, Linux
• Train 500+ models 2x faster with 70% less VRAM
• Supports GGUF, vision, audio, embedding models
• Auto-create datasets from PDF, CSV, DOCX
• Self-healing tool calling and code execution
• Compare models side by side + export to GGUF

GitHub: https://github.com/unslothai/unsloth
Blog and Guide: https://unsloth.ai/docs/new/studio

Available now on Hugging Face, NVIDIA, Docker and Colab.
danielhanchen 
posted an update 2 months ago
view post
Post
3937
We collaborated with NVIDIA to teach you about Reinforcement Learning and RL environments. 💚 Learn:

• Why RL environments matter + how to build them
• When RL is better than SFT
• GRPO and RL best practices
• How verifiable rewards and RLVR work

Blog: https://unsloth.ai/blog/rl-environments
  • 4 replies
·