AI & ML interests

None defined yet.

Recent Activity

alielfilali01 
posted an update 4 months ago
Wauplin 
posted an update 5 months ago
view post
Post
3252
Say hello to hf: a faster, friendlier Hugging Face CLI ✨

We are glad to announce a long-awaited quality-of-life improvement: the Hugging Face CLI has been officially renamed from huggingface-cli to hf!

So... why this change?

Typing huggingface-cli constantly gets old fast. More importantly, the CLI’s command structure became messy as new features were added over time (upload, download, cache management, repo management, etc.). Renaming the CLI is a chance to reorganize commands into a clearer, more consistent format.

We decided not to reinvent the wheel and instead follow a well-known CLI pattern: hf <resource> <action>. Isn't hf auth login easier to type and remember?

The full rationale, implementation details, and migration notes are in the blog post: https://huggingface.co/blog/hf-cli

AtAndDev 
posted an update 5 months ago
view post
Post
565
Qwen 3 Coder is a personal attack to k2, and I love it.
It achieves near SOTA on LCB while not having reasoning.
Finally people are understanding that reasoning isnt necessary for high benches...

Qwen ftw!

DECENTRALIZE DECENTRALIZE DECENTRALIZE
chansung 
posted an update 5 months ago
view post
Post
4382
YAML engineering becomes more and more important than ever from infra provisioning to model training (recipes).

Here, I built a simple editor first for @dstackai , and I will share the live endpoint this week. Let me know what you think about this approach.

Based on this approach, if people think this is useful, I am going to do the same thing for the LLM training recipes for popular frameworks such as Hugging Face open-r1, Axolotl, and so on. Let me hear.
AtAndDev 
posted an update 7 months ago
view post
Post
3076
deepseek-ai/DeepSeek-R1-0528

This is the end
  • 1 reply
·
alielfilali01 
posted an update 7 months ago
AtAndDev 
posted an update 8 months ago
view post
Post
3144
Llama 4 is out...
·
Wauplin 
posted an update 9 months ago
view post
Post
2332
‼️ huggingface_hub's v0.30.0 is out with our biggest update of the past two years!

Full release notes: https://github.com/huggingface/huggingface_hub/releases/tag/v0.30.0.

🚀 Ready. Xet. Go!

Xet is a groundbreaking new protocol for storing large objects in Git repositories, designed to replace Git LFS. Unlike LFS, which deduplicates files, Xet operates at the chunk level—making it a game-changer for AI builders collaborating on massive models and datasets. Our Python integration is powered by [xet-core](https://github.com/huggingface/xet-core), a Rust-based package that handles all the low-level details.

You can start using Xet today by installing the optional dependency:

pip install -U huggingface_hub[hf_xet]


With that, you can seamlessly download files from Xet-enabled repositories! And don’t worry—everything remains fully backward-compatible if you’re not ready to upgrade yet.

Blog post: https://huggingface.co/blog/xet-on-the-hub
Docs: https://huggingface.co/docs/hub/en/storage-backends#xet


⚡ Inference Providers

- We’re thrilled to introduce Cerebras and Cohere as official inference providers! This expansion strengthens the Hub as the go-to entry point for running inference on open-weight models.

- Novita is now our 3rd provider to support text-to-video task after Fal.ai and Replicate.

- Centralized billing: manage your budget and set team-wide spending limits for Inference Providers! Available to all Enterprise Hub organizations.

from huggingface_hub import InferenceClient
client = InferenceClient(provider="fal-ai", bill_to="my-cool-company")
image = client.text_to_image(
    "A majestic lion in a fantasy forest",
    model="black-forest-labs/FLUX.1-schnell",
)
image.save("lion.png")


- No more timeouts when generating videos, thanks to async calls. Available right now for Fal.ai, expecting more providers to leverage the same structure very soon!
·
chansung 
posted an update 9 months ago
view post
Post
3947
simple guide on the recipe for GRPO on Open-R1 which is built on top of TRL

I think FastAPI wrapper of vLLM with WeightSyncWorker is pretty cool feature. Also, we have many predefined reward functions out of the box!
·
chansung 
posted an update 9 months ago
view post
Post
2675
Mistral AI Small 3.1 24B is not only commercial free but also the best model in a single GPU deployment.

I packed up all the information you need to know in a single picture. Hope this helps! :)
  • 1 reply
·
AtAndDev 
posted an update 9 months ago
view post
Post
4371
There seems to multiple paid apps shared here that are based on models on hf, but some ppl sell their wrappers as "products" and promote them here. For a long time, hf was the best and only platform to do oss model stuff but with the recent AI website builders anyone can create a product (really crappy ones btw) and try to sell it with no contribution to oss stuff. Please dont do this, or try finetuning the models you use...
Sorry for filling yall feed with this bs but yk...
  • 6 replies
·
chansung 
posted an update 9 months ago
view post
Post
1604
Gemma 3 Release in a nutshell
(seems like function calling is not supported whereas the announcement said so)
AtAndDev 
posted an update 9 months ago
view post
Post
1670
Gemma 3 seems to be really good at human preference. Just waiting for ppl to see it.
alielfilali01 
posted an update 10 months ago
view post
Post
1107
🚨 Arabic LLM Evaluation 🚨

Few models join the ranking of https://huggingface.co/spaces/inceptionai/AraGen-Leaderboard Today.

The new MistralAI model, Saba, is quite impressive, Top10 ! Well done @arthurmensch and team.

Sadly Mistral did not follow its strategy about public weights this time, we hope this changes soon and we get the model with a permissive license.

We added other Mistral models and apparently, we have been sleeping on mistralai/Mistral-Large-Instruct-2411 !

Another impressive model that joined the ranking today is https://huggingface.co/ALLaM-AI/ALLaM-7B-Instruct-preview. After a long wait finally ALLaM is here and it is IMPRESSIVE given its size !

ALLaM is ranked on OALL/Open-Arabic-LLM-Leaderboard as well.
AtAndDev 
posted an update 10 months ago
chansung 
posted an update 10 months ago
view post
Post
3037
Simple Paper Review #5

I briefly reviewed the paper "SFT Memorizes, RL Generalizes," which compares SFT and RL in post-training of LLM/VLM from HKU, UC Berkeley, Google DeepMind, and New York University

The conclusion suggests SFT excels in memorization, while RL is better for generalization. However, since LLM/VLM should benefit humans beyond just generalization, a mix of SFT and RL is advisable. Typically, some SFT is followed by RL to understand prompt formats and enhance generalization through trial and error.

The study focused on one model, Llama-3.2-Vision-11B, using environments like General Points for arithmetic reasoning and V-IRL for spatial reasoning. Training data was used for both SFT and RL, with evaluations on in-distribution and out-of-distribution data to assess memorization and generalization.

I want to apply RL extensively, but it requires building a similar simulation environment. For domain-specific models, significant investment in creating a "playground" for the model is crucial, as the effort will directly influence the outcomes.

https://arxiv.org/abs/2501.17161
chansung 
posted an update 10 months ago
view post
Post
4509
A brief summary of the o3-mini

The OpenAI o3-mini model is a significant improvement over the o1-mini, reaching o1 performance levels. While generally good, its performance isn't universally better than previous models (o1, o1-prev.) or GPT-4o across all benchmarks. This means workflows should be re-evaluated with each model upgrade.

The o3-mini has "low," "medium," and "high" versions, with "low" being the base model used for benchmarking. It's speculated that the higher versions simply involve more processing. A fair comparison with other models like Gemini 2.0 Thinking or DeepSeek-R1 would likely need to use the "low" version and a similar "think more" mechanism.

The system card is recommended reading due to its comprehensive benchmark data.

https://openai.com/index/openai-o3-mini/
AtAndDev 
posted an update 11 months ago
view post
Post
1947
everywhere i go i see his face