Shetu Mohanto's picture

Shetu Mohanto

shetumohanto

·

AI & ML interests

GenAI | MLOps | AI agent | Computer Vision

Recent Activity

reacted to mmhamdy's post with 🚀 about 2 months ago

The new DeepSeek Engram paper is super fun! It also integrates mHC, and I suspect they're probably releasing all these papers to make the V4 report of reasonable length😄 Here's a nice short summary from Gemini

reacted to prithivMLmods's post with 🔥 about 2 months ago

Now, a collection of various compression schemes for Qwen3.6 and the abliterated version 1 of dense models is available on the Hub. Check it out via the links below. 👇 🔗 Qwen3.6-MoE: https://huggingface.co/collections/prithivMLmods/qwen36-35b-a3b-compressions 🔗 Qwen3.6-27B Compressions: https://huggingface.co/collections/prithivMLmods/qwen36-27b-compressions 🤗 > To learn more, visit the app page or the respective model pages.

reacted to burtenshaw's post with ❤️ 10 months ago

Smol course has a distinctive approach to teaching post-training, so I'm posting about how it’s different to other post-training courses, including the llm course that’s already available. In short, the smol course is just more direct that any of the other course, and intended for semi-pro post trainers. - It’s a minimal set of instructions on the core parts. - It’s intended to bootstrap real projects you're working on. - The material handsover to existing documentation for details - Likewise, it handsover to the LLM course for basics. - Assessment is based on a leaderboard, without reading all the material. To start the smol course, follow here: https://huggingface.co/smol-course

View all activity

Organizations

liked a model almost 2 years ago

mattshumer/Reflection-Llama-3.1-70B

Text Generation • 71B • Updated Sep 24, 2024 • 131 • • 1.71k

liked a model about 2 years ago

mistralai/Mistral-7B-v0.1

Text Generation • 7B • Updated Jul 24, 2025 • 892k • • 4.11k