Fabio Ferrua

Marvin73

1 27

AI & ML interests

None yet

Recent Activity

liked a model 17 days ago

deepreinforce-ai/Ornith-1.0-35B-GGUF

liked a model about 1 month ago

FunAudioLLM/SenseVoiceSmall

liked a model about 1 month ago

nvidia/LocateAnything-3B

View all activity

Organizations

None yet

liked a model 17 days ago

deepreinforce-ai/Ornith-1.0-35B-GGUF

Text Generation • 35B • Updated 18 days ago • 1.35M • 856

liked 2 models about 1 month ago

FunAudioLLM/SenseVoiceSmall

Automatic Speech Recognition • Updated 23 days ago • 19.1k • 434

nvidia/LocateAnything-3B

Image-Text-to-Text • 4B • Updated about 1 month ago • 1.5M • 2.72k

liked a model about 2 months ago

stevelikesrhino/gemma-4-31B-it-nvfp4-GGUF

Text Generation • 31B • Updated Jun 5 • 572 • 6

liked a model 3 months ago

openbmb/VoxCPM2

Text-to-Speech • 2B • Updated Apr 16 • 920k • 1.47k

reacted to anakin87's post with ❤️ 3 months ago

Post

3314

📣 I just published a free course on Reinforcement Learning Environments for Language Models!

📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course

Over the past year, we've seen a shift in LLM Post-Training.
Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs.

Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data.

But what actually are these environments in practice❓ And how do you build them effectively❓

Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models.
I've packaged everything I learned into this short course.

What you'll learn

🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain
🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts
🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments

🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master
🔸 Build the game Environment
🔸 Use it to generate synthetic data for SFT warm-up
🔸 Group-based Reinforcement Learning

If you're interested in building "little worlds" where LLMs can learn, this course is for you.

---

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

1 reply

liked 6 models 3 months ago

liked a model 4 months ago

nickprock/Gemma3-1B-CulturaViva-ITA

Text Generation • 1.0B • Updated Mar 8 • 57 • 3

liked a model 10 months ago

mistralai/Magistral-Small-2509-GGUF

24B • Updated Sep 18, 2025 • 1.82k • 74

liked a model 12 months ago

onnx-community/Voxtral-Mini-3B-2507-ONNX

Audio-Text-to-Text • Updated Jul 24, 2025 • 400 • 29

upvoted an article about 1 year ago

Article

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

mlabonne

•

Jul 29, 2024

• 373

reacted to mlabonne's post with 👍 over 1 year ago

Post

20014

✂️ AutoAbliteration

I made a Colab notebook to automatically abliterate models.

It's quite general, so you can do interesting stuff like blocking a given language in the model outputs.

💻 Colab: https://colab.research.google.com/drive/1RmLv-pCMBBsQGXQIM8yF-OdCNyoylUR1?usp=sharing

1 reply

liked 2 models over 1 year ago

mistralai/Mistral-Small-24B-Instruct-2501

24B • Updated Jul 28, 2025 • 75.5k • 960

microsoft/phi-4

Text Generation • 15B • Updated Nov 24, 2025 • 854k • • 2.28k

reacted to anakin87's post with 👍 almost 2 years ago

Post

1778

🕵🏻 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐑𝐀𝐆 𝐰𝐢𝐭𝐡 🦙 𝐋𝐥𝐚𝐦𝐚 3.2

I was excited to explore Llama 3.2, but as a simple 🇪🇺 EU guy, I don't have access to Meta's multimodal models 😿

🤔 So I thought: why not challenge the small 3B text model with Agentic RAG?

🎯 The plan:
- Build a system that tries to answer questions using a knowledge base.
- If the documents don't contain the answer, use Web search for additional context.

Check out my experimental notebook here: 📓 https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/llama32_agentic_rag.ipynb

My stack:
🏗️ haystack (https://haystack.deepset.ai/): open-source LLM orchestration framework
🦙 meta-llama/Llama-3.2-3B-Instruct
🦆🌐 free DuckDuckGo API, integrated with Haystack

✨ 𝘛𝘩𝘦 𝘳𝘦𝘴𝘶𝘭𝘵𝘴? 𝘌𝘯𝘤𝘰𝘶𝘳𝘢𝘨𝘪𝘯𝘨 - 𝘢 𝘧𝘦𝘸 𝘮𝘰𝘯𝘵𝘩𝘴 𝘢𝘨𝘰, 𝘵𝘩𝘪𝘴 𝘭𝘦𝘷𝘦𝘭 𝘰𝘧 𝘱𝘦𝘳𝘧𝘰𝘳𝘮𝘢𝘯𝘤𝘦 𝘧𝘳𝘰𝘮 𝘢 𝘴𝘮𝘢𝘭𝘭 𝘮𝘰𝘥𝘦𝘭 𝘸𝘰𝘶𝘭𝘥'𝘷𝘦 𝘣𝘦𝘦𝘯 𝘶𝘯𝘵𝘩𝘪𝘯𝘬𝘢𝘣𝘭𝘦!
This probably reflects the impressive IFEval score of the model (comparable to Llama 3.1 8B).

Fabio Ferrua

AI & ML interests

Recent Activity

Organizations

Marvin73's activity

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth