Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
In a Training Loop 🔄
50
77
191
Stefano Fiorucci
PRO
anakin87
Follow
sjrhuschlee's profile picture
barrontang's profile picture
alenic's profile picture
189 followers
·
89 following
theanakin87
anakin87
stefano-fiorucci
AI & ML interests
Language Models: orchestration, post-training, GRPO, synthetic data... Contributing to Haystack LLM framework 🏗️
Recent Activity
liked
a dataset
7 days ago
VAGOsolutions/SauerkrautLM-Doom-MultiVec-31k
upvoted
an
article
13 days ago
ML Intern Takes Our Post-Training Internship Test
reacted
to
their
post
with ❤️
13 days ago
A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe I took https://huggingface.co/LiquidAI/LFM2-2.6B and trained it through play. 🧑🍳 Here's how: 1️⃣ Build a solid RL env with Verifiers (Prime Intellect) 2️⃣ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env 3️⃣ SFT warm-up to teach format 4️⃣ Group-based RL (CISPO) against opponents making 20-70% random moves 5️⃣ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies Done! Beats GPT-5-mini 🏆 --- 🎮 Play against the model: https://huggingface.co/spaces/anakin87/LFM2-2.6B-mr-tictactoe 🤗 Model: https://huggingface.co/anakin87/LFM2-2.6B-mr-tictactoe 📚 Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course 🤗 Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
View all activity
Organizations
anakin87
's models
21
Sort: Recently updated
anakin87/LFM2-2.6B-mr-tictactoe
Text Generation
•
3B
•
Updated
about 1 month ago
•
49
•
1
anakin87/LFM2-2.6B-ttt-rl-2
Text Generation
•
Updated
about 1 month ago
•
10
anakin87/LFM2-2.6B-ttt-rl-merged
Text Generation
•
3B
•
Updated
about 1 month ago
•
14
anakin87/LFM2-2.6B-ttt-rl
Text Generation
•
Updated
about 1 month ago
•
2
anakin87/LFM2-2.6B-ttt-sft
Text Generation
•
3B
•
Updated
about 1 month ago
•
11
anakin87/Phi-3.5-mini-ITA
Text Generation
•
4B
•
Updated
Mar 24
•
2.51k
•
13
anakin87/Qwen3-0.6B-alphabet-sort-grpo
Text Generation
•
0.6B
•
Updated
Sep 4, 2025
•
8
anakin87/gemma-2-2b-ita-sft
Text Generation
•
3B
•
Updated
Jun 29, 2025
•
2
anakin87/electra-italian-xxl-cased-squad-it
Question Answering
•
0.1B
•
Updated
Jun 29, 2025
•
9
•
8
anakin87/gemma-2b-orpo
Text Generation
•
3B
•
Updated
Jun 29, 2025
•
36
•
•
28
anakin87/qwen-scheduler-7b-grpo
Text Generation
•
Updated
Apr 26, 2025
•
6
anakin87/gemma-2-9b-neogenesis-ita
Text Generation
•
9B
•
Updated
Mar 10, 2025
•
1.07k
•
•
11
anakin87/gemma-2-2b-neogenesis-ita
Text Generation
•
3B
•
Updated
Jan 16, 2025
•
1.08k
•
•
6
anakin87/yo-Llama-3-8B-Instruct
Text Generation
•
8B
•
Updated
Jul 2, 2024
•
8
•
•
7
anakin87/Llama-3-8b-ita-ties
Text Generation
•
8B
•
Updated
May 24, 2024
•
1.09k
•
•
3
anakin87/Llama-3-8b-ita-slerp
Text Generation
•
8B
•
Updated
May 24, 2024
•
1.29k
•
•
1
anakin87/Llama-3-8b-ita-ties-pro
Text Generation
•
8B
•
Updated
May 24, 2024
•
1.3k
•
•
1
anakin87/gemma-2b-orpo-GGUF
3B
•
Updated
Apr 9, 2024
•
11
•
7
anakin87/gorilla-openfunctions-v2-sharded
Text Generation
•
7B
•
Updated
Mar 16, 2024
•
3
anakin87/gorilla-openfunctions-v0-sharded
Text Generation
•
7B
•
Updated
Dec 21, 2023
•
17
•
1
anakin87/zephyr-7b-alpha-sharded
Text Generation
•
7B
•
Updated
Nov 24, 2023
•
19
•
16