Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs.
Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data.
But what actually are these environments in practiceโ And how do you build them effectivelyโ
Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. I've packaged everything I learned into this short course.
What you'll learn
๐น Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain ๐น How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts ๐น Common patterns: How to build single-turn, multi-turn, and tool-use environments
๐น Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master ๐ธ Build the game Environment ๐ธ Use it to generate synthetic data for SFT warm-up ๐ธ Group-based Reinforcement Learning
If you're interested in building "little worlds" where LLMs can learn, this course is for you.
I was excited to explore Llama 3.2, but as a simple ๐ช๐บ EU guy, I don't have access to Meta's multimodal models ๐ฟ
๐ค So I thought: why not challenge the small 3B text model with Agentic RAG?
๐ฏ The plan: - Build a system that tries to answer questions using a knowledge base. - If the documents don't contain the answer, use Web search for additional context.
I'm excited to announce that Transformers.js V3 is finally available on NPM! ๐ฅ State-of-the-art Machine Learning for the web, now with WebGPU support! ๐คฏโก๏ธ
Install it from NPM with: ๐๐๐ ๐ @๐๐๐๐๐๐๐๐๐๐๐/๐๐๐๐๐๐๐๐๐๐๐๐