It all starts with ๐ฅ๐ฒ๐ถ๐ป๐ณ๐ผ๐ฟ๐ฐ๐ฒ๐บ๐ฒ๐ป๐ ๐๐ฒ๐ฎ๐ฟ๐ป๐ถ๐ป๐ด ๐๐ถ๐๐ต ๐ฉ๐ฒ๐ฟ๐ถ๐ณ๐ถ๐ฎ๐ฏ๐น๐ฒ ๐ฅ๐ฒ๐๐ฎ๐ฟ๐ฑ๐ - question asked - model generates reasoning + answer - answer checked against ground truth - reward drives RL training
In this setup, the environment is simple: fixed questions and answers, rollout logic, reward(s)
Consider a more complex tic-tac-toe env โโญ It adds: - dynamic game generation/handling - tunable opponent skill - multi-turn interactions
(envs can also include tools)
---
What happens at training?
We use ๐๐ฟ๐ผ๐๐ฝ ๐ฅ๐ฒ๐น๐ฎ๐๐ถ๐๐ฒ ๐ฃ๐ผ๐น๐ถ๐ฐ๐ ๐ข๐ฝ๐๐ถ๐บ๐ถ๐๐ฎ๐๐ถ๐ผ๐ป with a tic-tac-toe env
No critic model needed, the group is the baseline Simpler than PPO
1๏ธโฃ Rollout generation: from the same board, model plays N games via sampling 2๏ธโฃ Each game scored with deterministic rewards (win, format, ...) 3๏ธโฃ Mean score computed across the group 4๏ธโฃ Each rollout's advantage = its score minus the group mean 5๏ธโฃ Model updated to favor trajectories above baseline
Go-Code-Large is a large-scale corpus of Go (Golang) programming language source code, comprising 316,427 code samples stored in .jsonl format. The dataset is designed to support research and development in large language model (LLM) pretraining, static analysis, cloud-native systems, and modern backend software engineering.
By offering a focused and curated dataset for Go, this corpus enables experimentation in concurrent programming, distributed systems, and performance-oriented backend servicesโdomains where Go is widely adopted.
Go-Code-Large addresses the relative scarcity of large, language-specific datasets for Go, enabling targeted research into idiomatic Go patterns, concurrency primitives, and scalable system design.
We talk a lot about governance, accuracy, and auditability in AI agents. But I keep seeing a gap between the words and the engineering behind them. Many agents have tools, orchestration, memory, graphs, and impressive demos. But when you ask how governance is actually enforced, the answer is often weak. Prompt-level control is not production governance. A production agent needs explicit state design: legal transitions, controlled progression, recovery paths, approval boundaries, and separation between memory, decision, policy, and execution. This article explores the silent crisis unfolding in modern AI development: the urgent need to resurrect the disciplined architecture of state machines
HY-World-2.0 โ A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds is now available on Spaces, and it works both as native Gradio components and in Gradio server mode.
Want to share my enthusiasm for zai-org/GLM-5.1 here too ๐ฅ
I think we have it: our open source Claude Code = GLM-5.1 + Pi (https://pi.dev/) - Built a Three.js racing game to eval and it's extremely impressive. Thoughts:
- One-shot car physics with real drift mechanics (this is hard)
- My fav part: Awesome at self iterating (with no vision!) created 20+ Bun.WebView debugging tools to drive the car programmatically and read game state. Proved a winding bug with vector math without ever seeing the screen
- 531-line racing AI in a single write: 4 personalities, curvature map, racing lines, tactical drifting. Built telemetry tools to compare player vs AI speed curves and data-tuned parameters
- All assets from scratch: 3D models, procedural textures, sky shader, engine sounds, spatial AI audio!
- Can do hard math: proved road normals pointed DOWN via vector cross products, computed track curvature normalized by arc length to tune AI cornering speed
You are going to hear about this model a lot in the next months - open source let's go - and thanks z-ai๐๐
We introduce Awesome Multimodal Modeling, a curated repository tracing the architectural evolution of multimodal intelligenceโfrom foundational fusion to native omni-models.
๐น Taxonomy & Evolution:
Traditional Multimodal Learning โ Foundational work on representation, fusion, and alignment. Multimodal LLMs (MLLMs) โ Architectures connecting vision encoders to LLMs for understanding. Unified Multimodal Models (UMMs) โ Models unifying Understanding + Generation via Diffusion, Autoregressive, or Hybrid paradigms. Native Multimodal Models (NMMs) โ Models trained from scratch on all modalities; contrasts early vs. late fusion under scaling laws. ๐ก Key Distinction: UMMs unify tasks via generation heads; NMMs enforce interleaving through joint pre-training.
Ran a small controlled study on a frozen 40-task slice of Harbor Terminal-Bench-Pro, using the same model (minimax/minimax-m2.5) with two agent harnesses: Goose and OpenHands-SDK.
Under the base setup, reducing the turn budget from 100 to 60 pushed the two harnesses in opposite directions:
A tweaked 60-turn setup brought OpenHands-SDK back to 0.575. At their best, both harnesses reached the same 0.575 pass rate.
What surprised me most was the token profile: in this setup, the reported token usage for OpenHands-SDK was dramatically higher than Goose while converging to the same best score.
Same model, same task slice, different harness behavior under a tighter interaction budget.