๐๏ธ Smol AI WorldCup: A 4B Model Just Beat 8B โ Here's the Data
We evaluated 18 small language models from 12 makers on 125 questions across 7 languages. The results challenge the assumption that bigger is always better.
โ A 1.3B model fabricates confident fake content 80% of the time when prompted with nonexistent entities. Qwen3 family hits 100% trap detection across all sizes.
โ Qwen3-1.7B (1.2GB) outscores Mistral-7B, Llama-3.1-8B, and DeepSeek-R1-14B. Latest architecture at 1.7B beats older architecture at 14B.
What makes this benchmark different?
Most benchmarks ask "how smart?" โ we measure five axes simultaneously: Size, Honesty, Intelligence, Fast, Thrift (SHIFT). Our ranking metric WCS = sqrt(SHIFT x PIR_norm) rewards models that are both high-quality AND efficient. Smart but massive? Low rank. Tiny but poor? Also low.
The enable_expert_parallel flag hiding the complexity of GroupedGemmParallel + RouterParallel behind a single config is a great DX win โ distributing experts across devices used to require a lot of custom plumbing.
๐ฅ Claw for All: The ultimate all-rounder. Simplifies deployment for both devs & pros with a seamless web/mobile experience. ๐ฅ OpenClaw Launch: Speed is king. Deploy your apps in under 30 seconds with a single click. ๐ฅ ClawTeam: Skip the setup. Get pre-configured AI agent blueprints built specifically for OpenClaw. 4๏ธโฃ vibeclaw: Local-first. Run OpenClaw in your browser sandbox in literally 1 second. 5๏ธโฃ Tinkerclaw: The startup favorite. Zero-code platform to deploy, manage, and scale AI assistants. 6๏ธโฃ ClawWrapper: The "last mile" tool. Simplifies the entire packaging and launch process.
Which one are you adding to your stack? ๐ ๏ธ (Source: OpenClaw Directory)
๐ฅ UPGRADE in Kai: 30B Scaling! ๐ฅ NoesisLab/Kai-30B-Instruct NoesisLab/Kai-30B-Instruct We are incredibly excited to announce that the Kai-30B-Instruct model and its official Space are now LIVE! ๐ If you've been following the journey from Kai-0.35B to Kai-3B, you know we're rethinking how models reason. Tired of verbose, slow Chain-of-Thought (CoT) outputs that flood your screen with self-talk? So are we. Kai-30B-Instruct scales up our Adaptive Dual-Search Distillation (ADS) framework. By bridging classical A* heuristic search with continuous gradient descent , we use an information-theoretic log-barrier to physically prune high-entropy reasoning paths during training. The result? Pure implicit reasoning. The model executes structured logic, arithmetic carries, and branch selections as a reflex in a single forward passโno external scaffolding required. At 3B, we observed a phase transition where the model achieved "logical crystallization". Now, at 30B, we are giving the ADS regularizer the massive representational capacity it needs to tackle higher-order symbolic abstractions and complex reasoning tasks. ๐งช Test Kai yourself in our new Space: NoesisLab/Kai-30B-Instruct ๐ฆ Model Weights: NoesisLab/Kai-30B-Instruct Bring your hardest math, logic, and coding benchmarks. We invite the community to stress-test the limits of the penalty wall! ๐งฑ๐ฅ