High-temperature Text-Generation("Creativity" of the model) and Etical AI Alignment.
Democratizing local inference: reducing hardware requirements. Against the oligopoly of huge and polluting LLMs.
🌍 World Model Bench — does your world model actually think?
FID measures realism. FVD measures smoothness. But neither tells you whether the model understood the scene.
We just released WM Bench — the first benchmark for cognitive intelligence in world models. The core question: when a beast charges from 3 meters away, does the model know to sprint — not walk? Does it respond differently to a human vs an animal? Does it remember the left corridor was blocked two steps ago?
Those are cognitive questions. No existing benchmark asks them. So we built one.
- 👁 P1 Perception (25%) — Can it read the scene? - 🧠 P2 Cognition (45%) — Does it predict threats, escalate emotions, utilize memory? - 🔥 P3 Embodiment (30%) — Does the body respond with the right motion?
All evaluation is via simple JSON I/O — no 3D engine, no special hardware. Any model with an API can participate.
We also built PROMETHEUS as a live reference implementation — runs in your browser on a T4, no install needed. Combines FloodDiffusion motion generation with a LLM cognitive brain (Perceive → Predict → Decide → Act). Scored 726/1000 (Grade B) on Track C — the only directly verified model so far. Submissions from other teams very welcome.
We present a methodology for training small language models on CPU at FP32 precision that achieves capability-per-dollar efficiency orders of magnitude beyond GPU-based training. Across15modelsspanningfournovelarchitecturefamilies—MixtureofAttentions(MoA),cross- architecture fusion (Qemma), swarm intelligence (SAGI), and metric-space causal language models (DiscoverLM)—total compute cost was $24 on a single AMD EPYC 9454P proces- sor. We introduce seven methodological pillars: (1) FP32 precision preservation, with exper- iments demonstrating 5,810×single-operation error and 23,225×compounding error ratio for FP16 at network depth; (2) sparse cognitive architectures where 0.02–7% of parameters activate per token, matching CPU branching rather than GPU SIMD; (3) developmental curriculum training progressing from language to logic to transfer to depth; (4) continuous belt-fed data ingestion eliminating truncation waste; (5) hardware-native optimization for AMD Zen 4 via AOCL/OpenMP/NUMA-aware allocation; (6) self-regulating thermodynamic governance with emergent temperature measurement grounded in L2-star discrepancy; and (7) open-standard compute (AVX2 SIMD at FP32) free of proprietary vendor dependency. We argue that transformers were designed for GPU hardware rather than mathematical optimality, and that architecture designed for geometric correctness—metric-space attention, triangle inequality enforcement, sparse expert routing—naturally favor CPU execution. For sub-2B parameter models, CPU training produces more capable models at a fraction of the cost.
Neural Gas is a classical unsupervised learning algorithm for vector quantization and topology learning, introduced in the early 1990s. It maintains a set of prototype vectors that move through the data space and gradually approximate the underlying distribution by ranking samples and adapting all units accordingly.
While the original formulation is algorithmically elegant, most existing implementations remain procedural and non-differentiable, which limits their integration with modern deep learning systems.
The key idea is to reinterpret the update rules in a way that is compatible with autograd, allowing the algorithm to be embedded inside end-to-end trainable pipelines.
This shift enables several directions that are difficult or impossible with standard implementations:
- joint optimization of Neural Gas with neural networks - inclusion of topology-learning modules inside differentiable models - gradient-based tuning of algorithm parameters - hybrid architectures combining representation learning and vector quantization
The repository provides a clean PyTorch implementation and focuses on making the core mechanism usable as a first-class differentiable component, rather than a standalone preprocessing step.
The goal is to revisit a well-known algorithm and make it compatible with current machine learning workflows, where differentiability is a central constraint rather than an afterthought.
Flux-Klein-KV-Edit-Consistency demo is now available on Spaces. It preserves character identity and delivers high-quality, realistic results after edits. No need for any special prompts, just upload the image, type your prompt, and get the resulting image blazing fast.
➔ Built with Headless Gradio, an alternative to using gr.Blocks for creating the frontend and triggering events, powered by FastAPI + Gradio. You can now design the frontend however you want, with continued support for APIs, MCP, and ZeroGPU.
➔ Gradio Server Mode is now available from gradio@v6.10.0.
To learn more, visit the app page or the respective model pages.