One-Shot Hero! (python, game, simulation, transform intent to code)

#7
by BingoBird - opened

Hi! Non-Agentic test here performed via batch on 86 small (<13GB models)

THIS ONE IS THE WINNER!
At 2.6bpw it beat all Qwen3.5 all GPT-OSS All GLM-Flash with thinking off.

here's the prompt it solved:

- All balls have the same radius.
- All balls drop from the heptagon center when starting.
- Colors are: #f8b862, #f6ad49, #f39800, #f08300, #ec6d51, #ee7948, #ed6d3d, #ec6800, #ec6800, #ee7800, #eb6238, #ea5506, #ea5506, #eb6101, #e49e61, #e45e32, #e17b34, #dd7a56, #db8449, #d66a35
- The balls should be affected by gravity and friction, and they must be contained within the area of the heptagon by physical collision detection, making the balls bounce off the rotating walls realistically. There should also be collisions between balls.
- The heptagon is spinning around its center, rotating a full cycle once every 5 seconds.
- The heptagon size should be large enough to contain all the balls.
- Do not use the pygame library; implement collision detection algorithms and collision response etc. by yourself. The following python libraries are allowed: tkinter, math, numpy, dataclasses, typing, sys.
- All program code should be put in a single python file, with shebang for execution from bash shell. 

Unfortunately i had a dataloss on the drive but here are the next-best models:

amoral-cogito-14b-Q4_K_M.gguf-test2.py 1 ball, xor lines
amoral-cogito-14b-Q4_K_M.gguf-test.py 0 ball, xorlines
Dobby-Mini-Unhinged-Llama-3.1-8B.i1-Q4_K_M.gguf-test.py one ball no collisions
gemma-3-12b-it-antislop.i1-IQ4_XS.gguf-test2.py  no clearing of balls
gemma-3-12b-it-norm-preserved-biprojected-abliterated.i1-IQ4_XS.gguf-test2.py  Balls outside
gemma-3-amoral-12B-v2.i1-IQ4_NL.gguf-test2.py no heptagon crazy ball physics
gemma-3-amoral-12B-v2.i1-IQ4_NL.gguf-test.py crazy motion/collision
GLM-4.6V-Flash-Q4_K_M.gguf-test2.py no visible heptagon, good ball motion
Mamba-Codestral-7B-v0.1-Q5_0.gguf-test2.py blank screen
Ministral-3-8B-Reasoning-2512-Q5_K_M.gguf-test2.py no balls
Mistral-Small-3.2-24B-Instruct-2506-IQ4_XS-bartowski.gguf-test.py
Mistral-Small-3.2-24B-Instruct-2506.Q4_K_H.gguf-test.py bad collisions
NousResearch_Hermes-4-14B-IQ4_NL.gguf-test2.py only one ball, too slow gravity, too fast spinning
NousResearch_Hermes-4-14B-IQ4_NL.gguf-test.py  one ball, too fast heptagon, too slow ball
Qwen3-4B-Instruct-2507-Q4_K_M.gguf-test2.py  no visible heptagion, too slow ball motion, no collision
Qwen3-4B-Instruct-2507-sombliterated-Q8_0.gguf-test.py no balls, good heptagon motion
Qwen3-Coder-30B-A3B-Instruct-Pruned-Q3_K_M.gguf-test2.py  bad collisions, slow motion

Qwen3-Coder-30B-A3B-Instruct-Q3_K_S-2.69bpw.gguf-test.py BEST PERFECT

This test is from a 1-yr old reddit thread where SOTA online models were tested and fared no better than this, which was perfect except rotation speed was set too high.

Anyway, wow. wow..

https://x0.at/UbDs.py

Only changed the rotation speed....

Wonder how much it costs in GPU time to do byteshape tuned quants?

Sign up or log in to comment