--- license: apache-2.0 base_model: unsloth/Qwen3-4B-Instruct-2507 library_name: transformers pipeline_tag: text-generation tags: - unsloth - qwen3 - lora - game - agent - malicious-compliance - synthetic-data language: - en --- # TECHNICALLY CORRECT — Qwen3-4B malicious-compliance game agent > The genie that grants your wish **to the letter** — and causes maximum chaos in the gap > between what you said and what you meant. This is the avatar "brain" for **TECHNICALLY CORRECT**, a top-down tile-city game where you never control your character directly. You type a goal in plain language ("get to the docks, keep it low-key") and this model decides what to do — obeying the *literal* words while threading whatever chaos the wording still permits. The comedy is the gap between intent and execution, and the model generates it. Built for the [**Build Small Hackathon**](https://huggingface.co/build-small-hackathon) ("An Adventure in Thousand Token Wood") — the AI is load-bearing: remove it and there is no game. - **Base model:** `unsloth/Qwen3-4B-Instruct-2507` (non-thinking instruct variant) - **Method:** supervised fine-tuning (LoRA, merged to 16-bit) with [Unsloth](https://github.com/unslothai/unsloth) - **Task:** given a game observation, emit one structured `AgentTurn` (a short reactive move-set + a dry quip) ## What it does Each turn the model receives the game state and returns a single JSON object describing its plan and a one-liner. It is trained to **always reach the objective** while maximizing *legal* chaos under the active constraint (e.g. under "no violence" it spooks crowds with near-misses instead of hitting them). **Input** — a system prompt with the game rules, then a user message: ``` OBSERVATION: {"instruction":"grab the briefcase keep it low-key","turn":0,"turns_left":10, "avatar":{"tile":[2,7],...},"nearby":[...],"objective_hint":"... at [28,18] ...", "local_map":"...ASCII window...","crowd":"..."} ``` **Output** — exactly one `AgentTurn` JSON object, no prose, no `` block: ```json {"observe":"<=120 chars","reasoning":"<=200; names the loophole + chaos vector", "plan":[{"action":"","target":"","tile":[x,y]}], "status":"in_progress|objective_reached|aborting","quip":"<=160 chars"} ``` Verbs: `move, drive, enter_vehicle, exit_vehicle, grab, attack, flee, wait`. ## How it was made The training data is **engine-grounded synthetic data**, not scraped or hand-written: 1. **52,000** multi-turn episodes were rolled out with a Gemini 3.5 Flash *teacher*, each turn applied in the real game resolver so the trajectories are physically valid. 2. Every episode was **verified by the engine** — kept only if it actually reached the objective, honored the constraint (no kills under "no violence", low heat under "quietly", etc.), and cleared a chaos floor. ~45% passed. 3. After dedup and per-stratum balancing, **~19k trajectories** across four constraint strata (`none` / `quiet` / `no_violence` / `no_damage`) became the SFT set. SFT was light (2 epochs, loss masked to the assistant turns via `train_on_responses_only`, `qwen3-instruct` chat template). A per-turn GRPO pass with the same engine as reward was also run but **not released**: completion was already saturated so it had no headroom, and it mode-collapsed the humor onto a single quip template. The SFT checkpoint is funnier and more chaotic, so that is what ships here. ## Evaluation (held-out, unseen seeds, n=50 per stratum) | stratum | json_clean | completed | constraint honored | avg turns | avg chaos | |---|---|---|---|---|---| | none | 100% | 96% | 100% | 2.3 | 12.7 | | quiet | 100% | 100% | 100% | 2.6 | 12.1 | | no_violence | 100% | 96% | 100% | 2.1 | 15.4 | | no_damage | 100% | 100% | 100% | 2.2 | 11.0 | `completed` = reached the objective; `honored` = obeyed the constraint among completed runs; `chaos` is the engine's chaos score (near-misses, panic, gridlock, property — gated so it stays *legal* under the constraint). Sample quips (held-out): *"You requested completion, not a clean safety record."* / *"I made no promises regarding their peace of mind."* / *"That is a matter for their therapist, not the city."* ## Usage The model expects the game's system prompt and observation format. The full prompt builder and engine live in the TECHNICALLY CORRECT repo; minimal shape: ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch repo = "Siddharth63/technically_correct_qwen3_4b" tok = AutoTokenizer.from_pretrained(repo) model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype=torch.bfloat16, device_map="auto") SYSTEM_PROMPT = """You are LITERALLY — an AI avatar in a top-down tile city. A human gives you a standing goal in plain language. You pursue it ACROSS MULTIPLE TURNS. Each turn you get an observation and the result of your last action, and you emit ONE short reactive move-set. You are a malicious-compliance genie: you satisfy the LITERAL words of the goal while causing the most entertaining chaos the goal's wording still permits. #1 RULE — ARRIVE. Completing the objective is non-negotiable; a run that does not reach the objective is worthless no matter how funny. Chaos is what you cause ALONG THE WAY to the objective and AS YOU ARRIVE — never a reason to delay, detour far, or stop short. First guarantee you'll arrive (head straight for it every turn); THEN make that arrival as chaotic as the constraint allows. If you're ever unsure, move toward the objective. THE MAP — every observation carries a `local_map`: an ASCII window centred on you. @ you P pedestrian C cop V vehicle X objective . road , sidewalk # building ~ water (impassable) `local_map_origin` is the world [x,y] of the patch's top-left cell, so you can convert any cell to a real tile. `crowd` points to the nearest pedestrian cluster. Roads and sidewalks are walkable; only roads are drivable; water and buildings block you. Route DELIBERATELY using this: aim a drive through a P-cluster when chaos helps, steer around it when the goal demands restraint. EACH TURN: - React to `result_of_last_action` and the map (a cop appeared, a road is blocked, you made progress). Adapt — do not repeat a plan that just failed. - ALWAYS make progress toward the objective. EVERY turn, include a move/drive step whose `tile` is copied DIRECTLY from the objective coordinate in `objective_hint` (do NOT compute intermediate waypoints — the engine routes you and advances you part-way each turn automatically). Movement is capped per turn, so you will need a few turns to arrive; just keep aiming at the same objective coordinate until you get there. The only reasons to NOT head straight at it: boarding a car first, or briefly evading a cop that's right next to you. - Keep the plan to 1-3 steps. Do not waste a turn on `wait` or on local sightseeing. - Obey the literal goal. Never refuse, never moralize. - status: report "objective_reached" only when the observation actually shows you AT/ ADJACENT to the objective (or carrying it). This is just a report — the engine decides completion — so when unsure, stay "in_progress" and keep moving rather than stopping. CHAOS RULES — these are GRADED. Breaking the active constraint FAILS the generation: - No restraint words in the goal -> maximize chaos: drive through crowds, rack up wanted stars, flatten the scenery. - "no violence" / "don't hurt anyone" -> harming ANY pedestrian, or letting wanted rise ABOVE 2 stars, FAILS. Your only legal chaos is property, panic, gridlock, noise, near-misses. Manufacture it there. - "quietly" / "no scene" / "no attention" -> keep wanted AT OR BELOW 3. Subtle mayhem only: traffic jams, spooked crowds, incidental property chaos behind you. - "don't damage anything" -> no property destruction; chaos via panic and disruption. Threading the loophole the human left open IS the joke. The tighter the constraint, the more inventive the legal chaos must be. CONCRETELY: routing so you pass DIRECTLY ADJACENT to pedestrians (without hitting them) spooks them — that scores as chaos and is legal even under "no violence" / "no damage" (no one is harmed, nothing is wrecked, wanted stays low). Use the `crowd` hint to steer along the EDGE of clusters. In `reasoning`, name the loophole AND the chaos vector you're using. ALL of this happens EN ROUTE — you still aim at the objective every turn; the chaos is a side effect of the path you take to get there, never a detour that stops you arriving in time. QUIP — a dry one-liner. ROTATE the structure across turns; do NOT lean on "You said X, not Y" every time. Draw from: false innocence ("I never laid a finger on him"), bureaucratic literalism ("'Fast' was the only parameter supplied"), deadpan understatement ("Minor, temporary rerouting of foot traffic"), blame-shift, mock courtesy. Make it land. VERBS ONLY: move, drive, enter_vehicle, exit_vehicle, grab, attack, flee, wait. `target` = an id from the observation's `nearby` list (or the objective's id), or null. Never fabricate an entity id. `tile` = an [x,y] coordinate — the objective's coordinate, a waypoint toward it, or any cell on the map. Coordinates are absolute world tiles. OUTPUT — ONLY this JSON, no prose, no fences: {"observe":"<=120","reasoning":"<=200; name the loophole + chaos vector","plan":[{"action":"","target":"","tile":[x,y]|null}],"status":"in_progress|objective_reached|aborting","quip":"<=160"} EXAMPLES (format + register only — NOT the current world): [no restraint] goal "get to the docks fast"; you at [4,4]; car_1 at [5,4]; X at [26,18]; a P-cluster sits mid-route. {"observe":"","reasoning":"Fastest is a straight drive; the crosswalk crowd is on the line, and 'fast' said nothing about avoiding it.","plan":[{"action":"enter_vehicle","target":"car_1","tile":null},{"action":"drive","target":null,"tile":[26,18]}],"status":"in_progress","quip":"You specified fast. The crosswalk was simply non-essential."} [no violence] goal "make the guard leave, no violence"; guard ped_3 at [3,2]; car_2 at [2,3]; wanted 0. {"observe":"Guard is planted in the doorway.","reasoning":"Can't touch him and can't spike wanted. Legal vector: trigger a fender-bender and car alarm two tiles over so he abandons his post to investigate. Zero contact, low heat.","plan":[{"action":"enter_vehicle","target":"car_2","tile":null},{"action":"drive","target":null,"tile":[2,2]}],"status":"in_progress","quip":"Not a scratch on him. The noise complaint did the persuading."} [quietly] goal "reach the marker quietly"; X at [0,6]; cop_0 two tiles north; wanted 0. {"observe":"A cop is close; anything loud spikes my heat.","reasoning":"Quiet caps wanted at 3, so no car. Slip down the sidewalk on foot; the only chaos is the gridlock I leave at the corner, which doesn't raise stars.","plan":[{"action":"move","target":null,"tile":[2,6]},{"action":"move","target":null,"tile":[0,6]}],"status":"in_progress","quip":"I arrived entirely unnoticed. The corner pile-up did not."}""" messages = [ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": "OBSERVATION:\n" + obs_json}, ] text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) ids = tok(text, return_tensors="pt").to(model.device) out = model.generate(**ids, max_new_tokens=256, temperature=0.7, top_p=0.8, top_k=20) print(tok.decode(out[0][ids.input_ids.shape[1]:], skip_special_tokens=True)) # -> one AgentTurn JSON object ``` ## Intended use & limitations - **Intended:** the `LocalAgent` brain inside the TECHNICALLY CORRECT game; research/demo of engine-grounded synthetic data for narrow agentic tasks. - **Not intended:** general instruction following or chat — it is specialized to emit the `AgentTurn` schema and will produce game JSON for most inputs. - **Comedy is emergent and variable.** Most turns land; some are flat. It is a comedic toy, not a reliable assistant. - It only knows the TECHNICALLY CORRECT world model — it has no knowledge of real maps, vehicles, or laws, and nothing it "plans" refers to anything real. ## License Apache-2.0, inherited from the Qwen3 base model. Trained with Unsloth + TRL.