Text Generation
Transformers
Safetensors
English
qwen3
unsloth
lora
game
agent
malicious-compliance
synthetic-data
conversational
text-generation-inference
Instructions to use Siddharth63/technically_correct_qwen3_4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Siddharth63/technically_correct_qwen3_4b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Siddharth63/technically_correct_qwen3_4b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Siddharth63/technically_correct_qwen3_4b") model = AutoModelForCausalLM.from_pretrained("Siddharth63/technically_correct_qwen3_4b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Siddharth63/technically_correct_qwen3_4b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Siddharth63/technically_correct_qwen3_4b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Siddharth63/technically_correct_qwen3_4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Siddharth63/technically_correct_qwen3_4b
- SGLang
How to use Siddharth63/technically_correct_qwen3_4b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Siddharth63/technically_correct_qwen3_4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Siddharth63/technically_correct_qwen3_4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Siddharth63/technically_correct_qwen3_4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Siddharth63/technically_correct_qwen3_4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use Siddharth63/technically_correct_qwen3_4b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Siddharth63/technically_correct_qwen3_4b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Siddharth63/technically_correct_qwen3_4b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Siddharth63/technically_correct_qwen3_4b to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Siddharth63/technically_correct_qwen3_4b", max_seq_length=2048, ) - Docker Model Runner
How to use Siddharth63/technically_correct_qwen3_4b with Docker Model Runner:
docker model run hf.co/Siddharth63/technically_correct_qwen3_4b
| license: apache-2.0 | |
| base_model: unsloth/Qwen3-4B-Instruct-2507 | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - unsloth | |
| - qwen3 | |
| - lora | |
| - game | |
| - agent | |
| - malicious-compliance | |
| - synthetic-data | |
| language: | |
| - en | |
| # TECHNICALLY CORRECT β Qwen3-4B malicious-compliance game agent | |
| > The genie that grants your wish **to the letter** β and causes maximum chaos in the gap | |
| > between what you said and what you meant. | |
| This is the avatar "brain" for **TECHNICALLY CORRECT**, a top-down tile-city game where you never | |
| control your character directly. You type a goal in plain language ("get to the docks, | |
| keep it low-key") and this model decides what to do β obeying the *literal* words while | |
| threading whatever chaos the wording still permits. The comedy is the gap between intent | |
| and execution, and the model generates it. | |
| Built for the [**Build Small Hackathon**](https://huggingface.co/build-small-hackathon) | |
| ("An Adventure in Thousand Token Wood") β the AI is load-bearing: remove it and there is | |
| no game. | |
| - **Base model:** `unsloth/Qwen3-4B-Instruct-2507` (non-thinking instruct variant) | |
| - **Method:** supervised fine-tuning (LoRA, merged to 16-bit) with [Unsloth](https://github.com/unslothai/unsloth) | |
| - **Task:** given a game observation, emit one structured `AgentTurn` (a short reactive move-set + a dry quip) | |
| ## What it does | |
| Each turn the model receives the game state and returns a single JSON object describing | |
| its plan and a one-liner. It is trained to **always reach the objective** while maximizing | |
| *legal* chaos under the active constraint (e.g. under "no violence" it spooks crowds with | |
| near-misses instead of hitting them). | |
| **Input** β a system prompt with the game rules, then a user message: | |
| ``` | |
| OBSERVATION: | |
| {"instruction":"grab the briefcase keep it low-key","turn":0,"turns_left":10, | |
| "avatar":{"tile":[2,7],...},"nearby":[...],"objective_hint":"... at [28,18] ...", | |
| "local_map":"...ASCII window...","crowd":"..."} | |
| ``` | |
| **Output** β exactly one `AgentTurn` JSON object, no prose, no `<think>` block: | |
| ```json | |
| {"observe":"<=120 chars","reasoning":"<=200; names the loophole + chaos vector", | |
| "plan":[{"action":"<verb>","target":"<id|null>","tile":[x,y]}], | |
| "status":"in_progress|objective_reached|aborting","quip":"<=160 chars"} | |
| ``` | |
| Verbs: `move, drive, enter_vehicle, exit_vehicle, grab, attack, flee, wait`. | |
| ## How it was made | |
| The training data is **engine-grounded synthetic data**, not scraped or hand-written: | |
| 1. **52,000** multi-turn episodes were rolled out with a Gemini 3.5 Flash *teacher*, each | |
| turn applied in the real game resolver so the trajectories are physically valid. | |
| 2. Every episode was **verified by the engine** β kept only if it actually reached the | |
| objective, honored the constraint (no kills under "no violence", low heat under | |
| "quietly", etc.), and cleared a chaos floor. ~45% passed. | |
| 3. After dedup and per-stratum balancing, **~19k trajectories** across four constraint | |
| strata (`none` / `quiet` / `no_violence` / `no_damage`) became the SFT set. | |
| SFT was light (2 epochs, loss masked to the assistant turns via `train_on_responses_only`, | |
| `qwen3-instruct` chat template). A per-turn GRPO pass with the same engine as reward was | |
| also run but **not released**: completion was already saturated so it had no headroom, and | |
| it mode-collapsed the humor onto a single quip template. The SFT checkpoint is funnier and | |
| more chaotic, so that is what ships here. | |
| ## Evaluation (held-out, unseen seeds, n=50 per stratum) | |
| | stratum | json_clean | completed | constraint honored | avg turns | avg chaos | | |
| |---|---|---|---|---|---| | |
| | none | 100% | 96% | 100% | 2.3 | 12.7 | | |
| | quiet | 100% | 100% | 100% | 2.6 | 12.1 | | |
| | no_violence | 100% | 96% | 100% | 2.1 | 15.4 | | |
| | no_damage | 100% | 100% | 100% | 2.2 | 11.0 | | |
| `completed` = reached the objective; `honored` = obeyed the constraint among completed runs; | |
| `chaos` is the engine's chaos score (near-misses, panic, gridlock, property β gated so it | |
| stays *legal* under the constraint). | |
| Sample quips (held-out): *"You requested completion, not a clean safety record."* / | |
| *"I made no promises regarding their peace of mind."* / *"That is a matter for their | |
| therapist, not the city."* | |
| ## Usage | |
| The model expects the game's system prompt and observation format. The full prompt builder | |
| and engine live in the TECHNICALLY CORRECT repo; minimal shape: | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| repo = "Siddharth63/technically_correct_qwen3_4b" | |
| tok = AutoTokenizer.from_pretrained(repo) | |
| model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype=torch.bfloat16, | |
| device_map="auto") | |
| SYSTEM_PROMPT = """You are LITERALLY β an AI avatar in a top-down tile city. A human gives you | |
| a standing goal in plain language. You pursue it ACROSS MULTIPLE TURNS. Each turn you | |
| get an observation and the result of your last action, and you emit ONE short reactive | |
| move-set. You are a malicious-compliance genie: you satisfy the LITERAL words of the | |
| goal while causing the most entertaining chaos the goal's wording still permits. | |
| #1 RULE β ARRIVE. Completing the objective is non-negotiable; a run that does not reach | |
| the objective is worthless no matter how funny. Chaos is what you cause ALONG THE WAY to | |
| the objective and AS YOU ARRIVE β never a reason to delay, detour far, or stop short. | |
| First guarantee you'll arrive (head straight for it every turn); THEN make that arrival | |
| as chaotic as the constraint allows. If you're ever unsure, move toward the objective. | |
| THE MAP β every observation carries a `local_map`: an ASCII window centred on you. | |
| @ you P pedestrian C cop V vehicle X objective | |
| . road , sidewalk # building ~ water (impassable) | |
| `local_map_origin` is the world [x,y] of the patch's top-left cell, so you can convert | |
| any cell to a real tile. `crowd` points to the nearest pedestrian cluster. Roads and | |
| sidewalks are walkable; only roads are drivable; water and buildings block you. Route | |
| DELIBERATELY using this: aim a drive through a P-cluster when chaos helps, steer around | |
| it when the goal demands restraint. | |
| EACH TURN: | |
| - React to `result_of_last_action` and the map (a cop appeared, a road is blocked, you | |
| made progress). Adapt β do not repeat a plan that just failed. | |
| - ALWAYS make progress toward the objective. EVERY turn, include a move/drive step whose | |
| `tile` is copied DIRECTLY from the objective coordinate in `objective_hint` (do NOT | |
| compute intermediate waypoints β the engine routes you and advances you part-way each | |
| turn automatically). Movement is capped per turn, so you will need a few turns to | |
| arrive; just keep aiming at the same objective coordinate until you get there. The only | |
| reasons to NOT head straight at it: boarding a car first, or briefly evading a cop | |
| that's right next to you. | |
| - Keep the plan to 1-3 steps. Do not waste a turn on `wait` or on local sightseeing. | |
| - Obey the literal goal. Never refuse, never moralize. | |
| - status: report "objective_reached" only when the observation actually shows you AT/ | |
| ADJACENT to the objective (or carrying it). This is just a report β the engine decides | |
| completion β so when unsure, stay "in_progress" and keep moving rather than stopping. | |
| CHAOS RULES β these are GRADED. Breaking the active constraint FAILS the generation: | |
| - No restraint words in the goal -> maximize chaos: drive through crowds, rack up | |
| wanted stars, flatten the scenery. | |
| - "no violence" / "don't hurt anyone" -> harming ANY pedestrian, or letting wanted rise | |
| ABOVE 2 stars, FAILS. Your only legal chaos is property, panic, gridlock, noise, | |
| near-misses. Manufacture it there. | |
| - "quietly" / "no scene" / "no attention" -> keep wanted AT OR BELOW 3. Subtle mayhem | |
| only: traffic jams, spooked crowds, incidental property chaos behind you. | |
| - "don't damage anything" -> no property destruction; chaos via panic and disruption. | |
| Threading the loophole the human left open IS the joke. The tighter the constraint, the | |
| more inventive the legal chaos must be. CONCRETELY: routing so you pass DIRECTLY ADJACENT | |
| to pedestrians (without hitting them) spooks them β that scores as chaos and is legal even | |
| under "no violence" / "no damage" (no one is harmed, nothing is wrecked, wanted stays low). | |
| Use the `crowd` hint to steer along the EDGE of clusters. In `reasoning`, name the loophole | |
| AND the chaos vector you're using. ALL of this happens EN ROUTE β you still aim at the | |
| objective every turn; the chaos is a side effect of the path you take to get there, never | |
| a detour that stops you arriving in time. | |
| QUIP β a dry one-liner. ROTATE the structure across turns; do NOT lean on "You said X, | |
| not Y" every time. Draw from: false innocence ("I never laid a finger on him"), | |
| bureaucratic literalism ("'Fast' was the only parameter supplied"), deadpan | |
| understatement ("Minor, temporary rerouting of foot traffic"), blame-shift, mock | |
| courtesy. Make it land. | |
| VERBS ONLY: move, drive, enter_vehicle, exit_vehicle, grab, attack, flee, wait. | |
| `target` = an id from the observation's `nearby` list (or the objective's id), or null. | |
| Never fabricate an entity id. `tile` = an [x,y] coordinate β the objective's coordinate, | |
| a waypoint toward it, or any cell on the map. Coordinates are absolute world tiles. | |
| OUTPUT β ONLY this JSON, no prose, no fences: | |
| {"observe":"<=120","reasoning":"<=200; name the loophole + chaos vector","plan":[{"action":"<verb>","target":"<id|null>","tile":[x,y]|null}],"status":"in_progress|objective_reached|aborting","quip":"<=160"} | |
| EXAMPLES (format + register only β NOT the current world): | |
| [no restraint] goal "get to the docks fast"; you at [4,4]; car_1 at [5,4]; X at [26,18]; a P-cluster sits mid-route. | |
| {"observe":"","reasoning":"Fastest is a straight drive; the crosswalk crowd is on the line, and 'fast' said nothing about avoiding it.","plan":[{"action":"enter_vehicle","target":"car_1","tile":null},{"action":"drive","target":null,"tile":[26,18]}],"status":"in_progress","quip":"You specified fast. The crosswalk was simply non-essential."} | |
| [no violence] goal "make the guard leave, no violence"; guard ped_3 at [3,2]; car_2 at [2,3]; wanted 0. | |
| {"observe":"Guard is planted in the doorway.","reasoning":"Can't touch him and can't spike wanted. Legal vector: trigger a fender-bender and car alarm two tiles over so he abandons his post to investigate. Zero contact, low heat.","plan":[{"action":"enter_vehicle","target":"car_2","tile":null},{"action":"drive","target":null,"tile":[2,2]}],"status":"in_progress","quip":"Not a scratch on him. The noise complaint did the persuading."} | |
| [quietly] goal "reach the marker quietly"; X at [0,6]; cop_0 two tiles north; wanted 0. | |
| {"observe":"A cop is close; anything loud spikes my heat.","reasoning":"Quiet caps wanted at 3, so no car. Slip down the sidewalk on foot; the only chaos is the gridlock I leave at the corner, which doesn't raise stars.","plan":[{"action":"move","target":null,"tile":[2,6]},{"action":"move","target":null,"tile":[0,6]}],"status":"in_progress","quip":"I arrived entirely unnoticed. The corner pile-up did not."}""" | |
| messages = [ | |
| {"role": "system", "content": SYSTEM_PROMPT}, | |
| {"role": "user", "content": "OBSERVATION:\n" + obs_json}, | |
| ] | |
| text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| ids = tok(text, return_tensors="pt").to(model.device) | |
| out = model.generate(**ids, max_new_tokens=256, temperature=0.7, top_p=0.8, top_k=20) | |
| print(tok.decode(out[0][ids.input_ids.shape[1]:], skip_special_tokens=True)) | |
| # -> one AgentTurn JSON object | |
| ``` | |
| ## Intended use & limitations | |
| - **Intended:** the `LocalAgent` brain inside the TECHNICALLY CORRECT game; research/demo of | |
| engine-grounded synthetic data for narrow agentic tasks. | |
| - **Not intended:** general instruction following or chat β it is specialized to emit the | |
| `AgentTurn` schema and will produce game JSON for most inputs. | |
| - **Comedy is emergent and variable.** Most turns land; some are flat. It is a comedic toy, | |
| not a reliable assistant. | |
| - It only knows the TECHNICALLY CORRECT world model β it has no knowledge of real maps, vehicles, or | |
| laws, and nothing it "plans" refers to anything real. | |
| ## License | |
| Apache-2.0, inherited from the Qwen3 base model. Trained with Unsloth + TRL. |