--- license: gemma base_model: unsloth/gemma-3-270m-it language: - en pipeline_tag: text-generation tags: - robotics - text-to-json - instruction-following - mujoco - gemma3 library_name: transformers --- # LLM-Tank — Gemma-3 270M → robot JSON Source-code: https://codeberg.org/imperius/llm-tank Fine-tuned **Gemma-3 270M** that translates **one free-form English instruction** for a tracked robot with a gripper arm into a strict JSON command list, executed in a **MuJoCo** simulation. Full pipeline: `text → this model → valid JSON → controller → robot drives / grasps`. Code & sim: see the source repository. ![LLM-Tank demo](demo.gif) ## What it outputs A single JSON object `{"commands": [ ... ]}`. Actions: - `move` — `direction` (forward|backward), `distance_m`, `speed?` - `turn` — `direction` (left|right), `angle_deg`, `speed?` - `stop`, `wait` — `duration_s` - `grasp` / `release` — optional `cell` ∈ `front|front_left|front_right|left|right` (discrete, relative to the robot; IK is solved by the controller, **not** the model) - out-of-scope / nonsense → `{"commands": []}` The model emits **no coordinates** — only discrete actions/enums (this keeps generation reliable and schema-checkable). ## Required input format (IMPORTANT) The model was trained `train == infer` with a **fixed short system prompt** folded with the instruction into ONE user turn. You must use exactly this: ```python import json from transformers import AutoModelForCausalLM, AutoTokenizer SYSTEM = ("You translate ONE English instruction for a tracked robot " "with a gripper arm into a single JSON object " '{"commands":[...]} using actions: move, turn, stop, wait, ' "grasp, release. Output ONLY the JSON object, no prose, no " 'markdown. If the instruction is out of scope or nonsense, ' 'output {"commands": []}.') tok = AutoTokenizer.from_pretrained("PATH_OR_REPO") model = AutoModelForCausalLM.from_pretrained("PATH_OR_REPO", torch_dtype="auto", device_map="auto") def translate(instruction: str) -> dict: user = SYSTEM + "\n\n---\nINSTRUCTION: " + instruction.strip() enc = tok.apply_chat_template( [{"role": "user", "content": user}], tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt").to(model.device) out = model.generate(**enc, max_new_tokens=160, do_sample=False) txt = tok.decode(out[0][enc["input_ids"].shape[1]:], skip_special_tokens=True) i, j = txt.find("{"), txt.rfind("}") try: return json.loads(txt[i:j + 1]) except Exception: return {"commands": []} # safe fallback print(translate("go forward 2 meters then turn left")) # {"commands": [{"action": "move", "direction": "forward", # "distance_m": 2.0}, {"action": "turn", "direction": "left", # "angle_deg": 90}]} print(translate("pick it up")) # {"commands": [{"action": "grasp"}]} print(translate("make me a coffee"))# {"commands": []} ``` Greedy decoding (`do_sample=False`). The model is ~99% schema-valid without constrained decoding; always keep the safe fallback. ## Metrics (held-out val, 352 examples: locomotion + manipulation + OOD) | metric | value | | --- | --- | | schema_valid_rate | 0.991 | | exact_match_rate | 0.943 | | action_seq_accuracy | 0.980 | | ood_f1 | 0.857 | | task_success (MuJoCo, 40) | 0.975 | ## Training Full fine-tuning (not LoRA) of `unsloth/gemma-3-270m-it` on ~3.5k synthetic instruction→JSON pairs (generated with 120B models, validated against a JSON Schema). fp32, Kaggle T4. Two phases: locomotion, then + arm (grasp/release). Details in the source repo (`docs/`). ## Demo `demo.mp4` (in this repo) — ~1 min, two panes: left = command + model JSON output, right = the robot acting in MuJoCo (real model + real physics, not staged). ## Limitations - No perception: the model can't target objects by name/color, only by discrete relative `cell`. Object resolution is spatial (controller grabs the nearest graspable body in the chosen cell). - English only. Single fixed gripper, minimal custom arm. - Designed for the accompanying controller/sim; raw JSON is meaningless without it. ## License Weights are a derivative of Google **Gemma-3** — use is governed by the [Gemma Terms of Use](https://ai.google.dev/gemma/terms). Accompanying code is under its own license (see the source repository).