Imperius commited on
Commit
2eff32d
·
verified ·
1 Parent(s): 217b894

LLM-Tank: Gemma-3 270M robot-JSON weights + model card + demo

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ demo.gif filter=lfs diff=lfs merge=lfs -text
37
+ demo.mp4 filter=lfs diff=lfs merge=lfs -text
38
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,126 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: gemma
3
+ base_model: unsloth/gemma-3-270m-it
4
+ language:
5
+ - en
6
+ pipeline_tag: text-generation
7
+ tags:
8
+ - robotics
9
+ - text-to-json
10
+ - instruction-following
11
+ - mujoco
12
+ - gemma3
13
+ library_name: transformers
14
  ---
15
+
16
+ # LLM-Tank — Gemma-3 270M → robot JSON
17
+
18
+ Fine-tuned **Gemma-3 270M** that translates **one free-form English
19
+ instruction** for a tracked robot with a gripper arm into a strict JSON
20
+ command list, executed in a **MuJoCo** simulation.
21
+
22
+ Full pipeline: `text → this model → valid JSON → controller → robot
23
+ drives / grasps`. Code & sim: see the source repository.
24
+
25
+ ![LLM-Tank demo](demo.gif)
26
+
27
+ ## What it outputs
28
+
29
+ A single JSON object `{"commands": [ ... ]}`. Actions:
30
+
31
+ - `move` — `direction` (forward|backward), `distance_m`, `speed?`
32
+ - `turn` — `direction` (left|right), `angle_deg`, `speed?`
33
+ - `stop`, `wait` — `duration_s`
34
+ - `grasp` / `release` — optional `cell` ∈
35
+ `front|front_left|front_right|left|right` (discrete, relative to the
36
+ robot; IK is solved by the controller, **not** the model)
37
+ - out-of-scope / nonsense → `{"commands": []}`
38
+
39
+ The model emits **no coordinates** — only discrete actions/enums (this
40
+ keeps generation reliable and schema-checkable).
41
+
42
+ ## Required input format (IMPORTANT)
43
+
44
+ The model was trained `train == infer` with a **fixed short system
45
+ prompt** folded with the instruction into ONE user turn. You must use
46
+ exactly this:
47
+
48
+ ```python
49
+ import json
50
+ from transformers import AutoModelForCausalLM, AutoTokenizer
51
+
52
+ SYSTEM = ("You translate ONE English instruction for a tracked robot "
53
+ "with a gripper arm into a single JSON object "
54
+ '{"commands":[...]} using actions: move, turn, stop, wait, '
55
+ "grasp, release. Output ONLY the JSON object, no prose, no "
56
+ 'markdown. If the instruction is out of scope or nonsense, '
57
+ 'output {"commands": []}.')
58
+
59
+ tok = AutoTokenizer.from_pretrained("PATH_OR_REPO")
60
+ model = AutoModelForCausalLM.from_pretrained("PATH_OR_REPO",
61
+ torch_dtype="auto",
62
+ device_map="auto")
63
+
64
+ def translate(instruction: str) -> dict:
65
+ user = SYSTEM + "\n\n---\nINSTRUCTION: " + instruction.strip()
66
+ enc = tok.apply_chat_template(
67
+ [{"role": "user", "content": user}],
68
+ tokenize=True, add_generation_prompt=True,
69
+ return_dict=True, return_tensors="pt").to(model.device)
70
+ out = model.generate(**enc, max_new_tokens=160, do_sample=False)
71
+ txt = tok.decode(out[0][enc["input_ids"].shape[1]:],
72
+ skip_special_tokens=True)
73
+ i, j = txt.find("{"), txt.rfind("}")
74
+ try:
75
+ return json.loads(txt[i:j + 1])
76
+ except Exception:
77
+ return {"commands": []} # safe fallback
78
+
79
+ print(translate("go forward 2 meters then turn left"))
80
+ # {"commands": [{"action": "move", "direction": "forward",
81
+ # "distance_m": 2.0}, {"action": "turn", "direction": "left",
82
+ # "angle_deg": 90}]}
83
+ print(translate("pick it up")) # {"commands": [{"action": "grasp"}]}
84
+ print(translate("make me a coffee"))# {"commands": []}
85
+ ```
86
+
87
+ Greedy decoding (`do_sample=False`). The model is ~99% schema-valid
88
+ without constrained decoding; always keep the safe fallback.
89
+
90
+ ## Metrics (held-out val, 352 examples: locomotion + manipulation + OOD)
91
+
92
+ | metric | value |
93
+ | --- | --- |
94
+ | schema_valid_rate | 0.991 |
95
+ | exact_match_rate | 0.943 |
96
+ | action_seq_accuracy | 0.980 |
97
+ | ood_f1 | 0.857 |
98
+ | task_success (MuJoCo, 40) | 0.975 |
99
+
100
+ ## Training
101
+
102
+ Full fine-tuning (not LoRA) of `unsloth/gemma-3-270m-it` on ~3.5k
103
+ synthetic instruction→JSON pairs (generated with 120B models, validated
104
+ against a JSON Schema). fp32, Kaggle T4. Two phases: locomotion, then
105
+ + arm (grasp/release). Details in the source repo (`docs/`).
106
+
107
+ ## Demo
108
+
109
+ `demo.mp4` (in this repo) — ~1 min, two panes: left = command + model
110
+ JSON output, right = the robot acting in MuJoCo (real model + real
111
+ physics, not staged).
112
+
113
+ ## Limitations
114
+
115
+ - No perception: the model can't target objects by name/color, only by
116
+ discrete relative `cell`. Object resolution is spatial (controller
117
+ grabs the nearest graspable body in the chosen cell).
118
+ - English only. Single fixed gripper, minimal custom arm.
119
+ - Designed for the accompanying controller/sim; raw JSON is meaningless
120
+ without it.
121
+
122
+ ## License
123
+
124
+ Weights are a derivative of Google **Gemma-3** — use is governed by the
125
+ [Gemma Terms of Use](https://ai.google.dev/gemma/terms). Accompanying
126
+ code is under its own license (see the source repository).
chat_template.jinja ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {# Unsloth Chat template fixes #}
2
+ {{ bos_token }}
3
+ {%- if messages[0]['role'] == 'system' -%}
4
+ {%- if messages[0]['content'] is string -%}
5
+ {%- set first_user_prefix = messages[0]['content'] + '
6
+
7
+ ' -%}
8
+ {%- else -%}
9
+ {%- set first_user_prefix = messages[0]['content'][0]['text'] + '
10
+
11
+ ' -%}
12
+ {%- endif -%}
13
+ {%- set loop_messages = messages[1:] -%}
14
+ {%- else -%}
15
+ {%- set first_user_prefix = "" -%}
16
+ {%- set loop_messages = messages -%}
17
+ {%- endif -%}
18
+ {%- for message in loop_messages -%}
19
+ {%- if (message['role'] == 'user') != (loop.index0 % 2 == 0) -%}
20
+ {{ raise_exception("Conversation roles must alternate user/assistant/user/assistant/...") }}
21
+ {%- endif -%}
22
+ {%- if (message['role'] == 'assistant') -%}
23
+ {%- set role = "model" -%}
24
+ {%- else -%}
25
+ {%- set role = message['role'] -%}
26
+ {%- endif -%}
27
+ {{ '<start_of_turn>' + role + '
28
+ ' + (first_user_prefix if loop.first else "") }}
29
+ {%- if message['content'] is string -%}
30
+ {{ message['content'] | trim }}
31
+ {%- elif message['content'] is iterable -%}
32
+ {%- for item in message['content'] -%}
33
+ {%- if item['type'] == 'image' -%}
34
+ {{ '<start_of_image>' }}
35
+ {%- elif item['type'] == 'text' -%}
36
+ {{ item['text'] | trim }}
37
+ {%- endif -%}
38
+ {%- endfor -%}
39
+ {%- elif message['content'] is defined -%}
40
+ {{ raise_exception("Invalid content type") }}
41
+ {%- endif -%}
42
+ {{ '<end_of_turn>
43
+ ' }}
44
+ {%- endfor -%}
45
+ {%- if add_generation_prompt -%}
46
+ {{'<start_of_turn>model
47
+ '}}
48
+ {%- endif -%}
49
+
50
+ {# Copyright 2025-present Unsloth. Apache 2.0 License. #}
config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_sliding_window_pattern": 6,
3
+ "architectures": [
4
+ "Gemma3ForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "attn_logit_softcapping": null,
9
+ "bos_token_id": 2,
10
+ "dtype": "bfloat16",
11
+ "eos_token_id": 106,
12
+ "final_logit_softcapping": null,
13
+ "head_dim": 256,
14
+ "hidden_activation": "gelu_pytorch_tanh",
15
+ "hidden_size": 640,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 2048,
18
+ "layer_types": [
19
+ "sliding_attention",
20
+ "sliding_attention",
21
+ "sliding_attention",
22
+ "sliding_attention",
23
+ "sliding_attention",
24
+ "full_attention",
25
+ "sliding_attention",
26
+ "sliding_attention",
27
+ "sliding_attention",
28
+ "sliding_attention",
29
+ "sliding_attention",
30
+ "full_attention",
31
+ "sliding_attention",
32
+ "sliding_attention",
33
+ "sliding_attention",
34
+ "sliding_attention",
35
+ "sliding_attention",
36
+ "full_attention"
37
+ ],
38
+ "max_position_embeddings": 32768,
39
+ "model_type": "gemma3_text",
40
+ "num_attention_heads": 4,
41
+ "num_hidden_layers": 18,
42
+ "num_key_value_heads": 1,
43
+ "pad_token_id": 0,
44
+ "query_pre_attn_scalar": 256,
45
+ "rms_norm_eps": 1e-06,
46
+ "rope_parameters": {
47
+ "full_attention": {
48
+ "rope_theta": 1000000.0,
49
+ "rope_type": "default"
50
+ },
51
+ "sliding_attention": {
52
+ "rope_theta": 10000.0,
53
+ "rope_type": "default"
54
+ }
55
+ },
56
+ "sliding_window": 512,
57
+ "tie_word_embeddings": true,
58
+ "transformers_version": "5.0.0",
59
+ "unsloth_fixed": true,
60
+ "use_bidirectional_attention": false,
61
+ "use_cache": false,
62
+ "vocab_size": 262144
63
+ }
demo.gif ADDED

Git LFS Details

  • SHA256: c0656d985d3d58c1355333ff3bd9e637c692bdc39bcdf3dd42fb98e1f7ec2154
  • Pointer size: 132 Bytes
  • Size of remote file: 5.46 MB
demo.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2fd62048b546b9947c256b86be554f4aa019124dcd825c72863bb29b73adb32e
3
+ size 8814675
generation_config.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 2,
3
+ "cache_implementation": "hybrid",
4
+ "do_sample": true,
5
+ "eos_token_id": [
6
+ 1,
7
+ 106
8
+ ],
9
+ "max_length": 32768,
10
+ "pad_token_id": 0,
11
+ "top_k": 64,
12
+ "top_p": 0.95,
13
+ "transformers_version": "5.0.0"
14
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b27a0075c15c8436db3e8c6b059247977bcdfe3b95aafa05ad3514a44805a75c
3
+ size 536223056
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a74aefb1dc1340a25f29ab8370384b9ed24b2d921d7749ece7bbcfcfdf00d497
3
+ size 33384443
tokenizer_config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "backend": "tokenizers",
3
+ "boi_token": "<start_of_image>",
4
+ "bos_token": "<bos>",
5
+ "clean_up_tokenization_spaces": false,
6
+ "eoi_token": "<end_of_image>",
7
+ "eos_token": "<end_of_turn>",
8
+ "image_token": "<image_soft_token>",
9
+ "is_local": false,
10
+ "mask_token": "<mask>",
11
+ "model_max_length": 32768,
12
+ "model_specific_special_tokens": {
13
+ "boi_token": "<start_of_image>",
14
+ "eoi_token": "<end_of_image>",
15
+ "image_token": "<image_soft_token>"
16
+ },
17
+ "pad_token": "<pad>",
18
+ "padding_side": "left",
19
+ "sp_model_kwargs": null,
20
+ "spaces_between_special_tokens": false,
21
+ "tokenizer_class": "GemmaTokenizer",
22
+ "unk_token": "<unk>",
23
+ "use_default_system_prompt": false
24
+ }