| --- |
| license: apache-2.0 |
| language: |
| - en |
| tags: |
| - robotics |
| - instruction-following |
| - structured-generation |
| - text-to-json |
| - ros |
| - ros2 |
| - sparse-transformer |
| - embedded-ai |
| - on-device |
| - temporal-control |
| - control-loop |
| pipeline_tag: text-generation |
| inference: false |
| --- |
| |
| # Foros Robotics Action Engine |
|
|
| **Foros** is an ultra-compact **10M parameter** instruction-to-JSON model designed |
| for low-latency, on-device robotics control. It translates plain-English robot |
| commands β including **temporal loops, timed sequences, and FSM transitions** β |
| directly into structured JSON arrays of operations compatible with **ROS / ROS2** |
| and major industrial robot controllers (URScript, KRL, RAPID, Fanuc, DRL). |
|
|
| Developed by **AMEFORGE** β https://huggingface.co/AMFORGE. |
| Built on the in-house **SparseMind** architecture (sparse token attention, |
| sparse channel FFN, dynamic neuron typing). |
|
|
| **Current version: v5.10** β production-ready, deployed on CPU / Jetson / Raspberry Pi 4. |
|
|
| --- |
|
|
| ## Benchmark Results (Held-Out, 142 Curated Examples) |
|
|
| Foros is evaluated on a **held-out** test suite of 142 hand-curated robotics commands |
| spanning 5 difficulty tiers. None of these prompts appear in the training corpus. |
| All measurements taken on Kaggle T4 GPU, greedy decoding. |
|
|
| ### Per-Tier Breakdown β v5.10 |
|
|
| | Tier | Description | N | Valid JSON | Op Correct | **Exact Match** | |
| |---|---|---:|---:|---:|---:| |
| | **Tier 1** | Paraphrase (novel templates) | 32 | 100.0% | 100.0% | **100.0%** | |
| | **Tier 2** | Informal (natural language) | 29 | 100.0% | 96.6% | **93.1%** | |
| | **Tier 3** | Typos & noise robustness | 30 | 100.0% | 80.0% | **43.3%** | |
| | **Tier 4** | Multi-step sequences | 22 | 100.0% | 100.0% | **72.7%** | |
| | **Tier 5** | Long chains & temporal loops | 29 | 96.6% | 96.6% | **69.0%** | |
| | **Global (weighted)** | | **142** | **99.3%** | **94.4%** | **76.1%** | |
|
|
| ### Version Trajectory |
|
|
| We report Exact Match on the held-out benchmark across successive iterations to |
| document architectural and data improvements transparently: |
|
|
| | Version | Held-Out Exact Match | Held-Out JSON Valid | Notable Change | |
| |---|---:|---:|---| |
| | v5.4 (baseline) | 62.7% | ~99% | Initial production deployment | |
| | v5.9 | 63.4% | 100.0% | Numerical precision standardization, deterministic pick-and-place targets | |
| | **v5.10** | **76.1%** | **99.3%** | Refactored conditional templates, expanded informal/imperative vocabulary, integer-form numerical prompts | |
|
|
| ### Head-to-Head Comparison (All Measured) |
|
|
| All baselines evaluated on the same 142-example held-out benchmark, same hardware |
| (Kaggle T4 GPU), same scoring rubric, greedy decoding. |
|
|
| | Model | Exact Match | Valid JSON | Op Correct | Latency (avg) | Size | |
| |---|---:|---:|---:|---:|---:| |
| | π **Foros v5.10 β AMEFORGE** | **76.1%** | **99.3%** | **94.4%** | **508 ms** | **39.6 MB** | |
| | Qwen2.5-1.5B-Instruct | 28.0% | 60.0% | 44.0% | 1{,}998 ms | 2{,}944 MB | |
| | Qwen2.5-0.5B-Instruct | 18.0% | 46.0% | 24.0% | 3{,}766 ms | 942 MB | |
| | TinyLlama-1.1B-Chat | 6.0% | 22.0% | 10.0% | 7{,}315 ms | 2{,}098 MB | |
| | SmolLM2-360M-Instruct | 0.0% | 6.0% | 2.0% | 5{,}884 ms | 690 MB | |
|
|
| **Key takeaways:** |
|
|
| - Foros reaches **76.1% exact match** on held-out robotics commands. The best |
| general-purpose small LM evaluated (Qwen2.5-1.5B, ~150Γ larger) reaches only |
| **28.0%** β Foros outperforms it by **+48 percentage points**. |
| - The smallest comparable general-purpose LM (SmolLM2-360M, ~36Γ larger) |
| reaches **0.0% exact match** and only 6.0% valid JSON, indicating that |
| general-purpose small models struggle even to produce syntactically valid |
| output on this task. |
| - **4Γ lower latency** than Qwen2.5-1.5B, **14Γ lower** than TinyLlama-1.1B. |
| - **17Γ smaller** than the smallest competitive baseline (SmolLM2-360M), |
| **74Γ smaller** than Qwen2.5-1.5B. |
| - Runs on Raspberry Pi 4, Jetson Nano/Orin, or any embedded CPU. No GPU required |
| for inference, no cloud dependency, no telemetry. |
|
|
| **Latency profile** β atomic commands (Tier 1β3) run at **~305 ms**, compound |
| sequences (Tier 4β5) at **~860 ms**. The 508 ms average reflects the full |
| benchmark distribution including long temporal loops. |
|
|
| --- |
|
|
| ## What it does |
|
|
| ### Atomic Commands |
|
|
| | Natural Language Input | Structured Output (ROS JSON) | |
| |---|---| |
| | `move to x=0.5 y=-1.2 z=0.8` | `[{"op":"move","x":0.5,"y":-1.2,"z":0.8}]` | |
| | `rotate joints to [0.0, 45.0, 90.0, 0.0, 0.0, 0.0]` | `[{"op":"joint_move","joints":[0.0,45.0,90.0,0.0,0.0,0.0]}]` | |
| | `close gripper with force 0.75` | `[{"op":"gripper","action":"close","force":0.75}]` | |
| | `wait for 3.5 seconds` | `[{"op":"wait","seconds":3.5}]` | |
| | `set velocity to 0.75 m/s` | `[{"op":"speed","velocity":0.75}]` | |
| | `halt all motion` | `[{"op":"stop"}]` | |
| | `upon sensor_trip return to home position` | `[{"op":"safety","cond":"sensor_trip","then":[{"op":"home"}]}]` | |
|
|
| ### Temporal / Loop Commands |
|
|
| | Natural Language Input | Structured Output | |
| |---|---| |
| | `repeat 5 times: move arm` | `[{"op":"repeat","times":5,"body":[...]}]` | |
| | `keep doing move arm until obstacle` | `[{"op":"repeat_until","cond":"obstacle","body":[...]}]` | |
| | `run control loop at 100Hz for 2.5 seconds` | `[{"op":"control_loop","frequency_hz":100,"duration_s":2.5,"body":[...]}]` | |
| | `every 0.5s do rotate joints for 4 steps` | `[{"op":"timed_seq","interval_s":0.5,"count":4,"body":[...]}]` | |
| | `simultaneously move arm and set speed` | `[{"op":"parallel","branches":[[...],[...]]}]` | |
|
|
| ### Complex Sequences (Multi-step planning) |
|
|
| ``` |
| Input: pick up the red_box at 0.5 0.5 0.0 and place it at -0.5 1.0 0.0 |
| |
| Output: [ |
| {"op":"move","x":0.5,"y":0.5,"z":0.0}, |
| {"op":"gripper","action":"close"}, |
| {"op":"move","x":0.5,"y":0.5,"z":0.2}, |
| {"op":"move","x":-0.5,"y":1.0,"z":0.2}, |
| {"op":"move","x":-0.5,"y":1.0,"z":0.0}, |
| {"op":"gripper","action":"open"}, |
| {"op":"move","x":-0.5,"y":1.0,"z":0.2} |
| ] |
| ``` |
|
|
| --- |
|
|
| ## Supported Operations |
|
|
| | Category | Operations | |
| |---|---| |
| | **Motion** | `move`, `joint_move`, `move_tcp`, `move_joint`, `home`, `trajectory` | |
| | **End Effector** | `gripper`, `tool`, `get_joint_values` | |
| | **Control Flow** | `wait`, `safety`, `stop`, `repeat`, `repeat_until` | |
| | **Temporal** | `timed_seq`, `control_loop`, `parallel`, `state_transition` | |
|
|
| --- |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |---|---| |
| | Architecture | SparseMind (decoder-only, sparse attention + sparse FFN + dynamic neuron typing) | |
| | Parameters | 10,347,395 (~10.3 M) | |
| | Hidden size / Layers / Heads | 256 / 6 / 8 | |
| | Context length | 384 tokens | |
| | Tokenizer | In-house domain-specific BPE, vocab 3,000, atomic numerical tokens | |
| | Precision | FP32 | |
| | Model size | 39.6 MB | |
|
|
| --- |
|
|
| ## Training Methodology |
|
|
| Foros is trained on a **hybrid corpus** combining: |
|
|
| - **Programmatic synthetic data** covering all supported operations, with |
| paraphrastic variations (formal, informal, imperative tones), numerical |
| precision variants, and compositional sequences of varying depth. |
| - **Curated production logs** β anonymized real-world prompts collected from |
| deployed instances, with manually verified ground-truth JSON targets. |
| - **Iterative refinement** β successive versions (v5.4 β v5.9 β v5.10) integrate |
| fixes derived from systematic failure analysis on the held-out benchmark. |
|
|
| Training is conducted from scratch (no pre-trained checkpoint) on a single T4 |
| GPU in approximately 4 hours. |
|
|
| Detailed corpus composition, generator weights, and hyperparameter schedules |
| are proprietary to AMEFORGE. |
|
|
| --- |
|
|
| ## Known Limitations |
|
|
| - **Typo robustness** β Tier 3 sits at 43.3% exact match. Severely mangled |
| tokens (e.g., `mvoe` instead of `move`) can degrade numerical extraction. |
| A typo-aware fine-tune is planned for v5.11. |
| - **Relative motion** β Foros operates on **absolute coordinates**. Prompts like |
| `move left by 20 cm` are out of domain and should be resolved by an upstream |
| natural-language pre-processor that converts them to absolute positions. |
| - **Open-ended planning** β Foros is a structured *translator*, not a planner. |
| For multi-step reasoning beyond literal sequencing, pair it with an upstream |
| planner. |
| - **Numerical fidelity in low-confidence contexts** β when the prompt vocabulary |
| is unfamiliar, the model may default to in-distribution coordinate values. |
| For coordinate-critical operations in production, we recommend a lightweight |
| regex post-processor that re-injects explicit numerical values from the prompt |
| as a safety net. |
|
|
| --- |
|
|
| ## Local Inference |
|
|
| ```python |
| import os |
| import torch |
| import sentencepiece as spm |
| from huggingface_hub import hf_hub_download |
| |
| # Download model weights (public) |
| model_file = hf_hub_download(repo_id="AMFORGE/foros", filename="foros.pt") |
| |
| # Download tokenizer (gated β set HF_TOKEN environment variable) |
| tok_file = hf_hub_download( |
| repo_id="AMFORGE/foros_tok", |
| filename="sparsforos_tokenizer.model", |
| token=os.environ.get("HF_TOKEN"), |
| ) |
| |
| # Tokenizer |
| sp = spm.SentencePieceProcessor() |
| sp.Load(tok_file) |
| |
| # Model β requires the SparseMind reference implementation |
| # (available with the tokenizer via AMEFORGE on request) |
| from sparsemind_robotics_train import SparseMind, Config |
| |
| ckpt = torch.load(model_file, map_location="cpu", weights_only=False) |
| cfg = Config(**{k: v for k, v in ckpt["config"].items() |
| if k in Config.__dataclass_fields__}) |
| model = SparseMind(cfg) |
| model.load_state_dict(ckpt["model"]) |
| model.eval() |
| |
| # Inference β greedy decoding recommended for production |
| prompt = "move to x=0.5 y=-1.2 z=0.8 =>" |
| input_ids = torch.tensor([sp.EncodeAsIds(prompt)]) |
| out_ids = model.generate(input_ids, max_new=128, temp=1.0, top_k=1) |
| result = sp.DecodeIds(out_ids[0, input_ids.shape[1]:].tolist()) |
| print(result) |
| # [{"op":"move","x":0.5,"y":-1.2,"z":0.8}] |
| ``` |
|
|
| --- |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{foros_robotics_v5_10, |
| title = {Foros v5.10: An On-Device Instruction-to-JSON Engine for Robotics}, |
| author = {AMEFORGE}, |
| year = {2026}, |
| note = {Built on the SparseMind architecture. |
| https://huggingface.co/AMFORGE/foros} |
| } |
| ``` |
|
|
| --- |
|
|
| ## License & Contact |
|
|
| - **Model weights**: Apache 2.0 |
| - **Tokenizer**: gated access β contact AMEFORGE |
| - **Inquiries**: https://huggingface.co/AMFORGE |