foros-v5.3 / README.md
ameforge's picture
Update README.md
20ee058 verified
---
license: apache-2.0
language:
- en
tags:
- robotics
- instruction-following
- structured-generation
- text-to-json
- ros
- ros2
- sparse-transformer
- embedded-ai
- on-device
- temporal-control
- control-loop
pipeline_tag: text-generation
inference: false
---
# Foros Robotics Action Engine
**Foros** is an ultra-compact **10M parameter** instruction-to-JSON model designed
for low-latency, on-device robotics control. It translates plain-English robot
commands β€” including **temporal loops, timed sequences, and FSM transitions** β€”
directly into structured JSON arrays of operations compatible with **ROS / ROS2**
and major industrial robot controllers (URScript, KRL, RAPID, Fanuc, DRL).
Developed by **AMEFORGE** β€” https://huggingface.co/AMFORGE.
Built on the in-house **SparseMind** architecture (sparse token attention,
sparse channel FFN, dynamic neuron typing).
**Current version: v5.10** β€” production-ready, deployed on CPU / Jetson / Raspberry Pi 4.
---
## Benchmark Results (Held-Out, 142 Curated Examples)
Foros is evaluated on a **held-out** test suite of 142 hand-curated robotics commands
spanning 5 difficulty tiers. None of these prompts appear in the training corpus.
All measurements taken on Kaggle T4 GPU, greedy decoding.
### Per-Tier Breakdown β€” v5.10
| Tier | Description | N | Valid JSON | Op Correct | **Exact Match** |
|---|---|---:|---:|---:|---:|
| **Tier 1** | Paraphrase (novel templates) | 32 | 100.0% | 100.0% | **100.0%** |
| **Tier 2** | Informal (natural language) | 29 | 100.0% | 96.6% | **93.1%** |
| **Tier 3** | Typos & noise robustness | 30 | 100.0% | 80.0% | **43.3%** |
| **Tier 4** | Multi-step sequences | 22 | 100.0% | 100.0% | **72.7%** |
| **Tier 5** | Long chains & temporal loops | 29 | 96.6% | 96.6% | **69.0%** |
| **Global (weighted)** | | **142** | **99.3%** | **94.4%** | **76.1%** |
### Version Trajectory
We report Exact Match on the held-out benchmark across successive iterations to
document architectural and data improvements transparently:
| Version | Held-Out Exact Match | Held-Out JSON Valid | Notable Change |
|---|---:|---:|---|
| v5.4 (baseline) | 62.7% | ~99% | Initial production deployment |
| v5.9 | 63.4% | 100.0% | Numerical precision standardization, deterministic pick-and-place targets |
| **v5.10** | **76.1%** | **99.3%** | Refactored conditional templates, expanded informal/imperative vocabulary, integer-form numerical prompts |
### Head-to-Head Comparison (All Measured)
All baselines evaluated on the same 142-example held-out benchmark, same hardware
(Kaggle T4 GPU), same scoring rubric, greedy decoding.
| Model | Exact Match | Valid JSON | Op Correct | Latency (avg) | Size |
|---|---:|---:|---:|---:|---:|
| πŸš€ **Foros v5.10 β€” AMEFORGE** | **76.1%** | **99.3%** | **94.4%** | **508 ms** | **39.6 MB** |
| Qwen2.5-1.5B-Instruct | 28.0% | 60.0% | 44.0% | 1{,}998 ms | 2{,}944 MB |
| Qwen2.5-0.5B-Instruct | 18.0% | 46.0% | 24.0% | 3{,}766 ms | 942 MB |
| TinyLlama-1.1B-Chat | 6.0% | 22.0% | 10.0% | 7{,}315 ms | 2{,}098 MB |
| SmolLM2-360M-Instruct | 0.0% | 6.0% | 2.0% | 5{,}884 ms | 690 MB |
**Key takeaways:**
- Foros reaches **76.1% exact match** on held-out robotics commands. The best
general-purpose small LM evaluated (Qwen2.5-1.5B, ~150Γ— larger) reaches only
**28.0%** β€” Foros outperforms it by **+48 percentage points**.
- The smallest comparable general-purpose LM (SmolLM2-360M, ~36Γ— larger)
reaches **0.0% exact match** and only 6.0% valid JSON, indicating that
general-purpose small models struggle even to produce syntactically valid
output on this task.
- **4Γ— lower latency** than Qwen2.5-1.5B, **14Γ— lower** than TinyLlama-1.1B.
- **17Γ— smaller** than the smallest competitive baseline (SmolLM2-360M),
**74Γ— smaller** than Qwen2.5-1.5B.
- Runs on Raspberry Pi 4, Jetson Nano/Orin, or any embedded CPU. No GPU required
for inference, no cloud dependency, no telemetry.
**Latency profile** β€” atomic commands (Tier 1–3) run at **~305 ms**, compound
sequences (Tier 4–5) at **~860 ms**. The 508 ms average reflects the full
benchmark distribution including long temporal loops.
---
## What it does
### Atomic Commands
| Natural Language Input | Structured Output (ROS JSON) |
|---|---|
| `move to x=0.5 y=-1.2 z=0.8` | `[{"op":"move","x":0.5,"y":-1.2,"z":0.8}]` |
| `rotate joints to [0.0, 45.0, 90.0, 0.0, 0.0, 0.0]` | `[{"op":"joint_move","joints":[0.0,45.0,90.0,0.0,0.0,0.0]}]` |
| `close gripper with force 0.75` | `[{"op":"gripper","action":"close","force":0.75}]` |
| `wait for 3.5 seconds` | `[{"op":"wait","seconds":3.5}]` |
| `set velocity to 0.75 m/s` | `[{"op":"speed","velocity":0.75}]` |
| `halt all motion` | `[{"op":"stop"}]` |
| `upon sensor_trip return to home position` | `[{"op":"safety","cond":"sensor_trip","then":[{"op":"home"}]}]` |
### Temporal / Loop Commands
| Natural Language Input | Structured Output |
|---|---|
| `repeat 5 times: move arm` | `[{"op":"repeat","times":5,"body":[...]}]` |
| `keep doing move arm until obstacle` | `[{"op":"repeat_until","cond":"obstacle","body":[...]}]` |
| `run control loop at 100Hz for 2.5 seconds` | `[{"op":"control_loop","frequency_hz":100,"duration_s":2.5,"body":[...]}]` |
| `every 0.5s do rotate joints for 4 steps` | `[{"op":"timed_seq","interval_s":0.5,"count":4,"body":[...]}]` |
| `simultaneously move arm and set speed` | `[{"op":"parallel","branches":[[...],[...]]}]` |
### Complex Sequences (Multi-step planning)
```
Input: pick up the red_box at 0.5 0.5 0.0 and place it at -0.5 1.0 0.0
Output: [
{"op":"move","x":0.5,"y":0.5,"z":0.0},
{"op":"gripper","action":"close"},
{"op":"move","x":0.5,"y":0.5,"z":0.2},
{"op":"move","x":-0.5,"y":1.0,"z":0.2},
{"op":"move","x":-0.5,"y":1.0,"z":0.0},
{"op":"gripper","action":"open"},
{"op":"move","x":-0.5,"y":1.0,"z":0.2}
]
```
---
## Supported Operations
| Category | Operations |
|---|---|
| **Motion** | `move`, `joint_move`, `move_tcp`, `move_joint`, `home`, `trajectory` |
| **End Effector** | `gripper`, `tool`, `get_joint_values` |
| **Control Flow** | `wait`, `safety`, `stop`, `repeat`, `repeat_until` |
| **Temporal** | `timed_seq`, `control_loop`, `parallel`, `state_transition` |
---
## Model Details
| Property | Value |
|---|---|
| Architecture | SparseMind (decoder-only, sparse attention + sparse FFN + dynamic neuron typing) |
| Parameters | 10,347,395 (~10.3 M) |
| Hidden size / Layers / Heads | 256 / 6 / 8 |
| Context length | 384 tokens |
| Tokenizer | In-house domain-specific BPE, vocab 3,000, atomic numerical tokens |
| Precision | FP32 |
| Model size | 39.6 MB |
---
## Training Methodology
Foros is trained on a **hybrid corpus** combining:
- **Programmatic synthetic data** covering all supported operations, with
paraphrastic variations (formal, informal, imperative tones), numerical
precision variants, and compositional sequences of varying depth.
- **Curated production logs** β€” anonymized real-world prompts collected from
deployed instances, with manually verified ground-truth JSON targets.
- **Iterative refinement** β€” successive versions (v5.4 β†’ v5.9 β†’ v5.10) integrate
fixes derived from systematic failure analysis on the held-out benchmark.
Training is conducted from scratch (no pre-trained checkpoint) on a single T4
GPU in approximately 4 hours.
Detailed corpus composition, generator weights, and hyperparameter schedules
are proprietary to AMEFORGE.
---
## Known Limitations
- **Typo robustness** β€” Tier 3 sits at 43.3% exact match. Severely mangled
tokens (e.g., `mvoe` instead of `move`) can degrade numerical extraction.
A typo-aware fine-tune is planned for v5.11.
- **Relative motion** β€” Foros operates on **absolute coordinates**. Prompts like
`move left by 20 cm` are out of domain and should be resolved by an upstream
natural-language pre-processor that converts them to absolute positions.
- **Open-ended planning** β€” Foros is a structured *translator*, not a planner.
For multi-step reasoning beyond literal sequencing, pair it with an upstream
planner.
- **Numerical fidelity in low-confidence contexts** β€” when the prompt vocabulary
is unfamiliar, the model may default to in-distribution coordinate values.
For coordinate-critical operations in production, we recommend a lightweight
regex post-processor that re-injects explicit numerical values from the prompt
as a safety net.
---
## Local Inference
```python
import os
import torch
import sentencepiece as spm
from huggingface_hub import hf_hub_download
# Download model weights (public)
model_file = hf_hub_download(repo_id="AMFORGE/foros", filename="foros.pt")
# Download tokenizer (gated β€” set HF_TOKEN environment variable)
tok_file = hf_hub_download(
repo_id="AMFORGE/foros_tok",
filename="sparsforos_tokenizer.model",
token=os.environ.get("HF_TOKEN"),
)
# Tokenizer
sp = spm.SentencePieceProcessor()
sp.Load(tok_file)
# Model β€” requires the SparseMind reference implementation
# (available with the tokenizer via AMEFORGE on request)
from sparsemind_robotics_train import SparseMind, Config
ckpt = torch.load(model_file, map_location="cpu", weights_only=False)
cfg = Config(**{k: v for k, v in ckpt["config"].items()
if k in Config.__dataclass_fields__})
model = SparseMind(cfg)
model.load_state_dict(ckpt["model"])
model.eval()
# Inference β€” greedy decoding recommended for production
prompt = "move to x=0.5 y=-1.2 z=0.8 =>"
input_ids = torch.tensor([sp.EncodeAsIds(prompt)])
out_ids = model.generate(input_ids, max_new=128, temp=1.0, top_k=1)
result = sp.DecodeIds(out_ids[0, input_ids.shape[1]:].tolist())
print(result)
# [{"op":"move","x":0.5,"y":-1.2,"z":0.8}]
```
---
## Citation
```bibtex
@misc{foros_robotics_v5_10,
title = {Foros v5.10: An On-Device Instruction-to-JSON Engine for Robotics},
author = {AMEFORGE},
year = {2026},
note = {Built on the SparseMind architecture.
https://huggingface.co/AMFORGE/foros}
}
```
---
## License & Contact
- **Model weights**: Apache 2.0
- **Tokenizer**: gated access β€” contact AMEFORGE
- **Inquiries**: https://huggingface.co/AMFORGE