Update README.md

20ee058 verified about 8 hours ago

10.2 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- robotics
	- instruction-following
	- structured-generation
	- text-to-json
	- ros
	- ros2
	- sparse-transformer
	- embedded-ai
	- on-device
	- temporal-control
	- control-loop
	pipeline_tag: text-generation
	inference: false
	---

	# Foros Robotics Action Engine

	Foros is an ultra-compact 10M parameter instruction-to-JSON model designed
	for low-latency, on-device robotics control. It translates plain-English robot
	commands — including temporal loops, timed sequences, and FSM transitions —
	directly into structured JSON arrays of operations compatible with ROS / ROS2
	and major industrial robot controllers (URScript, KRL, RAPID, Fanuc, DRL).

	Developed by AMEFORGE — https://huggingface.co/AMFORGE.
	Built on the in-house SparseMind architecture (sparse token attention,
	sparse channel FFN, dynamic neuron typing).

	Current version: v5.10 — production-ready, deployed on CPU / Jetson / Raspberry Pi 4.

	---

	## Benchmark Results (Held-Out, 142 Curated Examples)

	Foros is evaluated on a held-out test suite of 142 hand-curated robotics commands
	spanning 5 difficulty tiers. None of these prompts appear in the training corpus.
	All measurements taken on Kaggle T4 GPU, greedy decoding.

	### Per-Tier Breakdown — v5.10

	\| Tier \| Description \| N \| Valid JSON \| Op Correct \| Exact Match \|
	\|---\|---\|---:\|---:\|---:\|---:\|
	\| Tier 1 \| Paraphrase (novel templates) \| 32 \| 100.0% \| 100.0% \| 100.0% \|
	\| Tier 2 \| Informal (natural language) \| 29 \| 100.0% \| 96.6% \| 93.1% \|
	\| Tier 3 \| Typos & noise robustness \| 30 \| 100.0% \| 80.0% \| 43.3% \|
	\| Tier 4 \| Multi-step sequences \| 22 \| 100.0% \| 100.0% \| 72.7% \|
	\| Tier 5 \| Long chains & temporal loops \| 29 \| 96.6% \| 96.6% \| 69.0% \|
	\| Global (weighted) \| \| 142 \| 99.3% \| 94.4% \| 76.1% \|

	### Version Trajectory

	We report Exact Match on the held-out benchmark across successive iterations to
	document architectural and data improvements transparently:

	\| Version \| Held-Out Exact Match \| Held-Out JSON Valid \| Notable Change \|
	\|---\|---:\|---:\|---\|
	\| v5.4 (baseline) \| 62.7% \| ~99% \| Initial production deployment \|
	\| v5.9 \| 63.4% \| 100.0% \| Numerical precision standardization, deterministic pick-and-place targets \|
	\| v5.10 \| 76.1% \| 99.3% \| Refactored conditional templates, expanded informal/imperative vocabulary, integer-form numerical prompts \|

	### Head-to-Head Comparison (All Measured)

	All baselines evaluated on the same 142-example held-out benchmark, same hardware
	(Kaggle T4 GPU), same scoring rubric, greedy decoding.

	\| Model \| Exact Match \| Valid JSON \| Op Correct \| Latency (avg) \| Size \|
	\|---\|---:\|---:\|---:\|---:\|---:\|
	\| 🚀 Foros v5.10 — AMEFORGE \| 76.1% \| 99.3% \| 94.4% \| 508 ms \| 39.6 MB \|
	\| Qwen2.5-1.5B-Instruct \| 28.0% \| 60.0% \| 44.0% \| 1{,}998 ms \| 2{,}944 MB \|
	\| Qwen2.5-0.5B-Instruct \| 18.0% \| 46.0% \| 24.0% \| 3{,}766 ms \| 942 MB \|
	\| TinyLlama-1.1B-Chat \| 6.0% \| 22.0% \| 10.0% \| 7{,}315 ms \| 2{,}098 MB \|
	\| SmolLM2-360M-Instruct \| 0.0% \| 6.0% \| 2.0% \| 5{,}884 ms \| 690 MB \|

	Key takeaways:

	- Foros reaches 76.1% exact match on held-out robotics commands. The best
	general-purpose small LM evaluated (Qwen2.5-1.5B, ~150× larger) reaches only
	28.0% — Foros outperforms it by +48 percentage points.
	- The smallest comparable general-purpose LM (SmolLM2-360M, ~36× larger)
	reaches 0.0% exact match and only 6.0% valid JSON, indicating that
	general-purpose small models struggle even to produce syntactically valid
	output on this task.
	- 4× lower latency than Qwen2.5-1.5B, 14× lower than TinyLlama-1.1B.
	- 17× smaller than the smallest competitive baseline (SmolLM2-360M),
	74× smaller than Qwen2.5-1.5B.
	- Runs on Raspberry Pi 4, Jetson Nano/Orin, or any embedded CPU. No GPU required
	for inference, no cloud dependency, no telemetry.

	Latency profile — atomic commands (Tier 1–3) run at ~305 ms, compound
	sequences (Tier 4–5) at ~860 ms. The 508 ms average reflects the full
	benchmark distribution including long temporal loops.

	---

	## What it does

	### Atomic Commands

	\| Natural Language Input \| Structured Output (ROS JSON) \|
	\|---\|---\|
	\| `move to x=0.5 y=-1.2 z=0.8` \| `[{"op":"move","x":0.5,"y":-1.2,"z":0.8}]` \|
	\| `rotate joints to [0.0, 45.0, 90.0, 0.0, 0.0, 0.0]` \| `[{"op":"joint_move","joints":[0.0,45.0,90.0,0.0,0.0,0.0]}]` \|
	\| `close gripper with force 0.75` \| `[{"op":"gripper","action":"close","force":0.75}]` \|
	\| `wait for 3.5 seconds` \| `[{"op":"wait","seconds":3.5}]` \|
	\| `set velocity to 0.75 m/s` \| `[{"op":"speed","velocity":0.75}]` \|
	\| `halt all motion` \| `[{"op":"stop"}]` \|
	\| `upon sensor_trip return to home position` \| `[{"op":"safety","cond":"sensor_trip","then":[{"op":"home"}]}]` \|

	### Temporal / Loop Commands

	\| Natural Language Input \| Structured Output \|
	\|---\|---\|
	\| `repeat 5 times: move arm` \| `[{"op":"repeat","times":5,"body":[...]}]` \|
	\| `keep doing move arm until obstacle` \| `[{"op":"repeat_until","cond":"obstacle","body":[...]}]` \|
	\| `run control loop at 100Hz for 2.5 seconds` \| `[{"op":"control_loop","frequency_hz":100,"duration_s":2.5,"body":[...]}]` \|
	\| `every 0.5s do rotate joints for 4 steps` \| `[{"op":"timed_seq","interval_s":0.5,"count":4,"body":[...]}]` \|
	\| `simultaneously move arm and set speed` \| `[{"op":"parallel","branches":[[...],[...]]}]` \|

	### Complex Sequences (Multi-step planning)

	```
	Input: pick up the red_box at 0.5 0.5 0.0 and place it at -0.5 1.0 0.0

	Output: [
	{"op":"move","x":0.5,"y":0.5,"z":0.0},
	{"op":"gripper","action":"close"},
	{"op":"move","x":0.5,"y":0.5,"z":0.2},
	{"op":"move","x":-0.5,"y":1.0,"z":0.2},
	{"op":"move","x":-0.5,"y":1.0,"z":0.0},
	{"op":"gripper","action":"open"},
	{"op":"move","x":-0.5,"y":1.0,"z":0.2}
	]
	```

	---

	## Supported Operations

	\| Category \| Operations \|
	\|---\|---\|
	\| Motion \| `move`, `joint_move`, `move_tcp`, `move_joint`, `home`, `trajectory` \|
	\| End Effector \| `gripper`, `tool`, `get_joint_values` \|
	\| Control Flow \| `wait`, `safety`, `stop`, `repeat`, `repeat_until` \|
	\| Temporal \| `timed_seq`, `control_loop`, `parallel`, `state_transition` \|

	---

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Architecture \| SparseMind (decoder-only, sparse attention + sparse FFN + dynamic neuron typing) \|
	\| Parameters \| 10,347,395 (~10.3 M) \|
	\| Hidden size / Layers / Heads \| 256 / 6 / 8 \|
	\| Context length \| 384 tokens \|
	\| Tokenizer \| In-house domain-specific BPE, vocab 3,000, atomic numerical tokens \|
	\| Precision \| FP32 \|
	\| Model size \| 39.6 MB \|

	---

	## Training Methodology

	Foros is trained on a hybrid corpus combining:

	- Programmatic synthetic data covering all supported operations, with
	paraphrastic variations (formal, informal, imperative tones), numerical
	precision variants, and compositional sequences of varying depth.
	- Curated production logs — anonymized real-world prompts collected from
	deployed instances, with manually verified ground-truth JSON targets.
	- Iterative refinement — successive versions (v5.4 → v5.9 → v5.10) integrate
	fixes derived from systematic failure analysis on the held-out benchmark.

	Training is conducted from scratch (no pre-trained checkpoint) on a single T4
	GPU in approximately 4 hours.

	Detailed corpus composition, generator weights, and hyperparameter schedules
	are proprietary to AMEFORGE.

	---

	## Known Limitations

	- Typo robustness — Tier 3 sits at 43.3% exact match. Severely mangled
	tokens (e.g., `mvoe` instead of `move`) can degrade numerical extraction.
	A typo-aware fine-tune is planned for v5.11.
	- Relative motion — Foros operates on absolute coordinates. Prompts like
	`move left by 20 cm` are out of domain and should be resolved by an upstream
	natural-language pre-processor that converts them to absolute positions.
	- Open-ended planning — Foros is a structured translator, not a planner.
	For multi-step reasoning beyond literal sequencing, pair it with an upstream
	planner.
	- Numerical fidelity in low-confidence contexts — when the prompt vocabulary
	is unfamiliar, the model may default to in-distribution coordinate values.
	For coordinate-critical operations in production, we recommend a lightweight
	regex post-processor that re-injects explicit numerical values from the prompt
	as a safety net.

	---

	## Local Inference

	```python
	import os
	import torch
	import sentencepiece as spm
	from huggingface_hub import hf_hub_download

	# Download model weights (public)
	model_file = hf_hub_download(repo_id="AMFORGE/foros", filename="foros.pt")

	# Download tokenizer (gated — set HF_TOKEN environment variable)
	tok_file = hf_hub_download(
	repo_id="AMFORGE/foros_tok",
	filename="sparsforos_tokenizer.model",
	token=os.environ.get("HF_TOKEN"),
	)

	# Tokenizer
	sp = spm.SentencePieceProcessor()
	sp.Load(tok_file)

	# Model — requires the SparseMind reference implementation
	# (available with the tokenizer via AMEFORGE on request)
	from sparsemind_robotics_train import SparseMind, Config

	ckpt = torch.load(model_file, map_location="cpu", weights_only=False)
	cfg = Config(**{k: v for k, v in ckpt["config"].items()
	if k in Config.__dataclass_fields__})
	model = SparseMind(cfg)
	model.load_state_dict(ckpt["model"])
	model.eval()

	# Inference — greedy decoding recommended for production
	prompt = "move to x=0.5 y=-1.2 z=0.8 =>"
	input_ids = torch.tensor([sp.EncodeAsIds(prompt)])
	out_ids = model.generate(input_ids, max_new=128, temp=1.0, top_k=1)
	result = sp.DecodeIds(out_ids[0, input_ids.shape[1]:].tolist())
	print(result)
	# [{"op":"move","x":0.5,"y":-1.2,"z":0.8}]
	```

	---

	## Citation

	```bibtex
	@misc{foros_robotics_v5_10,
	title = {Foros v5.10: An On-Device Instruction-to-JSON Engine for Robotics},
	author = {AMEFORGE},
	year = {2026},
	note = {Built on the SparseMind architecture.
	https://huggingface.co/AMFORGE/foros}
	}
	```

	---

	## License & Contact

	- Model weights: Apache 2.0
	- Tokenizer: gated access — contact AMEFORGE
	- Inquiries: https://huggingface.co/AMFORGE