Menlo
/

AlphaSpace-1.5B

text-generation

text-generation-inference

Model card Files Files and versions

AlphaSpace-1.5B / README.md

alandao's picture

Remove the gif

e3e1954 verified 10 months ago

|

history blame contribute delete

3.5 kB

	---
	base_model:
	- deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
	datasets:
	- homebrewltd/Pick-Place-Table-Reasoning-local-pos-v0.2
	library_name: transformers
	license: apache-2.0
	pipeline_tag: robotics
	---

	# AlphaSpace-1.5B

	## Introduction

	"AlphaSpace: ([Paper](https://huggingface.co/papers/2503.18769)), a novel methodology designed to enhance the spatial reasoning capabilities of language models for robotic manipulation in 3D Cartesian space. AlphaSpace employs a hierarchical semantics-based tokenization strategy that encodes spatial information at both coarse and fine-grained levels. Our approach represents objects with their attributes, positions, and height information through structured tokens, enabling precise spatial reasoning without relying on traditional vision-based embeddings. This approach enables LLMs to accurately manipulate objects by positioning them at specific [x, y, z] coordinates.

	Code: https://github.com/AlanDao/AlphaSpace

	## Model Details
	* Model architecture: [Deepseek-R1-Distil-Qwen-1.5B Instruct](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)
	* Dataset:
	* Training: [homebrewltd/Pick-Place-Table-Reasoning-local-pos-v0.2](https://huggingface.co/datasets/homebrewltd/Pick-Place-Table-Reasoning-local-pos-v0.2)
	* Eval: https://huggingface.co/datasets/EmbodiedBench/EB-Manipulation.
	* License: Apache-2.0 license
	* Developed by: Alan Dao, Dinh Bach Vu, Bui Quang Huy (Menlo Research)


	## How to Get Started

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
	import torch
	from utils import tokenize_desk, SYSTEM_PROMPT

	# Load the mode


	model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16).to(device)
	tokenizer = AutoTokenizer.from_pretrained(model_path)

	# Define your workspace
	objects = [
	{"red-cube": [51, 43, 17]},
	{"black-cube": [44, 58, 17]},
	{"purple-cube": [74, 59, 17]},
	{"green-cube": [65, 82, 17]},
	]

	# Give a natural language instruction
	instruction = "Throw the red cube on top of the blue cylinder"
	desk, object_height = tokenize_desk(objects)
	final_instruction = SYSTEM_PROMPT.format(object_height=object_height,instruction=instruction,TABLE_MAP=desk)
	chat = [
	{"role": "user", "content": final_instruction.strip()}
	]
	tokenized_chat = tokenizer.apply_chat_template(chat, tokenize=True, add_generation_prompt=True, use_system_prompt=False, return_tensors="pt")
	# print(len(tokenized_chat[0]))
	generated_ids = model.generate(
	tokenized_chat.to("cuda"),
	max_new_tokens=2048,
	do_sample=False,
	temperature=0.6,
	)
	# Get the solution
	result = tokenizer.decode(generated_ids[0][tokenized_chat.shape[1]:], skip_special_tokens=True)
	print(result)
	```
	### Hardware

	GPU Configuration: Cluster of 8x NVIDIA H200-SXM-140GB.

	GPU Usage:
	- SFT: 40 mins.

	### Training Arguments

	We utilize [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory) library to train the model.

	\| Parameter \| Continual Training \|
	\| --- \| --- \|
	\| Epoch \| 1 \|
	\| Global batch size \| 128 \|
	\| Learning Rate \| 1e-4 \|
	\| Learning Scheduler \| cosine with warmup \|
	\| Optimizer \| [AdamW Fused](https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html) \|
	\| Warmup Ratio \| 0.1 \|
	\| Max length \| 4096 \|
	\| Precision \| bf16 \|

	## Citation
	- https://arxiv.org/abs/2503.18769

	## More Information
	* Contact the authors at alan@menlo.ai, bach@menlo.ai, yuuki@menlo.ai for further details.