--- base_model: - deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B datasets: - homebrewltd/Pick-Place-Table-Reasoning-local-pos-v0.2 library_name: transformers license: apache-2.0 pipeline_tag: robotics --- # AlphaSpace-1.5B ## Introduction **"AlphaSpace:** ([Paper](https://huggingface.co/papers/2503.18769)), a novel methodology designed to enhance the spatial reasoning capabilities of language models for robotic manipulation in 3D Cartesian space. AlphaSpace employs a hierarchical semantics-based tokenization strategy that encodes spatial information at both coarse and fine-grained levels. Our approach represents objects with their attributes, positions, and height information through structured tokens, enabling precise spatial reasoning without relying on traditional vision-based embeddings. This approach enables LLMs to accurately manipulate objects by positioning them at specific [x, y, z] coordinates. Code: https://github.com/AlanDao/AlphaSpace ## Model Details * Model architecture: [Deepseek-R1-Distil-Qwen-1.5B Instruct](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B) * Dataset: * Training: [homebrewltd/Pick-Place-Table-Reasoning-local-pos-v0.2](https://huggingface.co/datasets/homebrewltd/Pick-Place-Table-Reasoning-local-pos-v0.2) * Eval: https://huggingface.co/datasets/EmbodiedBench/EB-Manipulation. * License: Apache-2.0 license * Developed by: Alan Dao, Dinh Bach Vu, Bui Quang Huy (Menlo Research) ## How to Get Started ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline import torch from utils import tokenize_desk, SYSTEM_PROMPT # Load the mode model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16).to(device) tokenizer = AutoTokenizer.from_pretrained(model_path) # Define your workspace objects = [ {"red-cube": [51, 43, 17]}, {"black-cube": [44, 58, 17]}, {"purple-cube": [74, 59, 17]}, {"green-cube": [65, 82, 17]}, ] # Give a natural language instruction instruction = "Throw the red cube on top of the blue cylinder" desk, object_height = tokenize_desk(objects) final_instruction = SYSTEM_PROMPT.format(object_height=object_height,instruction=instruction,TABLE_MAP=desk) chat = [ {"role": "user", "content": final_instruction.strip()} ] tokenized_chat = tokenizer.apply_chat_template(chat, tokenize=True, add_generation_prompt=True, use_system_prompt=False, return_tensors="pt") # print(len(tokenized_chat[0])) generated_ids = model.generate( tokenized_chat.to("cuda"), max_new_tokens=2048, do_sample=False, temperature=0.6, ) # Get the solution result = tokenizer.decode(generated_ids[0][tokenized_chat.shape[1]:], skip_special_tokens=True) print(result) ``` ### Hardware **GPU Configuration**: Cluster of 8x NVIDIA H200-SXM-140GB. **GPU Usage**: - **SFT**: 40 mins. ### Training Arguments We utilize [Llama-Factory](https://github.com/hiyouga/LLaMA-Factory) library to train the model. | **Parameter** | **Continual Training** | | --- | --- | | **Epoch** | 1 | | **Global batch size** | 128 | | **Learning Rate** | 1e-4 | | **Learning Scheduler** | cosine with warmup | | **Optimizer** | [AdamW Fused](https://pytorch.org/docs/stable/generated/torch.optim.AdamW.html) | | **Warmup Ratio** | 0.1 | | **Max length** | 4096 | | **Precision** | bf16 | ## Citation - https://arxiv.org/abs/2503.18769 ## More Information * Contact the authors at alan@menlo.ai, bach@menlo.ai, yuuki@menlo.ai for further details.