Upload model card for best_model.pt

b8d1b9e verified 2 months ago

5.5 kB

	---
	license: mit
	library_name: pytorch
	tags:
	- robotics
	- progress-estimation
	- behavior-cloning
	---

	# SARM Progress Prediction

	Stage-aware progress prediction model for robot manipulation tasks

	## Model Description

	SARM predicts:
	- Progress: How far through the task (0.0 to 1.0)
	- Stage: Which stage of the task is being executed

	The model uses a transformer architecture to process sequences of RGB images and robot states.

	Task: clearing_food_from_table_into_fridge
	Dataset: IliaLarchenko/behavior_224_rgb

	## Model Details

	### Architecture
	- Type: Transformer with dual prediction heads (stage classification + progress regression)
	- Model dimension: 768
	- Attention heads: 12
	- Transformer layers: 8
	- MLP dimension: 512
	- Number of stages: 100
	- Number of tasks: 50

	### Training Details
	- Checkpoint: `best_model.pt`
	- Training step: 4800
	- Epoch: unknown
	- Training loss: unknown
	- Validation loss: 1.0865614609792829
	- Batch size: 16
	- Learning rate: 0.0001
	- Max sequence length: 13

	## Usage

	### Download and Load Model

	```python
	from hf_model_hub import download_model_from_hub
	from model import SARM
	import torch
	import json

	# Download model and config
	files = download_model_from_hub(
	repo_id="YOUR_USERNAME/YOUR_REPO",
	checkpoint_name="best_model.pt",
	output_dir="./downloaded_model"
	)

	# Load config
	with open(files["config"], "r") as f:
	config = json.load(f)

	# Create model
	model_config = config["model"]
	model = SARM(
	d_model=model_config["d_model"],
	n_heads=model_config["n_heads"],
	n_layers=model_config["n_layers"],
	d_mlp=model_config["d_mlp"],
	num_stages=model_config["num_stages"],
	d_state=model_config["d_state"],
	num_tasks=model_config["num_tasks"],
	)

	# Load checkpoint
	checkpoint = torch.load(files["checkpoint"])
	model.load_state_dict(checkpoint["model_state_dict"])
	model.eval()
	```

	### Run Inference

	```python
	# Assuming you have images and states prepared
	with torch.no_grad():
	stage_logits, progress = model(images, states, tasks, padding_mask)

	# Get predictions for the last frame
	predicted_stage = stage_logits[:, -1].argmax(dim=-1)
	predicted_progress = progress[:, -1]
	```

	## Training Data

	This model was trained on the IliaLarchenko/behavior_224_rgb for robot manipulation tasks.

	Training episodes: 90 episodes
	Validation episodes: 15 episodes

	## Intended Use

	- Progress estimation for robot manipulation tasks
	- Stage classification for multi-step tasks
	- Adaptive window sampling for VLA training
	- Task monitoring and intervention detection

	## Limitations

	- Trained on specific tasks from BEHAVIOR dataset
	- Requires RGB images (224x224) and robot state information
	- Fixed sequence length input

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{sarm-model,
	author = {Your Name},
	title = {SARM Progress Prediction},
	year = {2025},
	publisher = {HuggingFace},
	url = {https://huggingface.co/YOUR_USERNAME/YOUR_REPO}
	}
	```

	## Training Configuration

	<details>
	<summary>Click to expand full training configuration</summary>

	```json
	{
	"metadata": {
	"model_name": "SARM Progress Prediction",
	"description": "Stage-aware progress prediction model for robot manipulation tasks",
	"task": "clearing_food_from_table_into_fridge",
	"task_number": 25,
	"dataset": "IliaLarchenko/behavior_224_rgb",
	"version": "1.0",
	"author": "Your Name",
	"tags": [
	"robotics",
	"progress-estimation",
	"behavior-cloning"
	]
	},
	"model": {
	"d_model": 768,
	"n_heads": 12,
	"n_layers": 8,
	"d_mlp": 512,
	"num_stages": 100,
	"d_state": 256,
	"num_tasks": 50
	},
	"training": {
	"max_steps": 10000,
	"learning_rate": 0.0001,
	"weight_decay": 0.0001,
	"batch_size": 16,
	"gradient_accumulation_steps": 4,
	"max_grad_norm": 1.0,
	"scheduler": "cosine",
	"stage_loss_weight": 1.0,
	"progress_loss_weight": 1.0,
	"validation_steps": 100,
	"save_steps": 200
	},
	"data": {
	"max_sequence_length": 13,
	"image_size": 224,
	"num_workers": 10,
	"val_workers": 10,
	"val_samples": 500,
	"train_episodes": [
	1,
	2,
	3,
	4,
	5,
	6,
	7,
	8,
	9,
	10,
	11,
	12,
	13,
	14,
	15,
	16,
	17,
	18,
	19,
	20,
	21,
	22,
	23,
	24,
	25,
	26,
	27,
	28,
	29,
	30,
	31,
	32,
	33,
	34,
	35,
	36,
	37,
	38,
	39,
	40,
	41,
	42,
	43,
	44,
	45,
	46,
	47,
	48,
	49,
	50,
	51,
	52,
	53,
	54,
	55,
	56,
	57,
	58,
	59,
	60,
	61,
	62,
	63,
	64,
	65,
	66,
	67,
	68,
	69,
	70,
	71,
	72,
	73,
	74,
	75,
	76,
	77,
	78,
	79,
	80,
	81,
	82,
	83,
	84,
	85,
	86,
	87,
	88,
	89,
	90
	],
	"val_episodes": [
	91,
	92,
	93,
	94,
	95,
	96,
	97,
	98,
	99,
	100,
	101,
	102,
	103,
	104,
	105
	],
	"seed": 42
	},
	"logging": {
	"project_name": "sarm-training",
	"run_name": null,
	"log_freq": 10,
	"checkpoint_dir": "checkpoints_sarm_25_2"
	}
	}
	```

	</details>