Update README.md

0c5a356 verified 13 days ago

8.28 kB

	---
	license: apache-2.0
	base_model:
	- nvidia/Alpamayo-R1-10B
	---
	nvidia/Alpamayo-R1-10B 4bit Model.

	이모델은 자율주행 중 수집된 데이터로 이벤트를 예측하는 용도로 활용할 수 있습니다.
	자율주행을 하는게 아니라 자율주행 중 특정 상황이 발생할 것을 알려주는 기능을 합니다.


	```Runinfo
	model download ./Alpamayo-R1-10B-4bit

	GPU 12G/16G Memory Run able

	12G Memory is num_frames is 1 ~ 8, over OOM

	Transformers is 4.57.5 ( 5.0.0rc not run)

	nvidia/Alpamayo-R1-10B 이 대용량 메모리를 요구하고 4bit 로 로딩하여 저장한 모델입니다.
	12G 에서도 실행가능해졌습니다만 주어지는 프레임수는 1~8정도, 그 이상이면 OOM이 떨어집니다.
	트랜스포머 버전 5.0.0rc에서는 동작하지 않습니다.

	git clone https://github.com/NVlabs/alpamayo 하고
	cd alpamayo
	pip install . 로 설치해야 합니다만

	pyproject.toml을 수정하는게 좋습니다.
	python 3.13을 사용하면 requires-python = "==3.13.*"
	transformers 와 torch를 라인을 제거하고 설치하면 설치된 버전이 교체되지 않습니다.
	```
	-----------------------------------
	```python
	import torch
	import numpy as np
	from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
	from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset
	from alpamayo_r1 import helper

	model_path = "Alpamayo-R1-10B-4bit"
	model = AlpamayoR1.from_pretrained(model_path, dtype=torch.bfloat16).to("cuda")

	processor = helper.get_processor(model.tokenizer)

	clip_id = "030c760c-ae38-49aa-9ad8-f5650a545d26"
	print(f"Loading dataset for clip_id: {clip_id}...")
	#need set access token or huggingface-cli login...
	data = load_physical_aiavdataset(clip_id, t0_us=15_100_000,num_frames=1)
	print("Dataset loaded.")

	messages = helper.create_message(data["image_frames"].flatten(0, 1))

	inputs = processor.apply_chat_template(
	messages,
	tokenize=True,
	add_generation_prompt=False,
	continue_final_message=True,
	return_dict=True,
	return_tensors="pt",
	)

	model_inputs = {
	"tokenized_data": inputs,
	"ego_history_xyz": data["ego_history_xyz"],
	"ego_history_rot": data["ego_history_rot"],
	}

	model_inputs = helper.to_device(model_inputs, "cuda")
	torch.cuda.manual_seed_all(42)
	with torch.autocast("cuda", dtype=torch.bfloat16):
	pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
	data=model_inputs,
	top_p=0.98,
	temperature=0.6,
	num_traj_samples=1, # Feel free to raise this for more output trajectories and CoC traces.
	max_generation_length=256,
	return_extra=True,
	)


	print("Chain-of-Causation (per trajectory):\n", extra["cot"][0])
	gt_xy = data["ego_future_xyz"].cpu()[0, 0, :, :2].T.numpy()
	pred_xy = pred_xyz.cpu().numpy()[0, 0, :, :, :2].transpose(0, 2, 1)
	diff = np.linalg.norm(pred_xy - gt_xy[None, ...], axis=1).mean(-1)
	min_ade = diff.min()
	print("minADE:", min_ade, "meters")
	print(
	"Note: VLA-reasoning models produce nondeterministic outputs due to trajectory sampling, "
	"hardware differences, etc. With num_traj_samples=1 (set for GPU memory compatibility), "
	"variance in minADE is expected. For visual sanity checks, see notebooks/inference.ipynb"
	)
	```
	--------------------
	```Result:


	Chain-of-Causation (per trajectory):
	[['Nudge to the left to pass the stopped truck encroaching into the lane.']]
	minADE: 1.7749525 meters
	Note: VLA-reasoning models produce nondeterministic outputs due to trajectory sampling, hardware differences, etc. With num_traj_samples=1 (set for GPU memory compatibility), variance in minADE is expected. For visual sanity checks, see notebooks/inference.ipynb
	```


	나는 1장의 이미지로 판독하는 것을 테스트 하려고 아래와 같은 예제를 만들었다.
	데이터 로딩 없이 기본 초기화를 하여 시작점에서 시작하는 것에서 시작한다.
	구동하기 위해서 최소 12G 이상인 GPU카드를 사용해야 하고, 응답속도 도 꽤 지연이 걸려
	실제 자동차에 적용하기에 무리인것 같다.

	```python
	#ZeroTime init Base Image(1 photo on load image)
	import torch
	import numpy as np
	from PIL import Image
	from alpamayo_r1.models.alpamayo_r1 import AlpamayoR1
	from alpamayo_r1.load_physical_aiavdataset import load_physical_aiavdataset
	from alpamayo_r1 import helper

	num_history_steps = 16 # 과거 스텝 수
	num_future_steps = 64 # 미래 스텝 수

	# 더미 위치 데이터 (xyz 좌표)
	ego_history_xyz = torch.zeros((1, 1, num_history_steps, 3)) # (batch, agent, steps, xyz)
	ego_future_xyz = torch.zeros((1, 1, num_future_steps, 3))

	# 더미 회전 데이터 (3x3 회전행렬)
	ego_history_rot = torch.eye(3).repeat(1, 1, num_history_steps, 1, 1) # (1,1,steps,3,3)
	ego_future_rot = torch.eye(3).repeat(1, 1, num_future_steps, 1, 1)

	print("ego_history_xyz:", ego_history_xyz.shape)
	print("ego_future_xyz:", ego_future_xyz.shape)
	print("ego_history_rot:", ego_history_rot.shape)
	print("ego_future_rot:", ego_future_rot.shape)
	N_cameras = 1
	camera_indices = torch.arange(N_cameras, dtype=torch.long) # (N_cameras,) - long 타입 명시

	data={
	"camera_indices": camera_indices, # (N_cameras,)
	"ego_history_xyz": ego_history_xyz, # (1, 1, num_history_steps, 3)
	"ego_history_rot": ego_history_rot, # (1, 1, num_history_steps, 3, 3)
	"ego_future_xyz": ego_future_xyz, # (1, 1, num_future_steps, 3)
	"ego_future_rot": ego_future_rot, # (1, 1, num_future_steps, 3, 3)
	# "relative_timestamps": relative_timestamps, # (N_cameras, num_frames)
	# "absolute_timestamps": absolute_timestamps # (N_cameras, num_frames)
	}
	img_path = "IMG_20260116_065921.jpg"
	# 예측하고 싶은 JPG 파일 경로
	image = Image.open(img_path).convert("RGB")
	# helper.create_message는 tensor 입력을 기대하므로 변환
	# PIL Image를 numpy array로 변환 후 float32로 변환
	image_array = np.array(image).astype(np.float32) / 255.0 # 0-1 범위로 정규화
	image_tensor = torch.from_numpy(image_array).unsqueeze(0) # [batch, H, W, C]
	# 메시지 생성
	messages = helper.create_message(image_tensor)

	# Example clip ID
	model_path = "Alpamayo-R1-10B-4bit"
	model = AlpamayoR1.from_pretrained(model_path, dtype=torch.bfloat16).to("cuda")
	processor = helper.get_processor(model.tokenizer)



	# 설정값

	inputs = processor.apply_chat_template(
	messages,
	tokenize=True,
	add_generation_prompt=False,
	continue_final_message=True,
	return_dict=True,
	return_tensors="pt",
	)

	model_inputs = {
	"tokenized_data": inputs,
	"ego_history_xyz": data["ego_history_xyz"],
	"ego_history_rot": data["ego_history_rot"],
	}

	model_inputs = helper.to_device(model_inputs, "cuda")

	torch.cuda.manual_seed_all(42)
	with torch.autocast("cuda", dtype=torch.bfloat16):
	pred_xyz, pred_rot, extra = model.sample_trajectories_from_data_with_vlm_rollout(
	data=model_inputs,
	top_p=0.98,
	temperature=0.6,
	num_traj_samples=1, # Feel free to raise this for more output trajectories and CoC traces.
	max_generation_length=256,
	return_extra=True,
	)

	# the size is [batch_size, num_traj_sets, num_traj_samples]
	print("Chain-of-Causation (per trajectory):\n", extra["cot"][0])

	gt_xy = data["ego_future_xyz"].cpu()[0, 0, :, :2].T.numpy()
	pred_xy = pred_xyz.cpu().numpy()[0, 0, :, :, :2].transpose(0, 2, 1)
	diff = np.linalg.norm(pred_xy - gt_xy[None, ...], axis=1).mean(-1)
	min_ade = diff.min()
	print("minADE:", min_ade, "meters")
	print(
	"Note: VLA-reasoning models produce nondeterministic outputs due to trajectory sampling, "
	"hardware differences, etc. With num_traj_samples=1 (set for GPU memory compatibility), "
	"variance in minADE is expected. For visual sanity checks, see notebooks/inference.ipynb"
	)
	```

	```output

	Chain-of-Causation (per trajectory):
	[['Keep lane to continue driving since the lane ahead is clear.']]
	minADE: 0.55852604 meters
	Note: VLA-reasoning models produce nondeterministic outputs due to trajectory sampling, hardware differences, etc. With num_traj_samples=1 (set for GPU memory compatibility), variance in minADE is expected. For visual sanity checks, see notebooks/inference.ipynb

	```