Image-Text-to-Text
PEFT
Safetensors

Add model card and metadata for D2-V2X

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +63 -0
README.md CHANGED
@@ -1,3 +1,66 @@
1
  ---
2
  license: mit
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ library_name: peft
4
+ pipeline_tag: image-text-to-text
5
  ---
6
+
7
+ # D2-V2X: Depth-Driven Cooperative V2X Reasoning for Autonomous Driving
8
+
9
+ This repository contains the model weights (adapter) for **D2-V2X**, a spatially-aware Question-Rationale-Answer (QRA) framework designed for cooperative autonomous driving.
10
+
11
+ [**Paper (arXiv)**](https://arxiv.org/abs/2605.24098) | [**GitHub**](https://github.com/KevinRichard1/D2-V2X) | [**Dataset**](https://huggingface.co/datasets/kr301/d2v2x-qra)
12
+
13
+ ## Overview
14
+ D2-V2X addresses sensor occlusions in single-vehicle Vision-Language Models (VLMs) by establishing a benchmark for cooperative reasoning using multimodal vehicle and infrastructure sensors (V2X). It establishes a baseline that aligns 3D LiDAR features with the VLM's latent space, enforcing Chain-of-Thought (CoT) rationales to articulate spatial relations explicitly.
15
+
16
+ ## Usage
17
+
18
+ For environment setup and data preparation, please refer to the [official GitHub repository](https://github.com/KevinRichard1/D2-V2X).
19
+
20
+ ### Training
21
+ To train the model using the provided pipeline:
22
+ ```bash
23
+ python train.py \
24
+ --qwen_path="/path/to/qwen/model" \
25
+ --train_path="/path/to/train/dataset" \
26
+ --val_path="/path/to/val/dataset" \
27
+ --img_path="/path/to/images" \
28
+ --train_feature_path="/path/to/train/lidar/features" \
29
+ --val_feature_path="/path/to/val/lidar/features" \
30
+ --output_path="/checkpoint/path" \
31
+ --mode="" \
32
+ --stage="" \
33
+ --lr=2e-5 \
34
+ --epochs=3 \
35
+ --batch_size=1 \
36
+ --accum_steps=64
37
+ ```
38
+
39
+ ### Evaluation
40
+ To evaluate the model:
41
+ ```bash
42
+ python evaluate.py \
43
+ --qwen_path="/path/to/qwen/model" \
44
+ --checkpoint_path="/checkpoint/path" \
45
+ --inference \
46
+ --evaluate \
47
+ --mode="" \
48
+ --json_path="/path/to/test/dataset" \
49
+ --img_path="/path/to/images" \
50
+ --test_feature_path="/path/to/test/lidar/features" \
51
+ --inference_save_path="results.json"
52
+ ```
53
+
54
+ ## Citation
55
+ If you find this work useful, please cite:
56
+ ```bibtex
57
+ @misc{richard2026d2v2xdepthdrivencooperativev2x,
58
+ title={D2-V2X: Depth-Driven Cooperative V2X Reasoning for Autonomous Driving},
59
+ author={Kevin Richard and Alphin Varghese and Colin Pham and David Oh and Srijan Das},
60
+ year={2026},
61
+ eprint={2605.24098},
62
+ archivePrefix={arXiv},
63
+ primaryClass={cs.CV},
64
+ url={https://arxiv.org/abs/2605.24098},
65
+ }
66
+ ```