wcccp commited on
Commit
d41f4f8
·
verified ·
1 Parent(s): 0cfe67f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ pipeline_tag: image-text-to-text
7
+ tags:
8
+ - vision-language-model
9
+ - multimodal
10
+ - panoramic-understanding
11
+ - 360-degree
12
+ - equirectangular-panorama
13
+ - spatial-reasoning
14
+ - panoworld
15
+ base_model:
16
+ - Qwen/Qwen3.5-9B
17
+ datasets:
18
+ - wcccp/Pano_dataset
19
+ ---
20
+
21
+
22
+ # PanoWorld-Hstar
23
+
24
+ PanoWorld-Hstar is a vision-language model based on **Qwen3.5-9B**, developed for 360-degree panoramic understanding and spatial reasoning.
25
+
26
+ The model is part of the **PanoWorld** project, which focuses on ERP-native panoramic perception, global spatial topology understanding, and human-centric visual search in 360° scenes.
27
+
28
+ * Project: https://github.com/wcpcp/PanoWorld
29
+ * Model: https://huggingface.co/wcccp/PanoWorld
30
+ * Dataset: https://huggingface.co/datasets/wcccp/Pano_dataset
31
+
32
+ ## Model Description
33
+
34
+ PanoWorld-Hstar is fine-tuned for vision-language understanding in equirectangular panorama images. It is designed to improve model capability on panoramic scene captioning, spatial relation reasoning, direction understanding, and 360° visual question answering.
35
+
36
+ ## Intended Use
37
+
38
+ This model is intended for research on:
39
+
40
+ * 360° panoramic image understanding
41
+ * panoramic visual question answering
42
+ * spatial and directional reasoning
43
+ * human-centric visual search in panoramic scenes
44
+ * embodied AI and panoramic scene perception
45
+
46
+ ## Usage
47
+
48
+ ```python
49
+ import torch
50
+ from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
51
+
52
+ model_id = "wcccp/PanoWorld-Hstar"
53
+
54
+ processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
55
+ model = Qwen3_5ForConditionalGeneration.from_pretrained(
56
+ model_id,
57
+ torch_dtype=torch.bfloat16,
58
+ device_map="auto",
59
+ trust_remote_code=True,
60
+ )
61
+
62
+ messages = [
63
+ {
64
+ "role": "user",
65
+ "content": [
66
+ {"type": "image", "image": "example_panorama.jpg"},
67
+ {"type": "text", "text": "Describe this 360-degree panoramic scene."},
68
+ ],
69
+ }
70
+ ]
71
+
72
+ inputs = processor.apply_chat_template(
73
+ messages,
74
+ tokenize=True,
75
+ add_generation_prompt=True,
76
+ return_dict=True,
77
+ return_tensors="pt",
78
+ ).to(model.device)
79
+
80
+ generated_ids = model.generate(
81
+ **inputs,
82
+ max_new_tokens=512,
83
+ )
84
+
85
+ generated_ids_trimmed = [
86
+ output_ids[len(input_ids):]
87
+ for input_ids, output_ids in zip(inputs.input_ids, generated_ids)
88
+ ]
89
+
90
+ response = processor.batch_decode(
91
+ generated_ids_trimmed,
92
+ skip_special_tokens=True,
93
+ clean_up_tokenization_spaces=False,
94
+ )[0]
95
+
96
+ print(response)
97
+ ```
98
+
99
+ Please use a recent version of `transformers` that supports Qwen3.5.
100
+
101
+
102
+ ## Citation
103
+
104
+ ```bibtex
105
+ @misc{wang2026panoworld,
106
+ title={PanoWorld: Towards Spatial Supersensing in 360$^\circ$ Panorama World},
107
+ author={Changpeng Wang and Xin Lin and Junhan Liu and Yuheng Liu and Zhen Wang and Donglian Qi and Yunfeng Yan and Xi Chen},
108
+ year={2026},
109
+ eprint={2605.13169},
110
+ archivePrefix={arXiv},
111
+ primaryClass={cs.CV},
112
+ url={https://arxiv.org/abs/2605.13169},
113
+ }
114
+ ```