English
LifeIsSoSolong commited on
Commit
f1acdfc
·
verified ·
1 Parent(s): 1db47c1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -3
README.md CHANGED
@@ -1,3 +1,58 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - LifeIsSoSolong/Multimodal_Intelligent_Traffic_Surveillance
5
+ language:
6
+ - en
7
+ base_model:
8
+ - Qwen/Qwen2-VL-7B-Instruct
9
+ ---
10
+ # Qwen2-VL-7B-Instruct-Traffic
11
+
12
+ **Qwen2-VL-7B-Instruct-Traffic** is a multimodal model fine-tuned on the **MITS (Multimodal Intelligent Traffic Surveillance)** dataset for intelligent traffic surveillance scenarios.
13
+
14
+ - **Tasks:** recognition, counting, localization, background awareness, reasoning
15
+ - **Data:** 170,400 images + ~5M instruction-following VQA pairs from MITS
16
+ - **Modality:** Image + Text → Text
17
+ - **Domain:** traffic scenes (congestion, accidents, construction, smoke/fireworks, unusual weather, spills, etc.)
18
+
19
+ ## Quick Links
20
+ - 📚 Dataset: [`zhaokaikai/Multimodal_Intelligent_Traffic_Surveillance`](https://www.modelscope.cn/datasets/zhaokaikai/Multimodal_Intelligent_Traffic_Surveillance)
21
+ - 💻 Usage & examples: please refer to the GitHub repo
22
+ **https://github.com/LifeIsSoSolong/Multimodal-Intelligent-Traffic-Surveillance-Dataset-Models**
23
+
24
+ ## Intended Use
25
+ - Urban traffic monitoring, incident analysis, visual question answering for transportation management
26
+ - Research on ITS-specific multimodal reasoning and instruction following
27
+
28
+ ## Model Inputs/Outputs
29
+ - **Input:** an image (traffic scene) + a natural language instruction/question
30
+ - **Output:** a natural language response (e.g., description, count, event reasoning)
31
+
32
+ ## Training Summary
33
+ - Objective: instruction tuning on MITS traffic QA
34
+ - Backbone family: Qwen2-VL 7B Instruct
35
+ - Notes: align vision-language features to traffic-centric concepts and events
36
+
37
+ ## Limitations & Notes
38
+ - The model may make mistakes on rare objects or extreme weather/night scenes not well represented in training.
39
+ - Not a safety-critical system; human verification is required for real-world decisions.
40
+
41
+ ## License
42
+ - Follow the licenses of this model and the MITS dataset as stated on their ModelScope pages.
43
+
44
+ ## Citation
45
+ If you use this model or dataset, please cite:
46
+ ```bibtex
47
+ @article{zhao2025mits,
48
+ title = {MITS: A large-scale multimodal benchmark dataset for Intelligent Traffic Surveillance},
49
+ author = {Zhao, Kaikai and Liu, Zhaoxiang and Wang, Peng and Wang, Xin and Ma, Zhicheng and Xu, Yajun and Zhang, Wenjing and Nan, Yibing and Wang, Kai and Lian, Shiguo},
50
+ journal = {Image and Vision Computing},
51
+ pages = {105736},
52
+ year = {2025},
53
+ publisher = {Elsevier}
54
+ }
55
+ ```
56
+
57
+ ## Contact
58
+ Unicom AI