LifeIsSoSolong
/

Qwen2-VL-7B-Instruct-Traffic

Model card Files Files and versions

LifeIsSoSolong commited on Oct 25, 2025

Commit

f1acdfc

·

verified ·

1 Parent(s): 1db47c1

Update README.md

Files changed (1) hide show

README.md +58 -3

README.md CHANGED Viewed

@@ -1,3 +1,58 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+datasets:
+- LifeIsSoSolong/Multimodal_Intelligent_Traffic_Surveillance
+language:
+- en
+base_model:
+- Qwen/Qwen2-VL-7B-Instruct
+---
+# Qwen2-VL-7B-Instruct-Traffic
+**Qwen2-VL-7B-Instruct-Traffic** is a multimodal model fine-tuned on the **MITS (Multimodal Intelligent Traffic Surveillance)** dataset for intelligent traffic surveillance scenarios.
+- **Tasks:** recognition, counting, localization, background awareness, reasoning
+- **Data:** 170,400 images + ~5M instruction-following VQA pairs from MITS
+- **Modality:** Image + Text → Text
+- **Domain:** traffic scenes (congestion, accidents, construction, smoke/fireworks, unusual weather, spills, etc.)
+## Quick Links
+- 📚 Dataset: [`zhaokaikai/Multimodal_Intelligent_Traffic_Surveillance`](https://www.modelscope.cn/datasets/zhaokaikai/Multimodal_Intelligent_Traffic_Surveillance)
+- 💻 Usage & examples: please refer to the GitHub repo
+  **https://github.com/LifeIsSoSolong/Multimodal-Intelligent-Traffic-Surveillance-Dataset-Models**
+## Intended Use
+- Urban traffic monitoring, incident analysis, visual question answering for transportation management
+- Research on ITS-specific multimodal reasoning and instruction following
+## Model Inputs/Outputs
+- **Input:** an image (traffic scene) + a natural language instruction/question
+- **Output:** a natural language response (e.g., description, count, event reasoning)
+## Training Summary
+- Objective: instruction tuning on MITS traffic QA
+- Backbone family: Qwen2-VL 7B Instruct
+- Notes: align vision-language features to traffic-centric concepts and events
+## Limitations & Notes
+- The model may make mistakes on rare objects or extreme weather/night scenes not well represented in training.
+- Not a safety-critical system; human verification is required for real-world decisions.
+## License
+- Follow the licenses of this model and the MITS dataset as stated on their ModelScope pages.
+## Citation
+If you use this model or dataset, please cite:
+```bibtex
+@article{zhao2025mits,
+  title   = {MITS: A large-scale multimodal benchmark dataset for Intelligent Traffic Surveillance},
+  author  = {Zhao, Kaikai and Liu, Zhaoxiang and Wang, Peng and Wang, Xin and Ma, Zhicheng and Xu, Yajun and Zhang, Wenjing and Nan, Yibing and Wang, Kai and Lian, Shiguo},
+  journal = {Image and Vision Computing},
+  pages   = {105736},
+  year    = {2025},
+  publisher = {Elsevier}
+}
+```
+## Contact
+Unicom AI