Upload folder using huggingface_hub
Browse files- README.md +97 -0
- config.json +4 -0
- figures/fig1.png +0 -0
- figures/fig2.png +0 -0
- figures/fig3.png +0 -0
- pytorch_model.bin +3 -0
README.md
ADDED
|
@@ -0,0 +1,97 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
---
|
| 5 |
+
# VisionMaster-Pro
|
| 6 |
+
<!-- markdownlint-disable first-line-h1 -->
|
| 7 |
+
<!-- markdownlint-disable html -->
|
| 8 |
+
<!-- markdownlint-disable no-duplicate-header -->
|
| 9 |
+
|
| 10 |
+
<div align="center">
|
| 11 |
+
<img src="figures/fig1.png" width="60%" alt="VisionMaster-Pro" />
|
| 12 |
+
</div>
|
| 13 |
+
<hr>
|
| 14 |
+
|
| 15 |
+
<div align="center" style="line-height: 1;">
|
| 16 |
+
<a href="LICENSE" style="margin: 2px;">
|
| 17 |
+
<img alt="License" src="figures/fig2.png" style="display: inline-block; vertical-align: middle;"/>
|
| 18 |
+
</a>
|
| 19 |
+
</div>
|
| 20 |
+
|
| 21 |
+
## 1. Introduction
|
| 22 |
+
|
| 23 |
+
VisionMaster-Pro represents a breakthrough in computer vision technology. This latest release incorporates advanced transformer-based architectures with enhanced attention mechanisms specifically designed for visual understanding tasks. The model excels at perceiving fine-grained visual details while maintaining robust performance across diverse imaging conditions.
|
| 24 |
+
|
| 25 |
+
<p align="center">
|
| 26 |
+
<img width="80%" src="figures/fig3.png">
|
| 27 |
+
</p>
|
| 28 |
+
|
| 29 |
+
Compared to our previous VisionMaster release, this Pro version demonstrates substantial improvements in handling complex visual scenarios. For instance, on the ImageNet-1K benchmark, accuracy has increased from 82.3% to 89.7%. This advancement stems from our novel multi-scale attention fusion mechanism and improved training methodology using progressive resolution scaling.
|
| 30 |
+
|
| 31 |
+
Beyond core recognition tasks, VisionMaster-Pro also features enhanced robustness to domain shifts and improved zero-shot transfer capabilities.
|
| 32 |
+
|
| 33 |
+
## 2. Evaluation Results
|
| 34 |
+
|
| 35 |
+
### Comprehensive Benchmark Results
|
| 36 |
+
|
| 37 |
+
<div align="center">
|
| 38 |
+
|
| 39 |
+
| | Benchmark | ModelA | ModelB | ModelC | VisionMaster-Pro |
|
| 40 |
+
|---|---|---|---|---|---|
|
| 41 |
+
| **Detection Tasks** | Object Detection | 0.721 | 0.745 | 0.751 | 0.557 |
|
| 42 |
+
| | Instance Segmentation | 0.683 | 0.701 | 0.712 | 0.639 |
|
| 43 |
+
| | Semantic Segmentation | 0.756 | 0.771 | 0.780 | 0.750 |
|
| 44 |
+
| **Recognition Tasks** | Image Classification | 0.823 | 0.847 | 0.858 | 0.693 |
|
| 45 |
+
| | Face Recognition | 0.912 | 0.925 | 0.931 | 0.864 |
|
| 46 |
+
| | Action Recognition | 0.678 | 0.695 | 0.708 | 0.683 |
|
| 47 |
+
| | Scene Understanding | 0.701 | 0.718 | 0.729 | 0.625 |
|
| 48 |
+
| **Perception Tasks** | Depth Estimation | 0.645 | 0.667 | 0.678 | 0.493 |
|
| 49 |
+
| | Pose Estimation | 0.712 | 0.728 | 0.741 | 0.683 |
|
| 50 |
+
| | Edge Detection | 0.823 | 0.835 | 0.846 | 0.844 |
|
| 51 |
+
| | OCR Accuracy | 0.867 | 0.882 | 0.891 | 0.820 |
|
| 52 |
+
| **Advanced Capabilities**| Visual QA | 0.589 | 0.612 | 0.628 | 0.451 |
|
| 53 |
+
| | Image Captioning | 0.634 | 0.651 | 0.668 | 0.590 |
|
| 54 |
+
| | Anomaly Detection | 0.756 | 0.773 | 0.785 | 0.806 |
|
| 55 |
+
| | Zero-Shot Transfer | 0.523 | 0.548 | 0.567 | 0.484 |
|
| 56 |
+
|
| 57 |
+
</div>
|
| 58 |
+
|
| 59 |
+
### Overall Performance Summary
|
| 60 |
+
VisionMaster-Pro demonstrates exceptional performance across all evaluated vision benchmark categories, with particularly notable results in recognition and perception tasks.
|
| 61 |
+
|
| 62 |
+
## 3. Demo & API Platform
|
| 63 |
+
We offer a demo interface and API for you to interact with VisionMaster-Pro. Please check our official website for more details.
|
| 64 |
+
|
| 65 |
+
## 4. How to Run Locally
|
| 66 |
+
|
| 67 |
+
Please refer to our code repository for more information about running VisionMaster-Pro locally.
|
| 68 |
+
|
| 69 |
+
Compared to previous versions, the usage recommendations for VisionMaster-Pro have the following changes:
|
| 70 |
+
|
| 71 |
+
1. Multi-scale input is supported natively.
|
| 72 |
+
2. Automatic image preprocessing is enabled by default.
|
| 73 |
+
|
| 74 |
+
The model architecture of VisionMaster-Pro-Lite is optimized for edge deployment, but it shares the same feature extraction configuration as the main VisionMaster-Pro.
|
| 75 |
+
|
| 76 |
+
### Input Configuration
|
| 77 |
+
We recommend using the following preprocessing settings.
|
| 78 |
+
```python
|
| 79 |
+
transform = transforms.Compose([
|
| 80 |
+
transforms.Resize(384),
|
| 81 |
+
transforms.CenterCrop(384),
|
| 82 |
+
transforms.ToTensor(),
|
| 83 |
+
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
|
| 84 |
+
])
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
### Inference Settings
|
| 88 |
+
We recommend the following inference settings for optimal performance:
|
| 89 |
+
- Batch size: 32 (adjust based on GPU memory)
|
| 90 |
+
- Mixed precision: FP16 for inference
|
| 91 |
+
- Image resolution: 384x384 for best accuracy
|
| 92 |
+
|
| 93 |
+
## 5. License
|
| 94 |
+
This code repository is licensed under the [Apache License 2.0](LICENSE). The use of VisionMaster-Pro models is also subject to the [Apache License 2.0](LICENSE). Commercial use is permitted.
|
| 95 |
+
|
| 96 |
+
## 6. Contact
|
| 97 |
+
If you have any questions, please raise an issue on our GitHub repository or contact us at vision@visionmaster.ai.
|
config.json
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "vit",
|
| 3 |
+
"architectures": ["ViTForImageClassification"]
|
| 4 |
+
}
|
figures/fig1.png
ADDED
|
figures/fig2.png
ADDED
|
figures/fig3.png
ADDED
|
pytorch_model.bin
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b01b18f56e422500a1fa1b2aee4af74268b7c0ca9bbb1d79d1dc7c06a13122ae
|
| 3 |
+
size 24
|