Improve model card and add metadata

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +38 -4
README.md CHANGED
@@ -1,10 +1,44 @@
1
  ---
 
 
2
  tags:
3
  - model_hub_mixin
4
  - pytorch_model_hub_mixin
5
  ---
6
 
7
- This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
8
- - Code: https://github.com/Intellindust-AI-Lab/EdgeCrafter
9
- - Paper: https://arxiv.org/abs/2603.18739
10
- - Docs: [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ pipeline_tag: keypoint-detection
4
  tags:
5
  - model_hub_mixin
6
  - pytorch_model_hub_mixin
7
  ---
8
 
9
+ # EdgeCrafter: ECPose
10
+
11
+ This model is part of the **EdgeCrafter** framework, a unified compact Vision Transformer (ViT) framework for edge dense prediction tasks. Specifically, this checkpoint corresponds to an **ECPose** model, which is optimized for high-performance human pose estimation on resource-constrained edge devices.
12
+
13
+ - **Paper:** [EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation](https://huggingface.co/papers/2603.18739)
14
+ - **Project Page:** [EdgeCrafter Project Page](https://intellindust-ai-lab.github.io/projects/EdgeCrafter/)
15
+ - **Repository:** [GitHub - Intellindust-AI-Lab/EdgeCrafter](https://github.com/Intellindust-AI-Lab/EdgeCrafter)
16
+
17
+ ## Model Description
18
+ EdgeCrafter addresses the performance gap between compact Vision Transformers and CNN-based architectures (like YOLO) on edge devices. By using task-specialized distillation and an edge-aware encoder-decoder design, ECPose models achieve a competitive accuracy-efficiency tradeoff. For example, ECPose-X reaches 74.8 AP on the COCO dataset, significantly outperforming YOLO-based alternatives.
19
+
20
+ ## Evaluation Results (COCO2017 Validation)
21
+
22
+ | Model | Size | AP<sub>50:95</sub> | #Params | GFLOPs | Latency (ms) |
23
+ |:-----:|:----:|:--:|:-------:|:------:|:------------:|
24
+ | **ECPose-S** | 640 | 68.9 | 10M | 30 | 5.54 |
25
+ | **ECPose-M** | 640 | 72.4 | 20M | 63 | 9.25 |
26
+ | **ECPose-L** | 640 | 73.5 | 34M | 112 | 11.83 |
27
+ | **ECPose-X** | 640 | 74.8 | 51M | 172 | 14.31 |
28
+
29
+ *Note: Latency is measured on an NVIDIA T4 GPU with batch size 1 under FP16 precision using TensorRT (v10.6).*
30
+
31
+ ## Usage
32
+ This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration. For detailed inference scripts and reproduction instructions, please refer to the [official GitHub repository](https://github.com/Intellindust-AI-Lab/EdgeCrafter).
33
+
34
+ ## Citation
35
+ If you find this project useful in your research, please consider citing:
36
+
37
+ ```bibtex
38
+ @article{liu2026edgecrafter,
39
+ title={EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation},
40
+ author={Liu, Longfei and Hou, Yongjie and Li, Yang and Wang, Qirui and Sha, Youyang and Yu, Yongjun and Wang, Yinzhi and Ru, Peizhe and Yu, Xuanlong and Shen, Xi},
41
+ journal={arXiv},
42
+ year={2026}
43
+ }
44
+ ```