Intellindust
/

ECPose_S

@@ -1,10 +1,44 @@
 ---
 tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
 ---
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Code: https://github.com/Intellindust-AI-Lab/EdgeCrafter
-- Paper: https://arxiv.org/abs/2603.18739
-- Docs: [More Information Needed]

 ---
+license: apache-2.0
+pipeline_tag: keypoint-detection
 tags:
 - model_hub_mixin
 - pytorch_model_hub_mixin
 ---
+# EdgeCrafter: ECPose
+This model is part of the **EdgeCrafter** framework, a unified compact Vision Transformer (ViT) framework for edge dense prediction tasks. Specifically, this checkpoint corresponds to an **ECPose** model, which is optimized for high-performance human pose estimation on resource-constrained edge devices.
+- **Paper:** [EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation](https://huggingface.co/papers/2603.18739)
+- **Project Page:** [EdgeCrafter Project Page](https://intellindust-ai-lab.github.io/projects/EdgeCrafter/)
+- **Repository:** [GitHub - Intellindust-AI-Lab/EdgeCrafter](https://github.com/Intellindust-AI-Lab/EdgeCrafter)
+## Model Description
+EdgeCrafter addresses the performance gap between compact Vision Transformers and CNN-based architectures (like YOLO) on edge devices. By using task-specialized distillation and an edge-aware encoder-decoder design, ECPose models achieve a competitive accuracy-efficiency tradeoff. For example, ECPose-X reaches 74.8 AP on the COCO dataset, significantly outperforming YOLO-based alternatives.
+## Evaluation Results (COCO2017 Validation)
+| Model | Size | AP<sub>50:95</sub> | #Params | GFLOPs | Latency (ms) |
+|:-----:|:----:|:--:|:-------:|:------:|:------------:|
+| **ECPose-S** | 640 | 68.9 |  10M | 30 | 5.54 |
+| **ECPose-M** | 640 | 72.4 |  20M | 63 | 9.25 |
+| **ECPose-L** | 640 | 73.5 |  34M | 112 | 11.83 |
+| **ECPose-X** | 640 | 74.8 |  51M | 172 | 14.31 |
+*Note: Latency is measured on an NVIDIA T4 GPU with batch size 1 under FP16 precision using TensorRT (v10.6).*
+## Usage
+This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration. For detailed inference scripts and reproduction instructions, please refer to the [official GitHub repository](https://github.com/Intellindust-AI-Lab/EdgeCrafter).
+## Citation
+If you find this project useful in your research, please consider citing:
+```bibtex
+@article{liu2026edgecrafter,
+  title={EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation},
+  author={Liu, Longfei and Hou, Yongjie and Li, Yang and Wang, Qirui and Sha, Youyang and Yu, Yongjun and Wang, Yinzhi and Ru, Peizhe and Yu, Xuanlong and Shen, Xi},
+  journal={arXiv},
+  year={2026}
+}
+```