TopFormer introduces a lightweight token–pyramid transformer that progressively merges local and global representations, achieving strong accuracy–efficiency trade-offs for mobile and edge vision tasks.

Original paper: TopFormer: Token Pyramid Transformer for Mobile Vision

TopFormer-B

This model uses the TopFormer-Base variant, which balances representational capacity and computational efficiency through a hierarchical token pyramid. It is well suited for on-device image classification and as an efficient backbone for downstream tasks where low latency and power efficiency are critical.

Model Configuration:

Reference implementation: TopFormer
Original Weight: TopFormer-B_512x512_4x8_160k
Resolution: 3x512x512
Dataset: ADE20K
Support Cooper version:
- Cooper SDK: [2.5.4]
- Cooper Foundry: [2.3]

Model	Device	Compression	Model Link
TopFormer-B	N1-655	Amba_optimized	Model_Link
TopFormer-B	N1-655	Activation_fp16	Model_Link
TopFormer-B	CV7	Amba_optimized	Model_Link
TopFormer-B	CV7	Activation_fp16	Model_Link
TopFormer-B	CV72	Amba_optimized	Model_Link
TopFormer-B	CV72	Activation_fp16	Model_Link
TopFormer-B	CV75	Amba_optimized	Model_Link
TopFormer-B	CV75	Activation_fp16	Model_Link

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Ambarella/TopFormer

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Paper • 2204.05525 • Published Apr 12, 2022