ConvMixer is a simple yet effective vision architecture that combines large-kernel depthwise convolutions for spatial mixing with pointwise convolutions for channel mixing, achieving transformer-like performance with minimal complexity.

Original paper: Patches Are All You Need? ConvMixer

ConvMixer-768/32

This model uses the ConvMixer-768/32 variant, which processes 32×32 patches with 768 feature channels, providing strong accuracy while remaining computationally efficient. It is well suited for image classification tasks where simplicity, speed, and high accuracy are desired, and can serve as a lightweight backbone for research or prototyping.

Model Configuration:

Reference implementation: Official ConvMixer source code
Original Weight: Convmixer_768_32_ks7_p7_relu
Resolution: 3x224x224
Support Cooper version:
- Cooper SDK: [2.5.4]
- Cooper Foundry: [2.3]

Model	Device	compression	Model Link
ConvMixer-768/32	N1-655	Amba_optimized	Model_Link
ConvMixer-768/32	N1-655	Activation_fp16	Model_Link
ConvMixer-768/32	CV7	Amba_optimized	Model_Link
ConvMixer-768/32	CV7	Activation_fp16	Model_Link
ConvMixer-768/32	CV72	Amba_optimized	Model_Link
ConvMixer-768/32	CV72	Activation_fp16	Model_Link
ConvMixer-768/32	CV75	Amba_optimized	Model_Link
ConvMixer-768/32	CV75	Activation_fp16	Model_Link

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Ambarella/ConvMixer

Patches Are All You Need?

Paper • 2201.09792 • Published Jan 24, 2022