Release AI-ModelZoo-4.0.0
Browse files
README.md
CHANGED
|
@@ -83,29 +83,29 @@ For Yamnet-1024
|
|
| 83 |
* `tl` stands for "transfer learning", meaning that the model backbone weights were initialized from a pre-trained model, then only the last layer was unfrozen during the training.
|
| 84 |
|
| 85 |
### Reference **NPU** memory footprint based on ESC-10 dataset
|
| 86 |
-
|Model | Dataset | Format | Resolution | Series | Internal RAM (KiB) | External RAM (KiB) | Weights Flash (KiB) |
|
| 87 |
-
|
| 88 |
-
| [Yamnet 256](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 89 |
-
| [Yamnet 1024](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 90 |
|
| 91 |
### Reference **NPU** inference time based on ESC-10 dataset
|
| 92 |
-
| Model | Dataset | Format | Resolution | Board | Execution Engine | Inference time (ms) | Inf / sec |
|
| 93 |
-
|
| 94 |
-
| [Yamnet 256](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 95 |
-
| [Yamnet 1024](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 96 |
|
| 97 |
|
| 98 |
### Reference **MCU** memory footprint based on ESC-10 dataset
|
| 99 |
-
| Model | Format | Resolution | Series | Activation RAM (kB) | Runtime RAM (kB) | Weights Flash (kB) | Code Flash (kB) | Total RAM (kB) | Total Flash (kB) |
|
| 100 |
|-------------------|--------|------------|---------|----------------|-------------|---------------|------------|-------------|-------------|-----------------------|
|
| 101 |
-
|[Yamnet 256](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 102 |
-
|[Yamnet 1024](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 103 |
|
| 104 |
### Reference inference time based on ESC-10 dataset
|
| 105 |
-
| Model | Format | Resolution | Board | Execution Engine | Frequency | Inference time |
|
| 106 |
|-------------------|--------|------------|------------------|------------------|--------------|-----------------|-----------------------|
|
| 107 |
-
| [Yamnet 256](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 108 |
-
|[Yamnet 1024](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 109 |
|
| 110 |
|
| 111 |
### Accuracy with ESC-10 dataset
|
|
@@ -116,10 +116,10 @@ The reason this metric is used instead of patch-level accuracy is because patch-
|
|
| 116 |
|
| 117 |
| Model | Format | Resolution | Clip-level Accuracy |
|
| 118 |
|-------|--------|------------|----------------|
|
| 119 |
-
| [Yamnet 256](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 120 |
-
| [Yamnet 256](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 121 |
-
| [Yamnet 1024](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 122 |
-
| [Yamnet 1024](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 123 |
|
| 124 |
|
| 125 |
|
|
@@ -137,11 +137,12 @@ However, contrary to what the numbers might suggest online performance on device
|
|
| 137 |
Note that accuracy with unknown class is lower. This is normal
|
| 138 |
| Model | Format | Resolution | Clip-level Accuracy |
|
| 139 |
|-------|--------|------------|----------------|
|
| 140 |
-
| [Yamnet 256 without unknown class](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 141 |
-
| [Yamnet 256 without unknown class](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 142 |
-
| [Yamnet 256 with unknown class](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 143 |
-
| [Yamnet 256 with unknown class](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/
|
| 144 |
|
| 145 |
## Retraining and Integration in a simple example:
|
| 146 |
|
| 147 |
-
Please refer to the stm32ai-modelzoo-services GitHub [here](https://github.com/STMicroelectronics/stm32ai-modelzoo-services)
|
|
|
|
|
|
| 83 |
* `tl` stands for "transfer learning", meaning that the model backbone weights were initialized from a pre-trained model, then only the last layer was unfrozen during the training.
|
| 84 |
|
| 85 |
### Reference **NPU** memory footprint based on ESC-10 dataset
|
| 86 |
+
|Model | Dataset | Format | Resolution | Series | Internal RAM (KiB) | External RAM (KiB) | Weights Flash (KiB) | STEdgeAI Core version |
|
| 87 |
+
|----------|------------------|--------|-------------|------------------|------------------|---------------------|-------|-------------------------|
|
| 88 |
+
| [Yamnet 256](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/esc10/yamnet_e256_64x96_tl/yamnet_e256_64x96_tl_int8.tflite) | esc-10 | Int8 | 64x96x1 | STM32N6 | 144 | 0.0 | 137.33 | 3.0.0 |
|
| 89 |
+
| [Yamnet 1024](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/esc10/yamnet_e1024_64x96_tl/yamnet_e1024_64x96_tl_qdq_int8.onnx) | esc-10 | Int8 | 64x96x1 | STM32N6 | 144 | 0.0 | 3159.2 | 3.0.0 |
|
| 90 |
|
| 91 |
### Reference **NPU** inference time based on ESC-10 dataset
|
| 92 |
+
| Model | Dataset | Format | Resolution | Board | Execution Engine | Inference time (ms) | Inf / sec | STEdgeAI Core version |
|
| 93 |
+
|--------|------------------|--------|-------------|------------------|------------------|---------------------|-------|-------------------------|
|
| 94 |
+
| [Yamnet 256](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/esc10/yamnet_e256_64x96_tl/yamnet_e256_64x96_tl_int8.tflite) | esc-10 | Int8 | 64x96x1 | STM32N6570-DK | NPU/MCU | 0.93 | 1075.27 | 3.0.0 |
|
| 95 |
+
| [Yamnet 1024](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/esc10/yamnet_e1024_64x96_tl/yamnet_e1024_64x96_tl_qdq_int8.onnx) | esc-10 | Int8 | 64x96x1 | STM32N6570-DK | NPU/MCU | 9.12 | 109.64 | 3.0.0 |
|
| 96 |
|
| 97 |
|
| 98 |
### Reference **MCU** memory footprint based on ESC-10 dataset
|
| 99 |
+
| Model | Format | Resolution | Series | Activation RAM (kB) | Runtime RAM (kB) | Weights Flash (kB) | Code Flash (kB) | Total RAM (kB) | Total Flash (kB) | STEdgeAI Core version |
|
| 100 |
|-------------------|--------|------------|---------|----------------|-------------|---------------|------------|-------------|-------------|-----------------------|
|
| 101 |
+
|[Yamnet 256](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/esc10/yamnet_e256_64x96_tl/yamnet_e256_64x96_tl_int8.tflite) | Int8 | 64x96x1 | B-U585I-IOT02A | 109.57 | 0.99 | 135.91 | 31.19 | 110.56 | 167.1 | 3.0.0 |
|
| 102 |
+
|[Yamnet 1024](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/esc10/yamnet_e1024_64x96_tl/yamnet_e1024_64x96_tl_qdq_int8.onnx) | Int8 | 64x96x1 | STM32N6 | 144.0 | 1.77 | 3159.2 | 184.74 | 145.77 | 3343.94 | 3.0.0 |
|
| 103 |
|
| 104 |
### Reference inference time based on ESC-10 dataset
|
| 105 |
+
| Model | Format | Resolution | Board | Execution Engine | Frequency | Inference time | STEdgeAI Core version |
|
| 106 |
|-------------------|--------|------------|------------------|------------------|--------------|-----------------|-----------------------|
|
| 107 |
+
| [Yamnet 256](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/esc10/yamnet_e256_64x96_tl/yamnet_e256_64x96_tl_int8.tflite) | Int8 | 64x96x1 | B-U585I-IOT02A | 1 CPU | 160 MHz | 279.99 ms | 3.0.0
|
| 108 |
+
|[Yamnet 1024](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/esc10/yamnet_e1024_64x96_tl/yamnet_e1024_64x96_tl_qdq_int8.onnx) | Int8 | 64x96x1 | STM32N6 | 1 CPU + 1 NPU | 800MhZ/1000MhZ | 9.12 ms | 3.0.0
|
| 109 |
|
| 110 |
|
| 111 |
### Accuracy with ESC-10 dataset
|
|
|
|
| 116 |
|
| 117 |
| Model | Format | Resolution | Clip-level Accuracy |
|
| 118 |
|-------|--------|------------|----------------|
|
| 119 |
+
| [Yamnet 256](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/esc10/yamnet_e256_64x96_tl/yamnet_e256_64x96_tl.keras) | float32 | 64x96x1 | 94.9% |
|
| 120 |
+
| [Yamnet 256](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/esc10/yamnet_e256_64x96_tl/yamnet_e256_64x96_tl_int8.tflite) | int8 | 64x96x1 | 94.9% |
|
| 121 |
+
| [Yamnet 1024](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/esc10/yamnet_e1024_64x96_tl/yamnet_e1024_64x96_tl.keras) | float32 | 64x96x1 | 100.0% |
|
| 122 |
+
| [Yamnet 1024](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/esc10/yamnet_e1024_64x96_tl/yamnet_e1024_64x96_tl_qdq_int8.onnx) | int8 | 64x96x1 | 100.0% |
|
| 123 |
|
| 124 |
|
| 125 |
|
|
|
|
| 137 |
Note that accuracy with unknown class is lower. This is normal
|
| 138 |
| Model | Format | Resolution | Clip-level Accuracy |
|
| 139 |
|-------|--------|------------|----------------|
|
| 140 |
+
| [Yamnet 256 without unknown class](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/fsd50k/yamnet_e256_64x96_tl/without_unknown_class/yamnet_e256_64x96_tl.keras) | float32 | 64x96x1 | 86.0% |
|
| 141 |
+
| [Yamnet 256 without unknown class](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/fsd50k/yamnet_e256_64x96_tl/without_unknown_class/yamnet_e256_64x96_tl_int8.tflite) | float32 | 64x96x1 | 87.0% |
|
| 142 |
+
| [Yamnet 256 with unknown class](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/fsd50k/yamnet_e256_64x96_tl/with_unknown_class/yamnet_e256_64x96_tl.keras) | float32 | 64x96x1 | 73.0% |
|
| 143 |
+
| [Yamnet 256 with unknown class](https://github.com/STMicroelectronics/stm32ai-modelzoo/tree/main/audio_event_detection/yamnet/fsd50k/yamnet_e256_64x96_tl/with_unknown_class/yamnet_e256_64x96_tl_int8.tflite) | int8 | 64x96x1 | 73.9% |
|
| 144 |
|
| 145 |
## Retraining and Integration in a simple example:
|
| 146 |
|
| 147 |
+
Please refer to the stm32ai-modelzoo-services GitHub [here](https://github.com/STMicroelectronics/stm32ai-modelzoo-services)
|
| 148 |
+
|