Buckets:
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Introduction to model optimization for deployment","local":"introduction-to-model-optimization-for-deployment","sections":[{"title":"What is model optimization?","local":"what-is-model-optimization","sections":[],"depth":2},{"title":"Why is it important for deployment in computer vision?","local":"why-is-it-important-for-deployment-in-computer-vision","sections":[],"depth":2},{"title":"Different types of model optimization techniques","local":"different-types-of-model-optimization-techniques","sections":[],"depth":2},{"title":"Trade-offs between accuracy, performance, and resource usage","local":"trade-offs-between-accuracy-performance-and-resource-usage","sections":[],"depth":2}],"depth":1}"> | |
| <link href="/docs/computer-vision-course/pr_397/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/entry/start.7f209408.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/scheduler.7bc62968.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/singletons.b15acae1.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/paths.11cdc4b4.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/entry/app.32e8338e.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/index.2f8492b0.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/nodes/0.e37092e8.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/nodes/90.3defdabc.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/index.514d62da.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Introduction to model optimization for deployment","local":"introduction-to-model-optimization-for-deployment","sections":[{"title":"What is model optimization?","local":"what-is-model-optimization","sections":[],"depth":2},{"title":"Why is it important for deployment in computer vision?","local":"why-is-it-important-for-deployment-in-computer-vision","sections":[],"depth":2},{"title":"Different types of model optimization techniques","local":"different-types-of-model-optimization-techniques","sections":[],"depth":2},{"title":"Trade-offs between accuracy, performance, and resource usage","local":"trade-offs-between-accuracy-performance-and-resource-usage","sections":[],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="introduction-to-model-optimization-for-deployment" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#introduction-to-model-optimization-for-deployment"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Introduction to model optimization for deployment</span></h1> <p data-svelte-h="svelte-xlukum">Have you ever felt confused after the model training stage? What else should you do? If yes, this chapter will help you. In general, the step after we have trained a computer vision model is to deploy it so that other people can use our model. However, when the model has successfully deployed in production, many problems arise, such as the model size being too large, the prediction process taking a long time, and limited memory on the device. These problems can happen because we usually deploy models on devices with smaller specifications than the hardware for training. To overcome the issues, we can carry out additional stages before deploying and model optimization.</p> <h2 class="relative group"><a id="what-is-model-optimization" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#what-is-model-optimization"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>What is model optimization?</span></h2> <p data-svelte-h="svelte-anjnz9">Model optimization is a process of modifying a model we trained to make it better in terms of efficiency. These modifications are crucial because the hardware we use during training and inference will be very different in most cases. The hardware specifications at the time of inference are smaller, which is why this optimization model needs to be carried out. For example, we have training on high-performance GPUs, and the model inference process will run on edge devices (e.g., microcomputers, mobile devices, IoT, etc.). Of course, these devices have different specifications and tend to be smaller. Carrying out model optimization is crucial so our model can run smoothly on devices with lower specifications.</p> <h2 class="relative group"><a id="why-is-it-important-for-deployment-in-computer-vision" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#why-is-it-important-for-deployment-in-computer-vision"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Why is it important for deployment in computer vision?</span></h2> <p data-svelte-h="svelte-z5ucfa">As we already know, optimizing the model is important in before the deployment stage, but why? Several things make this optimization model important to do before the deployment stage. Some of these things are:</p> <ol data-svelte-h="svelte-wjwe0m"><li>Resource limitations: Computer vision models often require high computational resources such as memory, CPU, and GPU. This will be a problem if we want to deploy the model on devices with limited resources, such as mobile phones, embedded systems, or edge devices. Optimization techniques can reduce model size and computational cost and make it deployable for that platform.</li> <li>Latency requirements: Many computer vision applications, such as self-driving cars and augmented reality, require real-time response. This means the model must be able to process data and generate results quickly. Optimization can significantly increase the inference speed of a model and ensure it can meet latency constraints.</li> <li>Power consumption: Devices that use batteries, such as drones and wearable devices, require models with efficient power usage. Optimization techniques can also reduce battery consumption which is often caused by model sizes that are too large.</li> <li>Hardware compatibility: Sometimes, different hardware has its capabilities and limitations. Several optimization techniques are specifically used for specific hardware. If this is done, we can easily overcome the hardware limitations.</li></ol> <h2 class="relative group"><a id="different-types-of-model-optimization-techniques" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#different-types-of-model-optimization-techniques"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Different types of model optimization techniques</span></h2> <p data-svelte-h="svelte-1c7g3k3">There are several techniques in the model optimization, which will be explained in the next section. However, this section will briefly describe several types:</p> <ol data-svelte-h="svelte-1h6dvc6"><li>Pruning: Pruning is the process of eliminating redundant or unimportant connections in the model. This aims to reduce model size and complexity.</li></ol> <p data-svelte-h="svelte-1sw8yi0"><img src="https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/pruning.png" alt="Pruning"></p> <ol start="2" data-svelte-h="svelte-z17sbv"><li>Quantization: Quantization means converting model weights from high-precision formats (e.g., 32-bit floating-point) to lower-precision formats (e.g., 16-bit floating-point or 8-bit integers) to reduce memory footprint and increase inference speed.</li> <li>Knowledge Distillation: Knowledge distillation aims to transfer knowledge from a complex and larger model (teacher model) to a smaller model (student model) by mimicking the behavior of the teacher model.</li></ol> <p data-svelte-h="svelte-hoszh3"><img src="https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/knowledge_distillation.png" alt="Knowledge Distillation"></p> <ol start="4" data-svelte-h="svelte-1dr81vc"><li>Low-rank approximation: Approximates large matrices with small ones, reducing memory consumption and computational costs.</li> <li>Model compression with hardware accelerators: This process is like pruning and quantization. But, running on specific hardware such as NVIDIA GPUs and Intel Hardware.</li></ol> <h2 class="relative group"><a id="trade-offs-between-accuracy-performance-and-resource-usage" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#trade-offs-between-accuracy-performance-and-resource-usage"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Trade-offs between accuracy, performance, and resource usage</span></h2> <p data-svelte-h="svelte-1a584pj">A trade-off exists between accuracy, performance, and resource usage when deploying a model. That’s when we have to decide which part to prioritize so that the model can be maximized in the case at hand.</p> <ol data-svelte-h="svelte-1pdvi3f"><li>Accuracy is the model’s ability to predict correctly. High accuracy is needed in all applications, which also causes higher performance and resource usage. Complex models with high accuracy usually require a lot of memory, so there will be limitations if they are deployed on resource-constrained devices.</li> <li>Performance is the model’s speed and efficiency (latency). This is important so the model can make predictions quickly, even in real time. However, optimizing performance will usually result in decreasing accuracy.</li> <li>Resource usage is the computational resources needed to perform inference on the model, such as CPU, memory, and storage. Efficient resource usage is crucial if we want to deploy models on devices with certain limitations, such as smartphones or IoT devices.</li></ol> <p data-svelte-h="svelte-178a9zv">The image below shows a common computer vision model in terms of model size, accuracy, and latency. A bigger model has high accuracy, but needs more time for inference and has a larger file size.</p> <p data-svelte-h="svelte-udhmkd"><img src="https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/model_size_vs_accuracy.png" alt="Model Size VS Accuracy"></p> <p data-svelte-h="svelte-1amv05q"><img src="https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/accuracy_vs_latency.png" alt="Accuracy VS Latency"></p> <p data-svelte-h="svelte-hj2tz9">These are the three things we must consider: where do we focus on the model we trained? For example, focusing on high accuracy will result in a slower model during inference or require extensive resources. To overcome this, we apply one of the optimization methods as explained so that the model we get can maximize or balance the trade-off between the three components mentioned above.</p> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/computer-vision-course/blob/main/chapters/en/unit9/intro_to_model_optimization.mdx" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_1p6gie1 = { | |
| assets: "/docs/computer-vision-course/pr_397/en", | |
| base: "/docs/computer-vision-course/pr_397/en", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/computer-vision-course/pr_397/en/_app/immutable/entry/start.7f209408.js"), | |
| import("/docs/computer-vision-course/pr_397/en/_app/immutable/entry/app.32e8338e.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 90], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 16.8 kB
- Xet hash:
- 7e916cb6d852ab565d9bc4df973448bb977d53d8bf95050c4780a7c2b3f68833
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.