Buckets:
hf-doc-build/doc-dev / computer-vision-course /pr_397 /en /unit4 /multimodal-models /transfer_learning.html
| <meta charset="utf-8" /><meta name="hf:doc:metadata" content="{"title":"Transfer Learning of Multimodal Models","local":"transfer-learning-of-multimodal-models","sections":[{"title":"Transfer learning","local":"transfer-learning","sections":[],"depth":2},{"title":"Transfer Learning Applications","local":"transfer-learning-applications","sections":[],"depth":2}],"depth":1}"> | |
| <link href="/docs/computer-vision-course/pr_397/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/entry/start.7f209408.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/scheduler.7bc62968.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/singletons.b15acae1.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/paths.11cdc4b4.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/entry/app.32e8338e.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/index.2f8492b0.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/nodes/0.e37092e8.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/nodes/60.9e740c9a.js"> | |
| <link rel="modulepreload" href="/docs/computer-vision-course/pr_397/en/_app/immutable/chunks/index.514d62da.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{"title":"Transfer Learning of Multimodal Models","local":"transfer-learning-of-multimodal-models","sections":[{"title":"Transfer learning","local":"transfer-learning","sections":[],"depth":2},{"title":"Transfer Learning Applications","local":"transfer-learning-applications","sections":[],"depth":2}],"depth":1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <h1 class="relative group"><a id="transfer-learning-of-multimodal-models" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#transfer-learning-of-multimodal-models"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Transfer Learning of Multimodal Models</span></h1> <p data-svelte-h="svelte-1yup698">In the preceding sections, we’ve delved into the fundamental concepts of multimodal models such as CLIP and its related counterparts. In this chapter, we will try to find out how you can use different types of multimodal models for your tasks.</p> <p data-svelte-h="svelte-1eit1wn">There are several approaches to how you can adapt multimodal models to your use case:</p> <ol data-svelte-h="svelte-rqnokm"><li><p><strong>Zero\few-shot learning</strong>. Zero\few-shot learning involves leveraging large pretrained models capable of solving problems not present in the training data. These approaches can be useful when there is little labeled data for a task (5-10 examples) or there is none at all. <a href="https://huggingface.co/learn/computer-vision-course/unit11/1" rel="nofollow">Unit 11</a> will delve deeper into this topic.</p></li> <li><p><strong>Training the model from scratch</strong>. When pre-trained model weights are unavailable or the model’s dataset substantially differs from your own, this method becomes necessary. Here, we initialize model weights randomly (or via more sophisticated methods like <a href="https://arxiv.org/abs/1502.01852" rel="nofollow">He initialization</a>) and proceed with the usual training. However, this approach demands substantial amounts of training data.</p></li> <li><p><strong>Transfer learning</strong>. Transfer learning, unlike training from scratch, uses the weights of the pretrained model as initial weights.</p></li></ol> <p data-svelte-h="svelte-a3wxu0">This chapter primarily focuses on the transfer learning aspect within multimodal models. It will recap what transfer learning entails, elucidate its advantages, and provide practical examples illustrating how you can apply transfer learning to your tasks!</p> <h2 class="relative group"><a id="transfer-learning" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#transfer-learning"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Transfer learning</span></h2> <p data-svelte-h="svelte-rc824">More formally, transfer learning is the set of machine learning techniques in which knowledge, representations or patterns that are obtained from solving one problem are reused to solve another, but similar problem.</p> <p data-svelte-h="svelte-1w4zzss">In the context of deep learning, in transfer learning, when training a model for a particular task, we use the weights of another model as the initial weights. The pretrained model has typically been trained on a huge amount of data and has useful knowledge about the nature and relationships in that data. This knowledge is embedded in the weights of this model, and therefore if we use them as initial weights, we transfer the knowledge embedded in the pretrained model into the model we are training.</p> <div class="flex justify-center" data-svelte-h="svelte-12vvs06"><img src="https://huggingface.co/datasets/hf-vision/course-assets/resolve/main/multimodal_trasnsfer_learning_images/transfer_learning_light.png" alt="Transfer Learning"></div> <p data-svelte-h="svelte-q9yp9o">This approach has several advantages:</p> <ul data-svelte-h="svelte-8yj41r"><li><p><strong>Resource Efficiency:</strong> Since the pretrained model was trained on a huge amount of data over a long period, transfer learning requires much less computing resources for model convergence.</p></li> <li><p><strong>Reducing the size of labeled data:</strong> For the same reason, less data is required to achieve decent quality on the test sample.</p></li> <li><p><strong>Knowledge Transfer:</strong> When fine-tuning to the new task, the model capitalizes on the pre-existing knowledge encoded within the pre-trained model’s weights. This integration of prior knowledge often leads to enhanced performance on the new task.</p></li></ul> <p data-svelte-h="svelte-cga0tn">However, despite its advantages, transfer learning has some challenges that should also be taken into account:</p> <ul data-svelte-h="svelte-3tbnho"><li><p><strong>Domain Shift:</strong> Adapting knowledge from the source domain to the target domain can be challenging if the data distributions differ substantially.</p></li> <li><p><strong>Catastrophic forgetting:</strong> During fine-tuning process, the model adjusts its parameters to adapt to the new task, often causing it to lose the previously learned knowledge or representations related to the initial task.</p></li></ul> <h2 class="relative group"><a id="transfer-learning-applications" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#transfer-learning-applications"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Transfer Learning Applications</span></h2> <p data-svelte-h="svelte-pwfvze">We’ll explore practical applications of transfer learning across various tasks. The table below provides a description of the tasks that can be solved using multimodal models, as well as examples of how you can fine-tune them on your data.</p> <table data-svelte-h="svelte-5h6rbs"><thead><tr><th>Task</th> <th>Description</th> <th>Model</th></tr></thead> <tbody><tr><td><a href="https://colab.research.google.com/github/fariddinar/computer-vision-course/blob/main/notebooks/Unit%204%20-%20Multimodal%20Models/Clip_finetune.ipynb" rel="nofollow">Fine-tune CLIP</a></td> <td>Fine-tuning CLIP on a custom dataset</td> <td><a href="https://huggingface.co/openai/clip-vit-base-patch32" rel="nofollow">openai/clip-vit-base-patch32</a></td></tr> <tr><td><a href="https://huggingface.co/docs/transformers/main/en/tasks/visual_question_answering#train-the-model" rel="nofollow">VQA</a></td> <td>Answering a question in natural <br> language based on an image</td> <td><a href="https://huggingface.co/dandelin/vilt-b32-mlm" rel="nofollow">dandelin/vilt-b32-mlm</a></td></tr> <tr><td><a href="https://huggingface.co/docs/transformers/main/en/tasks/image_captioning" rel="nofollow">Image-to-Text</a></td> <td>Describing an image in natural language</td> <td><a href="https://huggingface.co/microsoft/git-base" rel="nofollow">microsoft/git-base</a></td></tr> <tr><td><a href="https://docs.ultralytics.com/models/yolo-world/" rel="nofollow">Open-set object detection</a></td> <td>Detect objects by natural language input</td> <td><a href="https://huggingface.co/papers/2401.17270" rel="nofollow">YOLO-World</a></td></tr> <tr><td><a href="https://github.com/haotian-liu/LLaVA?tab=readme-ov-file#train" rel="nofollow">Assistant (GTP-4V like)</a></td> <td>Instruction tuning in the multimodal field</td> <td><a href="https://huggingface.co/docs/transformers/model_doc/llava" rel="nofollow">LLaVA</a></td></tr></tbody></table> <a class="!text-gray-400 !no-underline text-sm flex items-center not-prose mt-4" href="https://github.com/huggingface/computer-vision-course/blob/main/chapters/en/unit4/multimodal-models/transfer_learning.mdx" target="_blank"><span data-svelte-h="svelte-1kd6by1"><</span> <span data-svelte-h="svelte-x0xyl0">></span> <span data-svelte-h="svelte-1dajgef"><span class="underline ml-1.5">Update</span> on GitHub</span></a> <p></p> | |
| <script> | |
| { | |
| __sveltekit_1p6gie1 = { | |
| assets: "/docs/computer-vision-course/pr_397/en", | |
| base: "/docs/computer-vision-course/pr_397/en", | |
| env: {} | |
| }; | |
| const element = document.currentScript.parentElement; | |
| const data = [null,null]; | |
| Promise.all([ | |
| import("/docs/computer-vision-course/pr_397/en/_app/immutable/entry/start.7f209408.js"), | |
| import("/docs/computer-vision-course/pr_397/en/_app/immutable/entry/app.32e8338e.js") | |
| ]).then(([kit, app]) => { | |
| kit.start(app, element, { | |
| node_ids: [0, 60], | |
| data, | |
| form: null, | |
| error: null | |
| }); | |
| }); | |
| } | |
| </script> | |
Xet Storage Details
- Size:
- 12.9 kB
- Xet hash:
- 02cdc56938ce7f3d7d992840aaf1792052b852ba8aed0c1fda21cf5066f64a52
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.