Buckets:
| <meta charset="utf-8" /><meta http-equiv="content-security-policy" content=""><meta name="hf:doc:metadata" content="{"local":"","sections":[{"local":"","title":"폴더 형태로 데이터셋 구축하기"},{"local":"hub","title":"Hub에 데이터 올리기"},{"local":"","title":"다음 단계"}],"title":"학습을 위한 데이터셋 만들기"}" data-svelte="svelte-1phssyn"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.21.0/ko/_app/assets/pages/__layout.svelte-hf-doc-builder.css"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.21.0/ko/_app/start-hf-doc-builder.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.21.0/ko/_app/chunks/vendor-hf-doc-builder.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.21.0/ko/_app/chunks/paths-hf-doc-builder.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.21.0/ko/_app/pages/__layout.svelte-hf-doc-builder.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.21.0/ko/_app/pages/training/create_dataset.mdx-hf-doc-builder.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.21.0/ko/_app/chunks/Tip-hf-doc-builder.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.21.0/ko/_app/chunks/IconCopyLink-hf-doc-builder.js"> | |
| <link rel="modulepreload" href="/docs/diffusers/v0.21.0/ko/_app/chunks/CodeBlock-hf-doc-builder.js"> | |
| <h1 id="">학습을 위한 데이터셋 만들기</h1> | |
| <p><a href="https://huggingface.co/datasets?task_categories=task_categories:text-to-image&sort=downloads" rel="nofollow">Hub</a> 에는 모델 교육을 위한 많은 데이터셋이 있지만, | |
| 관심이 있거나 사용하고 싶은 데이터셋을 찾을 수 없는 경우 🤗 <a href="hf.co/docs/datasets">Datasets</a> 라이브러리를 사용하여 데이터셋을 만들 수 있습니다. | |
| 데이터셋 구조는 모델을 학습하려는 작업에 따라 달라집니다. | |
| 가장 기본적인 데이터셋 구조는 unconditional 이미지 생성과 같은 작업을 위한 이미지 디렉토리입니다. | |
| 또 다른 데이터셋 구조는 이미지 디렉토리와 text-to-image 생성과 같은 작업에 해당하는 텍스트 캡션이 포함된 텍스트 파일일 수 있습니다.</p> | |
| <p>이 가이드에는 파인 튜닝할 데이터셋을 만드는 두 가지 방법을 소개합니다:</p> | |
| <ul><li>이미지 폴더를 <code>--train_data_dir</code> 인수에 제공합니다.</li> | |
| <li>데이터셋을 Hub에 업로드하고 데이터셋 리포지토리 id를 <code>--dataset_name</code> 인수에 전달합니다.</li></ul> | |
| <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p>💡 학습에 사용할 이미지 데이터셋을 만드는 방법에 대한 자세한 내용은 <a href="https://huggingface.co/docs/datasets/image_dataset" rel="nofollow">이미지 데이터셋 만들기</a> 가이드를 참고하세요.</p></div> | |
| <h2 id="">폴더 형태로 데이터셋 구축하기</h2> | |
| <p>Unconditional 생성을 위해 이미지 폴더로 자신의 데이터셋을 구축할 수 있습니다. | |
| 학습 스크립트는 🤗 Datasets의 <a href="https://huggingface.co/docs/datasets/en/image_dataset#imagefolder" rel="nofollow">ImageFolder</a> 빌더를 사용하여 | |
| 자동으로 폴더에서 데이터셋을 구축합니다. 디렉토리 구조는 다음과 같아야 합니다 :</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START -->data_dir/xxx.png | |
| data_dir/xxy.png | |
| data_dir/[...]/xxz.png<!-- HTML_TAG_END --></pre></div> | |
| <p>데이터셋 디렉터리의 경로를 <code>--train_data_dir</code> 인수로 전달한 다음 학습을 시작할 수 있습니다:</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START -->accelerate launch train_unconditional.py \ | |
| <span class="hljs-comment"># argument로 폴더 지정하기 \</span> | |
| --train_data_dir <path-to-train-directory> \ | |
| <other-arguments><!-- HTML_TAG_END --></pre></div> | |
| <h2 class="relative group"><a id="hub" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#hub"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> | |
| <span>Hub에 데이터 올리기 | |
| </span></h2> | |
| <div class="course-tip bg-gradient-to-br dark:bg-gradient-to-r before:border-green-500 dark:before:border-green-800 from-green-50 dark:from-gray-900 to-white dark:to-gray-950 border border-green-50 text-green-700 dark:text-gray-400"><p>💡 데이터셋을 만들고 Hub에 업로드하는 것에 대한 자세한 내용은 <a href="https://huggingface.co/blog/image-search-datasets" rel="nofollow">🤗 Datasets을 사용한 이미지 검색</a> 게시물을 참고하세요.</p></div> | |
| <p>PIL 인코딩된 이미지가 포함된 <code>이미지</code> 열을 생성하는 <a href="https://huggingface.co/docs/datasets/image_load#imagefolder" rel="nofollow">이미지 폴더</a> 기능을 사용하여 데이터셋 생성을 시작합니다.</p> | |
| <p><code>data_dir</code> 또는 <code>data_files</code> 매개 변수를 사용하여 데이터셋의 위치를 지정할 수 있습니다. | |
| <code>data_files</code> 매개변수는 특정 파일을 <code>train</code> 이나 <code>test</code> 로 분리한 데이터셋에 매핑하는 것을 지원합니다:</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START --><span class="hljs-keyword">from</span> datasets <span class="hljs-keyword">import</span> load_dataset | |
| <span class="hljs-comment"># 예시 1: 로컬 폴더</span> | |
| dataset = load_dataset(<span class="hljs-string">"imagefolder"</span>, data_dir=<span class="hljs-string">"path_to_your_folder"</span>) | |
| <span class="hljs-comment"># 예시 2: 로컬 파일 (지원 포맷 : tar, gzip, zip, xz, rar, zstd)</span> | |
| dataset = load_dataset(<span class="hljs-string">"imagefolder"</span>, data_files=<span class="hljs-string">"path_to_zip_file"</span>) | |
| <span class="hljs-comment"># 예시 3: 원격 파일 (지원 포맷 : tar, gzip, zip, xz, rar, zstd)</span> | |
| dataset = load_dataset( | |
| <span class="hljs-string">"imagefolder"</span>, | |
| data_files=<span class="hljs-string">"https://download.microsoft.com/download/3/E/1/3E1C3F21-ECDB-4869-8368-6DEBA77B919F/kagglecatsanddogs_3367a.zip"</span>, | |
| ) | |
| <span class="hljs-comment"># 예시 4: 여러개로 분할</span> | |
| dataset = load_dataset( | |
| <span class="hljs-string">"imagefolder"</span>, data_files={<span class="hljs-string">"train"</span>: [<span class="hljs-string">"path/to/file1"</span>, <span class="hljs-string">"path/to/file2"</span>], <span class="hljs-string">"test"</span>: [<span class="hljs-string">"path/to/file3"</span>, <span class="hljs-string">"path/to/file4"</span>]} | |
| )<!-- HTML_TAG_END --></pre></div> | |
| <p>[push_to_hub(<a href="https://huggingface.co/docs/datasets/v2.13.1/en/package_reference/main_classes#datasets.Dataset.push_to_hub" rel="nofollow">https://huggingface.co/docs/datasets/v2.13.1/en/package_reference/main_classes#datasets.Dataset.push_to_hub</a>) 을 사용해서 Hub에 데이터셋을 업로드 합니다:</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START --><span class="hljs-comment"># 터미널에서 huggingface-cli login 커맨드를 이미 실행했다고 가정합니다</span> | |
| dataset.push_to_hub(<span class="hljs-string">"name_of_your_dataset"</span>) | |
| <span class="hljs-comment"># 개인 repo로 push 하고 싶다면, `private=True` 을 추가하세요:</span> | |
| dataset.push_to_hub(<span class="hljs-string">"name_of_your_dataset"</span>, private=<span class="hljs-literal">True</span>)<!-- HTML_TAG_END --></pre></div> | |
| <p>이제 데이터셋 이름을 <code>--dataset_name</code> 인수에 전달하여 데이터셋을 학습에 사용할 수 있습니다:</p> | |
| <div class="code-block relative"><div class="absolute top-2.5 right-4"><button class="inline-flex items-center relative text-sm focus:text-green-500 cursor-pointer focus:outline-none transition duration-200 ease-in-out opacity-0 mx-0.5 text-gray-600 " title="code excerpt" type="button"><svg class="" xmlns="http://www.w3.org/2000/svg" aria-hidden="true" fill="currentColor" focusable="false" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 32 32"><path d="M28,10V28H10V10H28m0-2H10a2,2,0,0,0-2,2V28a2,2,0,0,0,2,2H28a2,2,0,0,0,2-2V10a2,2,0,0,0-2-2Z" transform="translate(0)"></path><path d="M4,18H2V4A2,2,0,0,1,4,2H18V4H4Z" transform="translate(0)"></path><rect fill="none" width="32" height="32"></rect></svg> | |
| <div class="absolute pointer-events-none transition-opacity bg-black text-white py-1 px-2 leading-tight rounded font-normal shadow left-1/2 top-full transform -translate-x-1/2 translate-y-2 opacity-0"><div class="absolute bottom-full left-1/2 transform -translate-x-1/2 w-0 h-0 border-black border-4 border-t-0" style="border-left-color: transparent; border-right-color: transparent; "></div> | |
| Copied</div></button></div> | |
| <pre><!-- HTML_TAG_START -->accelerate launch --mixed_precision=<span class="hljs-string">"fp16"</span> train_text_to_image.py \ | |
| --pretrained_model_name_or_path=<span class="hljs-string">"runwayml/stable-diffusion-v1-5"</span> \ | |
| --dataset_name=<span class="hljs-string">"name_of_your_dataset"</span> \ | |
| <other-arguments><!-- HTML_TAG_END --></pre></div> | |
| <h2 id="">다음 단계</h2> | |
| <p>데이터셋을 생성했으니 이제 학습 스크립트의 <code>train_data_dir</code> (데이터셋이 로컬이면) 혹은 <code>dataset_name</code> (Hub에 데이터셋을 올렸으면) 인수에 연결할 수 있습니다.</p> | |
| <p>다음 단계에서는 데이터셋을 사용하여 <a href="https://huggingface.co/docs/diffusers/v0.18.2/en/training/unconditional_training" rel="nofollow">unconditional 생성</a> 또는 <a href="https://huggingface.co/docs/diffusers/training/text2image" rel="nofollow">텍스트-이미지 생성</a>을 위한 모델을 학습시켜보세요!</p> | |
| <script type="module" data-hydrate="aq35o0"> | |
| import { start } from "/docs/diffusers/v0.21.0/ko/_app/start-hf-doc-builder.js"; | |
| start({ | |
| target: document.querySelector('[data-hydrate="aq35o0"]').parentNode, | |
| paths: {"base":"/docs/diffusers/v0.21.0/ko","assets":"/docs/diffusers/v0.21.0/ko"}, | |
| session: {}, | |
| route: false, | |
| spa: false, | |
| trailing_slash: "never", | |
| hydrate: { | |
| status: 200, | |
| error: null, | |
| nodes: [ | |
| import("/docs/diffusers/v0.21.0/ko/_app/pages/__layout.svelte-hf-doc-builder.js"), | |
| import("/docs/diffusers/v0.21.0/ko/_app/pages/training/create_dataset.mdx-hf-doc-builder.js") | |
| ], | |
| params: {} | |
| } | |
| }); | |
| </script> | |
Xet Storage Details
- Size:
- 16.3 kB
- Xet hash:
- 974718f18252d126967c04af154947ee41302e0c065a76cff2efb20d661f9434
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.