File size: 2,730 Bytes
17c6d62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
<!--Copyright 2020 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# λͺ¨λΈ

κΈ°λ³Έ 클래슀 [`PreTrainedModel`], [`TFPreTrainedModel`], [`FlaxPreTrainedModel`]λŠ” 둜컬 파일과 λ””λ ‰ν† λ¦¬λ‘œλΆ€ν„° λͺ¨λΈμ„ λ‘œλ“œν•˜κ³  μ €μž₯ν•˜κ±°λ‚˜ λ˜λŠ” (ν—ˆκΉ…νŽ˜μ΄μŠ€ AWS S3 λ¦¬ν¬μ§€ν† λ¦¬λ‘œλΆ€ν„° λ‹€μš΄λ‘œλ“œλœ) λΌμ΄λΈŒλŸ¬λ¦¬μ—μ„œ μ œκ³΅ν•˜λŠ” 사전 ν›ˆλ ¨λœ λͺ¨λΈ 섀정을 λ‘œλ“œν•˜κ³  μ €μž₯ν•˜λŠ” 것을 μ§€μ›ν•˜λŠ” κΈ°λ³Έ λ©”μ†Œλ“œλ₯Ό κ΅¬ν˜„ν•˜μ˜€μŠ΅λ‹ˆλ‹€.   

[`PreTrainedModel`]κ³Ό [`TFPreTrainedModel`]은 λ˜ν•œ λͺ¨λ“  λͺ¨λΈλ“€μ„ κ³΅ν†΅μ μœΌλ‘œ μ§€μ›ν•˜λŠ” λ©”μ†Œλ“œ μ—¬λŸ¬κ°œλ₯Ό κ΅¬ν˜„ν•˜μ˜€μŠ΅λ‹ˆλ‹€:

- μƒˆ 토큰이 단어μž₯에 좔가될 λ•Œ, μž…λ ₯ 토큰 μž„λ² λ”©μ˜ 크기λ₯Ό μ‘°μ •ν•©λ‹ˆλ‹€.
- λͺ¨λΈμ˜ μ–΄ν…μ…˜ ν—€λ“œλ₯Ό κ°€μ§€μΉ˜κΈ°ν•©λ‹ˆλ‹€.

각 λͺ¨λΈμ— 곡톡인 λ‹€λ₯Έ λ©”μ†Œλ“œλ“€μ€ λ‹€μŒμ˜ ν΄λž˜μŠ€μ—μ„œ μ •μ˜λ©λ‹ˆλ‹€. 
- [`~modeling_utils.ModuleUtilsMixin`](νŒŒμ΄ν† μΉ˜ λͺ¨λΈμš©)
- ν…μŠ€νŠΈ 생성을 μœ„ν•œ [`~modeling_tf_utils.TFModuleUtilsMixin`](ν…μ„œν”Œλ‘œ λͺ¨λΈμš©)
- [`~generation.GenerationMixin`](νŒŒμ΄ν† μΉ˜ λͺ¨λΈμš©)
- [`~generation.FlaxGenerationMixin`](Flax/JAX λͺ¨λΈμš©)

## PreTrainedModel

[[autodoc]] PreTrainedModel
    - push_to_hub
    - all

μ‚¬μš©μž μ •μ˜ λͺ¨λΈμ€ μ΄ˆκ³ μ† μ΄ˆκΈ°ν™”(superfast init)κ°€ νŠΉμ • λͺ¨λΈμ— 적용될 수 μžˆλŠ”μ§€ μ—¬λΆ€λ₯Ό κ²°μ •ν•˜λŠ” `_supports_assign_param_buffer`도 포함해야 ν•©λ‹ˆλ‹€.
`test_save_and_load_from_pretrained` μ‹€νŒ¨ μ‹œ, λͺ¨λΈμ΄ `_supports_assign_param_buffer`λ₯Ό ν•„μš”λ‘œ ν•˜λŠ”μ§€ ν™•μΈν•˜μ„Έμš”.
ν•„μš”λ‘œ ν•œλ‹€λ©΄ `False`둜 μ„€μ •ν•˜μ„Έμš”. 

## ModuleUtilsMixin

[[autodoc]] modeling_utils.ModuleUtilsMixin

## TFPreTrainedModel

[[autodoc]] TFPreTrainedModel
    - push_to_hub
    - all

## TFModelUtilsMixin

[[autodoc]] modeling_tf_utils.TFModelUtilsMixin

## FlaxPreTrainedModel

[[autodoc]] FlaxPreTrainedModel
    - push_to_hub
    - all

## ν—ˆλΈŒμ— μ €μž₯ν•˜κΈ°

[[autodoc]] utils.PushToHubMixin

## 곡유된 체크포인트

[[autodoc]] modeling_utils.load_sharded_checkpoint