File size: 12,500 Bytes
17c6d62
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
<!--Copyright 2023 Mistral AI and The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See thze License for the
specific language governing permissions and limitations under the License.

⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.

-->

# Mistral[[mistral]]

## κ°œμš”[[overview]]

λ―ΈμŠ€νŠΈλž„μ€ Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, LΓ©lio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, TimothΓ©e Lacroix, William El Sayedκ°€ μž‘μ„±ν•œ [이 λΈ”λ‘œκ·Έ 포슀트](https://mistral.ai/news/announcing-mistral-7b/)μ—μ„œ μ†Œκ°œλ˜μ—ˆμŠ΅λ‹ˆλ‹€.

λΈ”λ‘œκ·Έ 포슀트의 μ„œλ‘λŠ” λ‹€μŒκ³Ό κ°™μŠ΅λ‹ˆλ‹€:

*λ―ΈμŠ€νŠΈλž„ AIνŒ€μ€ ν˜„μ‘΄ν•˜λŠ” μ–Έμ–΄ λͺ¨λΈ 쀑 크기 λŒ€λΉ„ κ°€μž₯ κ°•λ ₯ν•œ λ―ΈμŠ€νŠΈλž„7Bλ₯Ό μΆœμ‹œν•˜κ²Œ λ˜μ–΄ μžλž‘μŠ€λŸ½μŠ΅λ‹ˆλ‹€.*

λ―ΈμŠ€νŠΈλž„-7BλŠ” [mistral.ai](https://mistral.ai/)μ—μ„œ μΆœμ‹œν•œ 첫 번째 λŒ€κ·œλͺ¨ μ–Έμ–΄ λͺ¨λΈ(LLM)μž…λ‹ˆλ‹€.

### μ•„ν‚€ν…μ²˜ 세뢀사항[[architectural-details]]

λ―ΈμŠ€νŠΈλž„-7BλŠ” λ‹€μŒκ³Ό 같은 ꡬ쑰적 νŠΉμ§•μ„ κ°€μ§„ 디코더 μ „μš© νŠΈλžœμŠ€ν¬λ¨Έμž…λ‹ˆλ‹€:

- μŠ¬λΌμ΄λ”© μœˆλ„μš° μ–΄ν…μ…˜: 8k μ»¨ν…μŠ€νŠΈ 길이와 κ³ μ • μΊμ‹œ 크기둜 ν›ˆλ ¨λ˜μ—ˆμœΌλ©°, 이둠상 128K ν† ν°μ˜ μ–΄ν…μ…˜ λ²”μœ„λ₯Ό κ°€μ§‘λ‹ˆλ‹€.
- GQA(Grouped Query Attention): 더 λΉ λ₯Έ 좔둠이 κ°€λŠ₯ν•˜κ³  더 μž‘μ€ 크기의 μΊμ‹œλ₯Ό μ‚¬μš©ν•©λ‹ˆλ‹€.
- λ°”μ΄νŠΈ 폴백(Byte-fallback) BPE ν† ν¬λ‚˜μ΄μ €: λ¬Έμžλ“€μ΄ μ ˆλŒ€ μ–΄νœ˜ λͺ©λ‘ μ™Έμ˜ ν† ν°μœΌλ‘œ λ§€ν•‘λ˜μ§€ μ•Šλ„λ‘ 보μž₯ν•©λ‹ˆλ‹€.

더 μžμ„Έν•œ λ‚΄μš©μ€ [μΆœμ‹œ λΈ”λ‘œκ·Έ 포슀트](https://mistral.ai/news/announcing-mistral-7b/)λ₯Ό μ°Έμ‘°ν•˜μ„Έμš”.

### λΌμ΄μ„ μŠ€[[license]]

`λ―ΈμŠ€νŠΈλž„-7B`λŠ” μ•„νŒŒμΉ˜ 2.0 λΌμ΄μ„ μŠ€λ‘œ μΆœμ‹œλ˜μ—ˆμŠ΅λ‹ˆλ‹€.

## μ‚¬μš© 팁[[usage-tips]]

λ―ΈμŠ€νŠΈλž„ AIνŒ€μ€ λ‹€μŒ 3κ°€μ§€ 체크포인트λ₯Ό κ³΅κ°œν–ˆμŠ΅λ‹ˆλ‹€:

- κΈ°λ³Έ λͺ¨λΈμΈ [λ―ΈμŠ€νŠΈλž„-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)은 인터넷 규λͺ¨μ˜ λ°μ΄ν„°μ—μ„œ λ‹€μŒ 토큰을 μ˜ˆμΈ‘ν•˜λ„λ‘ 사전 ν›ˆλ ¨λ˜μ—ˆμŠ΅λ‹ˆλ‹€.
- μ§€μ‹œ μ‘°μ • λͺ¨λΈμΈ [λ―ΈμŠ€νŠΈλž„-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1)은 지도 λ―Έμ„Έ μ‘°μ •(SFT)κ³Ό 직접 μ„ ν˜Έλ„ μ΅œμ ν™”(DPO)λ₯Ό μ‚¬μš©ν•œ μ±„νŒ…μ— μ΅œμ ν™”λœ κΈ°λ³Έ λͺ¨λΈμž…λ‹ˆλ‹€.
- κ°œμ„ λœ μ§€μ‹œ μ‘°μ • λͺ¨λΈμΈ [λ―ΈμŠ€νŠΈλž„-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)λŠ” v1을 κ°œμ„ ν•œ λ²„μ „μž…λ‹ˆλ‹€.

κΈ°λ³Έ λͺ¨λΈμ€ λ‹€μŒκ³Ό 같이 μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€:

```python
>>> from transformers import AutoModelForCausalLM, AutoTokenizer

>>> model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", device_map="auto")
>>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

>>> prompt = "My favourite condiment is"

>>> model_inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
>>> model.to(device)

>>> generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True)
>>> tokenizer.batch_decode(generated_ids)[0]
"My favourite condiment is to ..."
```

μ§€μ‹œ μ‘°μ • λͺ¨λΈμ€ λ‹€μŒκ³Ό 같이 μ‚¬μš©ν•  수 μžˆμŠ΅λ‹ˆλ‹€:

```python
>>> from transformers import AutoModelForCausalLM, AutoTokenizer

>>> model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2", device_map="auto")
>>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

>>> messages = [
...     {"role": "user", "content": "What is your favourite condiment?"},
...     {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
...     {"role": "user", "content": "Do you have mayonnaise recipes?"}
... ]

>>> model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")

>>> generated_ids = model.generate(model_inputs, max_new_tokens=100, do_sample=True)
>>> tokenizer.batch_decode(generated_ids)[0]
"Mayonnaise can be made as follows: (...)"
```

μ§€μ‹œ μ‘°μ • λͺ¨λΈμ€ μž…λ ₯이 μ˜¬λ°”λ₯Έ ν˜•μ‹μœΌλ‘œ μ€€λΉ„λ˜λ„λ‘ [μ±„νŒ… ν…œν”Œλ¦Ώ](../chat_templating)을 μ μš©ν•΄μ•Ό ν•©λ‹ˆλ‹€.

## ν”Œλž˜μ‹œ μ–΄ν…μ…˜μ„ μ΄μš©ν•œ λ―ΈμŠ€νŠΈλž„ 속도ν–₯상[[speeding-up-mistral-by-using-flash-attention]]

μœ„μ˜ μ½”λ“œ μŠ€λ‹ˆνŽ«λ“€μ€ μ–΄λ–€ μ΅œμ ν™” 기법도 μ‚¬μš©ν•˜μ§€ μ•Šμ€ μΆ”λ‘  과정을 λ³΄μ—¬μ€λ‹ˆλ‹€. ν•˜μ§€λ§Œ λͺ¨λΈ λ‚΄λΆ€μ—μ„œ μ‚¬μš©λ˜λŠ” μ–΄ν…μ…˜ λ©”μ»€λ‹ˆμ¦˜μ˜ 더 λΉ λ₯Έ κ΅¬ν˜„μΈ [ν”Œλž˜μ‹œ μ–΄ν…μ…˜2](../perf_train_gpu_one.md#flash-attention-2)을 ν™œμš©ν•˜λ©΄ λͺ¨λΈμ˜ 속도λ₯Ό 크게 높일 수 μžˆμŠ΅λ‹ˆλ‹€.

λ¨Όμ €, μŠ¬λΌμ΄λ”© μœˆλ„μš° μ–΄ν…μ…˜ κΈ°λŠ₯을 ν¬ν•¨ν•˜λŠ” ν”Œλž˜μ‹œ μ–΄ν…μ…˜2의 μ΅œμ‹  버전을 μ„€μΉ˜ν•΄μ•Ό ν•©λ‹ˆλ‹€.

```bash
pip install -U flash-attn --no-build-isolation
```

ν•˜λ“œμ›¨μ–΄μ™€ ν”Œλž˜μ‹œ μ–΄ν…μ…˜2의 ν˜Έν™˜μ—¬λΆ€λ₯Ό ν™•μΈν•˜μ„Έμš”. 이에 λŒ€ν•œ μžμ„Έν•œ λ‚΄μš©μ€ [ν”Œλž˜μ‹œ μ–΄ν…μ…˜ μ €μž₯μ†Œ](https://github.com/Dao-AILab/flash-attention)의 곡식 λ¬Έμ„œμ—μ„œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€. λ˜ν•œ λͺ¨λΈμ„ λ°˜μ •λ°€λ„(예: `torch.float16`)둜 λΆˆλŸ¬μ™€μ•Όν•©λ‹ˆλ‹€.

ν”Œλž˜μ‹œ μ–΄ν…μ…˜2λ₯Ό μ‚¬μš©ν•˜μ—¬ λͺ¨λΈμ„ 뢈러였고 μ‹€ν–‰ν•˜λ €λ©΄ μ•„λž˜ μ½”λ“œ μŠ€λ‹ˆνŽ«μ„ μ°Έμ‘°ν•˜μ„Έμš”:

```python
>>> import torch
>>> from transformers import AutoModelForCausalLM, AutoTokenizer

>>> model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-v0.1", torch_dtype=torch.float16, attn_implementation="flash_attention_2", device_map="auto")
>>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

>>> prompt = "My favourite condiment is"

>>> model_inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
>>> model.to(device)

>>> generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True)
>>> tokenizer.batch_decode(generated_ids)[0]
"My favourite condiment is to (...)"
```

### κΈ°λŒ€ν•˜λŠ” 속도 ν–₯상[[expected-speedups]]

λ‹€μŒμ€ `mistralai/Mistral-7B-v0.1` 체크포인트λ₯Ό μ‚¬μš©ν•œ 트랜슀포머의 κΈ°λ³Έ κ΅¬ν˜„κ³Ό ν”Œλž˜μ‹œ μ–΄ν…μ…˜2 버전 λͺ¨λΈ μ‚¬μ΄μ˜ 순수 μΆ”λ‘  μ‹œκ°„μ„ λΉ„κ΅ν•œ μ˜ˆμƒ 속도 ν–₯상 λ‹€μ΄μ–΄κ·Έλž¨μž…λ‹ˆλ‹€.

<div style="text-align: center">
<img src="https://huggingface.co/datasets/ybelkada/documentation-images/resolve/main/mistral-7b-inference-large-seqlen.png">
</div>

### μŠ¬λΌμ΄λ”© μœˆλ„μš° μ–΄ν…μ…˜[[sliding-window-attention]]

ν˜„μž¬ κ΅¬ν˜„μ€ μŠ¬λΌμ΄λ”© μœˆλ„μš° μ–΄ν…μ…˜ λ©”μ»€λ‹ˆμ¦˜κ³Ό λ©”λͺ¨λ¦¬ 효율적인 μΊμ‹œ 관리 κΈ°λŠ₯을 μ§€μ›ν•©λ‹ˆλ‹€. μŠ¬λΌμ΄λ”© μœˆλ„μš° μ–΄ν…μ…˜μ„ ν™œμ„±ν™”ν•˜λ €λ©΄, μŠ¬λΌμ΄λ”© μœˆλ„μš° μ–΄ν…μ…˜κ³Ό ν˜Έν™˜λ˜λŠ”`flash-attn`(`>=2.3.0`)버전을 μ‚¬μš©ν•˜λ©΄ λ©λ‹ˆλ‹€. 

λ˜ν•œ ν”Œλž˜μ‹œ μ–΄ν…μ…˜2 λͺ¨λΈμ€ 더 λ©”λͺ¨λ¦¬ 효율적인 μΊμ‹œ μŠ¬λΌμ΄μ‹± λ©”μ»€λ‹ˆμ¦˜μ„ μ‚¬μš©ν•©λ‹ˆλ‹€. λ―ΈμŠ€νŠΈλž„ λͺ¨λΈμ˜ 곡식 κ΅¬ν˜„μ—μ„œ ꢌμž₯ν•˜λŠ” 둀링 μΊμ‹œ λ©”μ»€λ‹ˆμ¦˜μ„ 따라, μΊμ‹œ 크기λ₯Ό κ³ μ •(`self.config.sliding_window`)으둜 μœ μ§€ν•˜κ³ , `padding_side="left"`인 κ²½μš°μ—λ§Œ 배치 생성(batch generation)을 μ§€μ›ν•˜λ©°, ν˜„μž¬ ν† ν°μ˜ μ ˆλŒ€ μœ„μΉ˜λ₯Ό μ‚¬μš©ν•΄ μœ„μΉ˜ μž„λ² λ”©μ„ κ³„μ‚°ν•©λ‹ˆλ‹€.

## μ–‘μžν™”λ‘œ λ―ΈμŠ€νŠΈλž„ 크기 쀄이기[[shrinking-down-mistral-using-quantization]]

λ―ΈμŠ€νŠΈλž„ λͺ¨λΈμ€ 70μ–΅ 개의 νŒŒλΌλ―Έν„°λ₯Ό κ°€μ§€κ³  μžˆμ–΄, 절반의 정밀도(float16)둜 μ•½ 14GB의 GPU RAM이 ν•„μš”ν•©λ‹ˆλ‹€. 각 νŒŒλΌλ―Έν„°κ°€ 2λ°”μ΄νŠΈλ‘œ μ €μž₯되기 λ•Œλ¬Έμž…λ‹ˆλ‹€. ν•˜μ§€λ§Œ [μ–‘μžν™”](../quantization.md)λ₯Ό μ‚¬μš©ν•˜λ©΄ λͺ¨λΈ 크기λ₯Ό 쀄일 수 μžˆμŠ΅λ‹ˆλ‹€. λͺ¨λΈμ„ 4λΉ„νŠΈ(즉, νŒŒλΌλ―Έν„°λ‹Ή 반 λ°”μ΄νŠΈ)둜 μ–‘μžν™”ν•˜λ©΄ μ•½ 3.5GB의 RAM만 ν•„μš”ν•©λ‹ˆλ‹€.

λͺ¨λΈμ„ μ–‘μžν™”ν•˜λŠ” 것은 `quantization_config`λ₯Ό λͺ¨λΈμ— μ „λ‹¬ν•˜λŠ” κ²ƒλ§ŒνΌ κ°„λ‹¨ν•©λ‹ˆλ‹€. μ•„λž˜μ—μ„œλŠ” BitsAndBytes μ–‘μžν™”λ₯Ό μ‚¬μš©ν•˜μ§€λ§Œ, λ‹€λ₯Έ μ–‘μžν™” 방법은 [이 νŽ˜μ΄μ§€](../quantization.md)λ₯Ό μ°Έκ³ ν•˜μ„Έμš”:

```python
>>> import torch
>>> from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

>>> # specify how to quantize the model
>>> quantization_config = BitsAndBytesConfig(
...         load_in_4bit=True,
...         bnb_4bit_quant_type="nf4",
...         bnb_4bit_compute_dtype="torch.float16",
... )

>>> model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2", quantization_config=True, device_map="auto")
>>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")

>>> prompt = "My favourite condiment is"

>>> messages = [
...     {"role": "user", "content": "What is your favourite condiment?"},
...     {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
...     {"role": "user", "content": "Do you have mayonnaise recipes?"}
... ]

>>> model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")

>>> generated_ids = model.generate(model_inputs, max_new_tokens=100, do_sample=True)
>>> tokenizer.batch_decode(generated_ids)[0]
"The expected output"
```

이 λͺ¨λΈμ€ [Younes Belkada](https://huggingface.co/ybelkada)와 [Arthur Zucker](https://huggingface.co/ArthurZ)κ°€ κΈ°μ—¬ν–ˆμŠ΅λ‹ˆλ‹€.
원본 μ½”λ“œλŠ” [이곳](https://github.com/mistralai/mistral-src)μ—μ„œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€.

## λ¦¬μ†ŒμŠ€[[resources]]

λ―ΈμŠ€νŠΈλž„μ„ μ‹œμž‘ν•˜λŠ” 데 도움이 λ˜λŠ” Hugging Face와 community 자료 λͺ©λ‘(🌎둜 ν‘œμ‹œλ¨) μž…λ‹ˆλ‹€. 여기에 포함될 자료λ₯Ό μ œμΆœν•˜κ³  μ‹ΆμœΌμ‹œλ‹€λ©΄ PR(Pull Request)λ₯Ό μ—΄μ–΄μ£Όμ„Έμš”. 리뷰해 λ“œλ¦¬κ² μŠ΅λ‹ˆλ‹€! μžλ£ŒλŠ” κΈ°μ‘΄ 자료λ₯Ό λ³΅μ œν•˜λŠ” λŒ€μ‹  μƒˆλ‘œμš΄ λ‚΄μš©μ„ λ‹΄κ³  μžˆμ–΄μ•Ό ν•©λ‹ˆλ‹€.

<PipelineTag pipeline="text-generation"/>

- λ―ΈμŠ€νŠΈλž„-7B의 μ§€λ„ν˜• λ―Έμ„Έμ‘°μ •(SFT)을 μˆ˜ν–‰ν•˜λŠ” 데λͺ¨ λ…ΈνŠΈλΆμ€ [이곳](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Mistral/Supervised_fine_tuning_(SFT)_of_an_LLM_using_Hugging_Face_tooling.ipynb)μ—μ„œ 확인할 수 μžˆμŠ΅λ‹ˆλ‹€. 🌎
- 2024년에 Hugging Face 도ꡬλ₯Ό μ‚¬μš©ν•΄ LLM을 λ―Έμ„Έ μ‘°μ •ν•˜λŠ” 방법에 λŒ€ν•œ [λΈ”λ‘œκ·Έ 포슀트](https://www.philschmid.de/fine-tune-llms-in-2024-with-trl). 🌎
- Hugging Face의 [μ •λ ¬(Alignment) ν•Έλ“œλΆ](https://github.com/huggingface/alignment-handbook)μ—λŠ” λ―ΈμŠ€νŠΈλž„-7Bλ₯Ό μ‚¬μš©ν•œ μ§€λ„ν˜• λ―Έμ„Έ μ‘°μ •(SFT) 및 직접 μ„ ν˜Έ μ΅œμ ν™”(DPO)λ₯Ό μˆ˜ν–‰ν•˜κΈ° μœ„ν•œ μŠ€ν¬λ¦½νŠΈμ™€ λ ˆμ‹œν”Όκ°€ ν¬ν•¨λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€. μ—¬κΈ°μ—λŠ” 단일 GPUμ—μ„œ QLoRa 및 닀쀑 GPUλ₯Ό μ‚¬μš©ν•œ 전체 λ―Έμ„Έ 쑰정을 μœ„ν•œ μŠ€ν¬λ¦½νŠΈκ°€ ν¬ν•¨λ˜μ–΄ μžˆμŠ΅λ‹ˆλ‹€.
- [인과적 μ–Έμ–΄ λͺ¨λΈλ§ μž‘μ—… κ°€μ΄λ“œ](../tasks/language_modeling)

## MistralConfig[[transformers.MistralConfig]]

[[autodoc]] MistralConfig

## MistralModel[[transformers.MistralModel]]

[[autodoc]] MistralModel
    - forward

## MistralForCausalLM[[transformers.MistralForCausalLM]]

[[autodoc]] MistralForCausalLM
    - forward

## MistralForSequenceClassification[[transformers.MistralForSequenceClassification]]

[[autodoc]] MistralForSequenceClassification
    - forward

## MistralForTokenClassification[[transformers.MistralForTokenClassification]]

[[autodoc]] MistralForTokenClassification
    - forward

## FlaxMistralModel[[transformers.FlaxMistralModel]]

[[autodoc]] FlaxMistralModel
    - __call__

## FlaxMistralForCausalLM[[transformers.FlaxMistralForCausalLM]]

[[autodoc]] FlaxMistralForCausalLM
    - __call__

## TFMistralModel[[transformers.TFMistralModel]]

[[autodoc]] TFMistralModel
    - call

## TFMistralForCausalLM[[transformers.TFMistralForCausalLM]]

[[autodoc]] TFMistralForCausalLM
    - call

## TFMistralForSequenceClassification[[transformers.TFMistralForSequenceClassification]]

[[autodoc]] TFMistralForSequenceClassification
    - call