lsmpp's picture
Add files using upload-large-folder tool
4cef5ec verified
# Cohere[[cohere]]
## ๊ฐœ์š”[[overview]]
The Cohere Command-R ๋ชจ๋ธ์€ CohereํŒ€์ด [Command-R: ํ”„๋กœ๋•์…˜ ๊ทœ๋ชจ์˜ ๊ฒ€์ƒ‰ ์ฆ๊ฐ• ์ƒ์„ฑ](https://txt.cohere.com/command-r/)๋ผ๋Š” ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŠธ์—์„œ ์†Œ๊ฐœ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.
๋…ผ๋ฌธ ์ดˆ๋ก:
*Command-R์€ ๊ธฐ์—…์˜ ํ”„๋กœ๋•์…˜ ๊ทœ๋ชจ AI๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด RAG(๊ฒ€์ƒ‰ ์ฆ๊ฐ• ์ƒ์„ฑ)์™€ ๋„๊ตฌ ์‚ฌ์šฉ์„ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์ƒ์„ฑ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์˜ค๋Š˜ ์šฐ๋ฆฌ๋Š” ๋Œ€๊ทœ๋ชจ ํ”„๋กœ๋•์…˜ ์›Œํฌ๋กœ๋“œ๋ฅผ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ์ƒˆ๋กœ์šด LLM์ธ Command-R์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. Command-R์€ ๋†’์€ ํšจ์œจ์„ฑ๊ณผ ๊ฐ•๋ ฅํ•œ ์ •ํ™•์„ฑ์˜ ๊ท ํ˜•์„ ๋งž์ถ”๋Š” 'ํ™•์žฅ ๊ฐ€๋Šฅํ•œ' ๋ชจ๋ธ ์นดํ…Œ๊ณ ๋ฆฌ๋ฅผ ๋Œ€์ƒ์œผ๋กœ ํ•˜์—ฌ, ๊ธฐ์—…๋“ค์ด ๊ฐœ๋… ์ฆ๋ช…์„ ๋„˜์–ด ํ”„๋กœ๋•์…˜ ๋‹จ๊ณ„๋กœ ๋‚˜์•„๊ฐˆ ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.*
*Command-R์€ ๊ฒ€์ƒ‰ ์ฆ๊ฐ• ์ƒ์„ฑ(RAG)์ด๋‚˜ ์™ธ๋ถ€ API ๋ฐ ๋„๊ตฌ ์‚ฌ์šฉ๊ณผ ๊ฐ™์€ ๊ธด ๋ฌธ๋งฅ ์ž‘์—…์— ์ตœ์ ํ™”๋œ ์ƒ์„ฑ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ RAG ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์œ„ํ•œ ์ตœ๊ณ  ์ˆ˜์ค€์˜ ํ†ตํ•ฉ์„ ์ œ๊ณตํ•˜๊ณ  ๊ธฐ์—… ์‚ฌ์šฉ ์‚ฌ๋ก€์—์„œ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•˜๊ธฐ ์œ„ํ•ด ์šฐ๋ฆฌ์˜ ์—…๊ณ„ ์„ ๋„์ ์ธ Embed ๋ฐ Rerank ๋ชจ๋ธ๊ณผ ์กฐํ™”๋กญ๊ฒŒ ์ž‘๋™ํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ์—…์ด ๋Œ€๊ทœ๋ชจ๋กœ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“ค์–ด์ง„ ๋ชจ๋ธ๋กœ์„œ, Command-R์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŠน์ง•์„ ์ž๋ž‘ํ•ฉ๋‹ˆ๋‹ค:
- RAG ๋ฐ ๋„๊ตฌ ์‚ฌ์šฉ์— ๋Œ€ํ•œ ๊ฐ•๋ ฅํ•œ ์ •ํ™•์„ฑ
- ๋‚ฎ์€ ์ง€์—ฐ ์‹œ๊ฐ„๊ณผ ๋†’์€ ์ฒ˜๋ฆฌ๋Ÿ‰
- ๋” ๊ธด 128k ์ปจํ…์ŠคํŠธ์™€ ๋‚ฎ์€ ๊ฐ€๊ฒฉ
- 10๊ฐœ์˜ ์ฃผ์š” ์–ธ์–ด์— ๊ฑธ์นœ ๊ฐ•๋ ฅํ•œ ๊ธฐ๋Šฅ
- ์—ฐ๊ตฌ ๋ฐ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด HuggingFace์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜
๋ชจ๋ธ ์ฒดํฌํฌ์ธํŠธ๋Š” [์ด๊ณณ](https://huggingface.co/CohereForAI/c4ai-command-r-v01)์—์„œ ํ™•์ธํ•˜์„ธ์š”.
์ด ๋ชจ๋ธ์€ [Saurabh Dash](https://huggingface.co/saurabhdash)๊ณผ [Ahmet รœstรผn](https://huggingface.co/ahmetustun)์— ์˜ํ•ด ๊ธฐ์—ฌ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Hugging Face์—์„œ ์ด ์ฝ”๋“œ์˜ ๊ตฌํ˜„์€ [GPT-NeoX](https://github.com/EleutherAI/gpt-neox)์— ๊ธฐ๋ฐ˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค.
## ์‚ฌ์šฉ ํŒ[[usage-tips]]
<Tip warning={true}>
Hub์— ์—…๋กœ๋“œ๋œ ์ฒดํฌํฌ์ธํŠธ๋“ค์€ `torch_dtype = 'float16'`์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
์ด๋Š” `AutoModel` API๊ฐ€ ์ฒดํฌํฌ์ธํŠธ๋ฅผ `torch.float32`์—์„œ `torch.float16`์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
์˜จ๋ผ์ธ ๊ฐ€์ค‘์น˜์˜ `dtype`์€ `model = AutoModelForCausalLM.from_pretrained("path", torch_dtype = "auto")`๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ์ดˆ๊ธฐํ™”ํ•  ๋•Œ `torch_dtype="auto"`๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ํ•œ ๋Œ€๋ถ€๋ถ„ ๋ฌด๊ด€ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ์ด์œ ๋Š” ๋ชจ๋ธ์ด ๋จผ์ € ๋‹ค์šด๋กœ๋“œ๋˜๊ณ (์˜จ๋ผ์ธ ์ฒดํฌํฌ์ธํŠธ์˜ `dtype` ์‚ฌ์šฉ), ๊ทธ ๋‹ค์Œ `torch`์˜ ๊ธฐ๋ณธ `dtype`์œผ๋กœ ๋ณ€ํ™˜๋˜๋ฉฐ(์ด๋•Œ `torch.float32`๊ฐ€ ๋จ), ๋งˆ์ง€๋ง‰์œผ๋กœ config์— `torch_dtype`์ด ์ œ๊ณต๋œ ๊ฒฝ์šฐ ์ด๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.
๋ชจ๋ธ์„ `float16`์œผ๋กœ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์€ ๊ถŒ์žฅ๋˜์ง€ ์•Š์œผ๋ฉฐ `nan`์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ชจ๋ธ์€ `bfloat16`์œผ๋กœ ํ›ˆ๋ จํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
</Tip>
๋ชจ๋ธ๊ณผ ํ† ํฌ๋‚˜์ด์ €๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:
```python
# pip install transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "CohereForAI/c4ai-command-r-v01"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
# Format message with the command-r chat template
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>
gen_tokens = model.generate(
input_ids,
max_new_tokens=100,
do_sample=True,
temperature=0.3,
)
gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)
```
- Flash Attention 2๋ฅผ `attn_implementation="flash_attention_2"`๋ฅผ ํ†ตํ•ด ์‚ฌ์šฉํ•  ๋•Œ๋Š”, `from_pretrained` ํด๋ž˜์Šค ๋ฉ”์„œ๋“œ์— `torch_dtype`์„ ์ „๋‹ฌํ•˜์ง€ ๋ง๊ณ  ์ž๋™ ํ˜ผํ•ฉ ์ •๋ฐ€๋„ ํ›ˆ๋ จ(Automatic Mixed-Precision training)์„ ์‚ฌ์šฉํ•˜์„ธ์š”. `Trainer`๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ๋Š” ๋‹จ์ˆœํžˆ `fp16` ๋˜๋Š” `bf16`์„ `True`๋กœ ์ง€์ •ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ์—๋Š” `torch.autocast`๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”. ์ด๋Š” Flash Attention์ด `fp16`์™€ `bf16` ๋ฐ์ดํ„ฐ ํƒ€์ž…๋งŒ ์ง€์›ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
## ๋ฆฌ์†Œ์Šค[[resources]]
Command-R์„ ์‹œ์ž‘ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” Hugging Face์™€ community ์ž๋ฃŒ ๋ชฉ๋ก(๐ŸŒŽ๋กœ ํ‘œ์‹œ๋จ) ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์— ํฌํ•จ๋  ์ž๋ฃŒ๋ฅผ ์ œ์ถœํ•˜๊ณ  ์‹ถ์œผ์‹œ๋‹ค๋ฉด PR(Pull Request)๋ฅผ ์—ด์–ด์ฃผ์„ธ์š”. ๋ฆฌ๋ทฐ ํ•ด๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค! ์ž๋ฃŒ๋Š” ๊ธฐ์กด ์ž๋ฃŒ๋ฅผ ๋ณต์ œํ•˜๋Š” ๋Œ€์‹  ์ƒˆ๋กœ์šด ๋‚ด์šฉ์„ ๋‹ด๊ณ  ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
<PipelineTag pipeline="text-generation"/>
FP16 ๋ชจ๋ธ ๋กœ๋”ฉ
```python
# pip install transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "CohereForAI/c4ai-command-r-v01"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
# command-r ์ฑ— ํ…œํ”Œ๋ฆฟ์œผ๋กœ ๋ฉ”์„ธ์ง€ ํ˜•์‹์„ ์ •ํ•˜์„ธ์š”
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>
gen_tokens = model.generate(
input_ids,
max_new_tokens=100,
do_sample=True,
temperature=0.3,
)
gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)
```
bitsandbytes ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ด์šฉํ•ด์„œ 4bit ์–‘์žํ™”๋œ ๋ชจ๋ธ ๋กœ๋”ฉ
```python
# pip install transformers bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(load_in_4bit=True)
model_id = "CohereForAI/c4ai-command-r-v01"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config)
gen_tokens = model.generate(
input_ids,
max_new_tokens=100,
do_sample=True,
temperature=0.3,
)
gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)
```
## CohereConfig[[transformers.CohereConfig]]
[[autodoc]] CohereConfig
## CohereTokenizerFast[[transformers.CohereTokenizerFast]]
[[autodoc]] CohereTokenizerFast
- build_inputs_with_special_tokens
- get_special_tokens_mask
- create_token_type_ids_from_sequences
- update_post_processor
- save_vocabulary
## CohereModel[[transformers.CohereModel]]
[[autodoc]] CohereModel
- forward
## CohereForCausalLM[[transformers.CohereForCausalLM]]
[[autodoc]] CohereForCausalLM
- forward