DrDavis's picture
Upload folder using huggingface_hub
17c6d62 verified

Cohere[[cohere]]

๊ฐœ์š”[[overview]]

The Cohere Command-R ๋ชจ๋ธ์€ CohereํŒ€์ด Command-R: ํ”„๋กœ๋•์…˜ ๊ทœ๋ชจ์˜ ๊ฒ€์ƒ‰ ์ฆ๊ฐ• ์ƒ์„ฑ๋ผ๋Š” ๋ธ”๋กœ๊ทธ ํฌ์ŠคํŠธ์—์„œ ์†Œ๊ฐœ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

๋…ผ๋ฌธ ์ดˆ๋ก:

Command-R์€ ๊ธฐ์—…์˜ ํ”„๋กœ๋•์…˜ ๊ทœ๋ชจ AI๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด RAG(๊ฒ€์ƒ‰ ์ฆ๊ฐ• ์ƒ์„ฑ)์™€ ๋„๊ตฌ ์‚ฌ์šฉ์„ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์ƒ์„ฑ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์˜ค๋Š˜ ์šฐ๋ฆฌ๋Š” ๋Œ€๊ทœ๋ชจ ํ”„๋กœ๋•์…˜ ์›Œํฌ๋กœ๋“œ๋ฅผ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ์ƒˆ๋กœ์šด LLM์ธ Command-R์„ ์†Œ๊ฐœํ•ฉ๋‹ˆ๋‹ค. Command-R์€ ๋†’์€ ํšจ์œจ์„ฑ๊ณผ ๊ฐ•๋ ฅํ•œ ์ •ํ™•์„ฑ์˜ ๊ท ํ˜•์„ ๋งž์ถ”๋Š” 'ํ™•์žฅ ๊ฐ€๋Šฅํ•œ' ๋ชจ๋ธ ์นดํ…Œ๊ณ ๋ฆฌ๋ฅผ ๋Œ€์ƒ์œผ๋กœ ํ•˜์—ฌ, ๊ธฐ์—…๋“ค์ด ๊ฐœ๋… ์ฆ๋ช…์„ ๋„˜์–ด ํ”„๋กœ๋•์…˜ ๋‹จ๊ณ„๋กœ ๋‚˜์•„๊ฐˆ ์ˆ˜ ์žˆ๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.

*Command-R์€ ๊ฒ€์ƒ‰ ์ฆ๊ฐ• ์ƒ์„ฑ(RAG)์ด๋‚˜ ์™ธ๋ถ€ API ๋ฐ ๋„๊ตฌ ์‚ฌ์šฉ๊ณผ ๊ฐ™์€ ๊ธด ๋ฌธ๋งฅ ์ž‘์—…์— ์ตœ์ ํ™”๋œ ์ƒ์„ฑ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ RAG ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์œ„ํ•œ ์ตœ๊ณ  ์ˆ˜์ค€์˜ ํ†ตํ•ฉ์„ ์ œ๊ณตํ•˜๊ณ  ๊ธฐ์—… ์‚ฌ์šฉ ์‚ฌ๋ก€์—์„œ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•˜๊ธฐ ์œ„ํ•ด ์šฐ๋ฆฌ์˜ ์—…๊ณ„ ์„ ๋„์ ์ธ Embed ๋ฐ Rerank ๋ชจ๋ธ๊ณผ ์กฐํ™”๋กญ๊ฒŒ ์ž‘๋™ํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ์—…์ด ๋Œ€๊ทœ๋ชจ๋กœ ๊ตฌํ˜„ํ•  ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“ค์–ด์ง„ ๋ชจ๋ธ๋กœ์„œ, Command-R์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํŠน์ง•์„ ์ž๋ž‘ํ•ฉ๋‹ˆ๋‹ค:

  • RAG ๋ฐ ๋„๊ตฌ ์‚ฌ์šฉ์— ๋Œ€ํ•œ ๊ฐ•๋ ฅํ•œ ์ •ํ™•์„ฑ
  • ๋‚ฎ์€ ์ง€์—ฐ ์‹œ๊ฐ„๊ณผ ๋†’์€ ์ฒ˜๋ฆฌ๋Ÿ‰
  • ๋” ๊ธด 128k ์ปจํ…์ŠคํŠธ์™€ ๋‚ฎ์€ ๊ฐ€๊ฒฉ
  • 10๊ฐœ์˜ ์ฃผ์š” ์–ธ์–ด์— ๊ฑธ์นœ ๊ฐ•๋ ฅํ•œ ๊ธฐ๋Šฅ
  • ์—ฐ๊ตฌ ๋ฐ ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด HuggingFace์—์„œ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ ๊ฐ€์ค‘์น˜

๋ชจ๋ธ ์ฒดํฌํฌ์ธํŠธ๋Š” ์ด๊ณณ์—์„œ ํ™•์ธํ•˜์„ธ์š”. ์ด ๋ชจ๋ธ์€ Saurabh Dash๊ณผ Ahmet รœstรผn์— ์˜ํ•ด ๊ธฐ์—ฌ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Hugging Face์—์„œ ์ด ์ฝ”๋“œ์˜ ๊ตฌํ˜„์€ GPT-NeoX์— ๊ธฐ๋ฐ˜ํ•˜์˜€์Šต๋‹ˆ๋‹ค.

์‚ฌ์šฉ ํŒ[[usage-tips]]

Hub์— ์—…๋กœ๋“œ๋œ ์ฒดํฌํฌ์ธํŠธ๋“ค์€ torch_dtype = 'float16'์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” AutoModel API๊ฐ€ ์ฒดํฌํฌ์ธํŠธ๋ฅผ torch.float32์—์„œ torch.float16์œผ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

์˜จ๋ผ์ธ ๊ฐ€์ค‘์น˜์˜ dtype์€ model = AutoModelForCausalLM.from_pretrained("path", torch_dtype = "auto")๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ์ดˆ๊ธฐํ™”ํ•  ๋•Œ torch_dtype="auto"๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š” ํ•œ ๋Œ€๋ถ€๋ถ„ ๋ฌด๊ด€ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ์ด์œ ๋Š” ๋ชจ๋ธ์ด ๋จผ์ € ๋‹ค์šด๋กœ๋“œ๋˜๊ณ (์˜จ๋ผ์ธ ์ฒดํฌํฌ์ธํŠธ์˜ dtype ์‚ฌ์šฉ), ๊ทธ ๋‹ค์Œ torch์˜ ๊ธฐ๋ณธ dtype์œผ๋กœ ๋ณ€ํ™˜๋˜๋ฉฐ(์ด๋•Œ torch.float32๊ฐ€ ๋จ), ๋งˆ์ง€๋ง‰์œผ๋กœ config์— torch_dtype์ด ์ œ๊ณต๋œ ๊ฒฝ์šฐ ์ด๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

๋ชจ๋ธ์„ float16์œผ๋กœ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์€ ๊ถŒ์žฅ๋˜์ง€ ์•Š์œผ๋ฉฐ nan์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ชจ๋ธ์€ bfloat16์œผ๋กœ ํ›ˆ๋ จํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ๊ณผ ํ† ํฌ๋‚˜์ด์ €๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

# pip install transformers
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "CohereForAI/c4ai-command-r-v01"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Format message with the command-r chat template
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>

gen_tokens = model.generate(
    input_ids, 
    max_new_tokens=100, 
    do_sample=True, 
    temperature=0.3,
    )

gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)
  • Flash Attention 2๋ฅผ attn_implementation="flash_attention_2"๋ฅผ ํ†ตํ•ด ์‚ฌ์šฉํ•  ๋•Œ๋Š”, from_pretrained ํด๋ž˜์Šค ๋ฉ”์„œ๋“œ์— torch_dtype์„ ์ „๋‹ฌํ•˜์ง€ ๋ง๊ณ  ์ž๋™ ํ˜ผํ•ฉ ์ •๋ฐ€๋„ ํ›ˆ๋ จ(Automatic Mixed-Precision training)์„ ์‚ฌ์šฉํ•˜์„ธ์š”. Trainer๋ฅผ ์‚ฌ์šฉํ•  ๋•Œ๋Š” ๋‹จ์ˆœํžˆ fp16 ๋˜๋Š” bf16์„ True๋กœ ์ง€์ •ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ๊ทธ๋ ‡์ง€ ์•Š์€ ๊ฒฝ์šฐ์—๋Š” torch.autocast๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”. ์ด๋Š” Flash Attention์ด fp16์™€ bf16 ๋ฐ์ดํ„ฐ ํƒ€์ž…๋งŒ ์ง€์›ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

๋ฆฌ์†Œ์Šค[[resources]]

Command-R์„ ์‹œ์ž‘ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋Š” Hugging Face์™€ community ์ž๋ฃŒ ๋ชฉ๋ก(๐ŸŒŽ๋กœ ํ‘œ์‹œ๋จ) ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์— ํฌํ•จ๋  ์ž๋ฃŒ๋ฅผ ์ œ์ถœํ•˜๊ณ  ์‹ถ์œผ์‹œ๋‹ค๋ฉด PR(Pull Request)๋ฅผ ์—ด์–ด์ฃผ์„ธ์š”. ๋ฆฌ๋ทฐ ํ•ด๋“œ๋ฆฌ๊ฒ ์Šต๋‹ˆ๋‹ค! ์ž๋ฃŒ๋Š” ๊ธฐ์กด ์ž๋ฃŒ๋ฅผ ๋ณต์ œํ•˜๋Š” ๋Œ€์‹  ์ƒˆ๋กœ์šด ๋‚ด์šฉ์„ ๋‹ด๊ณ  ์žˆ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

FP16 ๋ชจ๋ธ ๋กœ๋”ฉ

# pip install transformers
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "CohereForAI/c4ai-command-r-v01"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# command-r ์ฑ— ํ…œํ”Œ๋ฆฟ์œผ๋กœ ๋ฉ”์„ธ์ง€ ํ˜•์‹์„ ์ •ํ•˜์„ธ์š”
messages = [{"role": "user", "content": "Hello, how are you?"}]
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
## <BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hello, how are you?<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>

gen_tokens = model.generate(
    input_ids, 
    max_new_tokens=100, 
    do_sample=True, 
    temperature=0.3,
    )

gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)

bitsandbytes ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ด์šฉํ•ด์„œ 4bit ์–‘์žํ™”๋œ ๋ชจ๋ธ ๋กœ๋”ฉ

# pip install transformers bitsandbytes accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(load_in_4bit=True)

model_id = "CohereForAI/c4ai-command-r-v01"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config)

gen_tokens = model.generate(
    input_ids, 
    max_new_tokens=100, 
    do_sample=True, 
    temperature=0.3,
    )

gen_text = tokenizer.decode(gen_tokens[0])
print(gen_text)

CohereConfig[[transformers.CohereConfig]]

[[autodoc]] CohereConfig

CohereTokenizerFast[[transformers.CohereTokenizerFast]]

[[autodoc]] CohereTokenizerFast - build_inputs_with_special_tokens - get_special_tokens_mask - create_token_type_ids_from_sequences - update_post_processor - save_vocabulary

CohereModel[[transformers.CohereModel]]

[[autodoc]] CohereModel - forward

CohereForCausalLM[[transformers.CohereForCausalLM]]

[[autodoc]] CohereForCausalLM - forward