Instructions to use CohereLabs/c4ai-command-r-v01 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CohereLabs/c4ai-command-r-v01 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="CohereLabs/c4ai-command-r-v01")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("CohereLabs/c4ai-command-r-v01")
model = AutoModelForCausalLM.from_pretrained("CohereLabs/c4ai-command-r-v01")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use CohereLabs/c4ai-command-r-v01 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CohereLabs/c4ai-command-r-v01"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CohereLabs/c4ai-command-r-v01",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/CohereLabs/c4ai-command-r-v01

SGLang

How to use CohereLabs/c4ai-command-r-v01 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CohereLabs/c4ai-command-r-v01" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CohereLabs/c4ai-command-r-v01",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CohereLabs/c4ai-command-r-v01" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CohereLabs/c4ai-command-r-v01",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use CohereLabs/c4ai-command-r-v01 with Docker Model Runner:
```
docker model run hf.co/CohereLabs/c4ai-command-r-v01
```

ValueError: Cannot instantiate this tokenizer from a slow version. If it's based on sentencepiece, make sure you have sentencepiece installed.

#18

by pseudotensor - opened Mar 13, 2024

Discussion

pseudotensor

Mar 13, 2024

•

edited Mar 13, 2024

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = 'CohereForAI/c4ai-command-r-v01'

tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True)

now fails with:

You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Traceback (most recent call last):
  File "/home/jon/h2ogpt/coheretest1.py", line 5, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True, add_prefix_space=False)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 810, in from_pretrained
    return tokenizer_class.from_pretrained(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2048, in from_pretrained
    return cls._from_pretrained(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2287, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/jon/.cache/huggingface/modules/transformers_modules/CohereForAI/c4ai-command-r-v01/779ade391d0552f47d38c13745f6e2d33eb3d916/tokenization_cohere_fast.py", line 128, in __init__
    super().__init__(
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 102, in __init__
    raise ValueError(
ValueError: Cannot instantiate this tokenizer from a slow version. If it's based on sentencepiece, make sure you have sentencepiece installed.

This worked yesterday.

My sentencepiece is latest, i.e. 0.2.0. transformers is latest, i.e. 4.38.2.

This does not work either:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model = 'CohereForAI/c4ai-command-r-v01'

tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True, use_fast=False)

gives:

Traceback (most recent call last):
  File "/home/jon/h2ogpt/coheretest1.py", line 5, in <module>
    tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True, use_fast=False)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 806, in from_pretrained
    tokenizer_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
  File "/home/jon/miniconda3/envs/h2ogpt/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 479, in get_class_from_dynamic_module
    if "--" in class_reference:
TypeError: argument of type 'NoneType' is not iterable

ahmetustun

Mar 13, 2024

hey, this should be fixed now. Can you please try again?

pseudotensor

Mar 14, 2024

yes

pseudotensor changed discussion status to closed Mar 14, 2024

iamgroot42

Mar 28, 2024

I just got started with command-r )quantized version) and still have this issue!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment