Instructions to use CohereLabs/c4ai-command-r-plus with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CohereLabs/c4ai-command-r-plus with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="CohereLabs/c4ai-command-r-plus")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("CohereLabs/c4ai-command-r-plus")
model = AutoModelForCausalLM.from_pretrained("CohereLabs/c4ai-command-r-plus")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use CohereLabs/c4ai-command-r-plus with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CohereLabs/c4ai-command-r-plus"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CohereLabs/c4ai-command-r-plus",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/CohereLabs/c4ai-command-r-plus

SGLang

How to use CohereLabs/c4ai-command-r-plus with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CohereLabs/c4ai-command-r-plus" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CohereLabs/c4ai-command-r-plus",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CohereLabs/c4ai-command-r-plus" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CohereLabs/c4ai-command-r-plus",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use CohereLabs/c4ai-command-r-plus with Docker Model Runner:
```
docker model run hf.co/CohereLabs/c4ai-command-r-plus
```

How to use the inference API correctly?

#35

by gbhall - opened Apr 21, 2024

Discussion

gbhall

Apr 21, 2024

•

edited Apr 21, 2024

Based on the docs, the following JSON input (tested via Postman) should work:

Post Endpoint: https://api-inference.huggingface.co/models/CohereForAI/c4ai-command-r-plus

{
  "inputs": {
    "role": "user",
    "content": "Who is Elon Musk?"
  }
}

But I get the error Failed to deserialize the JSON body into the target type: inputs: invalid type: map, expected a string at line 2 column 12

What's the expected format?

aiworldx

Apr 22, 2024

good

TikaToka

Apr 23, 2024

Sorry for unrelated question, but is it possible to use command-r-+ with hf inference api with pro subscription although it is 103B?

gbhall

Apr 25, 2024

•

edited Apr 25, 2024

Okay I got this to work after I read the docs properly and did more research.

Out of the box it's just a model. What I'm after is a chat-like template. Hugging Face offers something called TGI (Text Generation Inference): https://huggingface.co/docs/text-generation-inference/en/index

To use it, you need to specify it in a request like so using OpenAI's chat completion API: https://huggingface.co/blog/llama3#how-to-prompt-llama-3

from openai import OpenAI

# initialize the client but point it to TGI
client = OpenAI(
    base_url="https://api-inference.huggingface.co/models/CohereForAI/c4ai-command-r-plus" + "/v1/",  # replace with your endpoint url
    api_key="<HF_API_TOKEN>",  # replace with your token
)
chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[
        {"role": "user", "content": "Why is open-source software important?"},
    ],
    stream=True,
    max_tokens=500
)

# iterate and print stream
for message in chat_completion:
    print(message.choices[0].delta.content, end="")

This works.

gbhall

Apr 25, 2024

Or, to use this exact example with postman.

Post Endpoint: https://api-inference.huggingface.co/models/CohereForAI/c4ai-command-r-plus/v1/chat/completions

JSON body:

{
  "model": "tgi",
  "messages": [
    {"role": "user", "content": "Who is Elon Musk?"}
  ],
  "stream": false,
  "max_tokens": 500
}

Reponse:

{
    "id": "",
    "object": "text_completion",
    "created": 1714021803,
    "model": "text-generation-inference/commandrplus-medusa",
    "system_fingerprint": "1.4.5-native",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "Elon Musk is a business magnate, investor, and philanthropist. He is the founder, CEO, and Chief Engineer of SpaceX; angel investor, CEO and Product Architect of Tesla, Inc.; owner and CEO of Twitter, Inc.; founder of the Boring Company; co-founder of Neuralink; and president of the philanthropic Musk Foundation. With an estimated net worth of around $191 billion, Musk is the second-wealthiest person in the world, behind Bernard Arnault as of August 3, 2023."
            },
            "logprobs": null,
            "finish_reason": "eos_token"
        }
    ],
    "usage": {
        "prompt_tokens": 11,
        "completion_tokens": 109,
        "total_tokens": 120
    }
}

alexrs

Cohere Labs org Sep 4, 2024

Closing it as it seems resolved!

alexrs changed discussion status to closed Sep 4, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment