Instructions to use bartowski/Codestral-22B-v0.1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use bartowski/Codestral-22B-v0.1-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="bartowski/Codestral-22B-v0.1-GGUF",
	filename="Codestral-22B-v0.1-IQ2_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use bartowski/Codestral-22B-v0.1-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf bartowski/Codestral-22B-v0.1-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf bartowski/Codestral-22B-v0.1-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf bartowski/Codestral-22B-v0.1-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf bartowski/Codestral-22B-v0.1-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf bartowski/Codestral-22B-v0.1-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf bartowski/Codestral-22B-v0.1-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf bartowski/Codestral-22B-v0.1-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf bartowski/Codestral-22B-v0.1-GGUF:Q4_K_M

Use Docker

docker model run hf.co/bartowski/Codestral-22B-v0.1-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use bartowski/Codestral-22B-v0.1-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "bartowski/Codestral-22B-v0.1-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bartowski/Codestral-22B-v0.1-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/bartowski/Codestral-22B-v0.1-GGUF:Q4_K_M

Ollama
How to use bartowski/Codestral-22B-v0.1-GGUF with Ollama:
```
ollama run hf.co/bartowski/Codestral-22B-v0.1-GGUF:Q4_K_M
```

Unsloth Studio

How to use bartowski/Codestral-22B-v0.1-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for bartowski/Codestral-22B-v0.1-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for bartowski/Codestral-22B-v0.1-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for bartowski/Codestral-22B-v0.1-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use bartowski/Codestral-22B-v0.1-GGUF with Docker Model Runner:
```
docker model run hf.co/bartowski/Codestral-22B-v0.1-GGUF:Q4_K_M
```

Lemonade

How to use bartowski/Codestral-22B-v0.1-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull bartowski/Codestral-22B-v0.1-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Codestral-22B-v0.1-GGUF-Q4_K_M

List all available models

lemonade list

Add base_model

by julien-c HF Staff - opened Jun 5, 2024

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-0

julien-c

Jun 5, 2024

No description provided.

Add base_model63c38494

bartowski changed pull request status to merged Jun 5, 2024

bartowski

Owner Jun 5, 2024

tyty

Is there any chance this attribute could be changed to something like "original_model" ? just because I know "base_model" is used to describe merges like here:

https://huggingface.co/mlabonne/Meta-Llama-3-120B-Instruct/blob/main/README.md?code=true#L7

so it makes it trickier to pull in the original model's metadata and then also add a link to the original model as base_model

julien-c

Jun 5, 2024

so, for now we've opted to use base_model for everything ie finetunes, merges, and quants.

see doc here:

https://huggingface.co/docs/hub/en/model-cards#specifying-a-base-model

We've thought about encoding more finely a taxonomy of operations but i was lazy to do it at the time 🤣

That being said, as that doc shows, we auto-detect whether a model is a finetune, merge, or quant of its base_model(s). And so we have a "tree" of dependency that you can walk back.

bartowski

Owner Jun 5, 2024

my only concern is from the example above of a merged model

In it, there's this already:

base_model:

meta-llama/Meta-Llama-3-70B-Instruct
meta-llama/Meta-Llama-3-70B-Instruct
meta-llama/Meta-Llama-3-70B-Instruct
meta-llama/Meta-Llama-3-70B-Instruct
meta-llama/Meta-Llama-3-70B-Instruct
meta-llama/Meta-Llama-3-70B-Instruct
meta-llama/Meta-Llama-3-70B-Instruct

if I were to just automatically add on:
base_model: mlabonne/Meta-Llama-3-120B-Instruct

it'll complain that I have base_model twice, so I'd have to presumably parse through the existing metadata yaml, find if base_model exists, find if it's multi-line, remove them all, and then add my own

not that that's so terrible, I'll survive LOL but it is a weird feeling edge case. Also it makes it so that for a model that's a quant of a merge, it can't list both that it's a merge of a certain model AND that it's a quant of another model, which might be interesting information to have readily available

julien-c

Jun 6, 2024

For me in your use case you would replace the base_model that's in the source model, with your own (pointing to that parent model)

Are you using Python? bc you can use huggingface_hub to programatically replace base_model (or any YAML) in a model card.

cc @Wauplin who leads https://github.com/huggingface/huggingface_hub

Wauplin

Jun 6, 2024

To overwrite base_model in the ModelCard metadata, you can use metadata_update:

from huggingface_hub import metadata_update

metadata_update("bartowski/Codestral-22B-v0.1-GGUF", metadata={"base_model": "mistralai/Codestral-22B-v0.1"}, overwrite=True)

If you want to append base_model to an existing list without overwriting any value, you can use ModelCard:

from huggingface_hub import ModelCard

new_model = "bartowski/Codestral-22B-v0.1-GGUF"
base_model = "mistralai/Codestral-22B-v0.1"

# Load existing
card = ModelCard.load(new_model)

# Update field
if card.data.base_model is None:
    card.data.base_model = base_model
elif isinstance(card.data.base_model, str):
    card.data.base_model = [card.data.base_model, base_model]
else:
    card.data.base_model.append(base_model)

# Save
card.push_to_hub(new_model)

Hope this proves useful :)

julien-c

Jun 7, 2024

•

edited Jun 7, 2024

in fact base_model should be seen as parent model i.e. the most immediate parent in the evolution tree of models.

So you would overwrite rather than append

bartowski

Owner Jun 7, 2024

yeah makes sense :) still think would be cool but that works

I'll look at implementing that python code, probably a more appropriate way to update the README and metadata in general than basic bash scripting...

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment