Instructions to use mistralai/Mistral-7B-Instruct-v0.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mistralai/Mistral-7B-Instruct-v0.2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mistralai/Mistral-7B-Instruct-v0.2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2") model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use mistralai/Mistral-7B-Instruct-v0.2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Install mistral-common: pip install --upgrade mistral-common # Start the vLLM server: vllm serve "mistralai/Mistral-7B-Instruct-v0.2" --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mistralai/Mistral-7B-Instruct-v0.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mistralai/Mistral-7B-Instruct-v0.2
- SGLang
How to use mistralai/Mistral-7B-Instruct-v0.2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mistralai/Mistral-7B-Instruct-v0.2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mistralai/Mistral-7B-Instruct-v0.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mistralai/Mistral-7B-Instruct-v0.2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mistralai/Mistral-7B-Instruct-v0.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use mistralai/Mistral-7B-Instruct-v0.2 with Docker Model Runner:
docker model run hf.co/mistralai/Mistral-7B-Instruct-v0.2
How to prune layers in AutoModelForCausalModel
I want to prune layers in mistral and see the results . But I am unable to do it .
What I have tried ..
I tried to create a new modulelist for model.model.layers by removing some layers it ...
But while I try to Inference from the model now .. it is breaking. Any suggestions on how to do it correctly
did you find a solution? I am interested in pruning mistral too.
merge kit ...
Slerp method :
can you please be a little elaborative?
OK : the problem of pruning a model is the layers to choose :
there is various methods you can choose :
So Here is a method using the mergekit :
# Step1 Clone the Repo for MergeKit and Install Requirements
!git clone https://github.com/cg123/mergekit.git
!cd mergekit && pip install -q -e .
import yaml
MODEL_NAME = "Marcoro14-7B-slerp"
yaml_config = """
slices:
- sources:
- model: AIDC-ai-business/Marcoroni-7B-v3
layer_range: [0, 32]
- model: EmbeddedLLM/Mistral-7B-Merge-14-v0.1
layer_range: [0, 32]
merge_method: slerp
base_model: AIDC-ai-business/Marcoroni-7B-v3
parameters:
t:
- filter: self_attn
value: [0, 0.5, 0.3, 0.7, 1]
- filter: mlp
value: [1, 0.5, 0.7, 0.3, 0]
- value: 0.5
dtype: bfloat16
"""
# Save config as yaml file
with open('config.yaml', 'w', encoding="utf-8") as f:
f.write(yaml_config)
merge_config = """
base_model: mistralai/Codestral-22B-v0.1
dtype: float16
merge_method: task_arithmetic
slices:
- sources:
- layer_range: [0, 32]
model: mistralai/Codestral-22B-v0.1
- layer_range: [23, 55]
model: mistralai/Codestral-22B-v0.1
parameters:
weight: 0.4
"""
with open('config.yaml', 'w') as f:
f.write(merge_config)
"""
I used this tody to check it works !! ( i reduced the model to 7b )
I will give it some tests today or allign it :
It took 150 gig hd space !
I used the free google Colab ! T4 :
LeroyDyer/_Spydaz_Web_AI_Codestral_7b
I will try to merge this model with the mathsstral model !
# @title ## Run merge
# @markdown ### Runtime type
# @markdown Select your runtime (CPU, High RAM, GPU)
runtime = "GPU" # @param ["CPU", "CPU + High-RAM", "GPU"]
# @markdown ### Mergekit arguments
# @markdown Use the `main` branch by default, [`mixtral`](https://github.com/cg123/mergekit/blob/mixtral/moe.md) if you want to create a Mixture of Experts.
branch = "main" # @param ["main", "mixtral"]
trust_remote_code = False # @param {type:"boolean"}
# Install mergekit
# Save config as yaml file
with open('config.yaml', 'w', encoding="utf-8") as f:
f.write(merge_config)
# Base CLI
if branch == "main":
cli = "mergekit-yaml config.yaml merge --copy-tokenizer --out-shard-size 5B --write-model-card"
elif branch == "mixtral":
cli = "mergekit-moe config.yaml merge --copy-tokenizer --out-shard-size 5B --write-model-card"
# Additional arguments
if runtime == "CPU":
cli += " --allow-crimes --lazy-unpickle"
elif runtime == "GPU":
cli += " --cuda --low-cpu-memory"
if trust_remote_code:
cli += " --trust-remote-code"
print(cli)
# Merge models
!{cli}
the important factors to consider is ther models you merge need to be simular in the Project (Weights) << ie mistral is 14,464 etc >> As you often get a error regarding inputs and outputs to a layer :
But if you convert the model to 32 layers ie 22b to 7b then you should pick the last layers and not the first layers :
If you wish to merge it with a normal 32 layers then you can add the configuration with the in and out : but i would convert the codestral to 7b first and then merge it after with a 7b model :
But i would actually merge and align ! so i would get a standard dta set and train the trimmed model until the tensors line back up , but being the last layers they will have coherance !
anyway try and see what happens ~
can you please be a little elaborative?
Here is the final product ( I first merged it down to 32 layers ) compressing the model with the merge shown
0-32
23-52
LeroyDyer/_Spydaz_Web_AI_Codestral_12b
Hmm it ame out to 12b !
then i linear merged the product with itself :
SO the new model was merged linera with itsef
LeroyDyer/_Spydaz_Web_AI_Codestral_12b_LM :
I did npt reduce the model a second time i decided to merge to model to itself to allign its layers and enforce a smooth connection :
I did not test the model ! YET!
I was dissapointed because it came to 12b ( 32 layers is the sweet spot for models ) So we can see these bad setting they used to create the model !
it could not allign with the other models ! even the nemo it was unsucessful : this was due to the settings they used ! they are not uniform : this changes to the embeddings ( vocab and tokenizers ) Number 1 mistake !
and the hidden layer sizes ! ( sencomdmistake ) all these mismatch models are nt compatible with them selfs !
sad to say !