Instructions to use Joseph717171/Mistral-12.25B-v0.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Joseph717171/Mistral-12.25B-v0.2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Joseph717171/Mistral-12.25B-v0.2")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Joseph717171/Mistral-12.25B-v0.2")
model = AutoModelForCausalLM.from_pretrained("Joseph717171/Mistral-12.25B-v0.2")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Joseph717171/Mistral-12.25B-v0.2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Joseph717171/Mistral-12.25B-v0.2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Joseph717171/Mistral-12.25B-v0.2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Joseph717171/Mistral-12.25B-v0.2

SGLang

How to use Joseph717171/Mistral-12.25B-v0.2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Joseph717171/Mistral-12.25B-v0.2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Joseph717171/Mistral-12.25B-v0.2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Joseph717171/Mistral-12.25B-v0.2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Joseph717171/Mistral-12.25B-v0.2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Joseph717171/Mistral-12.25B-v0.2 with Docker Model Runner:
```
docker model run hf.co/Joseph717171/Mistral-12.25B-v0.2
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Credit for the model card's description goes to ddh0 and mergekit

Credit for access and conversion of Mistral-7B-v0.2 goes to alpindale (from MistalAI's weights to HF Transformers)

Mistral-12.25B-v0.2

This is Mistral-12.25B-v0.2, a depth-upscaled version of alpindale/Mistral-7B-v0.2-hf.

This model is intended to be used as a basis for further fine-tuning, or as a drop-in upgrade from the original 7 billion parameter model.

Paper detailing how Depth-Up Scaling works: SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

This is a merge of pre-trained language models created using mergekit.

UpStage's conclusionary limitations of their research:

"Our study on the Depth Up-Scaling (DUS) has important limitations and considerations. One key limitation is the need for more thorough explorations of hyperparameters used in the DUS approach. Namely, we removed m = 8 layers from both ends of our base model, primarily due to hardware limitations. However, we have not yet determined if this value is optimal for enhancing performance. The extended time and cost of continued pretraining made it challenging to conduct more comprehensive experiments, which we aim to address in future work through various comparative analyses."

This model was made to help test whether 10.7B parameters (m = 8) is better or worse than m < 8 (10.7B+ parameters)

Merge Details

Merge Method

This model was merged using the passthrough merge method.

Models Merged

The following models were included in the merge:

/Users/jsarnecki/opt/Workspace/alpindale/Mistral-7B-v0.2-hf

Configuration

The following YAML configuration was used to produce this model:

dtype: bfloat16
merge_method: passthrough
# Depth UpScaled (DUS) version of Mistral-7B-v0.2
# where m = 4 (The number of layers to remove from the model)
#       s = 56 (The number of layers the model will have after the DUS)
slices:
- sources:
  - layer_range: [0, 28]
    model: /Users/jsarnecki/opt/Workspace/alpindale/Mistral-7B-v0.2-hf
- sources:
  - layer_range: [4, 32]
    model: /Users/jsarnecki/opt/Workspace/alpindale/Mistral-7B-v0.2-hf

exllama (Thanks to blockblockblock)

https:// huggingface.co/blockblockblock/ Mistral-12.25B-Instruct-vO.2-bpw2.5 https://huggingtace.co/blockblockblock/ Mistral-12.25B-Instruct-v0.2-bpw3 https://huggingface.co/blockblockblock/ Mistral-12.25B-Instruct-vO.2-bpw3.5 https://huggingface.co/blockblockblock/ Mistral-12.25B-Instruct-vO.2-bpw3.7 https://huggingface.co/blockblockblock/ Mistral-12.25B-Instruct-vO.2-bpw4 https://huggingface.co/blockblockblock/ Mistral-12.25B-Instruct-v0.2-bpw4.2 https://huggingface.co/blockblockblock/ Mistral-12.25B-Instruct-v0.2-bpw4.4 https://huggingface.co/blockblockblock/ Mistral-12.25B-Instruct-v0.2-bpw4.6 https://huggingface.co/blockblockblock/ Mistral-12.25B-Instruct-vO.2-bpw4.8 https://huggingface.co/blockblockblock/ Mistral-12.25B-Instruct-v0.2-bpw5 https://huggingface.co/blockblockblock/ Mistral-12.25B-Instruct-vO.2-bpw5.5 https://huggingface.co/blockblockblock/ Mistral-12.25B-Instruct-v0.2-bpw6

Downloads last month: 61

Safetensors

Model size

12B params

Tensor type

BF16

Model tree for Joseph717171/Mistral-12.25B-v0.2

Quantizations

2 models

Paper for Joseph717171/Mistral-12.25B-v0.2

SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling

Paper • 2312.15166 • Published Dec 23, 2023 • 61