Instructions to use aws-neuron/Mistral-neuron with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use aws-neuron/Mistral-neuron with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="aws-neuron/Mistral-neuron")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("aws-neuron/Mistral-neuron")
model = AutoModelForCausalLM.from_pretrained("aws-neuron/Mistral-neuron")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use aws-neuron/Mistral-neuron with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "aws-neuron/Mistral-neuron"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aws-neuron/Mistral-neuron",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/aws-neuron/Mistral-neuron

SGLang

How to use aws-neuron/Mistral-neuron with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "aws-neuron/Mistral-neuron" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aws-neuron/Mistral-neuron",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "aws-neuron/Mistral-neuron" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aws-neuron/Mistral-neuron",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use aws-neuron/Mistral-neuron with Docker Model Runner:
```
docker model run hf.co/aws-neuron/Mistral-neuron
```

jburtoft commited on Jan 3, 2024

Commit

300c81e

1 Parent(s): 84b8790

74f98ed453da674f39ed8a82f7aa3e784d48170b05c3171ee74e6d8151561251

Browse files

Files changed (18) hide show

pytorch_model.bin/p59.model.layers.6.mlp.gate_proj.weight +3 -0
pytorch_model.bin/p6.model.layers.0.mlp.up_proj.weight +3 -0
pytorch_model.bin/p60.model.layers.6.mlp.up_proj.weight +3 -0
pytorch_model.bin/p61.model.layers.6.mlp.down_proj.weight +3 -0
pytorch_model.bin/p62.model.layers.6.input_layernorm.weight +3 -0
pytorch_model.bin/p63.model.layers.6.post_attention_layernorm.weight +3 -0
pytorch_model.bin/p64.model.layers.7.self_attn.q_proj.weight +3 -0
pytorch_model.bin/p65.model.layers.7.self_attn.k_proj.weight +3 -0
pytorch_model.bin/p66.model.layers.7.self_attn.v_proj.weight +3 -0
pytorch_model.bin/p67.model.layers.7.self_attn.o_proj.weight +3 -0
pytorch_model.bin/p68.model.layers.7.mlp.gate_proj.weight +3 -0
pytorch_model.bin/p69.model.layers.7.mlp.up_proj.weight +3 -0
pytorch_model.bin/p7.model.layers.0.mlp.down_proj.weight +3 -0
pytorch_model.bin/p70.model.layers.7.mlp.down_proj.weight +3 -0
pytorch_model.bin/p71.model.layers.7.input_layernorm.weight +3 -0
pytorch_model.bin/p72.model.layers.7.post_attention_layernorm.weight +3 -0
pytorch_model.bin/p73.model.layers.8.self_attn.q_proj.weight +3 -0
pytorch_model.bin/p74.model.layers.8.self_attn.k_proj.weight +3 -0

pytorch_model.bin/p59.model.layers.6.mlp.gate_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6c6d1e7c823c62f89ab6e1209e1fdd1766f30a81a181358f8fe68b344b777be8
+size 234881910

pytorch_model.bin/p6.model.layers.0.mlp.up_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:93bcc81ff55db4515577bedab9cb6fb8aa946edb9abe77f66aa71f28eeb0c608
+size 234881901

pytorch_model.bin/p60.model.layers.6.mlp.up_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:22505305a75ae0e434215721a5dadda3d0c58ab37a8af2cb2edb33f6ccd3c4d4
+size 234881904

pytorch_model.bin/p61.model.layers.6.mlp.down_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5e6c2af01e7f59982ba2749243dd3ef611adec5ffe5ea8743c861d93f0df2373
+size 234881910

pytorch_model.bin/p62.model.layers.6.input_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e85898fdf4baa06c3cca43a2703bbcad2080d657deec63eaeb4b48a57c2051ab
+size 17276

pytorch_model.bin/p63.model.layers.6.post_attention_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:95a687a8cf2bd6ddbb16a1cb71b42758a6c3cdc7adf3eb5d5a9ce86b53706cfc
+size 17303

pytorch_model.bin/p64.model.layers.7.self_attn.q_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aaec2b83f991c746cdda381e4fb29b4f8b710d956d1a172ea0c3006718672729
+size 67109759

pytorch_model.bin/p65.model.layers.7.self_attn.k_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0b8082bf28ff599393d549629e8ac487fca17270829c607bda9322c5dcf398b5
+size 16778111

pytorch_model.bin/p66.model.layers.7.self_attn.v_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4ce55d7590441cbf8e16fe2a075e80a7f1179645dda3428d329c7eadc73ac1b3
+size 16778111

pytorch_model.bin/p67.model.layers.7.self_attn.o_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:061570b4066e5cde3534d0516cdc702870c1c09febb1be9a53d362ddd4284b1e
+size 67109759

pytorch_model.bin/p68.model.layers.7.mlp.gate_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2d045c1191053281a3e00373a853918c2ecc7e1de103e32742f9ced484a7cdc3
+size 234881910

pytorch_model.bin/p69.model.layers.7.mlp.up_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0e47abbf908391b362e5280cd302038223fb4702dd0c7164c1fb54aa7dd77061
+size 234881904

pytorch_model.bin/p7.model.layers.0.mlp.down_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2735559ac7dc8c15f446e5c98a68d4533b6cc8b6b3c9f5e443740a01e65fc8fd
+size 234881907

pytorch_model.bin/p70.model.layers.7.mlp.down_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c373df847f3cb8c72af964c731cfebd4e0ee137903c3d1ea6854c3694afe9f9
+size 234881910

pytorch_model.bin/p71.model.layers.7.input_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:44b685d549fe641d8f38dd918a0b48a6afd10f51859bf0dc931d5c8b2345ee04
+size 17276

pytorch_model.bin/p72.model.layers.7.post_attention_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:51c50efaaaca07acfe15ce4c99967d9ec9e4294f5fff15738741b88a0d65a848
+size 17303

pytorch_model.bin/p73.model.layers.8.self_attn.q_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bf873632de22c477573e0827d092aeed54811327fd7b1bc9f974d5347789d983
+size 67109759

pytorch_model.bin/p74.model.layers.8.self_attn.k_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9d0766dff7a231a33e5bc2eb418d2881e47925e22a13295b0e2abc8820751775
+size 16778111