Instructions to use aws-neuron/Mistral-neuron with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use aws-neuron/Mistral-neuron with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="aws-neuron/Mistral-neuron")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("aws-neuron/Mistral-neuron")
model = AutoModelForCausalLM.from_pretrained("aws-neuron/Mistral-neuron")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use aws-neuron/Mistral-neuron with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "aws-neuron/Mistral-neuron"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aws-neuron/Mistral-neuron",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/aws-neuron/Mistral-neuron

SGLang

How to use aws-neuron/Mistral-neuron with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "aws-neuron/Mistral-neuron" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aws-neuron/Mistral-neuron",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "aws-neuron/Mistral-neuron" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aws-neuron/Mistral-neuron",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use aws-neuron/Mistral-neuron with Docker Model Runner:
```
docker model run hf.co/aws-neuron/Mistral-neuron
```

jburtoft commited on Jan 3, 2024

Commit

749a894

1 Parent(s): 88210cb

e5bbc6ac1518183adffe256ac9df49f53b150c0d395ea45f14d8d97b684e7541

Browse files

Files changed (24) hide show

pytorch_model.bin/p187.model.layers.20.mlp.down_proj.weight +3 -0
pytorch_model.bin/p188.model.layers.20.input_layernorm.weight +3 -0
pytorch_model.bin/p189.model.layers.20.post_attention_layernorm.weight +3 -0
pytorch_model.bin/p19.model.layers.2.self_attn.q_proj.weight +3 -0
pytorch_model.bin/p190.model.layers.21.self_attn.q_proj.weight +3 -0
pytorch_model.bin/p191.model.layers.21.self_attn.k_proj.weight +3 -0
pytorch_model.bin/p192.model.layers.21.self_attn.v_proj.weight +3 -0
pytorch_model.bin/p193.model.layers.21.self_attn.o_proj.weight +3 -0
pytorch_model.bin/p194.model.layers.21.mlp.gate_proj.weight +3 -0
pytorch_model.bin/p195.model.layers.21.mlp.up_proj.weight +3 -0
pytorch_model.bin/p196.model.layers.21.mlp.down_proj.weight +3 -0
pytorch_model.bin/p197.model.layers.21.input_layernorm.weight +3 -0
pytorch_model.bin/p198.model.layers.21.post_attention_layernorm.weight +3 -0
pytorch_model.bin/p199.model.layers.22.self_attn.q_proj.weight +3 -0
pytorch_model.bin/p2.model.layers.0.self_attn.k_proj.weight +3 -0
pytorch_model.bin/p20.model.layers.2.self_attn.k_proj.weight +3 -0
pytorch_model.bin/p200.model.layers.22.self_attn.k_proj.weight +3 -0
pytorch_model.bin/p201.model.layers.22.self_attn.v_proj.weight +3 -0
pytorch_model.bin/p202.model.layers.22.self_attn.o_proj.weight +3 -0
pytorch_model.bin/p203.model.layers.22.mlp.gate_proj.weight +3 -0
pytorch_model.bin/p204.model.layers.22.mlp.up_proj.weight +3 -0
pytorch_model.bin/p205.model.layers.22.mlp.down_proj.weight +3 -0
pytorch_model.bin/p206.model.layers.22.input_layernorm.weight +3 -0
pytorch_model.bin/p207.model.layers.22.post_attention_layernorm.weight +3 -0

pytorch_model.bin/p187.model.layers.20.mlp.down_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2b35b3d21e9d152f4e379513beb1f118477915187fdafc241b95f68362f4cdda
+size 234881916

pytorch_model.bin/p188.model.layers.20.input_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e602df310ed864b0fef4e102d2c27729a934554406bad2a47b36becaa8133860
+size 17282

pytorch_model.bin/p189.model.layers.20.post_attention_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:36b08bffc6a977caecac5a9b75a915484c313e28327327526cf1391ac4b401e0
+size 17309

pytorch_model.bin/p19.model.layers.2.self_attn.q_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:34faf042dc3e0e764d535021c3a5baf5b180f792b05240641cce28e8071ea551
+size 67109759

pytorch_model.bin/p190.model.layers.21.self_attn.q_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:203b554798af6a3ba5159ed483a01d5749e245f5b63a58de6da1562a794e0aa8
+size 67109765

pytorch_model.bin/p191.model.layers.21.self_attn.k_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dcb370d37571f461d3e0e10e02edb23a1125f56725dab8945dba880b5fe7977e
+size 16778117

pytorch_model.bin/p192.model.layers.21.self_attn.v_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:789731477e4ba2b4fe00b384dacb1dd9808b13b50917a476842e14ac8187233c
+size 16778117

pytorch_model.bin/p193.model.layers.21.self_attn.o_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:74719cda63dd0bf6c5d8bc8ebe1fed0b036a415c2587151a6bb878b56ef6fb6d
+size 67109765

pytorch_model.bin/p194.model.layers.21.mlp.gate_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:91f63290ff49ec1afeaa5b20f65d084cd3d58dc19ce9bc2f46b9e0911d67419c
+size 234881916

pytorch_model.bin/p195.model.layers.21.mlp.up_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ea439ab881b26523ab7335105ab5e11d90f22c2d839e2ac979a62d91cdc8e9e3
+size 234881910

pytorch_model.bin/p196.model.layers.21.mlp.down_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c68ecc52d9c96e8934062b62a6f65a21e095f0209c8fb6495f0e8ec9c772dcab
+size 234881916

pytorch_model.bin/p197.model.layers.21.input_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:62239a789e338e0dc3b2402adb49a723881601babec999bed7b89b8a7da2e17b
+size 17282

pytorch_model.bin/p198.model.layers.21.post_attention_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:871e7fedb488f26423694d5df3457a315d19bd9337c3ec9604c87f6388f0446e
+size 17309

pytorch_model.bin/p199.model.layers.22.self_attn.q_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d723fd506dd34e5aaa60a93ead530c2828ddb7ec92f76061bd2b92b46f6e2872
+size 67109765

pytorch_model.bin/p2.model.layers.0.self_attn.k_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:532904415395bce99a219cc7c1f4f100796ad2ae36e9de6b7752643cd43628f3
+size 16778108

pytorch_model.bin/p20.model.layers.2.self_attn.k_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ee8a3b701e64834bf45b7f2949bc2c5e5ad6421a15e2052f366df73b7c84b520
+size 16778111

pytorch_model.bin/p200.model.layers.22.self_attn.k_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:57d304a40597e76bee146d22a5ea7079e490a2b8bc7216dba270994b630ed2ad
+size 16778117

pytorch_model.bin/p201.model.layers.22.self_attn.v_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8a73c0f25b29dc2882b91d2e6a1c6ec3474e3242c577725a89e5bc3eb51d9463
+size 16778117

pytorch_model.bin/p202.model.layers.22.self_attn.o_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:636ee3a06d8ca2571ddd3ad7dfea0d7ed3666391eab84fb29ad1f86f8c7c97f9
+size 67109765

pytorch_model.bin/p203.model.layers.22.mlp.gate_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:049fb82ee7a6c0ce7c9084968d4b9912dc775e4c66c992d9aeb928e13778dcbe
+size 234881916

pytorch_model.bin/p204.model.layers.22.mlp.up_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0eb240a5b15390d8c35a43c163adb7a0e97c41d9a4ad729c71464cdc3c9862d4
+size 234881910

pytorch_model.bin/p205.model.layers.22.mlp.down_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:276d9a2160c4139e4299cad90be99cea3ac337039d6b72d85aa5a4a7e70dd54f
+size 234881916

pytorch_model.bin/p206.model.layers.22.input_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3254f17c8c94ba733963ad697eaee36394718c60ce1fbf049252b9de351e916d
+size 17282

pytorch_model.bin/p207.model.layers.22.post_attention_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3275fe61e10ca5cb81e0a71d9bd48c5cec859295813b0f3913c3a58bdf98b047
+size 17309