Instructions to use aws-neuron/Mistral-neuron with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use aws-neuron/Mistral-neuron with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="aws-neuron/Mistral-neuron")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("aws-neuron/Mistral-neuron")
model = AutoModelForCausalLM.from_pretrained("aws-neuron/Mistral-neuron")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use aws-neuron/Mistral-neuron with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "aws-neuron/Mistral-neuron"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aws-neuron/Mistral-neuron",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/aws-neuron/Mistral-neuron

SGLang

How to use aws-neuron/Mistral-neuron with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "aws-neuron/Mistral-neuron" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aws-neuron/Mistral-neuron",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "aws-neuron/Mistral-neuron" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aws-neuron/Mistral-neuron",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use aws-neuron/Mistral-neuron with Docker Model Runner:
```
docker model run hf.co/aws-neuron/Mistral-neuron
```

jburtoft commited on Jan 3, 2024

Commit

483bf93

1 Parent(s): a782784

3a9c3fc5dd2478d36f41dc0609b6521fd14e3ae4d1aad35baa1c159e2c539240

Browse files

Files changed (19) hide show

pytorch_model.bin/p150.model.layers.16.mlp.up_proj.weight +3 -0
pytorch_model.bin/p151.model.layers.16.mlp.down_proj.weight +3 -0
pytorch_model.bin/p152.model.layers.16.input_layernorm.weight +3 -0
pytorch_model.bin/p153.model.layers.16.post_attention_layernorm.weight +3 -0
pytorch_model.bin/p154.model.layers.17.self_attn.q_proj.weight +3 -0
pytorch_model.bin/p155.model.layers.17.self_attn.k_proj.weight +3 -0
pytorch_model.bin/p156.model.layers.17.self_attn.v_proj.weight +3 -0
pytorch_model.bin/p157.model.layers.17.self_attn.o_proj.weight +3 -0
pytorch_model.bin/p158.model.layers.17.mlp.gate_proj.weight +3 -0
pytorch_model.bin/p159.model.layers.17.mlp.up_proj.weight +3 -0
pytorch_model.bin/p16.model.layers.1.mlp.down_proj.weight +3 -0
pytorch_model.bin/p160.model.layers.17.mlp.down_proj.weight +3 -0
pytorch_model.bin/p161.model.layers.17.input_layernorm.weight +3 -0
pytorch_model.bin/p162.model.layers.17.post_attention_layernorm.weight +3 -0
pytorch_model.bin/p163.model.layers.18.self_attn.q_proj.weight +3 -0
pytorch_model.bin/p164.model.layers.18.self_attn.k_proj.weight +3 -0
pytorch_model.bin/p165.model.layers.18.self_attn.v_proj.weight +3 -0
pytorch_model.bin/p166.model.layers.18.self_attn.o_proj.weight +3 -0
pytorch_model.bin/p167.model.layers.18.mlp.gate_proj.weight +3 -0

pytorch_model.bin/p150.model.layers.16.mlp.up_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:025794f5c32d8898a68da81d565db205e223d0ce2eaaabd9084c35c3f9bd5dd4
+size 234881910

pytorch_model.bin/p151.model.layers.16.mlp.down_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:63018e4557735d5747947cf78a39d66b0378d1d9bda89a970e5d909658f66e5b
+size 234881916

pytorch_model.bin/p152.model.layers.16.input_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ba14717df190ed288576d405060aac43348be63720131e713ae6cb8e70d438a3
+size 17282

pytorch_model.bin/p153.model.layers.16.post_attention_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:074e9422861c9f91c697ff8cfd36ebe99cc2df028dbc562cc0ca34c44a7ba44e
+size 17309

pytorch_model.bin/p154.model.layers.17.self_attn.q_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:82586c82e6097e86bfc6718cb68d8f8e8292eae323d8c056a5e04b8cdfa809cd
+size 67109765

pytorch_model.bin/p155.model.layers.17.self_attn.k_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f3fe7444ecbba47b1d751d6bf2c1d62680110e233547c132bdfa31c2bacd8b27
+size 16778117

pytorch_model.bin/p156.model.layers.17.self_attn.v_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7089b6f60df8a5a3329009d961f043636382f3466c96bc3ef913e3fa2f715768
+size 16778117

pytorch_model.bin/p157.model.layers.17.self_attn.o_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6490ae5bf7385966e131b239ed0087daa2c2a57e93433e242ac714f889d9352c
+size 67109765

pytorch_model.bin/p158.model.layers.17.mlp.gate_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c1b5a90a3416b757a22fbeee007406f5ecf0af86547cbcdec9dc3c944e515023
+size 234881916

pytorch_model.bin/p159.model.layers.17.mlp.up_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:93a768630dd036f50548283f4c991dc1a87a499e4f9a1e098cb1ba75061c1bb3
+size 234881910

pytorch_model.bin/p16.model.layers.1.mlp.down_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ef3f917b7b5dcf8041d4bd835516bff2636d2368490323f016537127b28df841
+size 234881910

pytorch_model.bin/p160.model.layers.17.mlp.down_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ad87b46bf02f172a418577c47c1356ed0c3826a1fe4b5e1cdf7e26921894fc28
+size 234881916

pytorch_model.bin/p161.model.layers.17.input_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:80f7433b570ee97438c33ae07b8634042ae6b6d68dedc315403bc7effdbb7da0
+size 17282

pytorch_model.bin/p162.model.layers.17.post_attention_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ca2d4d840ce5f3a0b8615010aff98917ff663e45ea6c4e2815707c2659f6818c
+size 17309

pytorch_model.bin/p163.model.layers.18.self_attn.q_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:85062d58e29f3f2ae295d688ae4184690bc18cd7c9ce6e2265bface9a3ddf989
+size 67109765

pytorch_model.bin/p164.model.layers.18.self_attn.k_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:62fb8501dfddd6037b7f189485042a88ee9367dfcbea88ae2e1db3c1a63ffbc1
+size 16778117

pytorch_model.bin/p165.model.layers.18.self_attn.v_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2a6fba6d56fb36bea878c6c0ba66ea10c1f2b8551345aaea90b18cb0156d87df
+size 16778117

pytorch_model.bin/p166.model.layers.18.self_attn.o_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:25c4370ccc9a1357c36e7c8d6009017576453f7c0134d4f31cbb38b021af0b06
+size 67109765

pytorch_model.bin/p167.model.layers.18.mlp.gate_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5995cd642c3af096eb0dde7b664584b79cb436121bb757065ca3f8d811f51417
+size 234881916