Instructions to use jburtoft/Llama-2-7b-split-transformers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jburtoft/Llama-2-7b-split-transformers with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jburtoft/Llama-2-7b-split-transformers")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("jburtoft/Llama-2-7b-split-transformers")
model = AutoModelForCausalLM.from_pretrained("jburtoft/Llama-2-7b-split-transformers")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use jburtoft/Llama-2-7b-split-transformers with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jburtoft/Llama-2-7b-split-transformers"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jburtoft/Llama-2-7b-split-transformers",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/jburtoft/Llama-2-7b-split-transformers

SGLang

How to use jburtoft/Llama-2-7b-split-transformers with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jburtoft/Llama-2-7b-split-transformers" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jburtoft/Llama-2-7b-split-transformers",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jburtoft/Llama-2-7b-split-transformers" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jburtoft/Llama-2-7b-split-transformers",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use jburtoft/Llama-2-7b-split-transformers with Docker Model Runner:
```
docker model run hf.co/jburtoft/Llama-2-7b-split-transformers
```

jburtoft commited on Mar 6, 2024

Commit

1edb75f

verified ·

1 Parent(s): 1199b8a

998b1c4d5092e36a29099a8826cd6679daea4c8294537a4aa669e6e6d887a35a

Browse files

Files changed (22) hide show

pytorch_model.bin/p77.model.layers.8.mlp.gate_proj.weight +3 -0
pytorch_model.bin/p78.model.layers.8.mlp.up_proj.weight +3 -0
pytorch_model.bin/p79.model.layers.8.mlp.down_proj.weight +3 -0
pytorch_model.bin/p8.model.layers.0.input_layernorm.weight +3 -0
pytorch_model.bin/p80.model.layers.8.input_layernorm.weight +3 -0
pytorch_model.bin/p81.model.layers.8.post_attention_layernorm.weight +3 -0
pytorch_model.bin/p82.model.layers.9.self_attn.q_proj.weight +3 -0
pytorch_model.bin/p83.model.layers.9.self_attn.k_proj.weight +3 -0
pytorch_model.bin/p84.model.layers.9.self_attn.v_proj.weight +3 -0
pytorch_model.bin/p85.model.layers.9.self_attn.o_proj.weight +3 -0
pytorch_model.bin/p86.model.layers.9.mlp.gate_proj.weight +3 -0
pytorch_model.bin/p87.model.layers.9.mlp.up_proj.weight +3 -0
pytorch_model.bin/p88.model.layers.9.mlp.down_proj.weight +3 -0
pytorch_model.bin/p89.model.layers.9.input_layernorm.weight +3 -0
pytorch_model.bin/p9.model.layers.0.post_attention_layernorm.weight +3 -0
pytorch_model.bin/p90.model.layers.9.post_attention_layernorm.weight +3 -0
pytorch_model.bin/p91.model.layers.10.self_attn.q_proj.weight +3 -0
pytorch_model.bin/p92.model.layers.10.self_attn.k_proj.weight +3 -0
pytorch_model.bin/p93.model.layers.10.self_attn.v_proj.weight +3 -0
pytorch_model.bin/p94.model.layers.10.self_attn.o_proj.weight +3 -0
pytorch_model.bin/p95.model.layers.10.mlp.gate_proj.weight +3 -0
pytorch_model.bin/p96.model.layers.10.mlp.up_proj.weight +3 -0

pytorch_model.bin/p77.model.layers.8.mlp.gate_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6266be1dae85651bb1dd490d62eacc87e05daea544414d4e68546503e1c37574
+size 180355958

pytorch_model.bin/p78.model.layers.8.mlp.up_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b290000536fcdc58af1e23acf40cb4184bec29b3e0c532b2cbcd281a409a2eda
+size 180355952

pytorch_model.bin/p79.model.layers.8.mlp.down_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a6af41c4f5dc8739c5b0c0dfe9db228d1f9667e9eb223b02614fc95cf104163a
+size 180355958

pytorch_model.bin/p8.model.layers.0.input_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8722a413e29dc0a9314022e0ff4503faf0d791cc171529ef46c17787576bfbe2
+size 17273

pytorch_model.bin/p80.model.layers.8.input_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:578e4cda501a5c3a5285d9fe81a836d48d3cfb99ee4bee2ed57e98dc06fd09c8
+size 17276

pytorch_model.bin/p81.model.layers.8.post_attention_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17d6a5ec9a916319b8fcfda563b1548d3f99b54786a852e58586c47a5552cb18
+size 17303

pytorch_model.bin/p82.model.layers.9.self_attn.q_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d189480a221d069489a679d69dc677378fe0942fc7ae76bd5db891978b7f9a65
+size 67109759

pytorch_model.bin/p83.model.layers.9.self_attn.k_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0d0efadd85fa1a81325a676e3a1af0b5401c6007655b081b0bbf9be90eb17d6b
+size 67109759

pytorch_model.bin/p84.model.layers.9.self_attn.v_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:caa50f40a6d27e0a4c549d3ef57070b11765a8b4f3852b588b8512e8a93c624e
+size 67109759

pytorch_model.bin/p85.model.layers.9.self_attn.o_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e2a8b3e66ae17c0557b3dbebc31e3d3504c363e538e57b15aef3a4a1cc616ed8
+size 67109759

pytorch_model.bin/p86.model.layers.9.mlp.gate_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fd90c33e7402ecffa111e6866095978476ba8a1e9dcd48553ca6ce5628417254
+size 180355958

pytorch_model.bin/p87.model.layers.9.mlp.up_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9a0a81b5d6f734bcf4dc31475dc99280dfce205d5278f8ec6849136d5557ca4b
+size 180355952

pytorch_model.bin/p88.model.layers.9.mlp.down_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:be77227b34ff6d401bdbbc3749a44ab058c0cc1c0a0a55c8ca9b6e6c8a4074f9
+size 180355958

pytorch_model.bin/p89.model.layers.9.input_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:054edbe807f70d685de965a837a2ebad55447b1c3cb8d39c0f7ecc0ee568feff
+size 17276

pytorch_model.bin/p9.model.layers.0.post_attention_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4805ab6fafe1b0894b91c2fcfcd9b74e6cf2c584b0a292af5c45500f2d5b6f2c
+size 17300

pytorch_model.bin/p90.model.layers.9.post_attention_layernorm.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:311d4abdc7f61e704f17df857f046b656790b2a55bc3e83e0b977d626894d3f0
+size 17303

pytorch_model.bin/p91.model.layers.10.self_attn.q_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66a9fe48424b68fd7d8785f6cdfa6c66a03528be6607cd64f7c1e0ca07521e70
+size 67109762

pytorch_model.bin/p92.model.layers.10.self_attn.k_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:78f41ad8ea094065aef9be3ceafc03580c9cf09661962cebca2f1c49ea99e4d8
+size 67109762

pytorch_model.bin/p93.model.layers.10.self_attn.v_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:46341bd103524ceb780a7d7437cea9e6ee1a49e89ea56d409836abdce432d5c1
+size 67109762

pytorch_model.bin/p94.model.layers.10.self_attn.o_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a429d1f860f666362306e048818229b0e7deba6dcdb61eb43294ad8ceddefa9a
+size 67109762

pytorch_model.bin/p95.model.layers.10.mlp.gate_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:998360b6efe47b7700ffa72c51f9887670d96e108ad070aa56bf4f3b0d271ccd
+size 180355961

pytorch_model.bin/p96.model.layers.10.mlp.up_proj.weight ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:46c7439b8bbb66c38f709a7bd017a53580eb186ff7a71cc6d5fcc6e67f7e732f
+size 180355955