Instructions to use aws-neuron/gpt2-seqlen-1024-bs-16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use aws-neuron/gpt2-seqlen-1024-bs-16 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="aws-neuron/gpt2-seqlen-1024-bs-16")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("aws-neuron/gpt2-seqlen-1024-bs-16")
model = AutoModelForCausalLM.from_pretrained("aws-neuron/gpt2-seqlen-1024-bs-16")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use aws-neuron/gpt2-seqlen-1024-bs-16 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "aws-neuron/gpt2-seqlen-1024-bs-16"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aws-neuron/gpt2-seqlen-1024-bs-16",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/aws-neuron/gpt2-seqlen-1024-bs-16

SGLang

How to use aws-neuron/gpt2-seqlen-1024-bs-16 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "aws-neuron/gpt2-seqlen-1024-bs-16" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aws-neuron/gpt2-seqlen-1024-bs-16",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "aws-neuron/gpt2-seqlen-1024-bs-16" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "aws-neuron/gpt2-seqlen-1024-bs-16",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use aws-neuron/gpt2-seqlen-1024-bs-16 with Docker Model Runner:
```
docker model run hf.co/aws-neuron/gpt2-seqlen-1024-bs-16
```

dacorvo HF Staff commited on Nov 16, 2023

Commit

b39e7cf

1 Parent(s): 181621f

Upload folder using huggingface_hub

Browse files

Files changed (8) hide show

checkpoint/config.json +1 -1
checkpoint/generation_config.json +1 -1
compiled/336a8139cdb6477a7420.neff +1 -1
compiled/91f58547ef748349c68d.neff +0 -0
compiled/964a9622cc995bfa9d99.neff +1 -1
compiled/b9f29083c128f3826c32.neff +0 -0
config.json +2 -2
generation_config.json +1 -1

checkpoint/config.json CHANGED Viewed

@@ -33,7 +33,7 @@
     }
   },
   "torch_dtype": "float32",
-  "transformers_version": "4.35.0",
   "use_cache": true,
   "vocab_size": 50257
 }

     }
   },
   "torch_dtype": "float32",
+  "transformers_version": "4.35.2",
   "use_cache": true,
   "vocab_size": 50257
 }

checkpoint/generation_config.json CHANGED Viewed

@@ -2,5 +2,5 @@
   "_from_model_config": true,
   "bos_token_id": 50256,
   "eos_token_id": 50256,
-  "transformers_version": "4.35.0"
 }

   "_from_model_config": true,
   "bos_token_id": 50256,
   "eos_token_id": 50256,
+  "transformers_version": "4.35.2"
 }

compiled/336a8139cdb6477a7420.neff CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:aaed68685ef9b886b175c16fe89853cffd8f514316f3f2560a93a7568645ad5a
 size 1076224

 version https://git-lfs.github.com/spec/v1
+oid sha256:fcef49b2398cf3fe4a22a721d5291997728528f50f27be1944e2a1f2352150c6
 size 1076224

compiled/91f58547ef748349c68d.neff CHANGED Viewed

Binary files a/compiled/91f58547ef748349c68d.neff and b/compiled/91f58547ef748349c68d.neff differ

compiled/964a9622cc995bfa9d99.neff CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:783dca1f00c01d31befa017573d55d79cb4ff19f64c500d632cf2e23cdba2070
 size 1629184

 version https://git-lfs.github.com/spec/v1
+oid sha256:ffc73e3343e81dba09c27c4a0f5128789acc2f88ce71ff9e976a4eca9a255be8
 size 1629184

compiled/b9f29083c128f3826c32.neff CHANGED Viewed

Binary files a/compiled/b9f29083c128f3826c32.neff and b/compiled/b9f29083c128f3826c32.neff differ

config.json CHANGED Viewed

@@ -21,7 +21,7 @@
     "auto_cast_type": "fp32",
     "batch_size": 16,
     "compiler_type": "neuronx-cc",
-    "compiler_version": "2.11.0.35+4f5279863",
     "num_cores": 2,
     "sequence_length": 1024,
     "task": "text-generation"
@@ -41,7 +41,7 @@
       "max_length": 50
     }
   },
-  "transformers_version": "4.35.0",
   "use_cache": true,
   "vocab_size": 50257
 }

     "auto_cast_type": "fp32",
     "batch_size": 16,
     "compiler_type": "neuronx-cc",
+    "compiler_version": "2.11.0.34+c5231f848",
     "num_cores": 2,
     "sequence_length": 1024,
     "task": "text-generation"
       "max_length": 50
     }
   },
+  "transformers_version": "4.35.2",
   "use_cache": true,
   "vocab_size": 50257
 }

generation_config.json CHANGED Viewed

@@ -2,5 +2,5 @@
   "_from_model_config": true,
   "bos_token_id": 50256,
   "eos_token_id": 50256,
-  "transformers_version": "4.35.0"
 }

   "_from_model_config": true,
   "bos_token_id": 50256,
   "eos_token_id": 50256,
+  "transformers_version": "4.35.2"
 }