Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / sagemaker /pr_1995 /en /tutorials /jumpstart /jumpstart-quickstart.md

rtrm

about 1 month ago

preview code

download

raw

4.95 kB

	# Quickstart - Deploy Hugging Face Models with SageMaker Jumpstart

	## Why use SageMaker JumpStart for Hugging Face models?

	Amazon SageMaker JumpStart lets you deploy the most-popular open Hugging Face models with one click—inside your own AWS account. JumpStart offers a curated [selection](https://aws.amazon.com/sagemaker-ai/jumpstart/getting-started/?sagemaker-jumpstart-cards.sort-by=item.additionalFields.model-name&sagemaker-jumpstart-cards.sort-order=asc&awsf.sagemaker-jumpstart-filter-product-type=all&awsf.sagemaker-jumpstart-filter-text=all&awsf.sagemaker-jumpstart-filter-vision=all&awsf.sagemaker-jumpstart-filter-tabular=all&awsf.sagemaker-jumpstart-filter-audio-tasks=all&awsf.sagemaker-jumpstart-filter-multimodal=all&awsf.sagemaker-jumpstart-filter-RL=*all&awsm.page-sagemaker-jumpstart-cards=1&sagemaker-jumpstart-cards.q=qwen&sagemaker-jumpstart-cards.q_operator=AND) of model checkpoints for various tasks, including text generation, embeddings, vision, audio, and more. Most models are deployed using the official [Hugging Face Deep Learning Containers](https://huggingface.co/docs/sagemaker/main/en/dlcs/introduction) with a sensible default instance type, so you can move from idea to production in minutes.

	In this quickstart guide, we will deploy [Qwen/Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct).

	## 1. Prerequisites

	\| \| Requirement \|
	\|---\|-------------\|
	\| AWS account with SageMaker enabled \| An AWS account that will contain all your AWS resources. \|
	\| An IAM role to access SageMaker AI \| Learn more about how IAM works with SageMaker AI in this [guide](https://docs.aws.amazon.com/sagemaker/latest/dg/security-iam.html). \|
	\| SageMaker Studio domain and user profile \| We recommend using SageMaker Studio for straightforward deployment and inference. Follow this [guide](https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-quick-start.html). \|
	\| Service quotas \| Most LLMs need GPU instances (e.g. ml.g5). Verify you have quota for `ml.g5.24xlarge` or [request it](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-requesting-quota-increases.html). \|

	## 2· Endpoint deployment

	Let's explain how you would deploy a Hugging Face model to SageMaker browsing through the Jumpstart catalog:
	1. Open SageMaker → JumpStart.
	2. Filter “Hugging Face” or search for your model (e.g. Qwen2.5-14B).
	3. Click Deploy → (optional) adjust instance size / count → Deploy.
	4. Wait until Endpoints shows In service.
	5. Copy the Endpoint name (or ARN) for later use.

	<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sagemaker/jumpstart-deployment.gif"
	alt="JumpStart deployment demo"
	width="500">

	Alternatively, you can also browse through the Hugging Face Model Hub:
	1. Open the model page → Click Deploy → SageMaker → Jumpstart tab if model is available.
	2. Copy the code snippet and use it from a SageMaker Notebook instance.


	<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/sagemaker/hf-jumpstart-deployment.gif"
	alt="JumpStart deployment demo"
	width="500">

	```python
	# SageMaker JumpStart provides APIs as part of SageMaker SDK that allow you to deploy and fine-tune models in network isolation using scripts that SageMaker maintains.

	from sagemaker.jumpstart.model import JumpStartModel


	model = JumpStartModel(model_id="huggingface-llm-qwen2-5-14b-instruct")
	example_payloads = model.retrieve_all_examples()

	predictor = model.deploy()

	for payload in example_payloads:
	response = predictor.predict(payload.body)
	print("Input:\n", payload.body[payload.prompt_key])
	print("Output:\n", response[0]["generated_text"], "\n\n===============\n")
	```

	The endpoint creation can take several minutes, depending on the size of the model.

	## 3. Test interactively

	If you deployed through the console, you need to grab the endpoint ARN and reuse in your code.
	```python
	from sagemaker.predictor import retrieve_default
	endpoint_name = "MY ENDPOINT NAME"
	predictor = retrieve_default(endpoint_name)
	payload = {
	"messages": [
	{
	"role": "system",
	"content": "You are a passionate data scientist."
	},
	{
	"role": "user",
	"content": "what is machine learning?"
	}
	],
	"max_tokens": 2048,
	"temperature": 0.7,
	"top_p": 0.9,
	"stream": False
	}

	response = predictor.predict(payload)
	print(response)
	```

	The endpoint support the Open AI API specification.

	## 4. Clean‑up

	To avoid incurring unnecessary costs, when you’re done, delete the SageMaker endpoints in the Deployments → Endpoints console or using the following code snippets:
	```python
	predictor.delete_model()
	predictor.delete_endpoint()
	```

	<EditOnGithub source="https://github.com/huggingface/hub-docs/blob/main/docs/sagemaker/source/tutorials/jumpstart/jumpstart-quickstart.md" />

Xet Storage Details

Size:: 4.95 kB
Xet hash:: 977b5047dbe92123c80bba3e1ae5090d1d928f7900dd1b5cbbf79f2b5803fc01

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.