Failed to deploy on Amazon SageMaker ml.g5.4xlarge instance

#11
by Khalizo - opened

Hi there,

I tried to deploy this model on the ml.g5.4xlarge but it failed. Please what instance does this work?

Script:
import boto3
import sagemaker
from sagemaker.huggingface import HuggingFaceModel

role = 'insert role here'

Define model configuration

model_id = 'google/gemma-3-1b-it'
model_name = model_id.split('/')[-1]
instance_count = 1
instance_type = 'ml.g5.4xlarge'
endpoint_name = f'{model_name}-{instance_count}'
region = 'eu-west-1'

create a boto3 session with the region

session = boto3.Session(region_name=region)
sagemaker_session = sagemaker.Session(boto_session=session)

Hub Model configuration

hub = {
'HF_MODEL_ID': model_id,
'HF_TASK': 'text-generation'
}

Create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
transformers_version='4.37.0',
pytorch_version='2.1.0',
py_version='py310',
env=hub,
role=role,
sagemaker_session=sagemaker_session
)

Deploy model to SageMaker Inference

print(f"Deploying model {model_name} to endpoint {endpoint_name}...")
predictor = huggingface_model.deploy(
initial_instance_count=instance_count,
instance_type=instance_type,
endpoint_name=endpoint_name
)

print(f"Model deployed successfully to endpoint: {endpoint_name}")
print(f"Endpoint URL: https://runtime.sagemaker.eu-west-1.amazonaws.com/endpoints/{endpoint_name}/invocations")

Khalizo changed discussion status to closed

Sign up or log in to comment