Failed to deploy on Amazon SageMaker ml.g5.4xlarge instance
Hi there,
I tried to deploy this model on the ml.g5.4xlarge but it failed. Please what instance does this work?
Script:
import boto3
import sagemaker
from sagemaker.huggingface import HuggingFaceModel
role = 'insert role here'
Define model configuration
model_id = 'google/gemma-3-1b-it'
model_name = model_id.split('/')[-1]
instance_count = 1
instance_type = 'ml.g5.4xlarge'
endpoint_name = f'{model_name}-{instance_count}'
region = 'eu-west-1'
create a boto3 session with the region
session = boto3.Session(region_name=region)
sagemaker_session = sagemaker.Session(boto_session=session)
Hub Model configuration
hub = {
'HF_MODEL_ID': model_id,
'HF_TASK': 'text-generation'
}
Create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.37.0',
pytorch_version='2.1.0',
py_version='py310',
env=hub,
role=role,
sagemaker_session=sagemaker_session
)
Deploy model to SageMaker Inference
print(f"Deploying model {model_name} to endpoint {endpoint_name}...")
predictor = huggingface_model.deploy(
initial_instance_count=instance_count,
instance_type=instance_type,
endpoint_name=endpoint_name
)
print(f"Model deployed successfully to endpoint: {endpoint_name}")
print(f"Endpoint URL: https://runtime.sagemaker.eu-west-1.amazonaws.com/endpoints/{endpoint_name}/invocations")