Deploying machine learning (ML) models into production can often be a complex and resource-intensive task, especially for customers without deep ML and DevOps expertise. Amazon SageMaker Canvas simplifies model building by offering a no-code interface, so you can create highly accurate ML models using your existing data sources and without writing a single line of code. But building a model is only half the journey; deploying it efficiently and cost-effectively is just as crucial. Amazon SageMaker Serverless Inference is designed for workloads with variable traffic patterns and idle periods. It automatically provisions and scales infrastructure based on demand, alleviating the need to manage servers or pre-configure capacity.
In this post, we walk through how to take an ML model built in SageMaker Canvas and deploy it using SageMaker Serverless Inference. This solution can help you go from model creation to production-ready predictions quickly, efficiently, and without managing any infrastructure.
To demonstrate serverless endpoint creation for a SageMaker Canvas trained model, let’s explore an example workflow:
You can also automate the process, as illustrated in the following diagram.

In this example, we deploy a pre-trained regression model to a serverless SageMaker endpoint. This way, we can use our model for variable workloads that don’t require real-time inference.
As a prerequisite, you must have access to Amazon Simple Storage Service (Amazon S3) and Amazon SageMaker AI. If you don’t already have a SageMaker AI domain configured in your account, you also need permissions to create a SageMaker AI domain.
You must also have a regression or classification model that you have trained. You can train your SageMaker Canvas model as you normally would. This includes creating the Amazon SageMaker Data Wrangler flow, performing necessary data transformations, and choosing the model training configuration. If you don’t already have a trained model, you can follow one of the labs in the Amazon SageMaker Canvas Immersion Day to create one before continuing. For this example, we use a classification model that was trained on the canvas-sample-shipping-logs.csv sample dataset.
Complete the following steps to save your model to the SageMaker Model Registry:


You can now exit SageMaker Canvas by logging out. To manage costs and prevent additional workspace charges, you can also configure SageMaker Canvas to automatically shut down when idle.
After you have added your model to the Model Registry, complete the following steps:
The model you just exported from SageMaker Canvas should be added with a deployment status of Pending manual approval.

SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT.
Complete the following steps to create a new model:
CompressedModel type.The environment variables will be shown as a single line in SageMaker Studio, with the following format:
SAGEMAKER_DEFAULT_INVOCATIONS_ACCEPT: text/csv, SAGEMAKER_INFERENCE_OUTPUT: predicted_label, SAGEMAKER_INFERENCE_SUPPORTED: predicted_label, SAGEMAKER_PROGRAM: tabular_serve.py, SAGEMAKER_SUBMIT_DIRECTORY: /opt/ml/model/code
You might have different variables than those in the preceding example. All variables from your environment variables should be added to your model. Make sure that each environment variable is on its own line when creating you new model.

Complete the following steps to create an endpoint configuration:

Complete the following steps to create an endpoint:

The endpoint might take a few minutes to be created. When the status is updated to InService, you can begin calling the endpoint.
The following sample code demonstrates how you can call an endpoint from a Jupyter notebook located in your SageMaker Studio environment:
import boto3
import csv
from io import StringIO
import time
def invoke_shipping_prediction(features):
sagemaker_client = boto3.client('sagemaker-runtime')
# Convert to CSV string format
output = StringIO()
csv.writer(output).writerow(features)
payload = output.getvalue()
response = sagemaker_client.invoke_endpoint(
EndpointName='canvas-shipping-data-model-1-serverless-endpoint',
ContentType='text/csv',
Accept='text/csv',
Body=payload
)
response_body = response['Body'].read().decode()
reader = csv.reader(StringIO(response_body))
result = list(reader)[0] # Get first row
# Parse the response into a more usable format
prediction = {
'predicted_label': result[0],
'confidence': float(result[1]),
'class_probabilities': eval(result[2]),
'possible_labels': eval(result[3])
}
return prediction
# Features for inference
features_set_1 = [
"Bell",
"Base",
14,
6,
11,
11,
"GlobalFreight",
"Bulk Order",
"Atlanta",
"2020-09-11 00:00:00",
"Express",
109.25199890136719
]
features_set_2 = [
"Bell",
"Base",
14,
6,
15,
15,
"MicroCarrier",
"Single Order",
"Seattle",
"2021-06-22 00:00:00",
"Standard",
155.0483856201172
]
# Invoke the SageMaker endpoint for feature set 1
start_time = time.time()
result = invoke_shipping_prediction(features_set_1)
# Print Output and Timing
end_time = time.time()
total_time = end_time - start_time
print(f"Total response time with endpoint cold start: {total_time:.3f} seconds")
print(f"Prediction for feature set 1: {result['predicted_label']}")
print(f"Confidence for feature set 1: {result['confidence']*100:.2f}%")
print("nProbabilities for feature set 1:")
for label, prob in zip(result['possible_labels'], result['class_probabilities']):
print(f"{label}: {prob*100:.2f}%")
print("---------------------------------------------------------")
# Invoke the SageMaker endpoint for feature set 2
start_time = time.time()
result = invoke_shipping_prediction(features_set_2)
# Print Output and Timing
end_time = time.time()
total_time = end_time - start_time
print(f"Total response time with warm endpoint: {total_time:.3f} seconds")
print(f"Prediction for feature set 2: {result['predicted_label']}")
print(f"Confidence for feature set 2: {result['confidence']*100:.2f}%")
print("nProbabilities for feature set 2:")
for label, prob in zip(result['possible_labels'], result['class_probabilities']):
print(f"{label}: {prob*100:.2f}%")
To automatically create serverless endpoints each time a new model is approved, you can use the following YAML file with AWS CloudFormation. This file will automate the creation of SageMaker endpoints with the configuration you specify.
This sample CloudFormation template is provided solely for inspirational purposes and is not intended for direct production use. Developers should thoroughly test this template according to their organization’s security guidelines before deployment.
AWSTemplateFormatVersion: "2010-09-09"
Description: Template for creating Lambda function to handle SageMaker model
package state changes and create serverless endpoints
Parameters:
MemorySizeInMB:
Type: Number
Default: 1024
Description: Memory size in MB for the serverless endpoint (between 1024 and 6144)
MinValue: 1024
MaxValue: 6144
MaxConcurrency:
Type: Number
Default: 20
Description: Maximum number of concurrent invocations for the serverless endpoint
MinValue: 1
MaxValue: 200
AllowedRegion:
Type: String
Default: "us-east-1"
Description: AWS region where SageMaker resources can be created
AllowedDomainId:
Type: String
Description: SageMaker Studio domain ID that can trigger deployments
NoEcho: true
AllowedDomainIdParameterName:
Type: String
Default: "/sagemaker/serverless-deployment/allowed-domain-id"
Description: SSM Parameter name containing the SageMaker Studio domain ID that can trigger deployments
Resources:
AllowedDomainIdParameter:
Type: AWS::SSM::Parameter
Properties:
Name: !Ref AllowedDomainIdParameterName
Type: String
Value: !Ref AllowedDomainId
Description: SageMaker Studio domain ID that can trigger deployments
SageMakerAccessPolicy:
Type: AWS::IAM::ManagedPolicy
Properties:
Description: Managed policy for SageMaker serverless endpoint creation
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- sagemaker:CreateModel
- sagemaker:CreateEndpointConfig
- sagemaker:CreateEndpoint
- sagemaker:DescribeModel
- sagemaker:DescribeEndpointConfig
- sagemaker:DescribeEndpoint
- sagemaker:DeleteModel
- sagemaker:DeleteEndpointConfig
- sagemaker:DeleteEndpoint
Resource: !Sub "arn:aws:sagemaker:${AllowedRegion}:${AWS::AccountId}:*"
- Effect: Allow
Action:
- sagemaker:DescribeModelPackage
Resource: !Sub "arn:aws:sagemaker:${AllowedRegion}:${AWS::AccountId}:model-package/*/*"
- Effect: Allow
Action:
- iam:PassRole
Resource: !Sub "arn:aws:iam::${AWS::AccountId}:role/service-role/AmazonSageMaker-ExecutionRole-*"
Condition:
StringEquals:
"iam:PassedToService": "sagemaker.amazonaws.com"
- Effect: Allow
Action:
- ssm:GetParameter
Resource: !Sub "arn:aws:ssm:${AllowedRegion}:${AWS::AccountId}:parameter${AllowedDomainIdParameterName}"
LambdaExecutionRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
- !Ref SageMakerAccessPolicy
ModelDeploymentFunction:
Type: AWS::Lambda::Function
Properties:
Handler: index.handler
Role: !GetAtt LambdaExecutionRole.Arn
Code:
ZipFile: |
import os
import json
import boto3
sagemaker_client = boto3.client('sagemaker')
ssm_client = boto3.client('ssm')
def handler(event, context):
print(f"Received event: {json.dumps(event, indent=2)}")
try:
# Get details directly from the event
detail = event['detail']
print(f'detail: {detail}')
# Get allowed domain ID from SSM Parameter Store
parameter_name = os.environ.get('ALLOWED_DOMAIN_ID_PARAMETER_NAME')
try:
response = ssm_client.get_parameter(Name=parameter_name)
allowed_domain = response['Parameter']['Value']
except Exception as e:
print(f"Error retrieving parameter {parameter_name}: {str(e)}")
allowed_domain = '*' # Default fallback
# Check if domain ID is allowed
if allowed_domain != '*':
created_by_domain = detail.get('CreatedBy', {}).get('DomainId')
if created_by_domain != allowed_domain:
print(f"Domain {created_by_domain} not allowed. Allowed: {allowed_domain}")
return {'statusCode': 403, 'body': 'Domain not authorized'}
# Get the model package ARN from the event resources
model_package_arn = event['resources'][0]
# Get the model package details from SageMaker
model_package_response = sagemaker_client.describe_model_package(
ModelPackageName=model_package_arn
)
# Parse model name and version from ModelPackageName
model_name, version = detail['ModelPackageName'].split('/')
serverless_model_name = f"{model_name}-{version}-serverless"
# Get all container details directly from the event
container_defs = detail['InferenceSpecification']['Containers']
# Get the execution role from the event and convert to proper IAM role ARN format
assumed_role_arn = detail['CreatedBy']['IamIdentity']['Arn']
execution_role_arn = assumed_role_arn.replace(':sts:', ':iam:')
.replace('assumed-role', 'role/service-role')
.rsplit('/', 1)[0]
# Prepare containers configuration for the model
containers = []
for i, container_def in enumerate(container_defs):
# Get environment variables from the model package for this container
environment_vars = model_package_response['InferenceSpecification']['Containers'][i].get('Environment', {}) or {}
containers.append({
'Image': container_def['Image'],
'ModelDataUrl': container_def['ModelDataUrl'],
'Environment': environment_vars
})
# Create model with all containers
if len(containers) == 1:
# Use PrimaryContainer if there's only one container
create_model_response = sagemaker_client.create_model(
ModelName=serverless_model_name,
PrimaryContainer=containers[0],
ExecutionRoleArn=execution_role_arn
)
else:
# Use Containers parameter for multiple containers
create_model_response = sagemaker_client.create_model(
ModelName=serverless_model_name,
Containers=containers,
ExecutionRoleArn=execution_role_arn
)
# Create endpoint config
endpoint_config_name = f"{serverless_model_name}-config"
create_endpoint_config_response = sagemaker_client.create_endpoint_config(
EndpointConfigName=endpoint_config_name,
ProductionVariants=[{
'VariantName': 'AllTraffic',
'ModelName': serverless_model_name,
'ServerlessConfig': {
'MemorySizeInMB': int(os.environ.get('MEMORY_SIZE_IN_MB')),
'MaxConcurrency': int(os.environ.get('MAX_CONCURRENT_INVOCATIONS'))
}
}]
)
# Create endpoint
endpoint_name = f"{serverless_model_name}-endpoint"
create_endpoint_response = sagemaker_client.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name
)
return {
'statusCode': 200,
'body': json.dumps({
'message': 'Serverless endpoint deployment initiated',
'endpointName': endpoint_name
})
}
except Exception as e:
print(f"Error: {str(e)}")
raise
Runtime: python3.12
Timeout: 300
MemorySize: 128
Environment:
Variables:
MEMORY_SIZE_IN_MB: !Ref MemorySizeInMB
MAX_CONCURRENT_INVOCATIONS: !Ref MaxConcurrency
ALLOWED_DOMAIN_ID_PARAMETER_NAME: !Ref AllowedDomainIdParameterName
EventRule:
Type: AWS::Events::Rule
Properties:
Description: Rule to trigger Lambda when SageMaker Model Package state changes
EventPattern:
source:
- aws.sagemaker
detail-type:
- SageMaker Model Package State Change
detail:
ModelApprovalStatus:
- Approved
UpdatedModelPackageFields:
- ModelApprovalStatus
State: ENABLED
Targets:
- Arn: !GetAtt ModelDeploymentFunction.Arn
Id: ModelDeploymentFunction
LambdaInvokePermission:
Type: AWS::Lambda::Permission
Properties:
FunctionName: !Ref ModelDeploymentFunction
Action: lambda:InvokeFunction
Principal: events.amazonaws.com
SourceArn: !GetAtt EventRule.Arn
Outputs:
LambdaFunctionArn:
Description: ARN of the Lambda function
Value: !GetAtt ModelDeploymentFunction.Arn
EventRuleArn:
Description: ARN of the EventBridge rule
Value: !GetAtt EventRule.Arn
This stack will limit automated serverless endpoint creation to a specific AWS Region and domain. You can find your domain ID when accessing SageMaker Studio from the SageMaker AI console, or by running the following command: aws sagemaker list-domains —region [your-region]
To manage costs and prevent additional workspace charges, make sure that you have logged out of SageMaker Canvas. If you tested your endpoint using a Jupyter notebook, you can shut down your JupyterLab instance by choosing Stop or configuring automated shutdown for JupyterLab.

In this post, we showed how to deploy a SageMaker Canvas model to a serverless endpoint using SageMaker Serverless Inference. By using this serverless approach, you can quickly and efficiently serve predictions from your SageMaker Canvas models without needing to manage the underlying infrastructure.
This seamless deployment experience is just one example of how AWS services like SageMaker Canvas and SageMaker Serverless Inference simplify the ML journey, helping businesses of different sizes and technical proficiencies unlock the value of AI and ML. As you continue exploring the SageMaker ecosystem, be sure to check out how you can unlock data governance for no-code ML with Amazon DataZone, and seamlessly transition between no-code and code-first model development using SageMaker Canvas and SageMaker Studio.
Nadhya Polanco is a Solutions Architect at AWS based in Brussels, Belgium. In this role, she supports organizations looking to incorporate AI and Machine Learning into their workloads. In her free time, Nadhya enjoys indulging in her passion for coffee and traveling.
Brajendra Singh is a Principal Solutions Architect at Amazon Web Services, where he partners with enterprise customers to design and implement innovative solutions. With a strong background in software development, he brings deep expertise in Data Analytics, Machine Learning, and Generative AI.
Manuel Rioux est fièrement propulsé par WordPress