This post is co-authored by Manuel Lopez Roldan, SiMa.ai, and Jason Westra, AWS Senior Solutions Architect.
Are you looking to deploy machine learning (ML) models at the edge? With Amazon SageMaker AI and SiMa.ai’s Palette Edgematic platform, you can efficiently build, train, and deploy optimized ML models at the edge for a variety of use cases. Designed to work on SiMa’s MLSoC (Machine Learning System on Chip) hardware, your models will have seamless compatibility across the entire SiMa.ai product family, allowing for effortless scaling, upgrades, transitions, and mix-and-match capabilities—ultimately minimizing your total cost of ownership.
In safety-critical environments like warehouses, construction sites, and manufacturing floors, detecting human presence and safety equipment in restricted areas can prevent accidents and enforce compliance. Cloud-based image recognition often falls short in safety use cases where low latency is essential. However, by deploying an object detection model optimized to detect personal protective equipment (PPE) on SiMa.ai MLSoC, you can achieve high-performance, real-time monitoring directly on edge devices without the latency typically associated with cloud-based inference.

In this post, we demonstrate how to retrain and quantize a model using SageMaker AI and the SiMa.ai Palette software suite. The goal is to accurately detect individuals in environments where visibility and protective equipment detection are essential for compliance and safety. We then show how to create a new application within Palette Edgematic in just a few minutes. This streamlined process enables you to deploy high-performance, real-time monitoring directly on edge devices, providing low latency for fast, accurate safety alerts, and it supports an immediate response to potential hazards, enhancing overall workplace safety.
The solution integrates SiMa.ai Edgematic with SageMaker JupyterLab to deploy an ML model, YOLOv7, to the edge. YOLO models are computer vision and ML models for object detection and image segmentation.
The following diagram shows the solution architecture you will follow to deploy a model to the edge. Edgematic offers a seamless, low-code no-code, end-to-end cloud-based pipeline, from model preparation to edge deployment. This approach provides high performance and accuracy, alleviates the complexity of managing updates or toolchain maintenance on devices, and simplifies inference testing and performance evaluation on edge hardware. This workflow makes sure AI applications run entirely on the edge without needing continuous cloud connectivity, decreasing latency issues, reducing security risks, and keeping data in-house.

The solution workflow comprises two main stages:
The steps to build your solution are as follows:
Before you get started, make sure you have the following:
SageMaker AI provides ML capabilities for data scientists and developers to prepare, build, train, and deploy high-quality ML models efficiently. It has numerous features, including SageMaker JupyterLab, which enables ML developers to rapidly build, train, and deploy models. SageMaker JupyterLab allows you to create a custom image, then access it from within JupyterLab environments. You will access Palette APIs to build, train, and optimize your object detection model for the edge, from within a familiar user experience in the AWS Cloud. To set up SageMaker JupyterLab to integrate with Palette, complete the steps in this section.
Provision the necessary AWS resources within the us-east-1 AWS Region. Create a SageMaker domain and user to train models and run Jupyter notebooks. Then, create an Amazon Elastic Container Registry (Amazon ECR) private repository to store Docker images.
Palette is a Docker container that contains the necessary tools to quantize and compile ML models for SiMa.ai MLSoC devices. SiMa.ai provides an AWS compatible Palette version that integrates seamlessly with SageMaker JupyterLab. From it, you can attach to the necessary GPUs you need to train, export to ONNX format, optimize, quantize, and compile your model—all within a familiar ML environment on AWS.
Download the Docker image from the Software Downloads page on the SiMa.ai Developer Portal (see the following screenshot) and then download the sample Jupyter notebook from the following SiMa.ai GitHub repository. You can choose to scan the image to maintain a secure posture.

The following steps require that you have set up your AWS Management Console credentials, have set up an IAM user with AmazonEC2ContainerRegistryFullAccess permissions, and can successfully perform Docker login to AWS. For more information, see Private registry authentication in Amazon ECR.
Tag the image that you downloaded from the SiMa.ai Developer Access portal using the AWS CLI and then push it to Amazon ECR to make it available to SageMaker JupyterLab. On the Amazon ECR console, navigate to the registry you created to locate the ECR URI of the image. Your console experience will look similar to the following screenshot.

Copy the URI of the repository and use it to set the ECR environment variable in the following command:
# setup variables as per your AWS environment
REGION=<your region here>
AWS_ACCOUNT_ID=<your 12 digit AWS Account ID here>
ECR=$AWS_ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com=<your ECR repository name here>
Now that you’ve set up your environment variables and with Docker running locally, you can enter the following commands. If you haven’t used SageMaker AI before, you might have to create a new IAM user and attach the AmazonEC2ContainerRegistryPowerUser policy and then run the aws configure command.
# login to the ECR repository
aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com
Upon receiving a “Login Succeeded” message, you’re logged in to Amazon ECR and can run the following Docker commands to tag the image and push it to Amazon ECR:
# Load the palette.tar image into docker
docker load < palette.tar
docker tag palette/sagemaker $ECR
docker push $ECR
The Palette image is over 25 GB. Therefore, with a 20 Mbps internet connection, the docker push operation can take several hours to upload to AWS.
After you upload the custom image to Amazon ECR, you configure SageMaker JupyterLab to use it. We recommend watching the two minutes long SageMaker AI/Palette Edgematic video to guide you as you walk through the steps to configure JupyterLab.
palette.palette.Custom palette image for SageMaker AI integration.Verify your custom image looks similar to that in the video example.
Settings in the Image properties section are defaulted for your convenience, but you can choose a different IAM role and Amazon Elastic File System (Amazon EFS) mount path, if needed.
With the Palette image configured, you are ready to launch SageMaker JupyterLab in Amazon SageMaker Studio and work in your custom environment.
When selecting an instance with a GPU, you might need to request a quota increase for that instance type. For more details, see Requesting a quota increase.
Congratulations! You’ve created a custom image for SageMaker JupyterLab using the Palette image and launched a JupyterLab space.
Now you are able to prepare the model for the edge using the Palette Model SDK. In this section, we walk through the sample SiMa.ai Jupyter notebook so you understand how to work with the YOLOv7 model and prepare it to run on SiMa.ai devices.
To download the notebook from the SiMa.ai GitHub repository, open a terminal in your notebook and run a git clone command. This will clone the repository to your instance and from there you can launch the yolov7.ipynb file.
To run the notebook, change the Amazon Simple Storage Service (Amazon S3) bucket name in the variable s3_bucket in the third cell to an S3 bucket such as the one generated with the SageMaker domain.
To run all the cells in the notebook, choose the arrow icon on top of the cells to reset the kernel.
The yolov7.ipynb file’s notebook describes in detail how to prepare the model package and optimize and compile the model. The following section only covers key features of the notebook as it relates to SiMa.ai Palette and the training of your workplace safety model. Describing every cell is out of scope for this post.
To recognize human heads and protective equipment, you will use the notebook to fine-tune the model to recognize these classes of objects. The following Python code defines the classes to detect, and it uses the open source open-images-v7 dataset and the fiftyone library to retrieve a set of 8,000 labeled images per class to train the model effectively. 75% of images are used for training and 25% for validation of the model. This cell also structures the dataset into YOLO format, optimizing it for your training workflow.
classes = ['Person', 'Human head', 'Helmet']
...
dataset = fiftyone.zoo.load_zoo_dataset(
"open-images-v7",
split="train",
label_types=["detections"],
classes=classes,
max_samples=total,
)
...
dataset.export(
dataset_type=fiftyone.types.YOLOv5Dataset,
labels_path=path,
classes=classes,
)
The next important cell configures the dataset and download the required weights. You will be using yolov7-tiny weights and you can choose your YOLOv7 type. Each is distributed under the GPL-3.0 license. YOLOv7 achieves better performance than YOLOv7-Tiny, but it takes longer to train. After choosing which YOLOv7 you prefer, retrain the model by running the command, as shown in the following code:
!cd yolov7 && python3 train.py --workers 4 --device 0 --batch-size 16 --data data/custom.yaml --img 640 640 --cfg cfg/training/yolov7-tiny.yaml --weights 'yolov7-tiny.pt' --name sima-yolov7 --hyp data/hyp.scratch.custom.yaml --epochs 10
Finally, as shown in the following code, retrain the model for 10 epochs with the new dataset and yolov7-tiny weights. This achieves a mAP of approximately 0.6, which should deliver highly accurate detection of the new class. The code then exports the model to ONNX format:
!cd yolov7 && python3 export.py --weights runs/train/sima-yolov7/weights/best.pt --grid --end2end --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 --max-wh 640
To optimize the architecture, you must perform modifications to the YOLOv7 model in ONNX format. In the following figure, the scissors and dotted red line show where graph surgery is performed on a YOLOv7 model. How is graph surgery different from model pruning? Model pruning reduces the overall size and complexity of a neural network by removing less significant weights or entire neurons, whereas graph surgery restructures the computational graph by modifying or replacing specific operations to provide compatibility with target hardware without changing the model’s learned parameters. The net effect is you are replacing unwanted operations on the heads like Reshape, Split, and Concat with supported operations that are mathematically equivalent (point-wise convolutions). Afterwards, you remove the postprocessing operations of the ONNX graph. These will be included in the postprocessing logic.

See the following code:
model = onnx.load(f"{model_name}.onnx")
...
remove_nodes(model)
insert_pointwise_conv(model)
update_elmtwise_const(model)
update_output_nodes(model)
...
onnx.save(model, ONNX_MODEL_NAME)
After surgery, you quantize the model. Quantization simplifies AI models by reducing the precision of the data they use from float 32-bit to int 8-bit, making models smaller, faster, and more efficient to run at the edge. Quantized models consume less power and resources, which is critical for deploying on lower-powered devices and optimizing overall efficiency. The following code quantizes your model using the validation dataset. It also runs some inference using the quantized model to provide insight about how well the model is performing after post-training quantization.
...
loaded_net = _load_model()
# Quantize model
quant_configs = default_quantization.with_calibration(HistogramMSEMethod(num_bins=1024))
calibration_data = _make_calibration_data()
quantized_net = loaded_net.quantize(calibration_data=calibration_data, quantization_config=quant_configs)
...
if QUANTIZED:
preprocessed_image1 = preprocess(img=image, input_shape=(640, 640)).transpose(0, 2, 3, 1)
inputs = {InputName('images'): preprocessed_image1}
out = quantized_net.execute(inputs)
Because quantization reduces precision, verify that the model accuracy remains high by testing some predictions. After validation, compile the model to generate files that enable it to run on SiMa.ai MLSoC devices, along with the required configuration for supporting plugins. This compilation produces an .lm file, the binary executable for the ML accelerator in the MLSoC, and a .json file containing configuration details like input image size and quantization type.
saved_mpk_directory = "./compiled_yolov7"
quantized_net.save("yolov7", output_directory=saved_mpk_directory)
quantized_net.compile(output_path=saved_mpk_directory, compress=False)
The notebook uploads the compiled file to the S3 bucket you specified, then generates a pre-signed link that is valid for 30 minutes. If the link expires, rerun this last cell again. Copy the generated link at the end of the notebook. It will be used in SiMa.ai Edgematic, shortly.
s3.meta.client.upload_file(file_name, S3_BUCKET_NAME, f"models/{name}.tar.gz")
...
presigned_url = s3_client.generate_presigned_url(
ClientMethod="get_object",
Params={
"Bucket": s3_bucket,
"Key": object_key
},
ExpiresIn=1800 # 30 minutes
)
After you complete your cloud-based model fine-tuning in AWS, transition to Edgematic for building the complete edge application, including plugins for preprocessing and postprocessing. Edgematic integrates the optimized model with essential plugins, like UDP sync for data transmission, video encoders for streaming predictions, and preprocessing tailored for the SiMa.ai MLA. These plugins are provided as drag-and-drop blocks, improving developer productivity by eliminating the need for custom coding. After it’s configured, Edgematic compiles and deploys the application to the edge device, transforming the model into a functional, real-world AI application.

Your model will appear under User defined on the Models tab. You can open the model folder and choose Run to get KPIs on the model such as frames per second.

Next, you will change the existing people detection pipeline to a PPE use case by replacing the existing YOLOv7 model with your newly trained PPE model.


Now you connect it back to the blocks that YOLOv7 was connected to.

After the application is deployed on the SiMa.ai MLSoC, you should see the detections of categories such as “Human head,” “Person,” and “Glasses,” as seen in the following screenshot.

Next, you change the application postprocessing logic from performing people detection to performing PPE detection. This is done by adding logic in the postprocessing that will perform business logic to detect if PPE is present or not. For this post, the PPE logic has already been written, and you just enable it.
YoloV7_Post_Overlay.py under yolov7, plugins, YoloV7_Post_Overlay.self.PPE on line 36 from False to True.
rtspsrc_1, and on the Type dropdown menu, choose Custom video, then upload a custom video.For example, the following video frame illustrates how the model at the edge detects the PPE equipment and labels the workers as safe.

To avoid ongoing costs, clean up your resources. In SiMa.ai Edgematic, sign out by choosing your profile picture on the right top and then signing out. To avoid additional costs on AWS, we recommend that you shut down the JupyterLab Space by choosing the stop icon for the domain and user. For more details, see Where to shut down resources per SageMaker AI features.
This post demonstrated how to use SageMaker AI and Edgematic to retrain object detection models such as YOLOv7 in the cloud, then optimize these models for edge deployment, and build an entire edge application within minutes without the need for custom coding.
The streamlined workflow using SiMa.ai Palette on SageMaker JupyterLab helps ML applications achieve high performance, low latency, and energy efficiency, while minimizing the complexity of development and deployment. Whether you’re enhancing workplace safety with real-time monitoring or deploying advanced AI applications at the edge, SiMa.ai solutions empower developers to accelerate innovation and bring cutting-edge technology to the real world efficiently and effectively.
Experience firsthand how Palette Edgematic and SageMaker AI can streamline your ML workflow from cloud to edge. Get started today:
Together, let’s accelerate the future of edge AI.
Manuel Lopez Roldan is a Product Manager at SiMa.ai, focused on growing the user base and improving the usability of software platforms for developing and deploying AI. With a strong background in machine learning and performance optimization, he leads cross-functional initiatives to deliver intuitive, high-impact developer experiences that drive adoption and business value. He is also an advocate for industry innovation, sharing insights on how to accelerate AI adoption at the edge through scalable tools and developer-centric design.
Jason Westra is a Senior Solutions Architect at AWS based in Colorado, where he helps startups build innovative products with Generative AI and ML. Outside of work, he is an avid outdoorsmen, back country skier, climber, and mountain biker.
Manuel Rioux est fièrement propulsé par WordPress