Accelerate edge AI development with SiMa.ai Edgematic with a seamless AWS integration

16 mai 2025

AWS » Machine Learning, Intelligence artificielle

This post is co-authored by Manuel Lopez Roldan, SiMa.ai, and Jason Westra, AWS Senior Solutions Architect.

Are you looking to deploy machine learning (ML) models at the edge? With Amazon SageMaker AI and SiMa.ai’s Palette Edgematic platform, you can efficiently build, train, and deploy optimized ML models at the edge for a variety of use cases. Designed to work on SiMa’s MLSoC (Machine Learning System on Chip) hardware, your models will have seamless compatibility across the entire SiMa.ai product family, allowing for effortless scaling, upgrades, transitions, and mix-and-match capabilities—ultimately minimizing your total cost of ownership.

In safety-critical environments like warehouses, construction sites, and manufacturing floors, detecting human presence and safety equipment in restricted areas can prevent accidents and enforce compliance. Cloud-based image recognition often falls short in safety use cases where low latency is essential. However, by deploying an object detection model optimized to detect personal protective equipment (PPE) on SiMa.ai MLSoC, you can achieve high-performance, real-time monitoring directly on edge devices without the latency typically associated with cloud-based inference.

In this post, we demonstrate how to retrain and quantize a model using SageMaker AI and the SiMa.ai Palette software suite. The goal is to accurately detect individuals in environments where visibility and protective equipment detection are essential for compliance and safety. We then show how to create a new application within Palette Edgematic in just a few minutes. This streamlined process enables you to deploy high-performance, real-time monitoring directly on edge devices, providing low latency for fast, accurate safety alerts, and it supports an immediate response to potential hazards, enhancing overall workplace safety.

Solution overview

The solution integrates SiMa.ai Edgematic with SageMaker JupyterLab to deploy an ML model, YOLOv7, to the edge. YOLO models are computer vision and ML models for object detection and image segmentation.

The following diagram shows the solution architecture you will follow to deploy a model to the edge. Edgematic offers a seamless, low-code no-code, end-to-end cloud-based pipeline, from model preparation to edge deployment. This approach provides high performance and accuracy, alleviates the complexity of managing updates or toolchain maintenance on devices, and simplifies inference testing and performance evaluation on edge hardware. This workflow makes sure AI applications run entirely on the edge without needing continuous cloud connectivity, decreasing latency issues, reducing security risks, and keeping data in-house.

The solution workflow comprises two main stages:

ML training and exporting – During this phase, you train and validate the model in SageMaker AI, providing readiness for SiMa.ai edge deployment. This step involves optimizing and compiling the model in which you will code with SiMa.ai SDKs to load, quantize, test, and compile models from frameworks like PyTorch, TensorFlow, and ONNX, producing binaries that run efficiently on SiMa.ai Machine Learning Accelerator.
ML edge evaluation and deployment – Next, you transfer the compiled model artifacts to Edgematic for a streamlined deployment to the edge device. Finally, you validate the model’s real-time performance and accuracy directly on the edge device, making sure it meets the safety monitoring requirements.

The steps to build your solution are as follows:

Create a custom image for SageMaker JupyterLab.
Launch SageMaker JupyterLab with your custom image.
Train the object detection model on the SageMaker JupyterLab notebook.
Perform graph surgery, quantization, and compilation.
Move the edge optimized model to SiMa.ai Edgematic software to evaluate its performance.

Prerequisites

Before you get started, make sure you have the following:

An AWS account. If you don’t have an AWS account, you can create one.
The AWS Command Line Interface (AWS CLI), Docker, and Git installed locally.
An AWS Identity and Access Management (IAM) user with the necessary permissions for creating and managing AWS resources.
SiMa.ai Developer Portal access. If you don’t have developer access, contact SiMa.ai from the Developer Portal to register for a free account.

Create a custom image for SageMaker JupyterLab

SageMaker AI provides ML capabilities for data scientists and developers to prepare, build, train, and deploy high-quality ML models efficiently. It has numerous features, including SageMaker JupyterLab, which enables ML developers to rapidly build, train, and deploy models. SageMaker JupyterLab allows you to create a custom image, then access it from within JupyterLab environments. You will access Palette APIs to build, train, and optimize your object detection model for the edge, from within a familiar user experience in the AWS Cloud. To set up SageMaker JupyterLab to integrate with Palette, complete the steps in this section.

Set up SageMaker AI and Amazon ECR

Provision the necessary AWS resources within the us-east-1 AWS Region. Create a SageMaker domain and user to train models and run Jupyter notebooks. Then, create an Amazon Elastic Container Registry (Amazon ECR) private repository to store Docker images.

Download the SiMa.ai SageMaker Palette Docker image

Palette is a Docker container that contains the necessary tools to quantize and compile ML models for SiMa.ai MLSoC devices. SiMa.ai provides an AWS compatible Palette version that integrates seamlessly with SageMaker JupyterLab. From it, you can attach to the necessary GPUs you need to train, export to ONNX format, optimize, quantize, and compile your model—all within a familiar ML environment on AWS.

Download the Docker image from the Software Downloads page on the SiMa.ai Developer Portal (see the following screenshot) and then download the sample Jupyter notebook from the following SiMa.ai GitHub repository. You can choose to scan the image to maintain a secure posture.

Build and tag a custom Docker image ECR URI

The following steps require that you have set up your AWS Management Console credentials, have set up an IAM user with AmazonEC2ContainerRegistryFullAccess permissions, and can successfully perform Docker login to AWS. For more information, see Private registry authentication in Amazon ECR.

Tag the image that you downloaded from the SiMa.ai Developer Access portal using the AWS CLI and then push it to Amazon ECR to make it available to SageMaker JupyterLab. On the Amazon ECR console, navigate to the registry you created to locate the ECR URI of the image. Your console experience will look similar to the following screenshot.

Copy the URI of the repository and use it to set the ECR environment variable in the following command:

# setup variables as per your AWS environment
REGION=<your region here>
AWS_ACCOUNT_ID=<your 12 digit AWS Account ID here>
ECR=$AWS_ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com=<your ECR repository name here>

Now that you’ve set up your environment variables and with Docker running locally, you can enter the following commands. If you haven’t used SageMaker AI before, you might have to create a new IAM user and attach the AmazonEC2ContainerRegistryPowerUser policy and then run the aws configure command.

# login to the ECR repository
aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $AWS_ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com

Upon receiving a “Login Succeeded” message, you’re logged in to Amazon ECR and can run the following Docker commands to tag the image and push it to Amazon ECR:

# Load the palette.tar image into docker
docker load < palette.tar
docker tag palette/sagemaker $ECR
docker push $ECR

The Palette image is over 25 GB. Therefore, with a 20 Mbps internet connection, the docker push operation can take several hours to upload to AWS.

Configure SageMaker with the custom image

After you upload the custom image to Amazon ECR, you configure SageMaker JupyterLab to use it. We recommend watching the two minutes long SageMaker AI/Palette Edgematic video to guide you as you walk through the steps to configure JupyterLab.

On the Amazon ECR console, navigate to the private registry, choose your repository from the list, choose Images, then choose Copy URI.
On the SageMaker AI console, choose Images in the navigation pane, and choose Create Image.
Provide your ECR URI and choose Next.
For Image properties, fill in the following fields. When filling in the fields, make sure that the image name and display name don’t use capital letters or special characters.
1. For Image name, enter palette.
2. For Image display name, enter palette.
3. For Description, enter Custom palette image for SageMaker AI integration.
4. For IAM role, either choose an existing role or create a new role (recommended).
For Image type, choose JupyterLab image.
Choose Submit.

Verify your custom image looks similar to that in the video example.

If everything matches, navigate to Admin configurations, Domains, and choose your domain.
On the Environment tab, choose Attach image in the Custom images for personal Studio apps
Choose Existing Image and your Palette image using the latest version, and choose Next.

Settings in the Image properties section are defaulted for your convenience, but you can choose a different IAM role and Amazon Elastic File System (Amazon EFS) mount path, if needed.

For this post, leave the defaults and choose the JupyterLab image option.
To finish, choose Submit.

Launch SageMaker JupyterLab with your custom image

With the Palette image configured, you are ready to launch SageMaker JupyterLab in Amazon SageMaker Studio and work in your custom environment.

Following the video as your guide, go to the User profiles section of your SageMaker domain and choose Launch, Studio.
In SageMaker Studio, choose Applications, JupyterLab.
Choose Create JupyterLab space.
For Name, enter a name for your new JupyterLab Space.
Choose Create Space.
For Instance, a GPU-based instance with at least 16 GB memory is recommended for the Model SDK to train efficiently. Both instance types, ml.g4dn.xlarge with Fast Launch and ml.g4dn.2xlarge, work. Allocate at least 30 GB of disk space.

When selecting an instance with a GPU, you might need to request a quota increase for that instance type. For more details, see Requesting a quota increase.

For Image, choose the new custom attached image you created in the prior step.
Choose Run space to start JupyterLab.
Choose Open JupyterLab when the status is Running.

Congratulations! You’ve created a custom image for SageMaker JupyterLab using the Palette image and launched a JupyterLab space.

Train the object detection model on a SageMaker JupyterLab notebook

Now you are able to prepare the model for the edge using the Palette Model SDK. In this section, we walk through the sample SiMa.ai Jupyter notebook so you understand how to work with the YOLOv7 model and prepare it to run on SiMa.ai devices.

To download the notebook from the SiMa.ai GitHub repository, open a terminal in your notebook and run a git clone command. This will clone the repository to your instance and from there you can launch the yolov7.ipynb file.

To run the notebook, change the Amazon Simple Storage Service (Amazon S3) bucket name in the variable s3_bucket in the third cell to an S3 bucket such as the one generated with the SageMaker domain.

To run all the cells in the notebook, choose the arrow icon on top of the cells to reset the kernel.

The yolov7.ipynb file’s notebook describes in detail how to prepare the model package and optimize and compile the model. The following section only covers key features of the notebook as it relates to SiMa.ai Palette and the training of your workplace safety model. Describing every cell is out of scope for this post.

Jupyter notebook walkthrough

To recognize human heads and protective equipment, you will use the notebook to fine-tune the model to recognize these classes of objects. The following Python code defines the classes to detect, and it uses the open source open-images-v7 dataset and the fiftyone library to retrieve a set of 8,000 labeled images per class to train the model effectively. 75% of images are used for training and 25% for validation of the model. This cell also structures the dataset into YOLO format, optimizing it for your training workflow.

classes = ['Person', 'Human head', 'Helmet']
...
     dataset = fiftyone.zoo.load_zoo_dataset(
                "open-images-v7",
                split="train",
                label_types=["detections"],
                classes=classes,
                max_samples=total,
            )
...
    dataset.export(
        dataset_type=fiftyone.types.YOLOv5Dataset,
        labels_path=path,
        classes=classes,
    )

The next important cell configures the dataset and download the required weights. You will be using yolov7-tiny weights and you can choose your YOLOv7 type. Each is distributed under the GPL-3.0 license. YOLOv7 achieves better performance than YOLOv7-Tiny, but it takes longer to train. After choosing which YOLOv7 you prefer, retrain the model by running the command, as shown in the following code:

!cd yolov7 && python3 train.py --workers 4 --device 0 --batch-size 16 --data data/custom.yaml --img 640 640 --cfg cfg/training/yolov7-tiny.yaml --weights 'yolov7-tiny.pt' --name sima-yolov7 --hyp data/hyp.scratch.custom.yaml --epochs 10

Finally, as shown in the following code, retrain the model for 10 epochs with the new dataset and yolov7-tiny weights. This achieves a mAP of approximately 0.6, which should deliver highly accurate detection of the new class. The code then exports the model to ONNX format:

!cd yolov7 && python3 export.py --weights runs/train/sima-yolov7/weights/best.pt --grid --end2end --simplify --topk-all 100 --iou-thres 0.65 --conf-thres 0.35 --img-size 640 640 --max-wh 640

Perform graph surgery, quantization, and compilation

To optimize the architecture, you must perform modifications to the YOLOv7 model in ONNX format. In the following figure, the scissors and dotted red line show where graph surgery is performed on a YOLOv7 model. How is graph surgery different from model pruning? Model pruning reduces the overall size and complexity of a neural network by removing less significant weights or entire neurons, whereas graph surgery restructures the computational graph by modifying or replacing specific operations to provide compatibility with target hardware without changing the model’s learned parameters. The net effect is you are replacing unwanted operations on the heads like Reshape, Split, and Concat with supported operations that are mathematically equivalent (point-wise convolutions). Afterwards, you remove the postprocessing operations of the ONNX graph. These will be included in the postprocessing logic.

See the following code:

model = onnx.load(f"{model_name}.onnx")
...
remove_nodes(model)
insert_pointwise_conv(model)
update_elmtwise_const(model)
update_output_nodes(model)
...
onnx.save(model, ONNX_MODEL_NAME)

After surgery, you quantize the model. Quantization simplifies AI models by reducing the precision of the data they use from float 32-bit to int 8-bit, making models smaller, faster, and more efficient to run at the edge. Quantized models consume less power and resources, which is critical for deploying on lower-powered devices and optimizing overall efficiency. The following code quantizes your model using the validation dataset. It also runs some inference using the quantized model to provide insight about how well the model is performing after post-training quantization.

...
loaded_net = _load_model()
# Quantize model
quant_configs = default_quantization.with_calibration(HistogramMSEMethod(num_bins=1024))
calibration_data = _make_calibration_data()
quantized_net = loaded_net.quantize(calibration_data=calibration_data, quantization_config=quant_configs)
...
    if QUANTIZED:
        preprocessed_image1 = preprocess(img=image, input_shape=(640, 640)).transpose(0, 2, 3, 1)
        inputs = {InputName('images'): preprocessed_image1}
        out = quantized_net.execute(inputs)

Because quantization reduces precision, verify that the model accuracy remains high by testing some predictions. After validation, compile the model to generate files that enable it to run on SiMa.ai MLSoC devices, along with the required configuration for supporting plugins. This compilation produces an .lm file, the binary executable for the ML accelerator in the MLSoC, and a .json file containing configuration details like input image size and quantization type.

saved_mpk_directory = "./compiled_yolov7"
quantized_net.save("yolov7", output_directory=saved_mpk_directory)
quantized_net.compile(output_path=saved_mpk_directory, compress=False)

The notebook uploads the compiled file to the S3 bucket you specified, then generates a pre-signed link that is valid for 30 minutes. If the link expires, rerun this last cell again. Copy the generated link at the end of the notebook. It will be used in SiMa.ai Edgematic, shortly.

s3.meta.client.upload_file(file_name, S3_BUCKET_NAME, f"models/{name}.tar.gz")
...
presigned_url = s3_client.generate_presigned_url(    
     ClientMethod="get_object",
     Params={
        "Bucket": s3_bucket,
        "Key": object_key
    },
    ExpiresIn=1800  # 30 minutes
)

Move the model to SiMa.ai Edgematic to evaluate its performance

After you complete your cloud-based model fine-tuning in AWS, transition to Edgematic for building the complete edge application, including plugins for preprocessing and postprocessing. Edgematic integrates the optimized model with essential plugins, like UDP sync for data transmission, video encoders for streaming predictions, and preprocessing tailored for the SiMa.ai MLA. These plugins are provided as drag-and-drop blocks, improving developer productivity by eliminating the need for custom coding. After it’s configured, Edgematic compiles and deploys the application to the edge device, transforming the model into a functional, real-world AI application.

To begin, log in to Edgematic, create a new project, and drag and drop the YoloV7 pipeline under Developer Community.

To run your YOLOv7 workplace safety application, request a device and choose the play icon. The application will be compiled, installed on the remote device assigned upon login, and it will begin running. After 30 seconds, the complete application will be running on the SiMa.ai MLSoC and you will see that it detects people in the video stream.
Choose the Models tab, then choose Add Model.
Choose the Amazon S3 pre-signed link, enter the previously copied link, then choose Add.

Your model will appear under User defined on the Models tab. You can open the model folder and choose Run to get KPIs on the model such as frames per second.

Next, you will change the existing people detection pipeline to a PPE use case by replacing the existing YOLOv7 model with your newly trained PPE model.

To change the model, stop the pipeline by choosing the stop icon.
Choose Delete to delete the YOLOv7 block of the application.

Drag and drop your new model imported from the User defined folder on the Models

Now you connect it back to the blocks that YOLOv7 was connected to.

First, change the tool in canvas to Connect, then choose the connecting points between the respective plugins.
Choose the play

After the application is deployed on the SiMa.ai MLSoC, you should see the detections of categories such as “Human head,” “Person,” and “Glasses,” as seen in the following screenshot.

Next, you change the application postprocessing logic from performing people detection to performing PPE detection. This is done by adding logic in the postprocessing that will perform business logic to detect if PPE is present or not. For this post, the PPE logic has already been written, and you just enable it.

First, stop the previous application by choosing the stop icon.
Next, locate the Explorer section and locate the file named YoloV7_Post_Overlay.py under yolov7, plugins, YoloV7_Post_Overlay.
Open the file and change the variable self.PPE on line 36 from False to True.
Rerun the application by choosing the play icon.

Finally, you can add a custom video by choosing the gear icon on the first application plugin called rtspsrc_1, and on the Type dropdown menu, choose Custom video, then upload a custom video.

For example, the following video frame illustrates how the model at the edge detects the PPE equipment and labels the workers as safe.

Clean up

To avoid ongoing costs, clean up your resources. In SiMa.ai Edgematic, sign out by choosing your profile picture on the right top and then signing out. To avoid additional costs on AWS, we recommend that you shut down the JupyterLab Space by choosing the stop icon for the domain and user. For more details, see Where to shut down resources per SageMaker AI features.

Conclusion

This post demonstrated how to use SageMaker AI and Edgematic to retrain object detection models such as YOLOv7 in the cloud, then optimize these models for edge deployment, and build an entire edge application within minutes without the need for custom coding.

The streamlined workflow using SiMa.ai Palette on SageMaker JupyterLab helps ML applications achieve high performance, low latency, and energy efficiency, while minimizing the complexity of development and deployment. Whether you’re enhancing workplace safety with real-time monitoring or deploying advanced AI applications at the edge, SiMa.ai solutions empower developers to accelerate innovation and bring cutting-edge technology to the real world efficiently and effectively.

Experience firsthand how Palette Edgematic and SageMaker AI can streamline your ML workflow from cloud to edge. Get started today:

Access our complete workshop materials and example code on AWS Marketplace
Join our developer community to share experiences and best practices

Together, let’s accelerate the future of edge AI.

Additional resources

About the Authors

Manuel Lopez Roldan is a Product Manager at SiMa.ai, focused on growing the user base and improving the usability of software platforms for developing and deploying AI. With a strong background in machine learning and performance optimization, he leads cross-functional initiatives to deliver intuitive, high-impact developer experiences that drive adoption and business value. He is also an advocate for industry innovation, sharing insights on how to accelerate AI adoption at the edge through scalable tools and developer-centric design.

Jason Westra is a Senior Solutions Architect at AWS based in Colorado, where he helps startups build innovative products with Generative AI and ML. Outside of work, he is an avid outdoorsmen, back country skier, climber, and mountain biker.