Introduction to Edge AI Inference with AWS IoT Greengrass V2/SageMaker

Introduction to Edge AI Inference with AWS IoT Greengrass V2/SageMaker

Takahiro Iwasa
Takahiro Iwasa
15 min read
Greengrass IoT SageMaker

I held a study meeting to introduce AWS IoT Greengrass V2 and SageMaker, “Introduction to Edge AI Inference with AWS IoT Greengrass V2/SageMaker”.

I would like to share the content of the presentation, in the hopes that it will provide an opportunity for you to know edge inference with AWS. You can pull an example code used in this post from my GitHub repository.


Participants are expected to:

  • Be interested in edge AI inference on AWS.
  • Have basic knowledge of machine learning and AWS.

Goals in this post:

  • Provide an overview of edge AI inference on AWS.
  • Demonstrate examples using IoT Greengrass V2, SageMaker, SageMaker Neo, and SageMaker Edge Manager.

Edge Computing Overview

Edge computing is a distributed computing paradigm that brings computation and data storage closer to the sources of data. This is expected to improve response times and save bandwidth.

Edge computing provides many merits like the following:

  • Low latency (especially important for real-time applications such as driving automation)
  • Decreased security risks
  • Lower communication costs

However, edge computing may have some disadvantages, such as:

  • Challenges in vertical scaling, which is easier in cloud environments
  • Requirements for capacity planning
  • Infrastructure management

Edge computing is not only utilized for AI but also for content delivery networks (CDN).

One of the current challenges with edge AI is its computing capacity. Modern machine learning and deep learning algorithms involve numerous hyperparameters, demanding substantial computing resources. Various efforts have been made to optimize ML/DL models for edge AI. 1

IoT Greengrass V2


AWS IoT Greengrass is an open source Internet of Things (IoT) edge runtime and cloud service that helps you build, deploy and manage IoT applications on your devices.

IoT Greengrass V2 offers:

  • Running on Windows 2, Linux (Ubuntu, Raspberry Pi OS, and so on).
  • Architecture for both x86 and ARM.
  • Support for Lambda Functions.
  • Deep Learning Runtime (DLR) for edge AI inference.


The IoT Greengrass V2 concept includes the following:

  • Greengrass Core Devices:
    • Run Greengrass Core on your edge.
    • Are registered as AWS IoT Things.
    • Communicates with AWS.
  • Greengrass Client Devices:
    • Communicates with Greengrass Core Device using MQTT.
    • Are registered as AWS IoT Things.
    • Communicates with other client devices when a Greengrass Core Devices is used as a message broker.
  • Greengrass Components:
    • Are software running on Greengrass Core Devices.
    • Are implemented and registered by users.
  • Deployment:
    • Consists of instructions from AWS to Greengrass Core Devices.


SageMaker Overview

SageMaker is a managed service for machine learning in AWS. Users can use 17 built-in algorithms to perform machine learning with less code 3. It supports main deep learning frameworks such as TensorFlow or PyTorch 4.

SageMaker offers the following type of inference endpoints:

SageMaker Neo Overview

Neo is a capability of Amazon SageMaker that enables machine learning models to train once and run anywhere in the cloud and at the edge.

With a single click, SageMaker Neo optimizes the trained model and compiles it into an executable. The compiler uses a machine learning model to apply the performance optimizations that extract the best available performance for your model on the cloud instance or edge device.

SageMaker Edge Manager Overview

Amazon SageMaker Edge Manager provides model management for edge devices so you can optimize, secure, monitor, and maintain machine learning models on fleets of edge devices such as smart cameras, robots, personal computers, and mobile devices.

The example in this post does not use the monitoring feature.

Example of Edge AI Inference


In this example, an EC2 instance is considered an edge device.

According to the following steps, you would see the edge inference.

  1. Setting up
    1. Preparing AWS resources
    2. Implementing training script
    3. Implementing inference script
  2. SageMaker
    1. Training with SageMaker
    2. Compiling model with SageMaker Neo
    3. Packaging model with SageMaker Edge Manager
  3. Greengrass
    1. Setting up Greengrass Core
    2. Registering Greengrass Component for edge inference
    3. Deploying Greengrass Component
  4. Testing

Preparing AWS resources

Prepare the following AWS resources beforehand. For specific details on each resource, check the CloudFormation template provided by my GitHub repository.

IAM Usergreengrass-core-setup-userFor setting up Greengrass Core
IAM Rolesagemaker-execution-roleSageMaker execution role
IAM RoleGreengrassV2TokenExchangeRoleGreengrass Core role
S3sagemaker-ml-model-artifacts-{account_id}-{region}Bucket for ML models

You can create these resources by running the following command.

% aws cloudformation deploy --template-file ./cfn.yaml --stack-name greengrass-sample --capabilities CAPABILITY_NAMED_IAM

Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - greengrass-sample

Implementing training script

This example uses PyTorch’s pre-trained VGG16 model, so install it with the following command.

% pip install torch torchvision

Write with the following content, which will run in SageMaker.

import argparse
import os
from datetime import datetime

import torch
from torchvision import models

def fit(model: torch.nn.modules.Module) -> None:
    # Write some training codes...

def save(model: torch.nn.modules.Module, path: str) -> None:
    suffix ='%Y-%m-%d_%H-%M-%S')
    path = os.path.join(path, f'model-{suffix}.pt')
    # If you use `model.state_dict()`, SageMaker compilation will fail., path)

def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script.

    # input data and model directories
    parser.add_argument('--model_dir', type=str)
    parser.add_argument('--sm_model_dir', type=str, default=os.environ.get('SM_MODEL_DIR'))
    parser.add_argument('--train', type=str, default=os.environ.get('SM_CHANNEL_TRAIN'))
    parser.add_argument('--test', type=str, default=os.environ.get('SM_CHANNEL_TEST'))

    args, _ = parser.parse_known_args()
    return args

if __name__ == '__main__':
    args = parse_args()
    vgg16 = models.vgg16(pretrained=True)
    save(vgg16, args.sm_model_dir)

When implementing training scripts for SageMaker, you can follow the same approach as you would in your local environment. However, there are the following points to consider. Please refer to the official documentation for further information.

  • Runtime arguments
  • Environment variables
  • Training dataset
    • If you are using FILE mode, the training dataset will be replicated automatically from S3 to your SageMaker instance.
    • If you are using PIPE mode, you will need to implement code for reading streamed training data. For more information, please read the official documentation
  • Model saving directory
    • SM_MODEL_DIR environment variable can be used for saving your trained model, which will be automatically uploaded to S3.

The example above does not include the training code.

Implementing inference script

You can load the model compiled by SageMaker Neo using Deep Learning Runtime (DLR). install DLR with the following command.

% pip install dlr

Write with the following content, which will run in your Greengrass Core.

import argparse
import glob
import json
import os
import time

import numpy as np
from dlr import DLRModel

def load_model() -> DLRModel:
    return DLRModel('/greengrass/v2/work/vgg16-component')

def load_labels() -> dict:
    path = os.path.dirname(os.path.abspath(__file__))
    # See
    path = os.path.join(path, 'imagenet_class_index.json')
    with open(path, 'r') as f:
        labels = json.load(f)
    return labels

def iter_files(path: str) -> str:
    path = path[:-1] if path.endswith('/') else path
    files = glob.glob(f'{path}/*.npy')
    for file in files:
        yield file

def predict(model: DLRModel, image: np.ndarray) -> np.ndarray:

def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser()
    parser.add_argument('--test_dir', type=str)
    parser.add_argument('--interval', type=int, default=300)
    args, _ = parser.parse_known_args()
    return args

def start(model: DLRModel, path: str, labels: dict) -> None:
    for file in iter_files(path):
        image = np.load(file)
        y = predict(model, image)
        index = int(np.argmax(y))
        label = labels.get(str(index), '')
        print(f'Prediction result of {file}: {label}')

if __name__ == '__main__':
    args = parse_args()
    print(f'args: {args}')
    model = load_model()
    labels = load_labels()

    if args.interval == 0:
        start(model, args.test_dir, labels)
        while True:
            start(model, args.test_dir, labels)
            print(f'Sleep in {args.interval} seconds...')

Please note that PyTorch expects torch.Tensor as input data, while models compiled by SageMaker Neo expect numpy.ndarray. For information on the input shapes of PyTorch’s pre-trained models, refer to the official documentation.

In this example, inference is periodically performed for /greengrass/v2/work/vgg16-inference-component/images/*.npy placed in your Greengrass Core using the --interval argument. you may also refer to aws.greengrass.DLRImageClassification or aws.greengrass.DLRObjectDetection, which are AWS-provided components.

To register an inference Greengrass Component, upload a zip file containing the inference script and associated files to your S3 bucket.

% cd vgg16-inference-component
% zip imagenet_class_index.json
% aws s3 cp s3://{YOUR_BUCKET}/artifacts/

Training with SageMaker

Install SageMaker Python SDK with the following command.

% pip install sagemaker

To queue a SageMaker training job within your local environment, write with the following content. Of course, you can also use SageMaker management console instead, which is not covered in this post.

from sagemaker.pytorch import PyTorch

AWS_ACCOUNT_ID = '123456789012'
S3_BUCKET = f's3://sagemaker-ml-model-artifacts-{AWS_ACCOUNT_ID}-ap-northeast-1’

if __name__ == '__main__':
    pytorch_estimator = PyTorch(

After running the script, you would see billing information in the execution log. For example, 255 seconds would be billed in the following log.

% python


2022-02-16 15:41:56 Uploading - Uploading generated training model
2022-02-16 15:42:56 Completed - Training job completed
ProfilerReport-1645025749: NoIssuesFound
Training seconds: 255
Billable seconds: 255

The model will be saved in your S3 bucket (output/model.tar.gz). It will then be compiled and optimized using SageMaker Neo.

Compiling model with SageMaker Neo

Create a SageMaker compilation job. In this example, the job took about 4 minutes to complete.

Specify the input configuration according to the following:

ArtifactS3 URI of model.tar.gz
Input shapeModel input shape
Framework version1.8

For input shape, the official documentation describes the following.

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224.

Specify the output configuration as you like.

Although I could not find the information in AWS official documentation, it was commented in the AWS Forum.

You will see the following at the bottom of the page.

The library compiled by Sagemaker Neo with target rasp4b returns “ELF-64 bit LSB pie executable, ARM aarch64, version 1 (GNU/Linux), dynamically linked,

You can leave the following default.

Packaging model with SageMaker Edge Manager

Create a SageMaker Edge Packaging Job.

Enter the SageMaker Neo compilation job name.

If you choose Greengrass V2 component as deploy preset, the compiled model will be:

  • Registered as Greengrass V2 component by SageMaker Edge.
  • Saved to /greengrass/v2/work/vgg16-component/ on the Greengrass Core.

Setting up Greengrass Core

Set up Greengrass Core on your edge device. This post uses an EC2 instance running Ubuntu 20.04.03. For detailed instructions on how to install Greengrass Core, please refer to the official documentation.

Please note that MQTT over TLS uses port 8883. If the port is not open, you will need to follow the manual setup guide.

Install JDK.

% sudo apt install default-jdk
% java -version

Add an user and group for Greengrass Core.

% sudo useradd --system --create-home ggc_user
% sudo groupadd --system ggc_group

Configure an AWS credential.

% # Set the credential of greengrass-core-setup-user already provisioned by CloudFormation

Install Greengrass Core.

% curl -s >
% unzip -d GreengrassInstaller && rm
% sudo -E java -Droot="/greengrass/v2" \
  -jar ./GreengrassInstaller/lib/Greengrass.jar \
  --aws-region ap-northeast-1 \
  --thing-name MyGreengrassCore \
  --thing-group-name MyGreengrassCoreGroup \
  --thing-policy-name GreengrassV2IoTThingPolicy \
  --tes-role-name GreengrassV2TokenExchangeRole \
  --tes-role-alias-name GreengrassCoreTokenExchangeRoleAlias \
  --component-default-user ggc_user:ggc_group \
  --provision true \
  --setup-system-service true

Check the Greengrass Core service. Considering the memory usage, it is better that the core device has 2GB or more memory.

% sudo systemctl status greengrass
● greengrass.service - Greengrass Core
     Loaded: loaded (/etc/systemd/system/greengrass.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2022-02-16 05:09:16 UTC; 1 day 2h ago
   Main PID: 1454 (sh)
      Tasks: 51 (limit: 2197)
     Memory: 734.2M
     CGroup: /system.slice/greengrass.service

This post uses automatic resource provisioning, so the following AWS resources have been automatically provisioned. Instead, you can set up by manual resource provisioning.

Thing GroupMyGreengrassCoreGroup
Thing PolicyGreengrassV2IoTThingPolicy
Token Exchange RoleGreengrassV2TokenExchangeRole
Token Exchange Role AliasGreengrassCoreTokenExchangeRoleAlias

Registering Greengrass Component for edge inference

Create recipe.yaml to register a Greengrass Component for inference. For more information on the component recipe, please refer to the official documentation.

RecipeFormatVersion: '2020-01-25'

ComponentName: vgg16-inference-component
ComponentVersion: 1.0.0
ComponentDescription: Inference component for VGG16
ComponentPublisher: Iret

# Arguments to be passed.
    Interval: 60

# Dependencies which will be installed with this component.
    VersionRequirement: ">=1.6.5 <1.7.0"
    DependencyType: HARD
    VersionRequirement: ">=1.0.0"
    DependencyType: HARD

- Name: Linux
    os: linux
      RequiresPrivilege: true
      Script: |
        . {variant.DLR:configuration:/MLRootPath}/greengrass_ml_dlr_venv/bin/activate
        python3 -u {artifacts:decompressedPath}/vgg16-inference-component-1.0.0/ --interval {configuration:/Interval} --test_dir {work:path}/images/
  - Uri: s3://sagemaker-ml-model-artifacts-123456789012-ap-northeast-1/artifacts/
    Unarchive: ZIP

Interval in the above example means inference interval.

You can specify component dependencies in ComponentDependencies. In this example, the following must be specified.

  • variant.DLR
    • Needed for loading models compiled by SageMaker Neo. For more details, please refer to the official documentation.
    • Has Python virtual environment in /greengrass/v2/work/variant.DLR/greengrass_ml/greengrass_ml_dlr_venv on your Greengrass Core device.
  • vgg16-component
    • Model compiled by SageMaker Neo
    • Registered by SageMaker Edge Manager

After creating recipe.yaml, create the Greengrass component.

When deploying the component, Greengrass Core will download and extract artifacts from S3. Greengrass Core validates checksum of artifacts, so if the artifacts are directly overwritten, the component status will be broken. Please refer to the official documentation for more details.

Deploying Greengrass Component

In this step, you can choose components to deploy.

For My components, specify the following. vgg16-component will be installed even if you do not choose because vgg16-inference-component recipe has the HARD dependency for it.

vgg16-componentThe VGG16 component packaged by SageMaker Edge Manager
vgg16-inference-componentThe inference component

For Public components, specify the following. variant.DLR will be installed even if you do not choose because vgg16-inference-component recipe has the HARD dependency for it.

variant.DLRThe component necessary for loading the model
aws.greengrass.NucleusThe component necessary for Greengrass Core

Press Next with no configuration changes.

Press Next with no configuration changes.

After reviewing, press Deploy to start deploying components.


To test inference on your Greengrass Core, follow the steps:

  1. Pre-trained PyTorch models expect a 4-dimensional tensor (N, C, H, W) as input shape, so convert images for inference into Numpy Array. For more information, please refer to the official documentation.
  2. Transfer the converted data to /greengrass/v2/work/vgg16-inference-component/images/ on your Greengrass Core device.
  3. Check the /greengrass/v2/logs/vgg16-inference-component.log file on your Greengrass Core device.

You can use the following python script to convert images into Numpy Array.

import argparse
import os
from PIL import Image

import numpy as np
import torch
from torchvision import transforms

def load_image_to_tensor(path: str) -> torch.Tensor:
    preprocess = transforms.Compose([
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),

    img =
    tensor_3d = preprocess(img)
    return torch.unsqueeze(tensor_3d, 0)

def save(tensor: torch.Tensor, path: str) -> None:, tensor.numpy())

def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser()
    parser.add_argument('image', type=str)
    args, _ = parser.parse_known_args()
    return args

if __name__ == '__main__':
    args = parse_args()
    image = args.image

    tensor = load_image_to_tensor(image)
    save(tensor, os.path.basename(image) + '.npy')

Run the script with the following command.

% python <YOUR_IMAGE>

Transfer the converted Numpy Array data to your Greengrass Core device. Then, you can see the inference result in /greengrass/v2/logs/vgg16-inference-component.log.

% scp xxx.jpg.npy <GREENGRASS_HOST>://greengrass/v2/work/vgg16-inference-component/images/
% tail -f /greengrass/v2/logs/vgg16-inference-component.log


2022-02-19T21:32:21.993Z [INFO] (Copier) vgg16-inference-component: stdout. Prediction result of /greengrass/v2/work/vgg16-inference-component/images/keyboard.jpg.npy: ['n03085013', 'computer_keyboard']. {scriptName=services.vgg16-inference-component.lifecycle.Run.Script, serviceName=vgg16-inference-component, currentState=RUNNING}
2022-02-19T21:32:22.257Z [INFO] (Copier) vgg16-inference-component: stdout. Prediction result of /greengrass/v2/work/vgg16-inference-component/images/pen.jpg.npy: ['n03388183', 'fountain_pen']. {scriptName=services.vgg16-inference-component.lifecycle.Run.Script, serviceName=vgg16-inference-component, currentState=RUNNING}

This post used the following two images.

The inference result is computer_keyboard.

The inference result is fountain_pen.


AWS users can easily implement the edge inference feature provided by IoT Greengrass V2 and the SageMaker ecosystem.

I hope you will find this post useful.


  1. CACM. Shrinking Artificial Intelligence.




Takahiro Iwasa

Takahiro Iwasa

Software Developer at KAKEHASHI Inc.
Involved in the requirements definition, design, and development of cloud-native applications using AWS. Now, building a new prescription data collection platform at KAKEHASHI Inc. Japan AWS Top Engineers 2020-2023.