Kinesis Video Streams と Rekognition Video を利用して Raspberry Pi に接続された USB カメラで顔検出を実装

Takahiro Iwasa (岩佐孝浩)

2023年12月9日

11 min read

Kinesis Video Streams Rekognition Video

Kinesis Video Streams と Rekognition Video を使用して、 Raspberry Pi に接続された USB カメラで顔検出を実装してみました。実装方法を紹介します。

Searching faces in a collection in streaming video - Amazon Rekognition

docs.aws.amazon.com

Searching faces in a collection in streaming video - Amazon Rekognition

Overview of Amazon Rekognition face search in streaming video.

Overview

前提条件

Video Producer

この投稿では、以下を Video Producer として利用します。

Raspberry Pi 4B with 4GB RAM
- Ubuntu 23.10 (Raspberry Pi Imager を利用してインストール)
USB カメラ
GStreamer
- Amazon Kinesis Video Streams CPP Producer, GStreamer Plugin and JNI

AWS リソース

この投稿では、以下を利用します。

AWS SAM CLI
Python 3.11

Creating SAM Application

この投稿で使用されているコードは、私の GitHub リポジトリから取得できます。

Directory Structure

/
|-- src/
|   |-- app.py
|   `-- requirements.txt
|-- samconfig.toml
`-- template.yaml

AWS SAM Template

AWSTemplateFormatVersion: 2010-09-09
Transform: AWS::Serverless-2016-10-31
Description: face-detector-using-kinesis-video-streams

Resources:
  Function:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: face-detector-function
      CodeUri: src/
      Handler: app.lambda_handler
      Runtime: python3.11
      Architectures:
        - arm64
      Timeout: 3
      MemorySize: 128
      Role: !GetAtt FunctionIAMRole.Arn
      Events:
        KinesisEvent:
          Type: Kinesis
          Properties:
            Stream: !GetAtt KinesisStream.Arn
            MaximumBatchingWindowInSeconds: 10
            MaximumRetryAttempts: 3
            StartingPosition: LATEST

  FunctionIAMRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: face-detector-function-role
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service: lambda.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
        - arn:aws:iam::aws:policy/service-role/AWSLambdaKinesisExecutionRole
      Policies:
        - PolicyName: policy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - kinesisvideo:GetHLSStreamingSessionURL
                  - kinesisvideo:GetDataEndpoint
                Resource: !GetAtt KinesisVideoStream.Arn

  KinesisVideoStream:
    Type: AWS::KinesisVideo::Stream
    Properties:
      Name: face-detector-kinesis-video-stream
      DataRetentionInHours: 24

  RekognitionCollection:
    Type: AWS::Rekognition::Collection
    Properties:
      CollectionId: FaceCollection

  RekognitionStreamProcessor:
    Type: AWS::Rekognition::StreamProcessor
    Properties:
      Name: face-detector-rekognition-stream-processor
      KinesisVideoStream:
        Arn: !GetAtt KinesisVideoStream.Arn
      KinesisDataStream:
        Arn: !GetAtt KinesisStream.Arn
      RoleArn: !GetAtt RekognitionStreamProcessorIAMRole.Arn
      FaceSearchSettings:
        CollectionId: !Ref RekognitionCollection
        FaceMatchThreshold: 80
      DataSharingPreference:
        OptIn: false

  KinesisStream:
    Type: AWS::Kinesis::Stream
    Properties:
      Name: face-detector-kinesis-stream
      StreamModeDetails:
        StreamMode: ON_DEMAND

  RekognitionStreamProcessorIAMRole:
    Type: AWS::IAM::Role
    Properties:
      RoleName: face-detector-rekognition-stream-processor-role
      AssumeRolePolicyDocument:
        Version: 2012-10-17
        Statement:
          - Effect: Allow
            Principal:
              Service: rekognition.amazonaws.com
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/service-role/AmazonRekognitionServiceRole
      Policies:
        - PolicyName: policy
          PolicyDocument:
            Version: 2012-10-17
            Statement:
              - Effect: Allow
                Action:
                  - kinesis:PutRecord
                  - kinesis:PutRecords
                Resource:
                  - !GetAtt KinesisStream.Arn

Python Script

requirements.txt

空のままにしてください。

app.py

Rekognition Video ストリームプロセッサは、検出された顔のデータを Kinesis Data Stream にストリーミングし、 Base64 文字列として取得できます（18行目）。データ構造に関する情報は、公式ドキュメントをご参照ください。

Lambda 関数は、 KinesisVideoArchivedMedia#get_hls_streaming_session_url API を使用して HLS URL を生成します（52-64行目）。

import base64
import json
import logging
from datetime import datetime, timedelta, timezone
from functools import cache

import boto3

JST = timezone(timedelta(hours=9))
kvs_client = boto3.client('kinesisvideo')
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)


def lambda_handler(event: dict, context: dict) -> dict:
    for record in event['Records']:
        base64_data = record['kinesis']['data']
        stream_processor_event = json.loads(base64.b64decode(base64_data).decode())
        # For more information the result, please refer to https://docs.aws.amazon.com/rekognition/latest/dg/streaming-video-kinesis-output.html

        if not stream_processor_event['FaceSearchResponse']:
            continue

        logger.info(stream_processor_event)
        url = get_hls_streaming_session_url(stream_processor_event)
        logger.info(url)

    return {
        'statusCode': 200,
    }


@cache
def get_kvs_am_client(api_name: str, stream_arn: str):
    # See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kinesisvideo/client/get_data_endpoint.html
    endpoint = kvs_client.get_data_endpoint(
        APIName=api_name.upper(),
        StreamARN=stream_arn
    )['DataEndpoint']
    return boto3.client('kinesis-video-archived-media', endpoint_url=endpoint)


def get_hls_streaming_session_url(stream_processor_event: dict) -> str:
    # See https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/kinesis-video-archived-media/client/get_hls_streaming_session_url.html

    kinesis_video = stream_processor_event['InputInformation']['KinesisVideo']
    stream_arn = kinesis_video['StreamArn']
    kvs_am_client = get_kvs_am_client('get_hls_streaming_session_url', stream_arn)
    start_timestamp = datetime.fromtimestamp(kinesis_video['ServerTimestamp'], JST)
    end_timestamp = datetime.fromtimestamp(kinesis_video['ServerTimestamp'], JST) + timedelta(minutes=1)

    return kvs_am_client.get_hls_streaming_session_url(
        StreamARN=stream_arn,
        PlaybackMode='ON_DEMAND',
        HLSFragmentSelector={
            'FragmentSelectorType': 'SERVER_TIMESTAMP',
            'TimestampRange': {
                'StartTimestamp': start_timestamp,
                'EndTimestamp': end_timestamp,
            },
        },
        ContainerFormat='FRAGMENTED_MP4',
        Expires=300,
    )['HLSStreamingSessionURL']

ビルドとデプロイ

以下のコマンドでビルドおよびデプロイしてください。

sam build
sam deploy

Indexing Faces

USB カメラで検出したい顔をインデックスしてください。次のコマンドを実行する前に、プレースホルダー <YOUR_BUCKET>、<YOUR_OBJECT>、および <PERSON_ID> を実際の値に置き換えてください。

aws rekognition index-faces \
  --image '{"S3Object": {"Bucket": "<YOUR_BUCKET>", "Name": "<YOUR_OBJECT>"}}' \
  --collection-id FaceCollection \
  --external-image-id <PERSON_ID>

Rekognition は実際の画像をコレクションに保存しません。

Adding faces to a collection - Amazon Rekognition

docs.aws.amazon.com

Adding faces to a collection - Amazon Rekognition

You can use the IndexFaces operation to detect faces in an image and add them to a collection. For each face detected, Amazon Rekognition extracts facial features and stores the feature information in a database. In addition, the command stores metadata for each face that's detected in the specified face collection. Amazon Rekognition doesn't store the actual image bytes.

For each face detected, Amazon Rekognition extracts facial features and stores the feature information in a database. In addition, the command stores metadata for each face that’s detected in the specified face collection. Amazon Rekognition doesn’t store the actual image bytes.

Video Producer セットアップ

この投稿では、 Ubuntu 23.10 をインストールした Raspberry Pi 4B (4GB) をビデオプロデューサーとして使用します。

AWS 提供の Amazon Kinesis Video Streams CPP Producer, GStreamer Plugin and JNI をビルドしてください。

AWS は GStreamer プラグインを含む Docker イメージを提供していますが、私の Raspberry Pi では動作しませんでした。

GStreamer Plugin ビルド

以下のコマンドを実行してプラグインをビルドしてください。ご使用のマシンの仕様によりますが、20分以上かかる場合があります。

sudo apt update
sudo apt upgrade
sudo apt install \
  make \
  cmake \
  build-essential \
  m4 \
  autoconf \
  default-jdk
sudo apt install \
  libssl-dev \
  libcurl4-openssl-dev \
  liblog4cplus-dev \
  libgstreamer1.0-dev \
  libgstreamer-plugins-base1.0-dev \
  gstreamer1.0-plugins-base-apps \
  gstreamer1.0-plugins-bad \
  gstreamer1.0-plugins-good \
  gstreamer1.0-plugins-ugly \
  gstreamer1.0-tools

git clone https://github.com/awslabs/amazon-kinesis-video-streams-producer-sdk-cpp.git
mkdir -p amazon-kinesis-video-streams-producer-sdk-cpp/build
cd amazon-kinesis-video-streams-producer-sdk-cpp/build

sudo cmake .. -DBUILD_GSTREAMER_PLUGIN=ON -DBUILD_JNI=TRUE
sudo make

以下のコマンドでビルド結果を確認してください。

cd ~/amazon-kinesis-video-streams-producer-sdk-cpp
export GST_PLUGIN_PATH=`pwd`/build
export LD_LIBRARY_PATH=`pwd`/open-source/local/lib
gst-inspect-1.0 kvssink

以下が表示されるはずです。

Factory Details:
  Rank                     primary + 10 (266)
  Long-name                KVS Sink
  Klass                    Sink/Video/Network
  Description              GStreamer AWS KVS plugin
  Author                   AWS KVS <[email protected]>
...

次回起動のために、 ~/.profile に export <XXX_PATH> を追加すると便利です。

echo "" >> ~/.profile
echo "# GStreamer" >> ~/.profile
echo "export GST_PLUGIN_PATH=$GST_PLUGIN_PATH" >> ~/.profile
echo "export LD_LIBRARY_PATH=$LD_LIBRARY_PATH" >> ~/.profile

GStreamer 実行

USB カメラをデバイスに接続し、次のコマンドを実行してください。

ビデオデータの品質向上は、コストアップの可能性があることにご注意ください。

gst-launch-1.0 -v v4l2src device=/dev/video0 \
  ! videoconvert \
  ! video/x-raw,format=I420,width=320,height=240,framerate=5/1 \
  ! x264enc bframes=0 key-int-max=45 bitrate=500 tune=zerolatency \
  ! video/x-h264,stream-format=avc,alignment=au \
  ! kvssink stream-name=<KINESIS_VIDEO_STREAM_NAME> storage-size=128 access-key="<YOUR_ACCESS_KEY>" secret-key="<YOUR_SECRET_KEY>" aws-region="<YOUR_AWS_REGION>"

カメラからのライブストリーミングビデオを Kinesis Video Streams のマネジメントコンソールで確認できます。

テスト

Rekognition Video ストリームプロセッサ起動

Rekognition Video ストリームプロセッサを起動してください。これにより、 Kinesis Video Stream をサブスクライブし、顔コレクションを使用して顔を検出し、結果を Kinesis Data Stream にストリーミングします。

aws rekognition start-stream-processor --name face-detector-rekognition-stream-processor

ストリームプロセッサのステータスが "Status": "RUNNING" であることを確認してください。

aws rekognition describe-stream-processor --name face-detector-rekognition-stream-processor | grep "Status"

顔撮影

USB カメラで顔を撮影後、ビデオデータは以下の順序で処理されます。

ビデオデータが Kinesis Video Stream にストリーミングされます。
ストリームされたデータは Rekognition Video ストリームプロセッサで処理されます。
ストリームプロセッサの結果は Kinesis Data Stream にストリーミングされます。
Lambda 関数によって HLS の URL が生成されます。

以下のコマンドで Lambda 関数のログレコードを確認してください。

sam logs -n Function --stack-name face-detector-using-kinesis-video-streams --tail

ログレコードには、以下のようなストリームプロセッサのイベントデータが含まれています。

{
    "InputInformation": {
        "KinesisVideo": {
            "StreamArn": "arn:aws:kinesisvideo:<AWS_REGION>:<AWS_ACCOUNT_ID>:stream/face-detector-kinesis-video-stream/xxxxxxxxxxxxx",
            "FragmentNumber": "91343852333181501717324262640137742175000164731",
            "ServerTimestamp": 1702208586.022,
            "ProducerTimestamp": 1702208585.699,
            "FrameOffsetInSeconds": 0.0,
        }
    },
    "StreamProcessorInformation": {"Status": "RUNNING"},
    "FaceSearchResponse": [
        {
            "DetectedFace": {
                "BoundingBox": {
                    "Height": 0.4744676,
                    "Width": 0.29107505,
                    "Left": 0.33036956,
                    "Top": 0.19599175,
                },
                "Confidence": 99.99677,
                "Landmarks": [
                    {"X": 0.41322955, "Y": 0.33761832, "Type": "eyeLeft"},
                    {"X": 0.54405355, "Y": 0.34024307, "Type": "eyeRight"},
                    {"X": 0.424819, "Y": 0.5417343, "Type": "mouthLeft"},
                    {"X": 0.5342691, "Y": 0.54362005, "Type": "mouthRight"},
                    {"X": 0.48934412, "Y": 0.43806323, "Type": "nose"},
                ],
                "Pose": {"Pitch": 5.547308, "Roll": 0.85795176, "Yaw": 4.76913},
                "Quality": {"Brightness": 57.938313, "Sharpness": 46.0298},
            },
            "MatchedFaces": [
                {
                    "Similarity": 99.986176,
                    "Face": {
                        "BoundingBox": {
                            "Height": 0.417963,
                            "Width": 0.406223,
                            "Left": 0.28826,
                            "Top": 0.242463,
                        },
                        "FaceId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
                        "Confidence": 99.996605,
                        "ImageId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
                        "ExternalImageId": "iwasa",
                    },
                }
            ],
        }
    ],
}

ログレコード内には、以下のような HLS URL も見つけることができます。

https://x-xxxxxxxx.kinesisvideo.<AWS_REGION>.amazonaws.com/hls/v1/getHLSMasterPlaylist.m3u8?SessionToken=xxxxxxxxxx

オンデマンドのビデオを視聴するには、 Safari または Edge を使用して URL を開いてください。

現時点で Chrome は HLS をネイティブでサポートしていませんが、サードパーティの拡張が利用可能です。例えば、 Native HLS Playback などがあります。

クリーンアップ

以下のコマンドを使用して、プロビジョニングされた AWS リソースを削除してください。

aws rekognition stop-stream-processor --name face-detector-rekognition-stream-processor
sam delete

料金

以下は、次の条件に基づくシミュレーションの例です。

AWS リージョンは ap-northeast-1 を使用します。
USB カメラは常にビデオデータを Kinesis Video Stream にストリーミングします。サイズは約1GB/日で、1日間保存されます。
Rekognition Video ストリームプロセッサは常に Kinesis Video Stream を解析します。
顔コレクションには10個の顔データがあります。
Kinesis Data Stream は常に1シャードをプロビジョニングします。
Lambda (Arm/128MB) 関数は毎秒呼び出されます。請求可能な実行時間は約100ミリ秒です。
ユーザーは毎日1時間、 HLS URL を使用して、オンデマンドのビデオを視聴します。

Kinesis Video Streams

Pricing Table

項目	料金
Data Ingested into Kinesis Video Streams (per GB data ingested)	$0.01097
Data Consumed from Kinesis Video Streams (per GB data egressed)	$0.01097
Data Consumed from Kinesis Video Streams using HLS (per GB data egressed)	$0.01536
Data Stored in Kinesis Video Streams (per GB-Month data stored)	$0.025

シミュレーション

項目	式	料金
Data Ingested	1 GB * 31 days * $0.01097	$0.34007
Data Consumed	1 GB * 31 days * $0.01097	$0.34007
Data Consumed using HLS	1 hour * 31 days * $0.01536	$0.47616
Data Stored	1 GB * 31 days * $0.025	$0.775
計	-	$1.9

Rekognition Video

Pricing Table

項目	料金
Face Vector Storage	$0.000013/face metadata per month
Face Search	$0.15/min

シミュレーション

項目	式	料金
Face Vector Storage	10 faces * $0.000013	$0.00013
Face Search	60 minutes * 24 hours * 31 days * $0.15	$6,696
計	-	$6,696

ストリーミングされる動画で顔を検出したい場合、動体検知を使用することを強くお勧めします。

Kinesis Data Streams

Pricing Table

項目	料金
Shard Hour (1MB/second ingress, 2MB/second egress)	$0.0195
PUT Payload Units, per 1,000,000 units	$0.0215

シミュレーション

項目	式	料金
Shard Hour	1 shard * 24 hours * 31 days * $0.0195	$14.508
PUT Payload Units	1 unit * 60 seconds * 60 minutes * 24 hours * 31 days * 0.0215 / 1000000	$0.0575856
計	-	$14.6

Lambda

Pricing Table

項目	料金
Price per 1ms (128MB)	$0.0000000017

シミュレーション

項目	式	料金
Price per 1ms	100ms * 60 seconds * 60 minutes * 24 hours * 31 days * $0.0000000017	$0.455328
計	-	$0.5