【機械学習】Jetson Nano 2GB チュートリアルで物体認識・物体検出・画像解析【Hello AI World 】

いよいよ使ってみましょう!

前回は、jetson-inferenceというDockerを入れました

【機械学習】Jetson Nano Developer Kit のはじめの一歩 Docker導入【Hello AI World】(7)
これまでの一覧は、こちらに。。 インストールやちょっとしたコマンド類や周辺機器のチェックができたら。。 ここのページ...

これには、以下の3つの機械学習済みモデルが入っています

これらの機能を、まずは、あんまり考えずに、ただただ使ってみましょう!

ちなみに、ホームから、前回入れたDockerを動かして

$ cd jetson-inference
$ docker/run.sh

Dockerから

# cd build/aarch64/bin

まで、行っておきましょう。そこに、コマンドが入っています。なお、ここでは、USBカメラがついていると仮定しています。

デモを動かしてみます。

# ./imagenet /dev/video0

いろんなものをカメラにうつしてみると。。こんな感じで認識してくれます。

# ./detectnet /dev/video0

今度は、場所に何があるか?を検出認識してくれます

# ./segnet --network=fcn-resnet18-mhp  /dev/video0

今度は、部屋の中にあるものとか人を検出して、分割して表示してくれるのですが。。
これなかなかまだ難しいかもですね。

では、それぞれの説明は。。

難しいので、それぞれを見てもらえればと思いますが。。

jetson-inference

C++ Python
Image Recognition imageNet imageNet
Object Detection detectNet detectNet
Segmentation segNet segNet

ImageNet

./imagenet --help

と打つと、使い方が出てきます。

usage: imagenet [–help] [–network=NETWORK] input_URI [output_URI]

Classify a video/image stream using an image recognition DNN.
See below for additional arguments that may not be shown above.

画像認識 DNN を使用してビデオ/画像ストリームを分類します。
上に表示されていない可能性のある追加の引数については、以下を参照してください。

positional arguments:
input_URI resource URI of input stream (see videoSource below)
output_URI resource URI of output stream (see videoOutput below)

imageNet arguments:
–network=NETWORK pre-trained model to load, one of the following:
* alexnet
* googlenet (default)
* googlenet-12
* resnet-18
* resnet-50
* resnet-101
* resnet-152
* vgg-16
* vgg-19
* inception-v4
–model=MODEL path to custom model to load (caffemodel, uff, or onnx)
–prototxt=PROTOTXT path to custom prototxt to load (for .caffemodel only)
–labels=LABELS path to text file containing the labels for each class
–input-blob=INPUT name of the input layer (default is ‘data’)
–output-blob=OUTPUT name of the output layer (default is ‘prob’)
–batch-size=BATCH maximum batch size (default is 1)
–profile enable layer profiling in TensorRT

videoSource arguments:
input_URI resource URI of the input stream, for example:
* /dev/video0 (V4L2 camera #0)
* csi://0 (MIPI CSI camera #0)
* rtp://@:1234 (RTP stream)
* rtsp://user:pass@ip:1234 (RTSP stream)
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
–input-width=WIDTH explicitly request a width of the stream (optional)
–input-height=HEIGHT explicitly request a height of the stream (optional)
–input-rate=RATE explicitly request a framerate of the stream (optional)
–input-codec=CODEC RTP requires the codec to be set, one of these:
* h264, h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–input-flip=FLIP flip method to apply to input (excludes V4L2):
* none (default)
* counterclockwise
* rotate-180
* clockwise
* horizontal
* vertical
* upper-right-diagonal
* upper-left-diagonal
–input-loop=LOOP for file-based inputs, the number of loops to run:
* -1 = loop forever
* 0 = don’t loop (default)
* >0 = set number of loops

videoOutput arguments:
output_URI resource URI of the output stream, for example:
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
* rtp://<remote-ip>:1234 (RTP stream)
* display://0 (OpenGL window)
–output-codec=CODEC desired codec for compressed output streams:
* h264 (default), h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–bitrate=BITRATE desired target VBR bitrate for compressed streams,
in bits per second. The default is 4000000 (4 Mbps)
–headless don’t create a default OpenGL GUI window

logging arguments:
–log-file=FILE output destination file (default is stdout)
–log-level=LEVEL message output threshold, one of the following:
* silent
* error
* warning
* success
* info
* verbose (default)
* debug
–verbose enable verbose logging (same as –log-level=verbose)
–debug enable debug logging (same as –log-level=debug)

ざっくり言うと、NETWORKというのが学習済みのモデルのことです。それをいろいろ指定できます。モデルによっていろいろ強み弱みがあるみたいですね。もし、モデルをインストールしてないようでしたら、以下のコマンドで追加でインストールできます。

$ cd jetson-inference/tools
$ ./download-models.sh

現在学習されているモノは、この1000個のものらしいです。

jetson-inference/data/networks/ilsvrc12_synset_words.txt at master
Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson. - dusty-nv/jetson-inference

DetectNet

こちらも使い方は、helpを見てみると

# ./detectnet --help

usage: detectnet [–help] [–network=NETWORK] [–threshold=THRESHOLD] input_URI [output_URI]

Locate objects in a video/image stream using an object detection DNN.
See below for additional arguments that may not be shown above.

物体検出 DNN を使用して、ビデオ/画像ストリーム内のオブジェクトを見つけます。
上に表示されていない可能性のある追加の引数については、以下を参照してください。

positional arguments:
input_URI resource URI of input stream (see videoSource below)
output_URI resource URI of output stream (see videoOutput below)

detectNet arguments:
–network=NETWORK pre-trained model to load, one of the following:
* ssd-mobilenet-v1
* ssd-mobilenet-v2 (default)
* ssd-inception-v2
* pednet
* multiped
* facenet
* coco-airplane
* coco-bottle
* coco-chair
* coco-dog
–model=MODEL path to custom model to load (caffemodel, uff, or onnx)
–prototxt=PROTOTXT path to custom prototxt to load (for .caffemodel only)
–labels=LABELS path to text file containing the labels for each class
–input-blob=INPUT name of the input layer (default is ‘data’)
–output-cvg=COVERAGE name of the coverge output layer (default is ‘coverage’)
–output-bbox=BOXES name of the bounding output layer (default is ‘bboxes’)
–mean-pixel=PIXEL mean pixel value to subtract from input (default is 0.0)
–batch-size=BATCH maximum batch size (default is 1)
–threshold=THRESHOLD minimum threshold for detection (default is 0.5)
–alpha=ALPHA overlay alpha blending value, range 0-255 (default: 120)
–overlay=OVERLAY detection overlay flags (e.g. –overlay=box,labels,conf)
valid combinations are: ‘box’, ‘labels’, ‘conf’, ‘none’
–profile enable layer profiling in TensorRT

videoSource arguments:
input_URI resource URI of the input stream, for example:
* /dev/video0 (V4L2 camera #0)
* csi://0 (MIPI CSI camera #0)
* rtp://@:1234 (RTP stream)
* rtsp://user:pass@ip:1234 (RTSP stream)
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
–input-width=WIDTH explicitly request a width of the stream (optional)
–input-height=HEIGHT explicitly request a height of the stream (optional)
–input-rate=RATE explicitly request a framerate of the stream (optional)
–input-codec=CODEC RTP requires the codec to be set, one of these:
* h264, h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–input-flip=FLIP flip method to apply to input (excludes V4L2):
* none (default)
* counterclockwise
* rotate-180
* clockwise
* horizontal
* vertical
* upper-right-diagonal
* upper-left-diagonal
–input-loop=LOOP for file-based inputs, the number of loops to run:
* -1 = loop forever
* 0 = don’t loop (default)
* >0 = set number of loops

videoOutput arguments:
output_URI resource URI of the output stream, for example:
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
* rtp://<remote-ip>:1234 (RTP stream)
* display://0 (OpenGL window)
–output-codec=CODEC desired codec for compressed output streams:
* h264 (default), h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–bitrate=BITRATE desired target VBR bitrate for compressed streams,
in bits per second. The default is 4000000 (4 Mbps)
–headless don’t create a default OpenGL GUI window

logging arguments:
–log-file=FILE output destination file (default is stdout)
–log-level=LEVEL message output threshold, one of the following:
* silent
* error
* warning
* success
* info
* verbose (default)
* debug
–verbose enable verbose logging (same as –log-level=verbose)
–debug enable debug logging (same as –log-level=debug)

さて、こちらは、モデルがそれぞれ特徴のあるモデルになっています。

Model Object classes
SSD-Mobilenet-v1 91 (COCO classes)に分類してます
SSD-Mobilenet-v2 91 (COCO classes)に分類してます
SSD-Inception-v2 91 (COCO classes)に分類してます
DetectNet-COCO-Dog dogs 犬・猫などを分類します
DetectNet-COCO-Bottle bottles ボトル類を分類します
DetectNet-COCO-Chair chairs 椅子などを分類します
DetectNet-COCO-Airplane airplanes 飛行機などを分類します
ped-100 pedestrians 歩行者を分類します
multiped-500 pedestrians, luggage 歩行者と荷物
facenet-120 faces 顔を分類します。

モデルがダウンロードしてない場合は、上で紹介した方法でダウンロードできます。

Segnet

# ./segnet --help

usage: segnet [–help] [–network NETWORK] input_URI [output_URI]

Segment and classify a video/image stream using a semantic segmentation DNN.
See below for additional arguments that may not be shown above.

positional arguments:
input_URI resource URI of input stream (see videoSource below)
output_URI resource URI of output stream (see videoOutput below)

segNet arguments:
–network=NETWORK pre-trained model to load, one of the following:
* fcn-resnet18-cityscapes-512×256
* fcn-resnet18-cityscapes-1024×512
* fcn-resnet18-cityscapes-2048×1024
* fcn-resnet18-deepscene-576×320
* fcn-resnet18-deepscene-864×480
* fcn-resnet18-mhp-512×320
* fcn-resnet18-mhp-640×360
* fcn-resnet18-voc-320×320 (default)
* fcn-resnet18-voc-512×320
* fcn-resnet18-sun-512×400
* fcn-resnet18-sun-640×512
–model=MODEL path to custom model to load (caffemodel, uff, or onnx)
–prototxt=PROTOTXT path to custom prototxt to load (for .caffemodel only)
–labels=LABELS path to text file containing the labels for each class
–colors=COLORS path to text file containing the colors for each class
–input-blob=INPUT name of the input layer (default: ‘data’)
–output-blob=OUTPUT name of the output layer (default: ‘score_fr_21classes’)
–batch-size=BATCH maximum batch size (default is 1)
–alpha=ALPHA overlay alpha blending value, range 0-255 (default: 150)
–visualize=VISUAL visualization flags (e.g. –visualize=overlay,mask)
valid combinations are: ‘overlay’, ‘mask’
–profile enable layer profiling in TensorRT

videoSource arguments:
input_URI resource URI of the input stream, for example:
* /dev/video0 (V4L2 camera #0)
* csi://0 (MIPI CSI camera #0)
* rtp://@:1234 (RTP stream)
* rtsp://user:pass@ip:1234 (RTSP stream)
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
–input-width=WIDTH explicitly request a width of the stream (optional)
–input-height=HEIGHT explicitly request a height of the stream (optional)
–input-rate=RATE explicitly request a framerate of the stream (optional)
–input-codec=CODEC RTP requires the codec to be set, one of these:
* h264, h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–input-flip=FLIP flip method to apply to input (excludes V4L2):
* none (default)
* counterclockwise
* rotate-180
* clockwise
* horizontal
* vertical
* upper-right-diagonal
* upper-left-diagonal
–input-loop=LOOP for file-based inputs, the number of loops to run:
* -1 = loop forever
* 0 = don’t loop (default)
* >0 = set number of loops

videoOutput arguments:
output_URI resource URI of the output stream, for example:
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
* rtp://<remote-ip>:1234 (RTP stream)
* display://0 (OpenGL window)
–output-codec=CODEC desired codec for compressed output streams:
* h264 (default), h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–bitrate=BITRATE desired target VBR bitrate for compressed streams,
in bits per second. The default is 4000000 (4 Mbps)
–headless don’t create a default OpenGL GUI window

logging arguments:
–log-file=FILE output destination file (default is stdout)
–log-level=LEVEL message output threshold, one of the following:
* silent
* error
* warning
* success
* info
* verbose (default)
* debug
–verbose enable verbose logging (same as –log-level=verbose)
–debug enable debug logging (same as –log-level=debug)

こんどは、使われるシーン毎でモデルを変えた方が良さそうですね。

Dataset Resolution CLI Argument Accuracy Jetson Nano Jetson Xavier
Cityscapes 512×256 fcn-resnet18-cityscapes-512x256 83.3% 48 FPS 480 FPS
Cityscapes 1024×512 fcn-resnet18-cityscapes-1024x512 87.3% 12 FPS 175 FPS
Cityscapes 2048×1024 fcn-resnet18-cityscapes-2048x1024 89.6% 3 FPS 47 FPS
DeepScene 576×320 fcn-resnet18-deepscene-576x320 96.4% 26 FPS 360 FPS
DeepScene 864×480 fcn-resnet18-deepscene-864x480 96.9% 14 FPS 190 FPS
Multi-Human 512×320 fcn-resnet18-mhp-512x320 86.5% 34 FPS 370 FPS
Multi-Human 640×360 fcn-resnet18-mhp-512x320 87.1% 23 FPS 325 FPS
Pascal VOC 320×320 fcn-resnet18-voc-320x320 85.9% 45 FPS 508 FPS
Pascal VOC 512×320 fcn-resnet18-voc-512x320 88.5% 34 FPS 375 FPS
SUN RGB-D 512×400 fcn-resnet18-sun-512x400 64.3% 28 FPS 340 FPS
SUN RGB-D 640×512 fcn-resnet18-sun-640x512 65.1% 17 FPS 224 FPS

まぁ。。また長くなったので。。このあたりで。。