【機械学習】Jetson Nano 2GB　チュートリアルで物体認識・物体検出・画像解析【Hello AI World 】

いよいよ使ってみましょう！

前回は、jetson-inferenceというDockerを入れました

【機械学習】Jetson Nano Developer Kit　のはじめの一歩　Docker導入【Hello AI World】（７）

これには、以下の３つの機械学習済みモデルが入っています

Classifying Images with ImageNet（物体認識：何が画像に映っているかを判別する）
Locating Objects with DetectNet（物体検出：画像のどこに何が映っているかを検出する）
Semantic Segmentation with SegNet（画像解析：画像を物体検出して、画像を分割する）

これらの機能を、まずは、あんまり考えずに、ただただ使ってみましょう！

ちなみに、ホームから、前回入れたDockerを動かして

$ cd jetson-inference
$ docker/run.sh

Dockerから

#　cd build/aarch64/bin

まで、行っておきましょう。そこに、コマンドが入っています。なお、ここでは、USBカメラがついていると仮定しています。

デモを動かしてみます。

#　./imagenet /dev/video0

いろんなものをカメラにうつしてみると。。こんな感じで認識してくれます。

#　./detectnet /dev/video0

今度は、場所に何があるか？を検出認識してくれます

#　./segnet --network=fcn-resnet18-mhp　 /dev/video0

今度は、部屋の中にあるものとか人を検出して、分割して表示してくれるのですが。。
これなかなかまだ難しいかもですね。

では、それぞれの説明は。。

難しいので、それぞれを見てもらえればと思いますが。。

jetson-inference

	C++	Python
Image Recognition	`imageNet`	`imageNet`
Object Detection	`detectNet`	`detectNet`
Segmentation	`segNet`	`segNet`

ImageNet

./imagenet --help

と打つと、使い方が出てきます。

usage: imagenet [–help] [–network=NETWORK] input_URI [output_URI]

Classify a video/image stream using an image recognition DNN.
See below for additional arguments that may not be shown above.

画像認識 DNN を使用してビデオ/画像ストリームを分類します。
上に表示されていない可能性のある追加の引数については、以下を参照してください。

positional arguments:
input_URI resource URI of input stream (see videoSource below)
output_URI resource URI of output stream (see videoOutput below)
imageNet arguments:
–network=NETWORK pre-trained model to load, one of the following:
* alexnet
* googlenet (default)
* googlenet-12
* resnet-18
* resnet-50
* resnet-101
* resnet-152
* vgg-16
* vgg-19
* inception-v4
–model=MODEL path to custom model to load (caffemodel, uff, or onnx)
–prototxt=PROTOTXT path to custom prototxt to load (for .caffemodel only)
–labels=LABELS path to text file containing the labels for each class
–input-blob=INPUT name of the input layer (default is ‘data’)
–output-blob=OUTPUT name of the output layer (default is ‘prob’)
–batch-size=BATCH maximum batch size (default is 1)
–profile enable layer profiling in TensorRT
videoSource arguments:
input_URI resource URI of the input stream, for example:
* /dev/video0 (V4L2 camera #0)
* csi://0 (MIPI CSI camera #0)
* rtp://@:1234 (RTP stream)
* rtsp://user:pass@ip:1234 (RTSP stream)
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
–input-width=WIDTH explicitly request a width of the stream (optional)
–input-height=HEIGHT explicitly request a height of the stream (optional)
–input-rate=RATE explicitly request a framerate of the stream (optional)
–input-codec=CODEC RTP requires the codec to be set, one of these:
* h264, h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–input-flip=FLIP flip method to apply to input (excludes V4L2):
* none (default)
* counterclockwise
* rotate-180
* clockwise
* horizontal
* vertical
* upper-right-diagonal
* upper-left-diagonal
–input-loop=LOOP for file-based inputs, the number of loops to run:
* -1 = loop forever
* 0 = don’t loop (default)
* >0 = set number of loops
videoOutput arguments:
output_URI resource URI of the output stream, for example:
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
* rtp://<remote-ip>:1234 (RTP stream)
* display://0 (OpenGL window)
–output-codec=CODEC desired codec for compressed output streams:
* h264 (default), h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–bitrate=BITRATE desired target VBR bitrate for compressed streams,
in bits per second. The default is 4000000 (4 Mbps)
–headless don’t create a default OpenGL GUI window
logging arguments:
–log-file=FILE output destination file (default is stdout)
–log-level=LEVEL message output threshold, one of the following:
* silent
* error
* warning
* success
* info
* verbose (default)
* debug
–verbose enable verbose logging (same as –log-level=verbose)
–debug enable debug logging (same as –log-level=debug)

ざっくり言うと、NETWORKというのが学習済みのモデルのことです。それをいろいろ指定できます。モデルによっていろいろ強み弱みがあるみたいですね。もし、モデルをインストールしてないようでしたら、以下のコマンドで追加でインストールできます。

$ cd jetson-inference/tools
$ ./download-models.sh

現在学習されているモノは、この１０００個のものらしいです。

https://github.com/dusty-nv/jetson-inference/blob/master/data/networks/ilsvrc12_synset_words.txt

DetectNet

こちらも使い方は、helpを見てみると

# ./detectnet --help

usage: detectnet [–help] [–network=NETWORK] [–threshold=THRESHOLD]　input_URI [output_URI]

Locate objects in a video/image stream using an object detection DNN.
See below for additional arguments that may not be shown above.

物体検出 DNN を使用して、ビデオ/画像ストリーム内のオブジェクトを見つけます。
上に表示されていない可能性のある追加の引数については、以下を参照してください。

positional arguments:
input_URI resource URI of input stream (see videoSource below)
output_URI resource URI of output stream (see videoOutput below)
detectNet arguments:
–network=NETWORK pre-trained model to load, one of the following:
* ssd-mobilenet-v1
* ssd-mobilenet-v2 (default)
* ssd-inception-v2
* pednet
* multiped
* facenet
* coco-airplane
* coco-bottle
* coco-chair
* coco-dog
–model=MODEL path to custom model to load (caffemodel, uff, or onnx)
–prototxt=PROTOTXT path to custom prototxt to load (for .caffemodel only)
–labels=LABELS path to text file containing the labels for each class
–input-blob=INPUT name of the input layer (default is ‘data’)
–output-cvg=COVERAGE name of the coverge output layer (default is ‘coverage’)
–output-bbox=BOXES name of the bounding output layer (default is ‘bboxes’)
–mean-pixel=PIXEL mean pixel value to subtract from input (default is 0.0)
–batch-size=BATCH maximum batch size (default is 1)
–threshold=THRESHOLD minimum threshold for detection (default is 0.5)
–alpha=ALPHA overlay alpha blending value, range 0-255 (default: 120)
–overlay=OVERLAY detection overlay flags (e.g. –overlay=box,labels,conf)
valid combinations are: ‘box’, ‘labels’, ‘conf’, ‘none’
–profile enable layer profiling in TensorRT
videoSource arguments:
input_URI resource URI of the input stream, for example:
* /dev/video0 (V4L2 camera #0)
* csi://0 (MIPI CSI camera #0)
* rtp://@:1234 (RTP stream)
* rtsp://user:pass@ip:1234 (RTSP stream)
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
–input-width=WIDTH explicitly request a width of the stream (optional)
–input-height=HEIGHT explicitly request a height of the stream (optional)
–input-rate=RATE explicitly request a framerate of the stream (optional)
–input-codec=CODEC RTP requires the codec to be set, one of these:
* h264, h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–input-flip=FLIP flip method to apply to input (excludes V4L2):
* none (default)
* counterclockwise
* rotate-180
* clockwise
* horizontal
* vertical
* upper-right-diagonal
* upper-left-diagonal
–input-loop=LOOP for file-based inputs, the number of loops to run:
* -1 = loop forever
* 0 = don’t loop (default)
* >0 = set number of loops
videoOutput arguments:
output_URI resource URI of the output stream, for example:
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
* rtp://<remote-ip>:1234 (RTP stream)
* display://0 (OpenGL window)
–output-codec=CODEC desired codec for compressed output streams:
* h264 (default), h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–bitrate=BITRATE desired target VBR bitrate for compressed streams,
in bits per second. The default is 4000000 (4 Mbps)
–headless don’t create a default OpenGL GUI window
logging arguments:
–log-file=FILE output destination file (default is stdout)
–log-level=LEVEL message output threshold, one of the following:
* silent
* error
* warning
* success
* info
* verbose (default)
* debug
–verbose enable verbose logging (same as –log-level=verbose)
–debug enable debug logging (same as –log-level=debug)

さて、こちらは、モデルがそれぞれ特徴のあるモデルになっています。

Model	Object classes
SSD-Mobilenet-v1	91 (COCO classes)に分類してます
SSD-Mobilenet-v2	91 (COCO classes)に分類してます
SSD-Inception-v2	91 (COCO classes)に分類してます
DetectNet-COCO-Dog	dogs　犬・猫などを分類します
DetectNet-COCO-Bottle	bottles　ボトル類を分類します
DetectNet-COCO-Chair	chairs　椅子などを分類します
DetectNet-COCO-Airplane	airplanes　飛行機などを分類します
ped-100	pedestrians　歩行者を分類します
multiped-500	pedestrians, luggage　歩行者と荷物
facenet-120	faces　顔を分類します。

モデルがダウンロードしてない場合は、上で紹介した方法でダウンロードできます。

Segnet

# ./segnet --help

usage: segnet [–help] [–network NETWORK]　input_URI [output_URI]

Segment and classify a video/image stream using a semantic segmentation DNN.
See below for additional arguments that may not be shown above.

positional arguments:
input_URI resource URI of input stream (see videoSource below)
output_URI resource URI of output stream (see videoOutput below)
segNet arguments:
–network=NETWORK pre-trained model to load, one of the following:
* fcn-resnet18-cityscapes-512×256
* fcn-resnet18-cityscapes-1024×512
* fcn-resnet18-cityscapes-2048×1024
* fcn-resnet18-deepscene-576×320
* fcn-resnet18-deepscene-864×480
* fcn-resnet18-mhp-512×320
* fcn-resnet18-mhp-640×360
* fcn-resnet18-voc-320×320 (default)
* fcn-resnet18-voc-512×320
* fcn-resnet18-sun-512×400
* fcn-resnet18-sun-640×512
–model=MODEL path to custom model to load (caffemodel, uff, or onnx)
–prototxt=PROTOTXT path to custom prototxt to load (for .caffemodel only)
–labels=LABELS path to text file containing the labels for each class
–colors=COLORS path to text file containing the colors for each class
–input-blob=INPUT name of the input layer (default: ‘data’)
–output-blob=OUTPUT name of the output layer (default: ‘score_fr_21classes’)
–batch-size=BATCH maximum batch size (default is 1)
–alpha=ALPHA overlay alpha blending value, range 0-255 (default: 150)
–visualize=VISUAL visualization flags (e.g. –visualize=overlay,mask)
valid combinations are: ‘overlay’, ‘mask’
–profile enable layer profiling in TensorRT
videoSource arguments:
input_URI resource URI of the input stream, for example:
* /dev/video0 (V4L2 camera #0)
* csi://0 (MIPI CSI camera #0)
* rtp://@:1234 (RTP stream)
* rtsp://user:pass@ip:1234 (RTSP stream)
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
–input-width=WIDTH explicitly request a width of the stream (optional)
–input-height=HEIGHT explicitly request a height of the stream (optional)
–input-rate=RATE explicitly request a framerate of the stream (optional)
–input-codec=CODEC RTP requires the codec to be set, one of these:
* h264, h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–input-flip=FLIP flip method to apply to input (excludes V4L2):
* none (default)
* counterclockwise
* rotate-180
* clockwise
* horizontal
* vertical
* upper-right-diagonal
* upper-left-diagonal
–input-loop=LOOP for file-based inputs, the number of loops to run:
* -1 = loop forever
* 0 = don’t loop (default)
* >0 = set number of loops
videoOutput arguments:
output_URI resource URI of the output stream, for example:
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
* rtp://<remote-ip>:1234 (RTP stream)
* display://0 (OpenGL window)
–output-codec=CODEC desired codec for compressed output streams:
* h264 (default), h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–bitrate=BITRATE desired target VBR bitrate for compressed streams,
in bits per second. The default is 4000000 (4 Mbps)
–headless don’t create a default OpenGL GUI window
logging arguments:
–log-file=FILE output destination file (default is stdout)
–log-level=LEVEL message output threshold, one of the following:
* silent
* error
* warning
* success
* info
* verbose (default)
* debug
–verbose enable verbose logging (same as –log-level=verbose)
–debug enable debug logging (same as –log-level=debug)

こんどは、使われるシーン毎でモデルを変えた方が良さそうですね。

Dataset	Resolution	CLI Argument	Accuracy	Jetson Nano	Jetson Xavier
Cityscapes	512×256	`fcn-resnet18-cityscapes-512x256`	83.3%	48 FPS	480 FPS
Cityscapes	1024×512	`fcn-resnet18-cityscapes-1024x512`	87.3%	12 FPS	175 FPS
Cityscapes	2048×1024	`fcn-resnet18-cityscapes-2048x1024`	89.6%	3 FPS	47 FPS
DeepScene	576×320	`fcn-resnet18-deepscene-576x320`	96.4%	26 FPS	360 FPS
DeepScene	864×480	`fcn-resnet18-deepscene-864x480`	96.9%	14 FPS	190 FPS
Multi-Human	512×320	`fcn-resnet18-mhp-512x320`	86.5%	34 FPS	370 FPS
Multi-Human	640×360	`fcn-resnet18-mhp-512x320`	87.1%	23 FPS	325 FPS
Pascal VOC	320×320	`fcn-resnet18-voc-320x320`	85.9%	45 FPS	508 FPS
Pascal VOC	512×320	`fcn-resnet18-voc-512x320`	88.5%	34 FPS	375 FPS
SUN RGB-D	512×400	`fcn-resnet18-sun-512x400`	64.3%	28 FPS	340 FPS
SUN RGB-D	640×512	`fcn-resnet18-sun-640x512`	65.1%	17 FPS	224 FPS

まぁ。。また長くなったので。。このあたりで。。

こだいらあたりでCivicTech