【機械学習】Jetson Nano 2GB チュートリアルで物体認識・物体検出・画像解析【Hello AI World 】

いよいよ使ってみましょう!

前回は、jetson-inferenceというDockerを入れました

【機械学習】Jetson Nano Developer Kit のはじめの一歩 Docker導入【Hello AI World】(7)

これには、以下の3つの機械学習済みモデルが入っています

これらの機能を、まずは、あんまり考えずに、ただただ使ってみましょう!

ちなみに、ホームから、前回入れたDockerを動かして

$ cd jetson-inference
$ docker/run.sh

Dockerから

# cd build/aarch64/bin

まで、行っておきましょう。そこに、コマンドが入っています。なお、ここでは、USBカメラがついていると仮定しています。

デモを動かしてみます。

# ./imagenet /dev/video0

いろんなものをカメラにうつしてみると。。こんな感じで認識してくれます。

# ./detectnet /dev/video0

今度は、場所に何があるか?を検出認識してくれます

# ./segnet --network=fcn-resnet18-mhp  /dev/video0

今度は、部屋の中にあるものとか人を検出して、分割して表示してくれるのですが。。
これなかなかまだ難しいかもですね。

では、それぞれの説明は。。

難しいので、それぞれを見てもらえればと思いますが。。

jetson-inference

C++Python
Image RecognitionimageNetimageNet
Object DetectiondetectNetdetectNet
SegmentationsegNetsegNet

ImageNet

./imagenet --help

と打つと、使い方が出てきます。

usage: imagenet [–help] [–network=NETWORK] input_URI [output_URI]

Classify a video/image stream using an image recognition DNN.
See below for additional arguments that may not be shown above.

画像認識 DNN を使用してビデオ/画像ストリームを分類します。
上に表示されていない可能性のある追加の引数については、以下を参照してください。

positional arguments:
input_URI resource URI of input stream (see videoSource below)
output_URI resource URI of output stream (see videoOutput below)

imageNet arguments:
–network=NETWORK pre-trained model to load, one of the following:
* alexnet
* googlenet (default)
* googlenet-12
* resnet-18
* resnet-50
* resnet-101
* resnet-152
* vgg-16
* vgg-19
* inception-v4
–model=MODEL path to custom model to load (caffemodel, uff, or onnx)
–prototxt=PROTOTXT path to custom prototxt to load (for .caffemodel only)
–labels=LABELS path to text file containing the labels for each class
–input-blob=INPUT name of the input layer (default is ‘data’)
–output-blob=OUTPUT name of the output layer (default is ‘prob’)
–batch-size=BATCH maximum batch size (default is 1)
–profile enable layer profiling in TensorRT

videoSource arguments:
input_URI resource URI of the input stream, for example:
* /dev/video0 (V4L2 camera #0)
* csi://0 (MIPI CSI camera #0)
* rtp://@:1234 (RTP stream)
* rtsp://user:pass@ip:1234 (RTSP stream)
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
–input-width=WIDTH explicitly request a width of the stream (optional)
–input-height=HEIGHT explicitly request a height of the stream (optional)
–input-rate=RATE explicitly request a framerate of the stream (optional)
–input-codec=CODEC RTP requires the codec to be set, one of these:
* h264, h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–input-flip=FLIP flip method to apply to input (excludes V4L2):
* none (default)
* counterclockwise
* rotate-180
* clockwise
* horizontal
* vertical
* upper-right-diagonal
* upper-left-diagonal
–input-loop=LOOP for file-based inputs, the number of loops to run:
* -1 = loop forever
* 0 = don’t loop (default)
* >0 = set number of loops

videoOutput arguments:
output_URI resource URI of the output stream, for example:
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
* rtp://<remote-ip>:1234 (RTP stream)
* display://0 (OpenGL window)
–output-codec=CODEC desired codec for compressed output streams:
* h264 (default), h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–bitrate=BITRATE desired target VBR bitrate for compressed streams,
in bits per second. The default is 4000000 (4 Mbps)
–headless don’t create a default OpenGL GUI window

logging arguments:
–log-file=FILE output destination file (default is stdout)
–log-level=LEVEL message output threshold, one of the following:
* silent
* error
* warning
* success
* info
* verbose (default)
* debug
–verbose enable verbose logging (same as –log-level=verbose)
–debug enable debug logging (same as –log-level=debug)

ざっくり言うと、NETWORKというのが学習済みのモデルのことです。それをいろいろ指定できます。モデルによっていろいろ強み弱みがあるみたいですね。もし、モデルをインストールしてないようでしたら、以下のコマンドで追加でインストールできます。

$ cd jetson-inference/tools
$ ./download-models.sh

現在学習されているモノは、この1000個のものらしいです。

https://github.com/dusty-nv/jetson-inference/blob/master/data/networks/ilsvrc12_synset_words.txt

DetectNet

こちらも使い方は、helpを見てみると

# ./detectnet --help

 

usage: detectnet [–help] [–network=NETWORK] [–threshold=THRESHOLD] input_URI [output_URI]

 

Locate objects in a video/image stream using an object detection DNN.
See below for additional arguments that may not be shown above.

物体検出 DNN を使用して、ビデオ/画像ストリーム内のオブジェクトを見つけます。
上に表示されていない可能性のある追加の引数については、以下を参照してください。

 

positional arguments:
input_URI resource URI of input stream (see videoSource below)
output_URI resource URI of output stream (see videoOutput below)

detectNet arguments:
–network=NETWORK pre-trained model to load, one of the following:
* ssd-mobilenet-v1
* ssd-mobilenet-v2 (default)
* ssd-inception-v2
* pednet
* multiped
* facenet
* coco-airplane
* coco-bottle
* coco-chair
* coco-dog
–model=MODEL path to custom model to load (caffemodel, uff, or onnx)
–prototxt=PROTOTXT path to custom prototxt to load (for .caffemodel only)
–labels=LABELS path to text file containing the labels for each class
–input-blob=INPUT name of the input layer (default is ‘data’)
–output-cvg=COVERAGE name of the coverge output layer (default is ‘coverage’)
–output-bbox=BOXES name of the bounding output layer (default is ‘bboxes’)
–mean-pixel=PIXEL mean pixel value to subtract from input (default is 0.0)
–batch-size=BATCH maximum batch size (default is 1)
–threshold=THRESHOLD minimum threshold for detection (default is 0.5)
–alpha=ALPHA overlay alpha blending value, range 0-255 (default: 120)
–overlay=OVERLAY detection overlay flags (e.g. –overlay=box,labels,conf)
valid combinations are: ‘box’, ‘labels’, ‘conf’, ‘none’
–profile enable layer profiling in TensorRT

videoSource arguments:
input_URI resource URI of the input stream, for example:
* /dev/video0 (V4L2 camera #0)
* csi://0 (MIPI CSI camera #0)
* rtp://@:1234 (RTP stream)
* rtsp://user:pass@ip:1234 (RTSP stream)
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
–input-width=WIDTH explicitly request a width of the stream (optional)
–input-height=HEIGHT explicitly request a height of the stream (optional)
–input-rate=RATE explicitly request a framerate of the stream (optional)
–input-codec=CODEC RTP requires the codec to be set, one of these:
* h264, h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–input-flip=FLIP flip method to apply to input (excludes V4L2):
* none (default)
* counterclockwise
* rotate-180
* clockwise
* horizontal
* vertical
* upper-right-diagonal
* upper-left-diagonal
–input-loop=LOOP for file-based inputs, the number of loops to run:
* -1 = loop forever
* 0 = don’t loop (default)
* >0 = set number of loops

videoOutput arguments:
output_URI resource URI of the output stream, for example:
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
* rtp://<remote-ip>:1234 (RTP stream)
* display://0 (OpenGL window)
–output-codec=CODEC desired codec for compressed output streams:
* h264 (default), h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–bitrate=BITRATE desired target VBR bitrate for compressed streams,
in bits per second. The default is 4000000 (4 Mbps)
–headless don’t create a default OpenGL GUI window

logging arguments:
–log-file=FILE output destination file (default is stdout)
–log-level=LEVEL message output threshold, one of the following:
* silent
* error
* warning
* success
* info
* verbose (default)
* debug
–verbose enable verbose logging (same as –log-level=verbose)
–debug enable debug logging (same as –log-level=debug)

 

さて、こちらは、モデルがそれぞれ特徴のあるモデルになっています。

ModelObject classes
SSD-Mobilenet-v191 (COCO classes)に分類してます
SSD-Mobilenet-v291 (COCO classes)に分類してます
SSD-Inception-v291 (COCO classes)に分類してます
DetectNet-COCO-Dogdogs 犬・猫などを分類します
DetectNet-COCO-Bottlebottles ボトル類を分類します
DetectNet-COCO-Chairchairs 椅子などを分類します
DetectNet-COCO-Airplaneairplanes 飛行機などを分類します
ped-100pedestrians 歩行者を分類します
multiped-500pedestrians, luggage 歩行者と荷物
facenet-120faces 顔を分類します。

モデルがダウンロードしてない場合は、上で紹介した方法でダウンロードできます。

Segnet

# ./segnet --help

usage: segnet [–help] [–network NETWORK] input_URI [output_URI]

Segment and classify a video/image stream using a semantic segmentation DNN.
See below for additional arguments that may not be shown above.

positional arguments:
input_URI resource URI of input stream (see videoSource below)
output_URI resource URI of output stream (see videoOutput below)

segNet arguments:
–network=NETWORK pre-trained model to load, one of the following:
* fcn-resnet18-cityscapes-512×256
* fcn-resnet18-cityscapes-1024×512
* fcn-resnet18-cityscapes-2048×1024
* fcn-resnet18-deepscene-576×320
* fcn-resnet18-deepscene-864×480
* fcn-resnet18-mhp-512×320
* fcn-resnet18-mhp-640×360
* fcn-resnet18-voc-320×320 (default)
* fcn-resnet18-voc-512×320
* fcn-resnet18-sun-512×400
* fcn-resnet18-sun-640×512
–model=MODEL path to custom model to load (caffemodel, uff, or onnx)
–prototxt=PROTOTXT path to custom prototxt to load (for .caffemodel only)
–labels=LABELS path to text file containing the labels for each class
–colors=COLORS path to text file containing the colors for each class
–input-blob=INPUT name of the input layer (default: ‘data’)
–output-blob=OUTPUT name of the output layer (default: ‘score_fr_21classes’)
–batch-size=BATCH maximum batch size (default is 1)
–alpha=ALPHA overlay alpha blending value, range 0-255 (default: 150)
–visualize=VISUAL visualization flags (e.g. –visualize=overlay,mask)
valid combinations are: ‘overlay’, ‘mask’
–profile enable layer profiling in TensorRT

videoSource arguments:
input_URI resource URI of the input stream, for example:
* /dev/video0 (V4L2 camera #0)
* csi://0 (MIPI CSI camera #0)
* rtp://@:1234 (RTP stream)
* rtsp://user:pass@ip:1234 (RTSP stream)
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
–input-width=WIDTH explicitly request a width of the stream (optional)
–input-height=HEIGHT explicitly request a height of the stream (optional)
–input-rate=RATE explicitly request a framerate of the stream (optional)
–input-codec=CODEC RTP requires the codec to be set, one of these:
* h264, h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–input-flip=FLIP flip method to apply to input (excludes V4L2):
* none (default)
* counterclockwise
* rotate-180
* clockwise
* horizontal
* vertical
* upper-right-diagonal
* upper-left-diagonal
–input-loop=LOOP for file-based inputs, the number of loops to run:
* -1 = loop forever
* 0 = don’t loop (default)
* >0 = set number of loops

videoOutput arguments:
output_URI resource URI of the output stream, for example:
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
* rtp://<remote-ip>:1234 (RTP stream)
* display://0 (OpenGL window)
–output-codec=CODEC desired codec for compressed output streams:
* h264 (default), h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–bitrate=BITRATE desired target VBR bitrate for compressed streams,
in bits per second. The default is 4000000 (4 Mbps)
–headless don’t create a default OpenGL GUI window

logging arguments:
–log-file=FILE output destination file (default is stdout)
–log-level=LEVEL message output threshold, one of the following:
* silent
* error
* warning
* success
* info
* verbose (default)
* debug
–verbose enable verbose logging (same as –log-level=verbose)
–debug enable debug logging (same as –log-level=debug)

こんどは、使われるシーン毎でモデルを変えた方が良さそうですね。

DatasetResolutionCLI ArgumentAccuracyJetson NanoJetson Xavier
Cityscapes512×256fcn-resnet18-cityscapes-512x25683.3%48 FPS480 FPS
Cityscapes1024×512fcn-resnet18-cityscapes-1024x51287.3%12 FPS175 FPS
Cityscapes2048×1024fcn-resnet18-cityscapes-2048x102489.6%3 FPS47 FPS
DeepScene576×320fcn-resnet18-deepscene-576x32096.4%26 FPS360 FPS
DeepScene864×480fcn-resnet18-deepscene-864x48096.9%14 FPS190 FPS
Multi-Human512×320fcn-resnet18-mhp-512x32086.5%34 FPS370 FPS
Multi-Human640×360fcn-resnet18-mhp-512x32087.1%23 FPS325 FPS
Pascal VOC320×320fcn-resnet18-voc-320x32085.9%45 FPS508 FPS
Pascal VOC512×320fcn-resnet18-voc-512x32088.5%34 FPS375 FPS
SUN RGB-D512×400fcn-resnet18-sun-512x40064.3%28 FPS340 FPS
SUN RGB-D640×512fcn-resnet18-sun-640x51265.1%17 FPS224 FPS

まぁ。。また長くなったので。。このあたりで。。

 


投稿日

カテゴリー:

,

投稿者:

コメント

“【機械学習】Jetson Nano 2GB チュートリアルで物体認識・物体検出・画像解析【Hello AI World 】” への1件のコメント

コメントを残す

メールアドレスが公開されることはありません。 が付いている欄は必須項目です

This site uses Akismet to reduce spam. Learn how your comment data is processed.

PHP Code Snippets Powered By : XYZScripts.com