いよいよ使ってみましょう!
前回は、jetson-inferenceというDockerを入れました
これには、以下の3つの機械学習済みモデルが入っています
- Classifying Images with ImageNet(物体認識:何が画像に映っているかを判別する)
- Locating Objects with DetectNet(物体検出:画像のどこに何が映っているかを検出する)
- Semantic Segmentation with SegNet(画像解析:画像を物体検出して、画像を分割する)
これらの機能を、まずは、あんまり考えずに、ただただ使ってみましょう!
ちなみに、ホームから、前回入れたDockerを動かして
$ cd jetson-inference
$ docker/run.sh
Dockerから
# cd build/aarch64/bin
まで、行っておきましょう。そこに、コマンドが入っています。なお、ここでは、USBカメラがついていると仮定しています。
デモを動かしてみます。
# ./imagenet /dev/video0
いろんなものをカメラにうつしてみると。。こんな感じで認識してくれます。
# ./detectnet /dev/video0
今度は、場所に何があるか?を検出認識してくれます
# ./segnet --network=fcn-resnet18-mhp /dev/video0
今度は、部屋の中にあるものとか人を検出して、分割して表示してくれるのですが。。
これなかなかまだ難しいかもですね。
では、それぞれの説明は。。
難しいので、それぞれを見てもらえればと思いますが。。
jetson-inference
C++ | Python | |
---|---|---|
Image Recognition | imageNet |
imageNet |
Object Detection | detectNet |
detectNet |
Segmentation | segNet |
segNet |
ImageNet
./imagenet --help
と打つと、使い方が出てきます。
usage: imagenet [–help] [–network=NETWORK] input_URI [output_URI]
Classify a video/image stream using an image recognition DNN.
See below for additional arguments that may not be shown above.
画像認識 DNN を使用してビデオ/画像ストリームを分類します。
上に表示されていない可能性のある追加の引数については、以下を参照してください。
positional arguments:
input_URI resource URI of input stream (see videoSource below)
output_URI resource URI of output stream (see videoOutput below)imageNet arguments:
–network=NETWORK pre-trained model to load, one of the following:
* alexnet
* googlenet (default)
* googlenet-12
* resnet-18
* resnet-50
* resnet-101
* resnet-152
* vgg-16
* vgg-19
* inception-v4
–model=MODEL path to custom model to load (caffemodel, uff, or onnx)
–prototxt=PROTOTXT path to custom prototxt to load (for .caffemodel only)
–labels=LABELS path to text file containing the labels for each class
–input-blob=INPUT name of the input layer (default is ‘data’)
–output-blob=OUTPUT name of the output layer (default is ‘prob’)
–batch-size=BATCH maximum batch size (default is 1)
–profile enable layer profiling in TensorRTvideoSource arguments:
input_URI resource URI of the input stream, for example:
* /dev/video0 (V4L2 camera #0)
* csi://0 (MIPI CSI camera #0)
* rtp://@:1234 (RTP stream)
* rtsp://user:pass@ip:1234 (RTSP stream)
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
–input-width=WIDTH explicitly request a width of the stream (optional)
–input-height=HEIGHT explicitly request a height of the stream (optional)
–input-rate=RATE explicitly request a framerate of the stream (optional)
–input-codec=CODEC RTP requires the codec to be set, one of these:
* h264, h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–input-flip=FLIP flip method to apply to input (excludes V4L2):
* none (default)
* counterclockwise
* rotate-180
* clockwise
* horizontal
* vertical
* upper-right-diagonal
* upper-left-diagonal
–input-loop=LOOP for file-based inputs, the number of loops to run:
* -1 = loop forever
* 0 = don’t loop (default)
* >0 = set number of loopsvideoOutput arguments:
output_URI resource URI of the output stream, for example:
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
* rtp://<remote-ip>:1234 (RTP stream)
* display://0 (OpenGL window)
–output-codec=CODEC desired codec for compressed output streams:
* h264 (default), h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–bitrate=BITRATE desired target VBR bitrate for compressed streams,
in bits per second. The default is 4000000 (4 Mbps)
–headless don’t create a default OpenGL GUI windowlogging arguments:
–log-file=FILE output destination file (default is stdout)
–log-level=LEVEL message output threshold, one of the following:
* silent
* error
* warning
* success
* info
* verbose (default)
* debug
–verbose enable verbose logging (same as –log-level=verbose)
–debug enable debug logging (same as –log-level=debug)
ざっくり言うと、NETWORKというのが学習済みのモデルのことです。それをいろいろ指定できます。モデルによっていろいろ強み弱みがあるみたいですね。もし、モデルをインストールしてないようでしたら、以下のコマンドで追加でインストールできます。
$ cd jetson-inference/tools
$ ./download-models.sh
現在学習されているモノは、この1000個のものらしいです。
DetectNet
こちらも使い方は、helpを見てみると
# ./detectnet --help
usage: detectnet [–help] [–network=NETWORK] [–threshold=THRESHOLD] input_URI [output_URI]
Locate objects in a video/image stream using an object detection DNN.
See below for additional arguments that may not be shown above.
物体検出 DNN を使用して、ビデオ/画像ストリーム内のオブジェクトを見つけます。
上に表示されていない可能性のある追加の引数については、以下を参照してください。
positional arguments:
input_URI resource URI of input stream (see videoSource below)
output_URI resource URI of output stream (see videoOutput below)detectNet arguments:
–network=NETWORK pre-trained model to load, one of the following:
* ssd-mobilenet-v1
* ssd-mobilenet-v2 (default)
* ssd-inception-v2
* pednet
* multiped
* facenet
* coco-airplane
* coco-bottle
* coco-chair
* coco-dog
–model=MODEL path to custom model to load (caffemodel, uff, or onnx)
–prototxt=PROTOTXT path to custom prototxt to load (for .caffemodel only)
–labels=LABELS path to text file containing the labels for each class
–input-blob=INPUT name of the input layer (default is ‘data’)
–output-cvg=COVERAGE name of the coverge output layer (default is ‘coverage’)
–output-bbox=BOXES name of the bounding output layer (default is ‘bboxes’)
–mean-pixel=PIXEL mean pixel value to subtract from input (default is 0.0)
–batch-size=BATCH maximum batch size (default is 1)
–threshold=THRESHOLD minimum threshold for detection (default is 0.5)
–alpha=ALPHA overlay alpha blending value, range 0-255 (default: 120)
–overlay=OVERLAY detection overlay flags (e.g. –overlay=box,labels,conf)
valid combinations are: ‘box’, ‘labels’, ‘conf’, ‘none’
–profile enable layer profiling in TensorRTvideoSource arguments:
input_URI resource URI of the input stream, for example:
* /dev/video0 (V4L2 camera #0)
* csi://0 (MIPI CSI camera #0)
* rtp://@:1234 (RTP stream)
* rtsp://user:pass@ip:1234 (RTSP stream)
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
–input-width=WIDTH explicitly request a width of the stream (optional)
–input-height=HEIGHT explicitly request a height of the stream (optional)
–input-rate=RATE explicitly request a framerate of the stream (optional)
–input-codec=CODEC RTP requires the codec to be set, one of these:
* h264, h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–input-flip=FLIP flip method to apply to input (excludes V4L2):
* none (default)
* counterclockwise
* rotate-180
* clockwise
* horizontal
* vertical
* upper-right-diagonal
* upper-left-diagonal
–input-loop=LOOP for file-based inputs, the number of loops to run:
* -1 = loop forever
* 0 = don’t loop (default)
* >0 = set number of loopsvideoOutput arguments:
output_URI resource URI of the output stream, for example:
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
* rtp://<remote-ip>:1234 (RTP stream)
* display://0 (OpenGL window)
–output-codec=CODEC desired codec for compressed output streams:
* h264 (default), h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–bitrate=BITRATE desired target VBR bitrate for compressed streams,
in bits per second. The default is 4000000 (4 Mbps)
–headless don’t create a default OpenGL GUI windowlogging arguments:
–log-file=FILE output destination file (default is stdout)
–log-level=LEVEL message output threshold, one of the following:
* silent
* error
* warning
* success
* info
* verbose (default)
* debug
–verbose enable verbose logging (same as –log-level=verbose)
–debug enable debug logging (same as –log-level=debug)
さて、こちらは、モデルがそれぞれ特徴のあるモデルになっています。
Model | Object classes |
---|---|
SSD-Mobilenet-v1 | 91 (COCO classes)に分類してます |
SSD-Mobilenet-v2 | 91 (COCO classes)に分類してます |
SSD-Inception-v2 | 91 (COCO classes)に分類してます |
DetectNet-COCO-Dog | dogs 犬・猫などを分類します |
DetectNet-COCO-Bottle | bottles ボトル類を分類します |
DetectNet-COCO-Chair | chairs 椅子などを分類します |
DetectNet-COCO-Airplane | airplanes 飛行機などを分類します |
ped-100 | pedestrians 歩行者を分類します |
multiped-500 | pedestrians, luggage 歩行者と荷物 |
facenet-120 | faces 顔を分類します。 |
モデルがダウンロードしてない場合は、上で紹介した方法でダウンロードできます。
Segnet
# ./segnet --help
usage: segnet [–help] [–network NETWORK] input_URI [output_URI]
Segment and classify a video/image stream using a semantic segmentation DNN.
See below for additional arguments that may not be shown above.
positional arguments:
input_URI resource URI of input stream (see videoSource below)
output_URI resource URI of output stream (see videoOutput below)segNet arguments:
–network=NETWORK pre-trained model to load, one of the following:
* fcn-resnet18-cityscapes-512×256
* fcn-resnet18-cityscapes-1024×512
* fcn-resnet18-cityscapes-2048×1024
* fcn-resnet18-deepscene-576×320
* fcn-resnet18-deepscene-864×480
* fcn-resnet18-mhp-512×320
* fcn-resnet18-mhp-640×360
* fcn-resnet18-voc-320×320 (default)
* fcn-resnet18-voc-512×320
* fcn-resnet18-sun-512×400
* fcn-resnet18-sun-640×512
–model=MODEL path to custom model to load (caffemodel, uff, or onnx)
–prototxt=PROTOTXT path to custom prototxt to load (for .caffemodel only)
–labels=LABELS path to text file containing the labels for each class
–colors=COLORS path to text file containing the colors for each class
–input-blob=INPUT name of the input layer (default: ‘data’)
–output-blob=OUTPUT name of the output layer (default: ‘score_fr_21classes’)
–batch-size=BATCH maximum batch size (default is 1)
–alpha=ALPHA overlay alpha blending value, range 0-255 (default: 150)
–visualize=VISUAL visualization flags (e.g. –visualize=overlay,mask)
valid combinations are: ‘overlay’, ‘mask’
–profile enable layer profiling in TensorRTvideoSource arguments:
input_URI resource URI of the input stream, for example:
* /dev/video0 (V4L2 camera #0)
* csi://0 (MIPI CSI camera #0)
* rtp://@:1234 (RTP stream)
* rtsp://user:pass@ip:1234 (RTSP stream)
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
–input-width=WIDTH explicitly request a width of the stream (optional)
–input-height=HEIGHT explicitly request a height of the stream (optional)
–input-rate=RATE explicitly request a framerate of the stream (optional)
–input-codec=CODEC RTP requires the codec to be set, one of these:
* h264, h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–input-flip=FLIP flip method to apply to input (excludes V4L2):
* none (default)
* counterclockwise
* rotate-180
* clockwise
* horizontal
* vertical
* upper-right-diagonal
* upper-left-diagonal
–input-loop=LOOP for file-based inputs, the number of loops to run:
* -1 = loop forever
* 0 = don’t loop (default)
* >0 = set number of loopsvideoOutput arguments:
output_URI resource URI of the output stream, for example:
* file://my_image.jpg (image file)
* file://my_video.mp4 (video file)
* file://my_directory/ (directory of images)
* rtp://<remote-ip>:1234 (RTP stream)
* display://0 (OpenGL window)
–output-codec=CODEC desired codec for compressed output streams:
* h264 (default), h265
* vp8, vp9
* mpeg2, mpeg4
* mjpeg
–bitrate=BITRATE desired target VBR bitrate for compressed streams,
in bits per second. The default is 4000000 (4 Mbps)
–headless don’t create a default OpenGL GUI windowlogging arguments:
–log-file=FILE output destination file (default is stdout)
–log-level=LEVEL message output threshold, one of the following:
* silent
* error
* warning
* success
* info
* verbose (default)
* debug
–verbose enable verbose logging (same as –log-level=verbose)
–debug enable debug logging (same as –log-level=debug)
こんどは、使われるシーン毎でモデルを変えた方が良さそうですね。
Dataset | Resolution | CLI Argument | Accuracy | Jetson Nano | Jetson Xavier |
---|---|---|---|---|---|
Cityscapes | 512×256 | fcn-resnet18-cityscapes-512x256 |
83.3% | 48 FPS | 480 FPS |
Cityscapes | 1024×512 | fcn-resnet18-cityscapes-1024x512 |
87.3% | 12 FPS | 175 FPS |
Cityscapes | 2048×1024 | fcn-resnet18-cityscapes-2048x1024 |
89.6% | 3 FPS | 47 FPS |
DeepScene | 576×320 | fcn-resnet18-deepscene-576x320 |
96.4% | 26 FPS | 360 FPS |
DeepScene | 864×480 | fcn-resnet18-deepscene-864x480 |
96.9% | 14 FPS | 190 FPS |
Multi-Human | 512×320 | fcn-resnet18-mhp-512x320 |
86.5% | 34 FPS | 370 FPS |
Multi-Human | 640×360 | fcn-resnet18-mhp-512x320 |
87.1% | 23 FPS | 325 FPS |
Pascal VOC | 320×320 | fcn-resnet18-voc-320x320 |
85.9% | 45 FPS | 508 FPS |
Pascal VOC | 512×320 | fcn-resnet18-voc-512x320 |
88.5% | 34 FPS | 375 FPS |
SUN RGB-D | 512×400 | fcn-resnet18-sun-512x400 |
64.3% | 28 FPS | 340 FPS |
SUN RGB-D | 640×512 | fcn-resnet18-sun-640x512 |
65.1% | 17 FPS | 224 FPS |
まぁ。。また長くなったので。。このあたりで。。