-
oneDNN implemented most of the hard work for CNN inference, and major DL frameworks are all based on oneDNN, including OpenVINO, but OpenVINO has better performance than other DL framework, why? Tickets: 67678 |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 5 replies
-
Based Tickets 67678, we can answer this by comparing inference performance of some famous networks between OpenVINO and PaddlePaddle. Compile Paddle:https://www.paddlepaddle.org.cn/documentation/docs/zh/1.5/beginners_guide/install/compile/compile_Ubuntu.html
编译飞桨过程中可能会打开很多文件,Ubuntu 18.04 默认设置最多同时打开的文件数是1024(参见 ulimit -a),需要更改这个设定值。
Inference using PaddleThere is a repo for inference using Paddle git clone https://github.com/PaddlePaddle/Paddle-Inference-Demo.git
cd Paddle-Inference-Demo/c++/lib
# create paddle_inference softlink
ln -sf ~/paddle-venv/Paddle/_build/paddle_inference_install_dir paddle_inference
cd resnet50
# change compile.sh: `WITH_GPU=OFF`
./compile.sh
# run.sh will download/extract/run resnet50 model (folder resnet50)
chmod +x ./run.sh
./run.sh
# we can re-trigger the test manually with different config
./build/resnet50_test --model_file resnet50/inference.pdmodel --params_file resnet50/inference.pdiparams --repeats 10 --batch_size 10 But only one core(and one thread) is used in the whole process. on average one image take 84 ms to infer on paddle resnet50. Compile and install OpenVINO:
# use virtual env to install mo's python dependency
virtualenv ov
. ./ov/bin/activate
cd ~/openvino/_build/install/tools/model_optimizer/install_prerequisites/
./install_prerequisites_caffe.sh
./install_prerequisites_onnx.sh
./install_prerequisites_tf.sh Inference using OpenVINOPrepare: https://github.com/openvinotoolkit/open_model_zoo/blob/master/tools/model_tools/README.md git clone [email protected]:openvinotoolkit/open_model_zoo.git
cd open_model_zoo/tools/downloader
pip install -r ./requirements-pytorch.in
pip install -r ./requirements-caffe2.in
pip install -r ./requirements-tensorflow.in Download and convert model cd open_model_zoo/tools/downloader
./downloader.py --name resnet-50-pytorch
./converter.py --name resnet-50-pytorch --precisions FP32 Inference with benchmark_app: $ ./benchmark_app -m /home/hddl/open_model_zoo/tools/downloader/public/resnet-50-pytorch/FP32/resnet-50-pytorch.xml -d CPU
Count: 2804 iterations
Duration: 60108.46 ms
Latency: 81.84 ms
Throughput: 46.65 FPS since benchmark_app use all 4 cores(8 hyper threads), the throughput is roughly (4*1000/81.84). $ ./benchmark_app -m /home/hddl/open_model_zoo/tools/downloader/public/resnet-50-pytorch/FP32/resnet-50-pytorch.xml -d CPU -nstreams=1 -nthreads=1
Count: 741 iterations
Duration: 60143.24 ms
Latency: 80.29 ms <=================Paddle takes 84ms
Throughput: 12.32 FPS Luocheng's method:
to show performance: utils.py:82+ config.enable_profile()
|
Beta Was this translation helpful? Give feedback.
-
Paddle framework is heavy for inferencesince Paddle reused same framework for both train & infer, it treats inference as same as train-forwarding pass but this limited the optimization can be done at framework level, OpenVINO, on the other hand, has no such burden, it's designed especially for inference so framework cost is much lower than Paddle. This framework level cost may be not a big deal for training since major computation dominate the whole latency, but it becomes a considerable cost when inference is done on light-weighted CNN like MobileNetV2, because in these CNNs, each convolution layer is executed extremely fast since they are depth-wise or 1x1, but framework cost keeps unchanged for all kinds of convolution. std::string is bad choice as key of mapPaddle framework uses std::string as key of map structure, even when caller actually passes a string literal as key. this requires run-time string construction and hashing operation. Performance analysis
$ sudo -E perf stat -B -e cache-references,cache-misses,cycles,instructions sleep 1
Performance counter stats for 'sleep 1':
65,532 cache-references
30,131 cache-misses # 45.979 % of all cache refs
1,329,247 cycles
1,050,063 instructions # 0.79 insn per cycle
1.001454689 seconds time elapsed
0.001199000 seconds user
0.000000000 seconds sys
$ sudo -E perf stat -B -e cache-references,cache-misses,cycles,instructions sleep 10
Performance counter stats for 'sleep 10':
77,823 cache-references
35,103 cache-misses # 45.106 % of all cache refs
1,515,698 cycles
1,050,428 instructions # 0.69 insn per cycle
10.001294553 seconds time elapsed
0.001234000 seconds user
0.000000000 seconds sys |
Beta Was this translation helpful? Give feedback.
Paddle framework is heavy for inference
since Paddle reused same framework for both train & infer, it treats inference as same as train-forwarding pass but this limited the optimization can be done at framework level, OpenVINO, on the other hand, has no such burden, it's designed especially for inference so framework cost is much lower than Paddle.
This framework level cost may be not a big deal for training since major computation dominate the whole latency, but it becomes a considerable cost when inference is done on light-weighted CNN like MobileNetV2, because in these CNNs, each convolution layer is executed extremely fast since they are depth-wise or 1x1, but framework cost keeps unc…