上周五龙芯发布消息,完成NCNN龙架构适配,使用向量优化实现了大部分算子。
今天在3a5000上测试了一下,结果如下:
loop_count = 10
num_threads = 4
powersave = 0
gpu_device = 0
cooling_down = 1
squeezenet min = 11.95 max = 24.30 avg = 14.82
squeezenet_int8 min = 15.00 max = 15.62 avg = 15.53
mobilenet min = 21.27 max = 22.50 avg = 21.58
mobilenet_int8 min = 25.68 max = 32.68 avg = 26.80
mobilenet_v2 min = 12.42 max = 19.07 avg = 13.47
mobilenet_v3 min = 11.16 max = 11.84 avg = 11.58
shufflenet min = 7.46 max = 18.08 avg = 8.75
shufflenet_v2 min = 7.24 max = 20.32 avg = 9.18
mnasnet min = 12.88 max = 21.25 avg = 14.14
proxylessnasnet min = 15.88 max = 27.34 avg = 18.73
efficientnet_b0 min = 24.02 max = 24.84 avg = 24.46
efficientnetv2_b0 min = 26.87 max = 47.11 avg = 34.14
regnety_400m min = 23.27 max = 24.00 avg = 23.62
blazeface min = 2.65 max = 3.24 avg = 2.77
googlenet min = 45.21 max = 69.04 avg = 50.09
googlenet_int8 min = 54.79 max = 70.45 avg = 56.77
resnet18 min = 37.77 max = 54.84 avg = 40.74
resnet18_int8 min = 43.62 max = 62.59 avg = 50.74
alexnet min = 40.64 max = 48.64 avg = 42.05
vgg16 min = 209.65 max = 217.28 avg = 212.94
vgg16_int8 min = 237.81 max = 253.14 avg = 241.95
resnet50 min = 105.24 max = 122.17 avg = 108.04
resnet50_int8 min = 116.89 max = 128.39 avg = 119.44
squeezenet_ssd min = 36.93 max = 48.87 avg = 39.78
squeezenet_ssd_int8 min = 36.52 max = 50.78 avg = 39.46
mobilenet_ssd min = 44.12 max = 56.93 avg = 46.52
mobilenet_ssd_int8 min = 53.10 max = 65.37 avg = 55.21
mobilenet_yolo min = 128.45 max = 144.19 avg = 131.26
mobilenetv2_yolov3 min = 51.20 max = 65.60 avg = 57.39
yolov4-tiny min = 71.50 max = 83.50 avg = 73.95
nanodet_m min = 16.89 max = 18.13 avg = 17.13
yolo-fastest-1.1 min = 7.14 max = 7.80 avg = 7.40
yolo-fastestv2 min = 6.99 max = 18.71 avg = 9.33
vision_transformer min = 2942.76 max = 2966.90 avg = 2954.92
FastestDet min = 7.75 max = 8.42 avg = 8.09
其他平台测试结果见ncnn的github页面。
今天在3a5000上测试了一下,结果如下:
loop_count = 10
num_threads = 4
powersave = 0
gpu_device = 0
cooling_down = 1
squeezenet min = 11.95 max = 24.30 avg = 14.82
squeezenet_int8 min = 15.00 max = 15.62 avg = 15.53
mobilenet min = 21.27 max = 22.50 avg = 21.58
mobilenet_int8 min = 25.68 max = 32.68 avg = 26.80
mobilenet_v2 min = 12.42 max = 19.07 avg = 13.47
mobilenet_v3 min = 11.16 max = 11.84 avg = 11.58
shufflenet min = 7.46 max = 18.08 avg = 8.75
shufflenet_v2 min = 7.24 max = 20.32 avg = 9.18
mnasnet min = 12.88 max = 21.25 avg = 14.14
proxylessnasnet min = 15.88 max = 27.34 avg = 18.73
efficientnet_b0 min = 24.02 max = 24.84 avg = 24.46
efficientnetv2_b0 min = 26.87 max = 47.11 avg = 34.14
regnety_400m min = 23.27 max = 24.00 avg = 23.62
blazeface min = 2.65 max = 3.24 avg = 2.77
googlenet min = 45.21 max = 69.04 avg = 50.09
googlenet_int8 min = 54.79 max = 70.45 avg = 56.77
resnet18 min = 37.77 max = 54.84 avg = 40.74
resnet18_int8 min = 43.62 max = 62.59 avg = 50.74
alexnet min = 40.64 max = 48.64 avg = 42.05
vgg16 min = 209.65 max = 217.28 avg = 212.94
vgg16_int8 min = 237.81 max = 253.14 avg = 241.95
resnet50 min = 105.24 max = 122.17 avg = 108.04
resnet50_int8 min = 116.89 max = 128.39 avg = 119.44
squeezenet_ssd min = 36.93 max = 48.87 avg = 39.78
squeezenet_ssd_int8 min = 36.52 max = 50.78 avg = 39.46
mobilenet_ssd min = 44.12 max = 56.93 avg = 46.52
mobilenet_ssd_int8 min = 53.10 max = 65.37 avg = 55.21
mobilenet_yolo min = 128.45 max = 144.19 avg = 131.26
mobilenetv2_yolov3 min = 51.20 max = 65.60 avg = 57.39
yolov4-tiny min = 71.50 max = 83.50 avg = 73.95
nanodet_m min = 16.89 max = 18.13 avg = 17.13
yolo-fastest-1.1 min = 7.14 max = 7.80 avg = 7.40
yolo-fastestv2 min = 6.99 max = 18.71 avg = 9.33
vision_transformer min = 2942.76 max = 2966.90 avg = 2954.92
FastestDet min = 7.75 max = 8.42 avg = 8.09
其他平台测试结果见ncnn的github页面。