Metal vs GLSL CoreImage performance
在 WWDC 会话 510 中,Apple 工程师提出了对在 Metal 中编码 CIKernel 的支持,并声称它应该工作得更快。
我已经一起制作了一个测试项目,它在 Metal 和 glsl 中都实现了运动模糊(代码类似于 510 会话中的代码)。
有时 metal kernel 更快,有时 glsl kernel 更快,但我绝对看不到 metal kernel 执行一致性并且整体上明显更好。应该是这样吗,是不是漏了什么?
注意:该项目不会在模拟器上运行,您需要 A8 供电设备。
看起来其中一些与硬件有关。这是我的 iPad Pro 10.5 英寸结果:
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
glsl 1 took 229.572057723999ms
glsl 2 took 49.1310358047485ms glsl 3 took 46.7269420623779ms glsl 4 took 53.08997631073ms glsl 5 took 48.9979982376099ms glsl 6 took 49.0390062332153ms glsl 7 took 52.5139570236206ms glsl 8 took 46.4930534362793ms glsl 9 took 39.6310091018677ms glsl 10 took 45.9860563278198ms metal 1 took 77.7549743652344ms metal 2 took 44.1800355911255ms metal 3 took 46.0859537124634ms metal 4 took 45.3709363937378ms metal 5 took 43.5279607772827ms metal 6 took 38.9848947525024ms metal 7 took 37.1809005737305ms metal 8 took 37.8340482711792ms metal 9 took 37.6850366592407ms metal 10 took 37.5720262527466ms |
还有我的 iPhoneSE 结果:
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
glsl 1 took 394.147992134094ms
glsl 2 took 94.601035118103ms glsl 3 took 81.4379453659058ms glsl 4 took 76.9931077957153ms glsl 5 took 77.0320892333984ms glsl 6 took 75.8579969406128ms glsl 7 took 76.9950151443481ms glsl 8 took 77.8199434280396ms glsl 9 took 79.7009468078613ms glsl 10 took 79.4800519943237ms metal 1 took 146.992921829224ms metal 2 took 88.6669158935547ms metal 3 took 81.8150043487549ms metal 4 took 78.1329870223999ms metal 5 took 79.5910358428955ms metal 6 took 93.6589241027832ms metal 7 took 94.8940515518188ms metal 8 took 89.0530347824097ms metal 9 took 84.3830108642578ms metal 10 took 77.949047088623ms |
一个问题和一个想法:
- 什么设备产生了你的结果?
- 我会很好奇,如果不同类型的过滤器,比如颜色内核会表现不同。
- 再次嗨@dfd :) 我的结果来自 iPhone 6 Plus。奇怪的是 SE 的计算速度比 6 还要快。您的结果通常与我得到的结果相似 – 在 glsl 上使用金属没有实质性改进。
- 我选择模糊,因为它或多或少的计算量很大,我认为这会让技术之间的差异更加清晰。
- 顺便说一句,感谢您在另一个线程中推荐”Core Image for Swift”,我从中学到了很多东西。
- 我不认为谈话声称基于 MSL 的内核会比 CIKL 内核更快(至少我在谈话记录中看不到这种说法)。 CIKL 在可能的情况下被编译为 MSL,因此结果应该不会太不同。基于 MSL 的内核的优点是它是预编译的,而不是在运行时编译的。
- 也许您想尝试将 MTLComputePipelineState 与内核函数一起使用?
来源:https://www.codenong.com/48683922/