The semiconductor industry has seen its fair share of debates. The latest one revolves around the internal architecture of Google’s Tensor Processing Units (TPUs) for machine learning; it even sparked accusations of click bait on social media and generated plenty of speculation among analysts.
The discussion reminded me of one of my personal favorite (non-)topics: the relevance of FP64 for mobile compute. Let’s rewind to four years ago and analyze what happened: the year is 2012 and a number of mobile silicon vendors start to add FP64 support in GPU ALUs. Industry commentators are quick to label the move as the best thing in embedded graphics since the arrival of OpenGL ES.
Around the same time, 64-bit computing becomes a hot topic in mobile. Marketing folks take the opportunity to employ some (deliberately) confusing terminology and start linking 64-bit CPUs and FP64-capable GPUs. Makes sense, right?
The cold, hard reality is that 64-bit CPUs had little (i.e. absolutely nothing) to do with FP64 support. Somehow they got lumped together because everyone scoffed at the mere sight of anything 32-related.
Anyway, once the initial dust settled and some of the confusion disappeared, the second wave of marketing hit. This time FP64 was positioned as a vital requirement for next-generation compute applications which led to a lot of huffing and puffing about how the mobile industry is going all out on GPU compute. Articles went on forever about how we will soon see data center workloads running on mobile devices.
Fast forward to 2016 and every single use case for GPU compute is still revolved around maximum power efficiency. Can you guess what that implies? Mainly finding the optimal balance between FP16 and FP32 support. Apart from a few developers still stuck in desktop-only mode, nobody is crying out for FP64 ALUs in GPUs. Instead, the industry is moving in the opposite direction by increasing the number of FP16 cores. For example, PowerVR Series7 GPUs have doubled FP16 performance over the previous generation while NVIDIA made a huge deal about the introduction of half-precision support in Tegra X1.
Everyone now agrees that FP64 is too power hungry for mobile while delivering absolutely no performance benefit in typical applications (image convolution, augmented reality, software-based video decoding, etc.). However, FP64 precision is indeed a requirement for a very limited set of scenarios. For example, the microserver market defines a set of high-precision workloads that can be offloaded from the main CPU across on-chip accelerators (including GPUs) for improved efficiency.
Since I don’t plan to run any DNA analysis or fintech simulations on my smartphone anytime soon, I am very satisfied having FP32/FP16 precision in mobile right now. And so should you.