Understanding the metal
Saturday, Sep 1, 2012 · 800 words · approx 4 mins to readHardware is nothing without software. Or, if you’re that way inclined, software is nothing without hardware. Both are true.
I wrote a post for the PowerVR Insider blog not that long ago that points out that in the world of graphics hardware the software is often overlooked. It’s a technical marketing piece with the sentiment that without high quality software (driver and shader compiler in the main) that our hardware designs would have worse area and power efficiency and we’d do much worse in competition.
The reverse is true too, although it doesn’t always make intuitive sense since software is malleable and hardware is fixed. While you can always change your codes to better fit a given processing architecture, it can certainly be the case that the effort and payoff aren’t actually worth it.
While high quality software that understands the machine its running on is crucial for embedded GPUs in particular because they’re firmware, driver and compiler driven, making high quality implementations of the trio an absolute must or we leave efficiency on the table before the graphics programmer gets involved, it’s a much worse general problem that programmers aren’t led to understand what the hardware should be capable of in the first place and tailor their codes to suit.
In graphics land it’s so much of a black box that you as the graphics programmer could argue that you shouldn’t even bother trying to understand it too much. You can’t change the compiler, you have a very limited number of languages and you don’t provide the API. But the payoff from understanding the system underneath is absolutely worth it.
They’re machines that require you to understand state and data flow more than parallelism (that tends to be implied in real-time graphics and you don’t have to worry too much). The model is pretty simple and you can understand the basics very quickly, leading to an instinct about what to do next when optimising.
If you writing in a general purpose language targetting modern CPU architectures, the payoff from understanding the machine is potentially even greater since it’s much harder to extra peak performance on a CPU than a GPU. The trend in computing since I’ve taken part in it has been to make understanding the CPU machine — by machine here I mean the actual native ‘metal’ — less of a priority. In fact in many CPU programming languages today the machine that runs your code is virtual and/or there’s JIT compilation and a runtime. That means most of the time you’re many layers above being native and you’re actively blocked from understanding it.
Your at-the-metal runtime efficiency is therefore in the hands of the language runtime first and most of all. Often it goes to heck, trading runtime efficiency for programmer efficiency. I’ve argued that doesn’t have to be the case before, although programmer productivity is king.
So despite it being incredibly hard to program a CPU efficiently, you’re more readily removed from it than ever before. My interest in hardware meant I’ve always strived to understand it as well as I can and I’ve benefited from that greatly when writing software for both CPU and GPU. My worry is that there’s often no need to do that any more and that’s going to cause us big problems in the future.
It’s a dying art and often even the language runtime and application VM authors don’t seem to care too much. If you’re a language designer and you’re targeting someone else’s application VM then you’ll tend to just punt on understanding the metal entirely.
I debated with a friend recently that the biggest problem facing at-the-metal runtime efficiency on future system architectures is manipulation and movement of data structures, not dispatch or where the code runs. It’s a problem domain that requires understanding of at-the-metal efficiency of multiple hardware and systems architectures and I’m worried that we have a dwindling number of people able to understand it and design efficient software for them properly.
Something like HSA is inherently a much more complex system to write software for and we already do it badly on current simpler architectures. As a programmer, always strive to understand the metal even if it’s several layers below the code you write, so that we have a fighting chance of exploiting the future of computer systems architectures efficiently and it makes sense to go and build them in the first place.