About this resource
Motivation
The variety of CPUs, GPUs, FPGAs, and now an explosion of new AI hardware accelerators is making the field of high performance machine learning hardware increasingly diverse. At the same time multiple deep learning frameworks exist, each providing unique performance/productivity trade-offs. Finally, the deep learning models themselves is the area in which true Cambrian explosion is happening. From convolutional nets, to recurrent nets to attention-based models to autoencoders to GANs to all sorts of previously unknown exotic beasts, new models emerge almost every day.
Which hardware is good for which models and is it properly supported by your favourite framework? This resource is our attempt to put together some statistics on performance of deep learning models implemented with different frameworks on a set of hardware platforms.
What are we measuring
First of all, let us clarify what is deep learning performance. For the user the ultimate parameter is the time needed to train a model. [this part is mostly relevant to training, for inference things are bit simpler]
This time essentially depends on two factors: how many steps are needed to converge and how much time is needed to compute one step (sample/mini-batch/etc) Convergence rate does, with few exceptions, depend on a hardware platform. More importantly, optimizing for converge rate involves two many special tricks like learning rate scheduling, switching between different optimizers etc, so that maintaining same training regime across all frameworks is problematic. For this reason we focus on time to iterate over a training set with default SGD optimizer and fixed learning rate. One possible issues with this approach is that potentially a given platform could enable faster computation in terms of samples per second at the cost of weaker convergence rate. Most typical case would be the use of reduced arithmetic precision. We address this issue in several ways. First of