Google’s current AI accelerator hardware, the Tensor Processing Unit (TPU), promised to speed up AI model training. However, the chipset’s architecture limits the unit from receiving a boost to performance in the early stages of training, such as when performing data preprocessing.
In a recently published paper, researchers at the search giant’s AI research division, Google Brain, proposed a new technique called ‘data echoing,’ which promises to speed up AI training by substituting repetitive computations in earlier pipeline stages with calculated results from those stages.
According to Google researchers, the most efficient data echoing algorithm uses less upstream processing when compared to the baseline algorithm’s performance.
“Training a neural network requires more than just the operations that run well on accelerators, so we cannot rely on accelerator improvements alone to keep producing speedups in all cases,” say the researchers in the paper. “A training program may need to read and decompress training data, shuffle it, batch it, and even transform or augment it.”
Typically, the AI system reads and decodes the input data first before it shuffles the data with a set of transformations for augmentation. The system will then gather examples into batches and update the parameters iteratively to reduce error. Data echoing, however, adds a stage in the pipeline which repeats the previous stage’s output just before the parameter update, hence reclaiming some unused computing space.
To test out the efficiency of the new technique, the researchers applied data echoing on two instances of language modelling tasks and image classification tasks, as well as one object detection task. They measured the amount of time for a ‘fresh’ training example to reach a target metric. The team also looked at whether their new method could reduce the number of examples needed.
The results show that data echoing required fewer fresh examples than the baseline, which reduced training time. The method performed better with larger batch sizes.
“All data echoing variants achieved at least the same performance as the baseline for both tasks,” the researchers said. “Data echoing is an effective alternative to optimizing the training pipeline or adding additional workers to perform upstream data processing, which may not always be possible or desirable.”