We are developing lightweight machine learning methods to ensure real-time distributed autonomous sensing with environmental and health objectives. In this regard, we focused on developing real-time on-drone efficient machine learning techniques. Specifically, we strived to reduce the energy cost during learning, by dropping unnecessary computations from three complementary levels: stochastic mini-batch dropping on the data level; selective layer update on the model level; and sign prediction for low-cost, low-precision back-propagation, on the algorithm level. Extensive simulations and ablation studies, with real energy measurements from an FPGA board, confirm the superiority of our proposed strategies and demonstrate remarkable energy savings for machine learning training. For example, when training ResNet-110 on CIFAR-100, an over 84% training energy saving is achieved without degrading inference accuracy.
The machine learning algorithms in this project must perform sensing and learning missions in real-world environments. As such, on-drone adaptation and continuous learning is critical to ensure the algorithmic performance due to the fact that data that the algorithm would see in the real-world missions might be very different from those that are used to pretrain the algorithm offline. The challenge is that on-drone computational and storage resources are very limited due to the weight and form-factor constraints, while adaptation and continuous learning of machine learning algorithms are often high cost. We thus strived to remove redundancy in the adaptation and learning process to aggressively reduce the computational and storage costs while maintaining the algorithmic performance, such as the sensing and detection accuracy.
We proposed a novel energy efficient machine learning training framework dubbed E2-Train. It consists of three complementary aspects of efforts to trim down unnecessary training computations and data movements:
Data-Level: Stochastic mini-batch dropping (SMD). We show that machine learning training could be accelerated by a “frustratingly easy” strategy: randomly skipping mini-batches with 0.5 probability throughout training. This could be interpreted as data sampling with (limited) replacements, and is found to incur minimal accuracy loss (and sometimes even increases accuracy).
Model-Level: Input-dependent selective layer update (SLU). For each minibatch, we select a different subset of the CNN layers to be updated. The input-adaptive selection is based on a low-cost gating function jointly learned during training. While similar ideas were explored in efficient inference, for the first time, they are applied to and evaluated for training for the first time.
Algorithm-Level: Predictive sign gradient descent (PSG). We explore the usage of an extremely low-precision gradient descent algorithm called SignSGD, which has recently found both theoretical and experimental grounds. The original algorithm still requires the full gradient computation and therefore does not save energy. We create a novel “predictive” variant, that could obtain the sign without computing the full gradient via low-cost, bit-level prediction. Combined with mixed-precision design, it decreases both computation and data-movement costs.
Besides mainly experimental explorations, we found E2-Train has many interesting links to recent machine learning training theories. We evaluated E2-Train in comparison with its closest state-of-the-art competitors. To measure its actual performance, E2-Train is implemented and evaluated on an FPGA board. The results show that the machine learning model applied with E2-Train consistently achieves higher training energy efficiency with marginal accuracy drops.
Publications:
Yue Wang, Ziyu Jiang, Xiaohan Chen, Pengfei Xu, Yang Zhao, Atlas Wang, and Yingyan Lin, “E2-Train: Training State-of-the-art CNNs with Over 80% Energy Savings”,
Thirty-third Conference on Neural Information Processing Systems (NeurIPS 2019).