The Next 700 Accelerated Layers - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Article Dans Une Revue ACM Transactions on Architecture and Code Optimization Année : 2019

The Next 700 Accelerated Layers

Résumé

Deep learning frameworks automate the deployment, distribution, synchronization, memory allocation, andhardware acceleration of models represented as graphs of computational operators. These operators wraphigh-performance libraries such as cuDNN or NNPACK. When the computation does not match any prede-fined library call, custom operators must be implemented, often at high engineering cost and performancepenalty, limiting the pace of innovation. To address this productivity gap, we propose and evaluate: (1) adomain-specific language with a tensor notation close to the mathematics of deep learning; (2) a Just-In-Time optimizing compiler based on the polyhedral framework; (3) carefully coordinated linear optimizationand evolutionary algorithms to synthesize high-performance CUDA kernels; (4) the transparent integrationof our flow into PyTorch and Caffe2, providing the fully automatic synthesis of high-performance GPU ker-nels from simple tensor algebra. The performance is comparable to, and often exceeds the performance of,highly tuned libraries.

Dates et versions

hal-02458550 , version 1 (28-01-2020)

Identifiants

Citer

Nicolas Vasilache, Oleksandr Zinenko, Theodoros Theodoridis, Priya Goyal, Zachary Devito, et al.. The Next 700 Accelerated Layers. ACM Transactions on Architecture and Code Optimization, 2019, 16 (4), pp.1-26. ⟨10.1145/3355606⟩. ⟨hal-02458550⟩
63 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More