Runtime Vectorization Transformations of Binary Code

Nabil Hallou; Erven Rohou; Philippe Clauss

doi:10.1007/s10766-016-0480-z

Article Dans Une Revue International Journal of Parallel Programming Année : 2017

Runtime Vectorization Transformations of Binary Code

(1) , (2, 3) , (4)

1
2
3
4

Nabil Hallou

Fonction : Auteur

Amdahl's Law is Forever

Erven Rohou

Fonction : Auteur
PersonId : 176658
IdHAL : erven-rohou
ORCID : 0000-0002-8060-8360
IdRef : 135287065

Pushing Architecture and Compilation for Application Performance

ARCHITECTURE

Philippe Clauss

Fonction : Auteur
PersonId : 739331
IdHAL : philippe-clauss
ORCID : 0000-0002-5759-9195

Compilation pour les Architectures MUlti-coeurS

Résumé

In many cases, applications are not optimized for the hardware on which they run. Several reasons contribute to this unsatisfying situation, such as legacy code, commercial code distributed in binary form, or deployment on compute farms. In fact, backward compatibility of ISA guarantees only the functionality, not the best exploitation of the hardware. In this work, we focus on maximizing the CPU efficiency for the SIMD extensions. The first contribution was originally published in the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation, SAMOS XV, Jul 2015, Agios Konstantinos, Greece. It is a binary-to-binary optimization framework where loops vectorized for an older version of the processor SIMD extension are automatically converted to a newer one. It is a lightweight mechanism that does not include a vectorizer, but instead leverages what a static vectorizer previously did. We show that many loops compiled for x86 SSE can be dynamically converted to the more recent and more powerful AVX; as well as, how correctness is maintained with regards to challenges such as data dependencies and reductions. We obtain speedups in line with those of a native compiler targeting AVX. The second contribution is the runtime vectorization of loops in binary codes that were not originally vectorized. For this purpose, we use open source frameworks that we have tuned and integrated to (1) dynamically lift the x86 binary into the Intermediate Representation form of the LLVM compiler, (2) abstract hot loops in the polyhedral model, (3) use the power of this mathematical framework to vectorize them, and (4) finally compile them back into executable form using the LLVM Just-In-Time compiler. In most cases, the obtained speedups are close to the number of elements that can be simultaneously processed by the SIMD unit. The re-vectorizer and auto-vectorizer are implemented inside a dynamic optimization platform; it is completely transparent to the user, does not require any rewriting of the binaries, and operates during program execution.

Domaines

Autre [cs.OH]

Fichier principal

DynamicRevectorizationExtended.pdf (741.56 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Erven Rohou : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01593216

Soumis le : lundi 25 septembre 2017-21:02:57

Dernière modification le : jeudi 11 avril 2024-13:08:14

Archivage à long terme le : mardi 26 décembre 2017-14:24:34

Dates et versions

hal-01593216 , version 1 (25-09-2017)

Identifiants

HAL Id : hal-01593216 , version 1
DOI : 10.1007/s10766-016-0480-z

Citer

Nabil Hallou, Erven Rohou, Philippe Clauss. Runtime Vectorization Transformations of Binary Code. International Journal of Parallel Programming, 2017, 8 (6), pp.1536 - 1565. ⟨10.1007/s10766-016-0480-z⟩. ⟨hal-01593216⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

INSTITUT-TELECOM UNIV-RENNES1 CNRS INRIA INSA-RENNES ENGEES IRISA INSA-STRASBOURG IRISA-D3 INRIA2 INC-CNRS UR1-MATH-STIC UR1-UFR-ISTIC SITE-ALSACE UNIV-RENNES INSA-GROUPE UR1-MATH-NUM

350 Consultations

947 Téléchargements

Runtime Vectorization Transformations of Binary Code

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager