Binary level toolchain provenance identification with graph neural networks - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Binary level toolchain provenance identification with graph neural networks

Tristan Benoit
Jean-Yves Marion
Sébastien Bardin

Résumé

We consider the problem of recovering the compiling chain used to generate a given stripped binary code. We present a Graph Neural Network framework at the binary level to solve this problem, with the idea to take into account the shallow semantics provided by the binary code's structured control flow graph (CFG). We introduce a Graph Neural Network, called Site Neural Network (SNN), dedicated to this problem. To attain scalability at the binary level, feature extraction is simplified by forgetting almost everything in a CFG except transfer control instructions and performing a parametric graph reduction. Our experiments show that our method recovers the compiler family with a very high F1-Score of 0.9950 while the optimization level is recovered with a moderately high F1-Score of 0.7517. On the compiler version prediction task, the F1-Score is about 0.8167 excluding the clang family. A comparison with a previous work demonstrates the accuracy and performance of this framework.
Fichier principal
Vignette du fichier
SANER_2021_CAMERA_READY_1.pdf (940.91 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03447628 , version 1 (24-11-2021)

Identifiants

Citer

Tristan Benoit, Jean-Yves Marion, Sébastien Bardin. Binary level toolchain provenance identification with graph neural networks. SANER 2021 - 28th IEEE International Conference on Software Analysis, Evolution and Reengineering, Mar 2021, Honolulu / Virtual, United States. pp.131-141, ⟨10.1109/SANER50967.2021.00021⟩. ⟨hal-03447628⟩
108 Consultations
201 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More