Software Provenance Tracking at the Scale of Public Source Code - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Article Dans Une Revue Empirical Software Engineering Année : 2020

Software Provenance Tracking at the Scale of Public Source Code

Résumé

We study the possibilities to track provenance of software source code artifacts within the largest publicly accessible corpus of publicly available source code, the Software Heritage archive, with over 4 billions unique source code files and 1 billion commits capturing their development histories across 50 million software projects. We perform a systematic and generic estimate of the replication factor across the different layers of this corpus, analysing how much the same artifacts (e.g., SLOC, files or commits) appear in different contexts (e.g., files, commits or source code repositories). We observe a combinatorial explosion in the number of identical source code files across different commits. To discuss the implication of these findings, we benchmark different data models for capturing software provenance information at this scale, and we identify a viable solution, based on the properties of isochrone subgraphs, that is deployable on commodity hardware, is incremental and appears to be maintainable for the foreseeable future. Using these properties, we quantify, at a scale never achieved previously, the growth rate of original, i.e. never-seen-before, source code files and commits, and find it to be exponential over a period of more than 40 years.
Fichier principal
Vignette du fichier
main.pdf (1.14 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02543794 , version 1 (15-04-2020)

Identifiants

  • HAL Id : hal-02543794 , version 1

Citer

Guillaume Rousseau, Roberto Di Cosmo, Stefano Zacchiroli. Software Provenance Tracking at the Scale of Public Source Code. Empirical Software Engineering, 2020. ⟨hal-02543794⟩

Collections

INRIA
329 Consultations
441 Téléchargements

Partager

Gmail Facebook X LinkedIn More