Experimentation of Data Locality Performance for a Parallel Hierarchical Algorithm on the Origin2000 - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 1998

Experimentation of Data Locality Performance for a Parallel Hierarchical Algorithm on the Origin2000

Laurent Alonso
  • Fonction : Auteur
  • PersonId : 830118
Jean-Claude Paul
  • Fonction : Auteur

Résumé

Hierarchical algorithms form a class of applications widely being used in high-performance scientific computing, due to their capability to solve very large physical problems. They are based on the physical property that the further two points are, the less they influence each other. However, their irregular and dynamic characteristics make parallelizing them efficiently a challenge. Indeed, two conflicting objectives have to be taken into account: load balancing and data locality. It has been shown that the message passing paradigm was not well suited for this kind of applications, because of the intensive communication they introduce. Implicit communication through a shared address space appears to be better adapted. Particularly, the ccNUMA architecture of the Origin2000 can help us getting the desired data locality through its memory hierarchy. We have experimented a parallel implementation of a well known computer graphics hierarchical algorithm: the wavelet radiosity. This algorithm is a very efficient approach to compute global illumination in diffuse environments but still remains too much time and memory consuming when dealing with extremely complex models. Our parallel algorithm focuses on load balancing optimization and heavily relies on the ccNUMA architecture efficiency for data locality. Load balancing is handled with a general dynamic tasking mechanism with specific improvements. Minimal efforts are made towards memory management (like the writing of thread-safe non-blocking malloc/free C functionalities) and the Origin2000 proves all its capabilities to efficiently handle the natural data locality of our application. Our best results yield a speed-up of 24 with 36 processors. Moreover, we were able to compute the illumination of a complex scene (a cloister in Quito, composed of 54789 initial surfaces and leading to 600000 final meshes) in 2 hours 41 minutes with 24 processors. To the knowledge of the authors, this is the most complex "real world" scene ever computed.
Fichier principal
Vignette du fichier
98-R-212.pdf (220.41 Ko) Télécharger le fichier

Dates et versions

inria-00098705 , version 1 (26-09-2006)

Identifiants

  • HAL Id : inria-00098705 , version 1

Citer

Xavier Cavin, Laurent Alonso, Jean-Claude Paul. Experimentation of Data Locality Performance for a Parallel Hierarchical Algorithm on the Origin2000. Fourth European CRAY-SGI MPP Workshop, 1998, Garching/Munich, Germany, pp.178-187. ⟨inria-00098705⟩
134 Consultations
65 Téléchargements

Partager

Gmail Facebook X LinkedIn More