Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, articles Publications Submitted articles, 2014. ,
DOI : 10.1145/2898348
URL : https://hal.archives-ouvertes.fr/hal-01333645
Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, Conference proceedings [, pp.521-532978, 2013. ,
DOI : 10.1007/978-3-642-40047-6_53
URL : https://hal.archives-ouvertes.fr/hal-01220611
Task-Based Multifrontal QR Solver for GPU-Accelerated Multicore Architectures, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), 2015. ,
DOI : 10.1109/HiPC.2015.27
URL : https://hal.archives-ouvertes.fr/hal-01166312
Matrices Over Runtime Systems at Exascale. Poster at the Super- Computing, 2015. ,
Sparse direct solvers on top of a runtime system, Presentation at the PMAA 2014 international conference, 2014. ,
Sparse direct solvers on top of a runtime system, Presentation at the SIAM Computational Science and Engineering international conference, 2015. ,
Task-Based Multifrontal QR Solver for GPU-Accelerated Multicore Architectures, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), 2015. ,
DOI : 10.1109/HiPC.2015.27
URL : https://hal.archives-ouvertes.fr/hal-01166312
Direct methods on GPU-based systems, preliminary work towards a functioning code. Presentation at the Sparse Days workshop, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00727020
On the out-of-core factorization of large sparse matrices, 2008. ,
URL : https://hal.archives-ouvertes.fr/tel-00563463
Robust Memory-Aware Mappings for Parallel Multifrontal Factorizations, SIAM Journal on Scientific Computing, vol.38, issue.3 ,
DOI : 10.1137/130938505
URL : https://hal.archives-ouvertes.fr/hal-00726644
LU factorization for accelerator-based systems, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA) ,
DOI : 10.1109/AICCSA.2011.6126599
URL : https://hal.archives-ouvertes.fr/hal-00654193
QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.932-943, 2011. ,
DOI : 10.1109/IPDPS.2011.90
URL : https://hal.archives-ouvertes.fr/inria-00547614
A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs, GPU Computing Gems, pp.473-484, 2011. ,
DOI : 10.1016/B978-0-12-385963-1.00034-4
Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015. ,
DOI : 10.1109/IPDPSW.2015.35
URL : https://hal.archives-ouvertes.fr/hal-01120507
Taskbased FMM for heterogeneous architectures, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00974674
QR factorization of tall and skinny matrices in a grid computing environment, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-11, 2010. ,
DOI : 10.1109/IPDPS.2010.5470475
URL : https://hal.archives-ouvertes.fr/inria-00548900
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, 2009. ,
DOI : 10.1088/1742-6596/180/1/012037
A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures, pp.1102-5328, 2011. ,
DOI : 10.1007/978-3-642-23397-5_19
URL : https://hal.archives-ouvertes.fr/hal-00726654
Optimizing Compilers for Modern Architectures: A Dependence-Based Approach, 2002. ,
Algorithm 837, ACM Transactions on Mathematical Software, vol.30, issue.3, pp.381-388, 2004. ,
DOI : 10.1145/1024074.1024081
MUMPS: A General Purpose Distributed Memory Sparse Solver, Proceedings of PARA2000, the Fifth International Workshop on Applied Parallel Computing, pp.122-131, 1947. ,
DOI : 10.1007/3-540-70734-4_16
URL : https://hal.archives-ouvertes.fr/hal-00856652
Multifrontal QR Factorization in a Multiprocessor Environment, Numerical Linear Algebra with Applications, vol.8, issue.89, pp.275-300, 1996. ,
DOI : 10.1002/(SICI)1099-1506(199607/08)3:4<275::AID-NLA83>3.0.CO;2-7
Communication-Avoiding QR Decomposition for GPUs, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.48-58, 2011. ,
DOI : 10.1109/IPDPS.2011.15
The Landscape of Parallel Computing Research: A View from Berkeley, 2006. ,
StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, 2009. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, pp.851-862, 2009. ,
DOI : 10.1109/TPDS.2003.1214317
Parallelizing dense and banded linear algebra libraries using SMPSs, Concurrency and Computation: Practice and Experience 21, pp.18-2438, 2009. ,
DOI : 10.1002/cpe.1463
Communication lower bounds and optimal algorithms for numerical linear algebra, Acta Numerica, vol.100, 2014. ,
DOI : 10.1137/0710032
Minimizing Communication in Numerical Linear Algebra, SIAM Journal on Matrix Analysis and Applications, vol.32, issue.3, 2011. ,
DOI : 10.1137/090769156
Numerical methods for Least Squares Problems, Philadelphia: SIAM, 1996. ,
DOI : 10.1137/1.9781611971484
PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013. ,
DOI : 10.1109/MCSE.2013.98
DAGuE: A generic distributed DAG engine for High Performance Computing ,
Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach, Scalable Computing and Communications: Theory and Practice, pp.699-733, 2013. ,
Distibuted Dense Numerical Linear Algebra Algorithms on massively parallel architectures: DPLASMA, Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW'11), PDSEC 2011. Anchorage, United States, pp.1432-1441, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00809680
Tiled QR Factorization Algorithms Networking, Storage and Analysis. SC '11, Proceedings of 2011 International Conference for High Performance Computing, 2011. ,
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.180-186, 2010. ,
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889
Fine granularity sparse QR factorization for multicore based systems PARA'10, Proceedings of the 10th international conference on Applied Parallel and Scientific Computing, pp.226-236978, 2012. ,
Fine-Grained Multithreading for the Multifrontal $QR$ Factorization of Sparse Matrices, SIAM Journal on Scientific Computing, vol.35, issue.4, pp.323-345, 2013. ,
DOI : 10.1137/110846427
URL : https://hal.archives-ouvertes.fr/hal-01122471
The Impact of Multicore on Math Software, Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing. PARA'06, 2007. ,
DOI : 10.1007/978-3-540-75755-9_1
A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009. ,
DOI : 10.1016/j.parco.2008.10.002
Parallel tiled QR factorization for multicore architectures, Concurr. Comput. : Pract. Exper, vol.2013, 2008. ,
Versatile, scalable, and accurate simulation of distributed applications and platforms, Journal of Parallel and Distributed Computing, vol.74, issue.10, pp.2899-2917, 2014. ,
DOI : 10.1016/j.jpdc.2014.06.008
URL : https://hal.archives-ouvertes.fr/hal-01017319
SuperMatrix, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming , PPoPP '08, pp.123-132, 2008. ,
DOI : 10.1145/1345206.1345227
Automatic task graph generation techniques, In: System Sciences Proceedings of the Twenty-Eighth Hawaii International Conference on, pp.113-122, 1995. ,
-Step Iteration Procedures, Journal of Mathematics and Physics, vol.57, issue.1, pp.64-73, 1955. ,
DOI : 10.1002/sapm195534164
URL : https://hal.archives-ouvertes.fr/hal-01308911
A column approximate minimum degree ordering algorithm, ACM Transactions on Mathematical Software, vol.30, issue.3, pp.353-376, 2004. ,
DOI : 10.1145/1024074.1024079
Direct Methods for Sparse Linear Systems, Society for Industrial and Applied Mathematics, 2006. ,
DOI : 10.1137/1.9780898718881
Algorithm 915, SuiteSparseQR, ACM Transactions on Mathematical Software, vol.38, issue.1, 2011. ,
DOI : 10.1145/2049662.2049670
The university of Florida sparse matrix collection, ACM Transactions on Mathematical Software, vol.38, issue.1, pp.1-1, 2011. ,
DOI : 10.1145/2049662.2049663
Communication-optimal Parallel and Sequential QR and LU Factorizations, SIAM Journal on Scientific Computing, vol.34, issue.1, 2012. ,
DOI : 10.1137/080731992
URL : https://hal.archives-ouvertes.fr/hal-00870930
Een algorithme ter voorkoming van de dodelijke omarming " . circulated privately, 1965. ,
The Mathematics Behind the Banker's Algorithm In: Selected Writings on Computing: A personal Perspective. Texts and Monographs in Computer Science, pp.308-312978, 1982. ,
Hierarchical QR factorization algorithms for multi-core clusters, Parallel Computing, vol.39, issue.4-5, pp.4-5, 2013. ,
DOI : 10.1016/j.parco.2013.01.003
URL : https://hal.archives-ouvertes.fr/hal-00809770
Direct Methods for Sparse Matrices, pp.0-198, 1986. ,
The Multifrontal Solution of Indefinite Sparse Symmetric Linear, ACM Transactions on Mathematical Software, vol.9, issue.3, pp.302-325, 1983. ,
DOI : 10.1145/356044.356047
Parallel Scheduling of Task Trees with Limited Memory, ACM Transactions on Parallel Computing, vol.2, issue.2, pp.1-13, 2015. ,
DOI : 10.1145/2779052
URL : https://hal.archives-ouvertes.fr/hal-01160118
Parallel Scheduling of Task Trees with Limited Memory, ACM Transactions on Parallel Computing, vol.2, issue.2, 2014. ,
DOI : 10.1145/2779052
URL : https://hal.archives-ouvertes.fr/hal-01160118
X-kaapi: A Multi Paradigm Runtime for Multicore Architectures, 2013 42nd International Conference on Parallel Processing, pp.728-735, 2013. ,
DOI : 10.1109/ICPP.2013.86
URL : https://hal.archives-ouvertes.fr/hal-00727827
Task scheduling for parallel sparse Cholesky factorization, International Journal of Parallel Programming, vol.27, issue.4, pp.291-314, 1989. ,
DOI : 10.1007/BF01407861
Nested Dissection of a Regular Finite Element Mesh, SIAM Journal on Numerical Analysis, vol.10, issue.2, pp.345-363, 1973. ,
DOI : 10.1137/0710032
Computation of Plane Unitary Rotations Transforming a General Matrix to Triangular Form, Journal of the Society for Industrial and Applied Mathematics, vol.61, 1958. ,
Numerical methods for solving linear least squares problems, English. In: Numerische Mathematik, pp.206-216, 1965. ,
DOI : 10.1007/BF01436075
Impact of reordering on the memory of a multifrontal solver, Parallel Computing, vol.29, issue.9, pp.1191-1218, 2003. ,
DOI : 10.1016/S0167-8191(03)00099-1
URL : https://hal.archives-ouvertes.fr/hal-00807378
Tile QR factorization with parallel panel processing for multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-10, 2010. ,
DOI : 10.1109/IPDPS.2010.5470443
URL : https://hal.archives-ouvertes.fr/inria-00548899
Improving performance of adaptive component-based dataflow middleware, Parallel Computing, vol.38, issue.6-7, pp.6-7, 2012. ,
DOI : 10.1016/j.parco.2012.03.005
PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems, Parallel Computing, vol.28, issue.2, pp.301-321, 2002. ,
DOI : 10.1016/S0167-8191(01)00141-7
Power Efficient Processor Architecture and The Cell Processor, 11th International Symposium on High-Performance Computer Architecture, pp.258-262, 2005. ,
DOI : 10.1109/HPCA.2005.26
An indefinite sparse direct solver for large problems on multicore machines, 2010. ,
A Sparse Symmetric Indefinite Direct Solver for GPU Architectures, ACM Transactions on Mathematical Software, vol.42, issue.1, 12189719. ,
DOI : 10.1145/2756548
Design of a Multicore Sparse Cholesky Factorization Using DAGs, SIAM Journal on Scientific Computing, vol.32, issue.6, pp.3627-3649, 2010. ,
DOI : 10.1137/090757216
Unitary Triangularization of a Nonsymmetric Matrix, Journal of the ACM, vol.5, issue.4, pp.339-342, 1958. ,
DOI : 10.1145/320941.320947
URL : https://hal.archives-ouvertes.fr/hal-01316095
A Runtime Approach to Dynamic Resource Allocation for Sparse Direct Solvers, 2014 43rd International Conference on Parallel Processing, pp.481-490, 2014. ,
DOI : 10.1109/ICPP.2014.57
URL : https://hal.archives-ouvertes.fr/hal-01101054
Composing multiple StarPU applications over heterogeneous machines: A supervised approach, International Journal of High Performance Computing Applications, vol.283, pp.285-300, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-00824514
Composability of parallel codes on heterogeneous architectures " . Theses ,
URL : https://hal.archives-ouvertes.fr/tel-01162975
The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations, Journal of Parallel and Distributed Computing, vol.72, issue.9, pp.1134-1143, 2012. ,
DOI : 10.1016/j.jpdc.2011.10.014
On Optimal Tree Traversals for Sparse Matrix Factorization, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011. ,
DOI : 10.1109/IPDPS.2011.60
URL : https://hal.archives-ouvertes.fr/hal-00945078
CHARM++: A Portable Concurrent Object Oriented System Based On C++, pp.91-108, 1993. ,
A Parallel Sparse Direct Solver via Hierarchical DAG Scheduling, ACM Transactions on Mathematical Software, vol.41, issue.1, pp.1-3, 2014. ,
DOI : 10.1145/2629641
Scheduling a Parallel Sparse Direct Solver to Multiple GPUs, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, pp.1401-1408, 2013. ,
DOI : 10.1109/IPDPSW.2013.26
Programming Heterogeneous Clusters with Accelerators Using Object-Based Programming, Scientific Programming, vol.19, issue.1, pp.47-62, 2011. ,
DOI : 10.1155/2011/525717
Fully Dynamic Scheduler for Numerical Computing on Multicore Processors, LAPACK working note, p.220, 2009. ,
Scheduling and memory optimizations for sparse direct solver on multicore/multi-gpu cluster systems ,
Multifrontal methods for large sparse systems of linear equations: parallelism, memory usage, performance optimization and numerical issues, 2012. ,
An Application of Generalized Tree Pebbling to Sparse Matrix Factorization, SIAM Journal on Algebraic Discrete Methods, vol.8, issue.3, 1987. ,
DOI : 10.1137/0608031
On the storage requirement in the out-of-core multifrontal method for sparse factorization, ACM Transactions on Mathematical Software, vol.12, issue.3, pp.127-148, 1986. ,
DOI : 10.1145/7921.11325
The Role of Elimination Trees in Sparse Factorization, SIAM Journal on Matrix Analysis and Applications, vol.11, issue.1, pp.134-172, 1990. ,
DOI : 10.1137/0611010
A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators, pp.93-101, 2010. ,
DOI : 10.1007/978-3-642-03869-3_79
Multifrontal Computations on GPUs and Their Multi-core Hosts, Proceedings of the 9th international conference on High performance computing for computational science. VEC- PAR'10, pp.71-82, 2011. ,
DOI : 10.1016/0167-8191(86)90019-0
Qilin, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, pp.45-55, 2009. ,
DOI : 10.1145/1669112.1669121
STREAM: Sustainable Memory Bandwidth in High Performance Computers. Tech. rep, 1991. ,
Accelerating GPU Kernels for Dense Linear Algebra, pp.83-92, 2010. ,
DOI : 10.1007/978-3-642-01970-8_89
An Improved Magma Gemm For Fermi Graphics Processing Units, The International Journal of High Performance Computing Applications, vol.27, issue.1, pp.511-515, 2010. ,
DOI : 10.1177/1094342010385729
LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares, ACM Transactions on Mathematical Software, vol.8, issue.1, 1982. ,
DOI : 10.1145/355984.355989
A Mapping Algorithm for Parallel Sparse Cholesky Factorization, SIAM Journal on Scientific Computing, vol.14, issue.5, pp.1253-1253, 1993. ,
DOI : 10.1137/0914074
Solving dense linear systems on platforms with multiple hardware accelerators, pp.121-130, 2009. ,
Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, 2009. ,
DOI : 10.1145/1527286.1527288
Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism, 2007. ,
Accelerating Sparse Cholesky Factorization on GPUs Architectures and Algorithms. IA3 '14, Proceedings of the Fourth Workshop on Irregular Applications, 2014. ,
Memory and performance issues in parallel multifrontal factorizations and triangular solutions with sparse right-hand sides " . anglais, Thèse de doctorat, 2012. ,
URL : https://hal.archives-ouvertes.fr/tel-00785748
A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems, 2015 IEEE International Parallel and Distributed Processing Symposium, pp.71-81, 2015. ,
DOI : 10.1109/IPDPS.2015.104
A Distributed CPU-GPU Sparse Direct Solver, Euro-Par 2014 Parallel Processing, pp.487-498, 2014. ,
DOI : 10.1007/978-3-319-09873-9_41
¨ Uber die Auflösung linearer Gleichungen mit Unendlich vielen unbekannten, pp.53-77, 1884. ,
A Storage-Efficient $WY$ Representation for Products of Householder Transformations, SIAM Journal on Scientific and Statistical Computing, vol.10, issue.1, pp.52-57, 1989. ,
DOI : 10.1137/0910005
A New Implementation of Sparse Gaussian Elimination, ACM Transactions on Mathematical Software, vol.8, issue.3, pp.256-276, 1982. ,
DOI : 10.1145/356004.356006
Complete Register Allocation Problems, Proceedings of the Fifth Annual ACM Symposium on Theory of Computing. STOC '73, pp.182-195, 1973. ,
The Generation of Optimal Code for Arithmetic Expressions, Journal of the ACM, vol.17, issue.4, 1970. ,
DOI : 10.1145/321607.321620
Scalable Tile Communication- Avoiding QR Factorization on Multicore Cluster Systems Networking , Storage and Analysis. SC '10, Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, 2010. ,
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, p.9, 2009. ,
DOI : 10.1145/1654059.1654079
Faithful performance prediction of a dynamic task-based runtime system for heterogeneous multi-core architectures, Concurrency and Computation: Practice and Experience, 2015. ,
DOI : 10.1002/cpe.3555
URL : https://hal.archives-ouvertes.fr/hal-01147997
Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multicore Architectures, Euro-Par 2014 Parallel Processing -20th International Conference Proceedings. 2014, pp.50-62978, 2014. ,
URL : https://hal.archives-ouvertes.fr/hal-01011633
Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, issue.5-6, pp.5-6, 2010. ,
DOI : 10.1016/j.parco.2009.12.005
Dense linear algebra solvers for multicore with GPU accelerators, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp.1-8, 2010. ,
DOI : 10.1109/IPDPSW.2010.5470941
Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002. ,
DOI : 10.1109/71.993206
Realm, Proceedings of the 23rd international conference on Parallel architectures and compilation, PACT '14, pp.263-276, 2014. ,
DOI : 10.1145/2628071.2628084
Introducing: The Libflame Library for Dense Matrix Computations, Computing in Science & Engineering, vol.99, 2009. ,
DOI : 10.1109/MCSE.2009.154
Benchmarking GPUs to tune dense linear algebra, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.8-31, 2008. ,
DOI : 10.1109/SC.2008.5214359
Roofline, Communications of the ACM, vol.52, issue.4, 2009. ,
DOI : 10.1145/1498765.1498785
Hierarchical DAG Scheduling for Hybrid Distributed Systems, 2015 IEEE International Parallel and Distributed Processing Symposium, pp.156-165, 2015. ,
DOI : 10.1109/IPDPS.2015.56
URL : https://hal.archives-ouvertes.fr/hal-01078359
QUARK Users' Guide: QUeueing And Runtime for Kernels, 2011. ,
A CPU???GPU hybrid approach for the unsymmetric multifrontal method, Parallel Computing, vol.37, issue.12, pp.759-770, 2011. ,
DOI : 10.1016/j.parco.2011.09.002
Supernodal sparse Cholesky factorization on graphics processing units Concurrency and Computation: Practice and Experience 26, 2014. ,