S. E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, articles Publications Submitted articles, 2014.
DOI : 10.1145/2898348

URL : https://hal.archives-ouvertes.fr/hal-01333645

C. E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Multifrontal QR Factorization for Multicore Architectures over Runtime Systems, Conference proceedings [, pp.521-532978, 2013.
DOI : 10.1007/978-3-642-40047-6_53

URL : https://hal.archives-ouvertes.fr/hal-01220611

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Task-Based Multifrontal QR Solver for GPU-Accelerated Multicore Architectures, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), 2015.
DOI : 10.1109/HiPC.2015.27

URL : https://hal.archives-ouvertes.fr/hal-01166312

. [. Posters and . Agullo, Matrices Over Runtime Systems at Exascale. Poster at the Super- Computing, 2015.

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Sparse direct solvers on top of a runtime system, Presentation at the PMAA 2014 international conference, 2014.

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Sparse direct solvers on top of a runtime system, Presentation at the SIAM Computational Science and Engineering international conference, 2015.

E. Agullo, A. Buttari, A. Guermouche, and F. Lopez, Task-Based Multifrontal QR Solver for GPU-Accelerated Multicore Architectures, 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), 2015.
DOI : 10.1109/HiPC.2015.27

URL : https://hal.archives-ouvertes.fr/hal-01166312

A. Decollas and F. Lopez, Direct methods on GPU-based systems, preliminary work towards a functioning code. Presentation at the Sparse Days workshop, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00727020

]. E. Agullo, On the out-of-core factorization of large sparse matrices, 2008.
URL : https://hal.archives-ouvertes.fr/tel-00563463

E. Agullo, P. R. Amestoy, A. Buttari, A. Guermouche, J. Excellent et al., Robust Memory-Aware Mappings for Parallel Multifrontal Factorizations, SIAM Journal on Scientific Computing, vol.38, issue.3
DOI : 10.1137/130938505

URL : https://hal.archives-ouvertes.fr/hal-00726644

E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, J. Langou et al., LU factorization for accelerator-based systems, 2011 9th IEEE/ACS International Conference on Computer Systems and Applications (AICCSA)
DOI : 10.1109/AICCSA.2011.6126599

URL : https://hal.archives-ouvertes.fr/hal-00654193

E. Agullo, C. Augonnet, J. Dongarra, M. Faverge, H. Ltaief et al., QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.932-943, 2011.
DOI : 10.1109/IPDPS.2011.90

URL : https://hal.archives-ouvertes.fr/inria-00547614

E. Agullo, C. Augonnet, J. Dongarra, H. Ltaief, R. Namyst et al., A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs, GPU Computing Gems, pp.473-484, 2011.
DOI : 10.1016/B978-0-12-385963-1.00034-4

E. Agullo, O. Beaumont, L. Eyraud-dubois, J. Herrmann, S. Kumar et al., Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms, 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 2015.
DOI : 10.1109/IPDPSW.2015.35

URL : https://hal.archives-ouvertes.fr/hal-01120507

E. Agullo, B. Bramas, O. Coulaud, E. Darve, M. Messner et al., Taskbased FMM for heterogeneous architectures, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00974674

E. Agullo, C. Coti, J. Dongarra, T. Herault, and J. Langou, QR factorization of tall and skinny matrices in a grid computing environment, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-11, 2010.
DOI : 10.1109/IPDPS.2010.5470475

URL : https://hal.archives-ouvertes.fr/inria-00548900

E. Agullo, J. Demmel, J. Dongarra, B. Hadri, J. Kurzak et al., Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, 2009.
DOI : 10.1088/1742-6596/180/1/012037

E. Agullo, J. Dongarra, R. Nath, and S. Tomov, A Fully Empirical Autotuned Dense QR Factorization for Multicore Architectures, pp.1102-5328, 2011.
DOI : 10.1007/978-3-642-23397-5_19

URL : https://hal.archives-ouvertes.fr/hal-00726654

R. Allen and K. Kennedy, Optimizing Compilers for Modern Architectures: A Dependence-Based Approach, 2002.

P. R. Amestoy, T. A. Davis, and I. S. Duff, Algorithm 837, ACM Transactions on Mathematical Software, vol.30, issue.3, pp.381-388, 2004.
DOI : 10.1145/1024074.1024081

P. R. Amestoy, I. S. Duff, J. Koster, J. L-'excellent, F. Manne et al., MUMPS: A General Purpose Distributed Memory Sparse Solver, Proceedings of PARA2000, the Fifth International Workshop on Applied Parallel Computing, pp.122-131, 1947.
DOI : 10.1007/3-540-70734-4_16

URL : https://hal.archives-ouvertes.fr/hal-00856652

P. R. Amestoy, I. S. Duff, and C. Puglisi, Multifrontal QR Factorization in a Multiprocessor Environment, Numerical Linear Algebra with Applications, vol.8, issue.89, pp.275-300, 1996.
DOI : 10.1002/(SICI)1099-1506(199607/08)3:4<275::AID-NLA83>3.0.CO;2-7

M. Anderson, G. Ballard, J. Demmel, and K. Keutzer, Communication-Avoiding QR Decomposition for GPUs, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.48-58, 2011.
DOI : 10.1109/IPDPS.2011.15

K. Asanovic, The Landscape of Parallel Computing Research: A View from Berkeley, 2006.

C. Augonnet, S. Thibault, R. Namyst, and P. Wacrenier, StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, Concurrency and Computation: Practice and Experience, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00384363

E. Ayguadé, R. M. Badia, F. D. Igual, J. Labarta, R. Mayo et al., An Extension of the StarSs Programming Model for Platforms with Multiple GPUs, pp.851-862, 2009.
DOI : 10.1109/TPDS.2003.1214317

R. M. Badia, J. R. Herrero, J. Labarta, J. M. Pérez, E. S. Quintana-ortí et al., Parallelizing dense and banded linear algebra libraries using SMPSs, Concurrency and Computation: Practice and Experience 21, pp.18-2438, 2009.
DOI : 10.1002/cpe.1463

G. Ballard, E. Carson, J. Demmel, M. Hoemmen, N. Knight et al., Communication lower bounds and optimal algorithms for numerical linear algebra, Acta Numerica, vol.100, 2014.
DOI : 10.1137/0710032

G. Ballard, J. Demmel, O. Holtz, and O. Schwartz, Minimizing Communication in Numerical Linear Algebra, SIAM Journal on Matrix Analysis and Applications, vol.32, issue.3, 2011.
DOI : 10.1137/090769156

?. A. Björck, Numerical methods for Least Squares Problems, Philadelphia: SIAM, 1996.
DOI : 10.1137/1.9781611971484

G. Bosilca, A. Bouteiller, A. Danalis, M. Faverge, T. Hérault et al., PaRSEC: Exploiting Heterogeneity to Enhance Scalability, Computing in Science & Engineering, vol.15, issue.6, pp.36-45, 2013.
DOI : 10.1109/MCSE.2013.98

G. Bosilca, A. Bouteiller, A. Danalis, T. Hérault, P. Lemarinier et al., DAGuE: A generic distributed DAG engine for High Performance Computing

G. Bosilca, A. Bouteiller, A. Danalis, T. Herault, P. Luszczek et al., Dense Linear Algebra on Distributed Heterogeneous Hardware with a Symbolic DAG Approach, Scalable Computing and Communications: Theory and Practice, pp.699-733, 2013.

G. Bosilca, Distibuted Dense Numerical Linear Algebra Algorithms on massively parallel architectures: DPLASMA, Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW'11), PDSEC 2011. Anchorage, United States, pp.1432-1441, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00809680

H. Bouwmeester, M. Jacquelin, J. Langou, and Y. Robert, Tiled QR Factorization Algorithms Networking, Storage and Analysis. SC '11, Proceedings of 2011 International Conference for High Performance Computing, 2011.

F. Broquedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.180-186, 2010.
DOI : 10.1109/PDP.2010.67

URL : https://hal.archives-ouvertes.fr/inria-00429889

A. Buttari, Fine granularity sparse QR factorization for multicore based systems PARA'10, Proceedings of the 10th international conference on Applied Parallel and Scientific Computing, pp.226-236978, 2012.

A. Buttari, Fine-Grained Multithreading for the Multifrontal $QR$ Factorization of Sparse Matrices, SIAM Journal on Scientific Computing, vol.35, issue.4, pp.323-345, 2013.
DOI : 10.1137/110846427

URL : https://hal.archives-ouvertes.fr/hal-01122471

A. Buttari, J. Dongarra, J. Kurzak, J. Langou, P. Luszczek et al., The Impact of Multicore on Math Software, Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing. PARA'06, 2007.
DOI : 10.1007/978-3-540-75755-9_1

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, A class of parallel tiled linear algebra algorithms for multicore architectures, Parallel Computing, vol.35, issue.1, pp.38-53, 2009.
DOI : 10.1016/j.parco.2008.10.002

A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, Parallel tiled QR factorization for multicore architectures, Concurr. Comput. : Pract. Exper, vol.2013, 2008.

H. Casanova, A. Giersch, A. Legrand, M. Quinson, and F. Suter, Versatile, scalable, and accurate simulation of distributed applications and platforms, Journal of Parallel and Distributed Computing, vol.74, issue.10, pp.2899-2917, 2014.
DOI : 10.1016/j.jpdc.2014.06.008

URL : https://hal.archives-ouvertes.fr/hal-01017319

E. Chan, F. G. Zee, P. Bientinesi, E. S. Quintana-ortí, G. Quintana-ortí et al., SuperMatrix, Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming , PPoPP '08, pp.123-132, 2008.
DOI : 10.1145/1345206.1345227

M. Cosnard and M. Loi, Automatic task graph generation techniques, In: System Sciences Proceedings of the Twenty-Eighth Hawaii International Conference on, pp.113-122, 1995.

E. J. Craig, -Step Iteration Procedures, Journal of Mathematics and Physics, vol.57, issue.1, pp.64-73, 1955.
DOI : 10.1002/sapm195534164

URL : https://hal.archives-ouvertes.fr/hal-01308911

T. A. Davis, J. R. Gilbert, S. I. Larimore, and E. G. Ng, A column approximate minimum degree ordering algorithm, ACM Transactions on Mathematical Software, vol.30, issue.3, pp.353-376, 2004.
DOI : 10.1145/1024074.1024079

T. Davis, Direct Methods for Sparse Linear Systems, Society for Industrial and Applied Mathematics, 2006.
DOI : 10.1137/1.9780898718881

T. A. Davis, Algorithm 915, SuiteSparseQR, ACM Transactions on Mathematical Software, vol.38, issue.1, 2011.
DOI : 10.1145/2049662.2049670

T. A. Davis and Y. Hu, The university of Florida sparse matrix collection, ACM Transactions on Mathematical Software, vol.38, issue.1, pp.1-1, 2011.
DOI : 10.1145/2049662.2049663

J. Demmel, L. Grigori, M. Hoemmen, and J. Langou, Communication-optimal Parallel and Sequential QR and LU Factorizations, SIAM Journal on Scientific Computing, vol.34, issue.1, 2012.
DOI : 10.1137/080731992

URL : https://hal.archives-ouvertes.fr/hal-00870930

E. W. Dijkstra, Een algorithme ter voorkoming van de dodelijke omarming " . circulated privately, 1965.

E. W. English, The Mathematics Behind the Banker's Algorithm In: Selected Writings on Computing: A personal Perspective. Texts and Monographs in Computer Science, pp.308-312978, 1982.

J. Dongarra, M. Faverge, T. Hérault, M. Jacquelin, J. Langou et al., Hierarchical QR factorization algorithms for multi-core clusters, Parallel Computing, vol.39, issue.4-5, pp.4-5, 2013.
DOI : 10.1016/j.parco.2013.01.003

URL : https://hal.archives-ouvertes.fr/hal-00809770

I. S. Duff, A. M. Erisman, and J. K. Reid, Direct Methods for Sparse Matrices, pp.0-198, 1986.

I. S. Duff and J. K. Reid, The Multifrontal Solution of Indefinite Sparse Symmetric Linear, ACM Transactions on Mathematical Software, vol.9, issue.3, pp.302-325, 1983.
DOI : 10.1145/356044.356047

L. Eyraud-dubois, L. Marchal, O. Sinnen, and F. Vivien, Parallel Scheduling of Task Trees with Limited Memory, ACM Transactions on Parallel Computing, vol.2, issue.2, pp.1-13, 2015.
DOI : 10.1145/2779052

URL : https://hal.archives-ouvertes.fr/hal-01160118

L. Eyraud-dubois, L. Marchal, O. Sinnen, and F. Vivien, Parallel Scheduling of Task Trees with Limited Memory, ACM Transactions on Parallel Computing, vol.2, issue.2, 2014.
DOI : 10.1145/2779052

URL : https://hal.archives-ouvertes.fr/hal-01160118

T. Gautier, F. Le-mentec, V. Faucher, and B. Raffin, X-kaapi: A Multi Paradigm Runtime for Multicore Architectures, 2013 42nd International Conference on Parallel Processing, pp.728-735, 2013.
DOI : 10.1109/ICPP.2013.86

URL : https://hal.archives-ouvertes.fr/hal-00727827

A. Geist and E. G. Ng, Task scheduling for parallel sparse Cholesky factorization, International Journal of Parallel Programming, vol.27, issue.4, pp.291-314, 1989.
DOI : 10.1007/BF01407861

A. J. George, Nested Dissection of a Regular Finite Element Mesh, SIAM Journal on Numerical Analysis, vol.10, issue.2, pp.345-363, 1973.
DOI : 10.1137/0710032

W. and G. English, Computation of Plane Unitary Rotations Transforming a General Matrix to Triangular Form, Journal of the Society for Industrial and Applied Mathematics, vol.61, 1958.

G. Golub, Numerical methods for solving linear least squares problems, English. In: Numerische Mathematik, pp.206-216, 1965.
DOI : 10.1007/BF01436075

A. Guermouche, J. Excellent, and G. Utard, Impact of reordering on the memory of a multifrontal solver, Parallel Computing, vol.29, issue.9, pp.1191-1218, 2003.
DOI : 10.1016/S0167-8191(03)00099-1

URL : https://hal.archives-ouvertes.fr/hal-00807378

B. Hadri, H. Ltaief, E. Agullo, and J. Dongarra, Tile QR factorization with parallel panel processing for multicore architectures, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pp.1-10, 2010.
DOI : 10.1109/IPDPS.2010.5470443

URL : https://hal.archives-ouvertes.fr/inria-00548899

T. D. Hartley and E. V. Saule, Improving performance of adaptive component-based dataflow middleware, Parallel Computing, vol.38, issue.6-7, pp.6-7, 2012.
DOI : 10.1016/j.parco.2012.03.005

P. Hénon, P. Ramet, and J. Roman, PaStiX: a high-performance parallel direct solver for sparse symmetric positive definite systems, Parallel Computing, vol.28, issue.2, pp.301-321, 2002.
DOI : 10.1016/S0167-8191(01)00141-7

H. P. Hofstee, Power Efficient Processor Architecture and The Cell Processor, 11th International Symposium on High-Performance Computer Architecture, pp.258-262, 2005.
DOI : 10.1109/HPCA.2005.26

J. D. Hogg and J. A. Scott, An indefinite sparse direct solver for large problems on multicore machines, 2010.

J. Hogg, E. Ovtchinnikov, and J. Scott, A Sparse Symmetric Indefinite Direct Solver for GPU Architectures, ACM Transactions on Mathematical Software, vol.42, issue.1, 12189719.
DOI : 10.1145/2756548

J. Hogg, J. K. Reid, and J. A. Scott, Design of a Multicore Sparse Cholesky Factorization Using DAGs, SIAM Journal on Scientific Computing, vol.32, issue.6, pp.3627-3649, 2010.
DOI : 10.1137/090757216

A. S. Householder, Unitary Triangularization of a Nonsymmetric Matrix, Journal of the ACM, vol.5, issue.4, pp.339-342, 1958.
DOI : 10.1145/320941.320947

URL : https://hal.archives-ouvertes.fr/hal-01316095

A. Hugo, A. Guermouche, P. Wacrenier, and R. Namyst, A Runtime Approach to Dynamic Resource Allocation for Sparse Direct Solvers, 2014 43rd International Conference on Parallel Processing, pp.481-490, 2014.
DOI : 10.1109/ICPP.2014.57

URL : https://hal.archives-ouvertes.fr/hal-01101054

A. Hugo, A. Guermouche, P. Wacrenier, and R. Namyst, Composing multiple StarPU applications over heterogeneous machines: A supervised approach, International Journal of High Performance Computing Applications, vol.283, pp.285-300, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00824514

A. Hugo, Composability of parallel codes on heterogeneous architectures " . Theses
URL : https://hal.archives-ouvertes.fr/tel-01162975

F. D. Igual, E. Chan, E. S. Quintana-ortí, G. Quintana-ortí, R. A. Van-de-geijn et al., The FLAME approach: From dense linear algebra algorithms to high-performance multi-accelerator implementations, Journal of Parallel and Distributed Computing, vol.72, issue.9, pp.1134-1143, 2012.
DOI : 10.1016/j.jpdc.2011.10.014

M. Jacquelin, L. Marchal, Y. Robert, and B. Uçar, On Optimal Tree Traversals for Sparse Matrix Factorization, 2011 IEEE International Parallel & Distributed Processing Symposium, 2011.
DOI : 10.1109/IPDPS.2011.60

URL : https://hal.archives-ouvertes.fr/hal-00945078

L. V. Kalé and S. Krishnan, CHARM++: A Portable Concurrent Object Oriented System Based On C++, pp.91-108, 1993.

K. Kim and V. Eijkhout, A Parallel Sparse Direct Solver via Hierarchical DAG Scheduling, ACM Transactions on Mathematical Software, vol.41, issue.1, pp.1-3, 2014.
DOI : 10.1145/2629641

K. Kim and V. Eijkhout, Scheduling a Parallel Sparse Direct Solver to Multiple GPUs, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, pp.1401-1408, 2013.
DOI : 10.1109/IPDPSW.2013.26

D. M. Kunzman and L. V. Kalé, Programming Heterogeneous Clusters with Accelerators Using Object-Based Programming, Scientific Programming, vol.19, issue.1, pp.47-62, 2011.
DOI : 10.1155/2011/525717

J. Kurzak and J. Dongarra, Fully Dynamic Scheduler for Numerical Computing on Multicore Processors, LAPACK working note, p.220, 2009.

X. Lacoste, Scheduling and memory optimizations for sparse direct solver on multicore/multi-gpu cluster systems

J. Excellent, Multifrontal methods for large sparse systems of linear equations: parallelism, memory usage, performance optimization and numerical issues, 2012.

J. W. Liu, An Application of Generalized Tree Pebbling to Sparse Matrix Factorization, SIAM Journal on Algebraic Discrete Methods, vol.8, issue.3, 1987.
DOI : 10.1137/0608031

J. W. Liu, On the storage requirement in the out-of-core multifrontal method for sparse factorization, ACM Transactions on Mathematical Software, vol.12, issue.3, pp.127-148, 1986.
DOI : 10.1145/7921.11325

J. W. Liu, The Role of Elimination Trees in Sparse Factorization, SIAM Journal on Matrix Analysis and Applications, vol.11, issue.1, pp.134-172, 1990.
DOI : 10.1137/0611010

H. Ltaief, S. Tomov, R. Nath, P. Du, and J. Dongarra, A Scalable High Performant Cholesky Factorization for Multicore with GPU Accelerators, pp.93-101, 2010.
DOI : 10.1007/978-3-642-03869-3_79

R. F. Lucas, G. Wagenbreth, D. M. Davis, and R. Grimes, Multifrontal Computations on GPUs and Their Multi-core Hosts, Proceedings of the 9th international conference on High performance computing for computational science. VEC- PAR'10, pp.71-82, 2011.
DOI : 10.1016/0167-8191(86)90019-0

C. Luk, S. Hong, and H. Kim, Qilin, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42, pp.45-55, 2009.
DOI : 10.1145/1669112.1669121

J. D. Mccalpin, STREAM: Sustainable Memory Bandwidth in High Performance Computers. Tech. rep, 1991.

R. Nath, S. Tomov, and J. Dongarra, Accelerating GPU Kernels for Dense Linear Algebra, pp.83-92, 2010.
DOI : 10.1007/978-3-642-01970-8_89

R. Nath, S. Tomov, and J. Dongarra, An Improved Magma Gemm For Fermi Graphics Processing Units, The International Journal of High Performance Computing Applications, vol.27, issue.1, pp.511-515, 2010.
DOI : 10.1177/1094342010385729

C. C. Paige and M. A. Saunders, LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares, ACM Transactions on Mathematical Software, vol.8, issue.1, 1982.
DOI : 10.1145/355984.355989

A. Pothen and C. Sun, A Mapping Algorithm for Parallel Sparse Cholesky Factorization, SIAM Journal on Scientific Computing, vol.14, issue.5, pp.1253-1253, 1993.
DOI : 10.1137/0914074

G. Quintana-ortí, F. D. Igual, E. S. Quintana-ortí, and R. A. Van-de-geijn, Solving dense linear systems on platforms with multiple hardware accelerators, pp.121-130, 2009.

G. Quintana-ortí, E. S. Quintana-ortí, R. A. Geijn, F. G. Zee, and E. Chan, Programming matrix algorithms-by-blocks for thread-level parallelism, ACM Transactions on Mathematical Software, vol.36, issue.3, 2009.
DOI : 10.1145/1527286.1527288

J. Reinders, Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism, 2007.

S. C. Rennich, D. Stosic, and T. A. Davis, Accelerating Sparse Cholesky Factorization on GPUs Architectures and Algorithms. IA3 '14, Proceedings of the Fourth Workshop on Irregular Applications, 2014.

F. Rouet, Memory and performance issues in parallel multifrontal factorizations and triangular solutions with sparse right-hand sides " . anglais, Thèse de doctorat, 2012.
URL : https://hal.archives-ouvertes.fr/tel-00785748

P. Sao, X. Liu, R. Vuduc, and X. Li, A Sparse Direct Solver for Distributed Memory Xeon Phi-Accelerated Systems, 2015 IEEE International Parallel and Distributed Processing Symposium, pp.71-81, 2015.
DOI : 10.1109/IPDPS.2015.104

P. Sao, R. W. Vuduc, and X. S. Li, A Distributed CPU-GPU Sparse Direct Solver, Euro-Par 2014 Parallel Processing, pp.487-498, 2014.
DOI : 10.1007/978-3-319-09873-9_41

E. Schmidt-german, ¨ Uber die Auflösung linearer Gleichungen mit Unendlich vielen unbekannten, pp.53-77, 1884.

R. Schreiber and C. Van-loan, A Storage-Efficient $WY$ Representation for Products of Householder Transformations, SIAM Journal on Scientific and Statistical Computing, vol.10, issue.1, pp.52-57, 1989.
DOI : 10.1137/0910005

R. Schreiber, A New Implementation of Sparse Gaussian Elimination, ACM Transactions on Mathematical Software, vol.8, issue.3, pp.256-276, 1982.
DOI : 10.1145/356004.356006

R. Sethi, Complete Register Allocation Problems, Proceedings of the Fifth Annual ACM Symposium on Theory of Computing. STOC '73, pp.182-195, 1973.

R. Sethi and J. D. Ullman, The Generation of Optimal Code for Arithmetic Expressions, Journal of the ACM, vol.17, issue.4, 1970.
DOI : 10.1145/321607.321620

F. Song, H. Ltaief, B. Hadri, and J. Dongarra, Scalable Tile Communication- Avoiding QR Factorization on Multicore Cluster Systems Networking , Storage and Analysis. SC '10, Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, 2010.

F. Song, A. Yarkhan, and J. Dongarra, Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC '09, p.9, 2009.
DOI : 10.1145/1654059.1654079

L. Stanisic, S. Thibault, A. Legrand, B. Videau, and J. Méhaut, Faithful performance prediction of a dynamic task-based runtime system for heterogeneous multi-core architectures, Concurrency and Computation: Practice and Experience, 2015.
DOI : 10.1002/cpe.3555

URL : https://hal.archives-ouvertes.fr/hal-01147997

L. Stanisic, S. Thibault, A. Legrand, B. Videau, and J. Méhaut, Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multicore Architectures, Euro-Par 2014 Parallel Processing -20th International Conference Proceedings. 2014, pp.50-62978, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01011633

S. Tomov, J. Dongarra, and M. Baboulin, Towards dense linear algebra for hybrid GPU accelerated manycore systems, Parallel Computing, vol.36, issue.5-6, pp.5-6, 2010.
DOI : 10.1016/j.parco.2009.12.005

S. Tomov, R. Nath, H. Ltaief, and J. Dongarra, Dense linear algebra solvers for multicore with GPU accelerators, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), pp.1-8, 2010.
DOI : 10.1109/IPDPSW.2010.5470941

H. Topcuouglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002.
DOI : 10.1109/71.993206

S. Treichler, M. Bauer, and A. Aiken, Realm, Proceedings of the 23rd international conference on Parallel architectures and compilation, PACT '14, pp.263-276, 2014.
DOI : 10.1145/2628071.2628084

F. Van-zee, E. Chan, R. Van-de-geijn, E. Quintana, and G. Quintana-orti, Introducing: The Libflame Library for Dense Matrix Computations, Computing in Science & Engineering, vol.99, 2009.
DOI : 10.1109/MCSE.2009.154

V. Volkov and J. Demmel, Benchmarking GPUs to tune dense linear algebra, 2008 SC, International Conference for High Performance Computing, Networking, Storage and Analysis, pp.8-31, 2008.
DOI : 10.1109/SC.2008.5214359

S. Williams, A. Waterman, and D. Patterson, Roofline, Communications of the ACM, vol.52, issue.4, 2009.
DOI : 10.1145/1498765.1498785

W. Wu, A. Bouteiller, G. Bosilca, M. Faverge, and J. Dongarra, Hierarchical DAG Scheduling for Hybrid Distributed Systems, 2015 IEEE International Parallel and Distributed Processing Symposium, pp.156-165, 2015.
DOI : 10.1109/IPDPS.2015.56

URL : https://hal.archives-ouvertes.fr/hal-01078359

A. Yarkhan, J. Kurzak, and J. Dongarra, QUARK Users' Guide: QUeueing And Runtime for Kernels, 2011.

C. D. Yu, W. Wang, and D. Pierce, A CPU???GPU hybrid approach for the unsymmetric multifrontal method, Parallel Computing, vol.37, issue.12, pp.759-770, 2011.
DOI : 10.1016/j.parco.2011.09.002

D. Zou, Y. Dou, S. Guo, R. Li, and L. Deng, Supernodal sparse Cholesky factorization on graphics processing units Concurrency and Computation: Practice and Experience 26, 2014.