Structured Mixture of Linear Mappings in High Dimension - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2018

Structured Mixture of Linear Mappings in High Dimension

Résumé

When analyzing data with complex structures such as high dimensionality and non-linearity, one often needs sophisticated models to capture the intrinsic complexity. However, practical implementation using these models could be difficult. Striking a balance between parsimony and model flexibility is essential to tackle data complexity while maintaining feasibility and satisfactory prediction performances. In this work, we proposed the use of Structured Mixture of Gaussian Locally Linear Mapping (SMoGLLiM) when there is a need to use high-dimensional predictors to predict low-dimensional responses and there is a possibility that the underlying associations could be heterogeneous or non-linear. Besides using mixtures of linear associations to approximate non-linear patterns locally and using inverse regression to mitigate the complications due to high-dimensional predictors, SMoGLLiM also aims at achieving robustness by adopting cluster-size constraints and trimming abnormal samples. Its hierarchical structure enables covariance matrices and latent factors being shared across smaller clusters, which effectively reduce the number of parameters. An Expectation-Maximization (EM) algorithm is devised for parameter estimation and, with analytical solutions; the estimation process is computa-tionally efficient. Numerical results obtained from three real-world datasets demonstrate the flexibility and ability of SMoGLLiM in accommodating complex data structure. They include using high-dimensional face images to predict the parameters under which the images were taken, predicting the sucrose levels by the high-dimensional hyperspectral measurements obtained from different types of orange juice and a magnetic resonance vascular fingerprinting (MRvF) study in which researchers are interested at using the so-called MRv fingerprints at voxel level to predict the microvascular properties in brain. The three datasets bear different features and presents different types of challenges. For example , the size of the MRv fingerprint dataset demands special consideration to reduce computational burden. With the hierarchical structure of SMoGLLiM, we are able to adopt parallel computing techniques to reduce the model building time by 97%. These examples illustrate the wide range of applicability of SMoGLLiM on handling different kinds of complex data structure.
Fichier principal
Vignette du fichier
SMoGLLiM_manuscript_20171221.pdf (1.27 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01700053 , version 1 (03-02-2018)

Identifiants

  • HAL Id : hal-01700053 , version 1

Citer

Chun-Chen Tu, Florence Forbes, Benjamin Lemasson, Naisyin Wang. Structured Mixture of Linear Mappings in High Dimension . 2018. ⟨hal-01700053⟩
272 Consultations
98 Téléchargements

Partager

Gmail Facebook X LinkedIn More