Sparsity regularization and graph-based representation in medical imaging

Katerina Gkirtzou

Résumé

Medical images have been widely used in modern medicine to depict the anatomy or function for both clinical purposes and for studying normal anatomy. Analyzing medical images efficiently and with high accuracy is a crucial step. The high-dimensionality and the non-linear nature of medical imaging data makes their analysis a difficult and challenging problem. In this thesis, we address the medical image analysis from the viewpoint of statistical learning theory and we concentrate especially on the use of regularization methods and graph representation and comparison. First, we approach the problem of graph representation and comparison for analyzing medical images. Graphs are a commonly used technique to represent data with inherited structure. Exploiting these data, requires the ability to efficiently compare and represent graphs. Unfortunately, standard solutions to these problems are either NP-hard, hard to parametrize and adapt to the problem at hand or not expressive enough. Graph kernels, which have been introduced in the machine learning community the last decade, are a promising solution to the aforementioned problems. Despite the significant progress in the design and improvement of graph kernels in the past few years, existing graph kernels focus on either unlabeled or discretely labeled graphs, while efficient and expressive representation and comparison of graphs with complex labels, such as real numbers and high-dimensional vectors, remains an open research problem. We introduce a novel method, the pyramid quantized Weisfeiler-Lehman graph representation to tackle the graph comparison and representation problem for continuous vector labeled graphs. Our algorithm considers statistics of subtree patterns based on the Weisfeiler-Lehman algorithm and uses a pyramid quantization strategy to determine a logarithmic number of discrete labellings. As a result, we approximate a graph representation with continuous or vector valued labels as a sequence of graphs discrete labels with increasing granularity. We evaluate our proposed algorithm on two different tasks with real datasets, on a fMRI analysis task and on the generic problem of 3D shape classification. Second, we examine different regularization methods for analyzing medical images, and more specifically MRI data. Regularization methods are a powerful tool for improving the predicted performance and avoid overfitting by introducing additional information to an ill-posed problem, such as the analysis of medical images. Towards this direction, we introduce a novel regularization method, the k-support regularized Support Vector Machine. This algorithm extends the l1 regularized SVM to a mixed norm of both l1 and l2 norms. This enables the use of a correlated sparsity regularization with the power of the SVM framework. We evaluate our novel algorithm in a euromuscular disease classification task using MRI-based markers. We furthermore explore the importance of diffusion tensor imaging for the discrimination between neuromuscular conditions. Overall, as graphs are fundamental mathematical objects and regularization methods are widely used to control ill-pose problems, both the pyramid quantized Weisfeiler-Lehman graph representation and the k-support regularized SVM are potentially applicable to a wide range of applications domains in computer vision, analysis of medical images and data mining.

Les images médicales ont largement utilisées en médicine moderne afin de représenter l'anatomie ou les fonctions, à la fois dans un objectif cliniques ou d'etude de l'anatomie normale. L'analyse efficace et précise d' images médicales est une étape critique. La dimensionnalité élévée et le caractére non-linéaire des données d'imagerie médicale rendent leur analyse difficile. Dans cette thèse, nous nous intéressons à l'analyse d'images médicales du point de vue de la théorie statistique de l'appretissage et nous concentrons spécialement sur l'utilisation de méthodes de régularisation et de la représentation et comparaison des graphes. Tout d'abord, nous nous intéressons un problème de représentation et comparaison des graphes pour l'analyse des images médicales et de façon plus générale. Les graphes sont une technique largement utilisée pour la représentation des données ayant une structure héritée. L'exploitation des ces données nécessite la capacité de comparer et représenter efficacement des graphes. Malheureusement, les solutions usuelles à ces problèmes sont soit NP-complets, difficiles à parametrer et à adapter au problème donnée, soit insuffisamment expressives. Les noyaux sur graphes, introduits à la communauté de l'apprentissage statistique au cours de la dernière décennie, offrent une solution promettante aux problèmes mentionnés ci-dessus. Malgré le progrès significatif dans le domaine de la conception et amélioration des noyaux sur graphes au cours des dernières années, les noyaux sur graphes existants se concentrent à des graphes non-labellisés ou labellisés de façon dicrète, tandis que la représentation et comparaison efficaces et expressives de graphes avec des labels complexe, comme des nombres réels ou des vecteurs a grande dimension, demeure une problème de recherch ouvert. Nous introduisons une nouvelle méthode, l'algorithme de Weisfeiler-Lehman pyramidal et quantifié (pyramid quantized Weisfeiler-Lehman algorithm), afin d'aborder le problème de la représentation et comparaison des graphes labellisés par des vecteurs continus. Notre algorithme considère les statistiques de motifs sous arbre, basé sur l'algorithme Weisfeiler-Lehman; il utilise une stratégie de quantification pyramidale pour déterminer un nombre logarithmique de labels discrets. Par conséquent, nous approximons une représentation de graphe avec des labels continus ou vecteur, comme une séquence de graphes avec des labels discrèts de plus en plus granulaires. Nous évaluons notre algorithme proposé sur deux tâches différentes et des bases des données réelles :un tâche d'une analyse IRMf et une tâche de problème générique de la classification de formes en trois dimensions. Ensuite, nous examinons différentes méthodes de régularisation pour analyser les images médicales, et plus spécifiquement des données d'IRM. Les méthodes de régularisation sont un outil puissant pour l'amélioration de la performance prédite et pour eviter le sur-apprentissage via l'introduction d'informations additionelles à un problème mal-posé tel que l'analyse d'images médicales. Dans cette direction, nous introduisons une nouvelle méthode de régularisation, la k-support regularized Support Vector Machine (les machines à vecteurs de support régularisées k-support). Cet algorithme étend la SVM régularisée l1 à une norme mixte de toutes les deux normes l1 et l2. Ceci permet l'utilisation d'une régularisation parcimonieuse corrélée à la puissance des SVM. Nous évaluons notre original algorithme sur une tâche de classification de maladies neuromusculaires, en utilisant des marqueurs à base de IRM. Par la suite, nous explorons l'importance de l'imagerie du tenseur de diffusion pour la discrimination entre les conditions neuromusculaires. Globalement, les graphes étant des objets mathématiques fondamentaux et les méthodes de régularisation étant largement utilisées pour contrôler des problèmes mal-posés, l' algorithme de Weisfeiler-Lehman pyramidal et quantifié (pyramid quantized Weisfeiler-Lehman algorithm) et la SVM régularisées k-support (k-support regularized SVM), pourraient bien être appliqués sur un grand éventail d'applications dans les domaines de vision artificielle, l'analyse d'images médicales et l'exploration de données.

Sparsity regularization and graph-based representation in medical imaging

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Partager