Scalable algorithms for large-scale machine learning problems : Application to multiclass classification and asynchronous distributed optimization

Bikash Joshi

Thèse Année : 2017

Scalable algorithms for large-scale machine learning problems : Application to multiclass classification and asynchronous distributed optimization

Algorithmes d'apprentissage pour les grandes masses de données : Application à la classification multi-classes et à l'optimisation distribuée asynchrone

(1)

Bikash Joshi

Fonction : Auteur
PersonId : 980312

Laboratoire d'Informatique de Grenoble

Résumé

This thesis focuses on developing scalable algorithms for large scale machine learning. In this work, we present two perspectives to handle large data. First, we consider the problem of large-scale multiclass classification. We introduce the task of multiclass classification and the challenge of classifying with a large number of classes. To alleviate these challenges, we propose an algorithm which reduces the original multiclass problem to an equivalent binary one. Based on this reduction technique, we introduce a scalable method to tackle the multiclass classification problem for very large number of classes and perform detailed theoretical and empirical analyses.In the second part, we discuss the problem of distributed machine learning. In this domain, we introduce an asynchronous framework for performing distributed optimization. We present application of the proposed asynchronous framework on two popular domains: matrix factorization for large-scale recommender systems and large-scale binary classification. In the case of matrix factorization, we perform Stochastic Gradient Descent (SGD) in an asynchronous distributed manner. Whereas, in the case of large-scale binary classification we use a variant of SGD which uses variance reduction technique, SVRG as our optimization algorithm.

L'objectif de cette thèse est de développer des algorithmes d'apprentissage adaptés aux grandes masses de données. Dans un premier temps, nous considérons le problème de la classification avec un grand nombre de classes. Afin d'obtenir un algorithme adapté à la grande dimension, nous proposons un algorithme qui transforme le problème multi-classes en un problème de classification binaire que nous sous-échantillonnons de manière drastique. Afin de valider cette méthode, nous fournissons une analyse théorique et expérimentale détaillée.Dans la seconde partie, nous approchons le problème de l'apprentissage sur données distribuées en introduisant un cadre asynchrone pour le traitement des données. Nous appliquons ce cadre à deux applications phares : la factorisation de matrice pour les systèmes de recommandation en grande dimension et la classification binaire.

Mots clés

Machine learning Collaborative filtering Distributed Framework

Apprentissage machine Filtrage collaboratif Cadre distribué

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

JOSHI_2017_archivage.pdf (2.18 Mo)

Origine : Version validée par le jury (STAR)

ABES STAR : Contact

https://theses.hal.science/tel-02402056

Soumis le : mardi 10 décembre 2019-12:07:07

Dernière modification le : jeudi 4 avril 2024-21:28:07

Archivage à long terme le : mercredi 11 mars 2020-21:58:40

Dates et versions

tel-02402056 , version 1 (10-12-2019)

Identifiants

HAL Id : tel-02402056 , version 1

Citer

Bikash Joshi. Scalable algorithms for large-scale machine learning problems : Application to multiclass classification and asynchronous distributed optimization. Artificial Intelligence [cs.AI]. Université Grenoble Alpes, 2017. English. ⟨NNT : 2017GREAM046⟩. ⟨tel-02402056⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UGA CNRS LIG STAR PERSYVAL-LAB LIG_SIDCH

170 Consultations

116 Téléchargements

Scalable algorithms for large-scale machine learning problems : Application to multiclass classification and asynchronous distributed optimization

Algorithmes d'apprentissage pour les grandes masses de données : Application à la classification multi-classes et à l'optimisation distribuée asynchrone

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager