A comparison of unsupervised curve classification methods for sport training data

Achieving peak performance at a specified time is the primary goal of athletes’ training programs. To optimize performance and reduce the risk of injury, a comprehensive list of training program parameters (e.g. intensity, volume, frequency, distribution, duration and type) requires careful management. This work focuses on clustering of time evolution curves of training measurements. Training data are recorded densely over time. However, duration of follow-up and duration of the seasons vary among subjects. Also, subject-specific variation can induce substantial error. Functional data analysis (FDA) and longitudinal data analysis (LDA) are the main approaches to analyze repeated measures data (in which multiple measurements are made on the same subject across time). Typically, FDA is applied when the data are dense, assumed to be observed in the continuum, and a function of time. LDA is usually applied when data are sparse, possibly with different number of measurements across individuals, and subject to error. We compared several FDA and LDA methods implemented through publicly available R code: k-means based on the standard Euclidian distance, a discrete Fréchet distance [2], and a functional distance [1]; Gaussian mixture model–based clustering for standard [4], longitudinal [5] and functional [3] data; and latent class mixed models [6]. We discuss advantages and limitations including computational and practical aspects. References [1] Febrero-Bande, M. and Oviedo de la Fuente, M. (2012). Statistical computing in functional data analysis: the R package fda.usc. Journal of Statistical Software, 51, 1–28. [2] Genolini, C. and Falissard, B. (2011). Kml : A package to cluster longitudinal data. Computer Methods and Programs in Biomedicine. [3] Jacques, J. and Preda, C. (2013). Funclust: A curves clustering method using functional random variables density approximation. Neurocomputing, 112, 164–171. [4] Lebret, R., Iovleff, S., Langrognet, F., Biernacki, C., Celeux, G., and Govaert, G. (2014). Rmixmod: The R package of the model–based unsupervised, supervised and semi–supervised classification mixmod library. Journal of Statistical Software. [5] McNicholas, P. D. and Murphy, T. B. (2010). Model–based clustering of longitudinal data. Canadian Journal of Statistics, 38, 153–168. [6] Proust-Lima, C., Philipps, V., and Liquet, B. (2015). Estimation of extended mixed models using latent classes and latent processes: the R package lcmm. Technical report, University of Bordeaux. arXiv:1503.00890v2.

Mots clés

Functional data analysis Longitudinal data analysis Sport science data

Domaines

Machine Learning [stat.ML] Méthodologie [stat.ME] Calcul [stat.CO] Applications [stat.AP] Apprentissage [cs.LG] Santé publique et épidémiologie

Marta Avalos : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01396366

Soumis le : lundi 14 novembre 2016-13:13:17

Dernière modification le : mercredi 19 juillet 2023-11:04:04

Dates et versions

hal-01396366 , version 1 (14-11-2016)

Identifiants

HAL Id : hal-01396366 , version 1

Citer

Gaëlle Lefort, Marta Fernandez Avalos, Perrine Soret, Pyne David, Jean-François Toussaint, et al.. A comparison of unsupervised curve classification methods for sport training data. 3rd conference of the International Society for Non-Parametric Statistics (ISNPS), The International Society for NonParametric Statistics (ISNPS), Jun 2016, Avignon, France. ⟨hal-01396366⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

GENES INRIA ENSAI UPEC INRIA2 INSEP USPC UP-SANTE U1219

309 Consultations

0 Téléchargements