Incremental Embedding Within a Dissimilarity-Based Framework
Résumé
Numerical representations of objects through vectors or matrices can be combined with numerous machine learning methods. However, this kind of representation does not allow to readily encoding both the objects and their relationships. Conversely, structural pattern recognition methods based on strings or graphs provide a natural encoding of objects' relationships but can usually be combined only with a few set of machine learning methods. This last decade has seen majors advancements aiming to link these two fields. The two majors research fields in this direction concern the design of new graph and string kernels and different explicit embedding schemes of structural data. Explicit embedding of structural data can be combined with any machine learning methods. Dissimilarity representation methods are important because they allow an explicit embedding and the connection with the kernel framework. However these methods requires the whole universe to be known during the learning phase and to obtain a Euclidean embedding the matrix of dissimilarity encoding dissimilarities between any pair of objects should be regularized. This last point somehow violates the usual separation between training and test sets since both sets should be jointly processed and is an important limitation in many practical applications where the test set is unbounded and unknown during the learning phase. Moreover, requiring the whole universe represents a bottleneck for the processing of massive dataset. In this paper, we propose to overcome these limitations following an incremental embedding based on dissimilarity representations. We study in this paper, the pros and cons of two methods, which allow computing implicitly, and separately the embedding of points in the test set and in the learning set. Conclusions are set following experiments performed on different datasets.