Different word representations and their combination for proper name retrieval from diachronic documents

Irina Illina; Dominique Fohr

Communication Dans Un Congrès Année : 2015

Different word representations and their combination for proper name retrieval from diachronic documents

(1) , (1)

Irina Illina

Fonction : Auteur
PersonId : 15663
IdHAL : irina-illina
IdRef : 120731746

Speech Modeling for Facilitating Oral-Based Communication

Dominique Fohr

Fonction : Auteur
PersonId : 15652
IdHAL : dominique-fohr
IdRef : 031092942

Speech Modeling for Facilitating Oral-Based Communication

Résumé

This paper deals with the problem of high-quality transcription systems for very large vocabulary automatic speech recognition (ASR). We investigate the problem of automatic retrieval of out-of-vocabulary (OOV) proper names (PNs). We want to take into account the temporal, syntactic and semantic context of words. Nowadays, Artificial Neural Networks (NN) are widely used in natural language processing: continuous space representations of words is learned automatically from unstructured text data. To model the latent topics at document level, Latent Dirichlet Allocation (LDA) has been successful. In this paper, we propose OOV PN retrieval using (1) temporal versus topic context modeling; (2) different word representation spaces for word-level and document-level context modeling; (3) combinations of retrieval results. Experimental evaluation on broadcast news data shows that the proposed method combinations lead to better results. This confirms the complementarity of methods.

Mots clés

speech recognition neural networks LDA vocabulary extension out-of-vocabulary words proper names

Domaines

Interface homme-machine [cs.HC]

Dominique Fohr : Connectez-vous pour contacter le contributeur

https://inria.hal.science/hal-01201533

Soumis le : jeudi 17 septembre 2015-14:58:55

Dernière modification le : lundi 11 septembre 2023-17:41:19

Dates et versions

hal-01201533 , version 1 (17-09-2015)

Identifiants

HAL Id : hal-01201533 , version 1

Citer

Irina Illina, Dominique Fohr. Different word representations and their combination for proper name retrieval from diachronic documents. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015) , Dec 2015, Scottsdale, United States. ⟨hal-01201533⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA LORIA-NLPKD ANR

204 Consultations

0 Téléchargements

Different word representations and their combination for proper name retrieval from diachronic documents

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager