Automatic discovery of topics and acoustic morphemes from speech

Christophe Cerisara

doi:10.1016/j.csl.2008.06.004

Article Dans Une Revue Computer Speech and Language Année : 2009

Automatic discovery of topics and acoustic morphemes from speech

(1)

Christophe Cerisara

Fonction : Auteur
PersonId : 2353
IdHAL : christophe-cerisara
IdRef : 102700168

Analysis, perception and recognition of speech

Résumé

This work deals with automatic lexical acquisition and topic discovery from a speech stream. The proposed algorithm builds a lexicon enriched with topic information in three steps: transcription of an audio stream into phone sequences with a speaker- and task-independent phone recogniser, automatic lexical acquisition based on approximate string matching, and hierarchical topic clustering of the lexical entries based on a knowledge-poor co-occurrence approach. The resulting semantic lexicon is then used to automatically cluster the incoming speech stream into topics. The main advantages of this algorithm are its very low computational requirements and its independence to pre-defined linguistic resources, which makes it easy to port to new languages and to adapt to new tasks. It is evaluated both qualitatively and quantitatively on two corpora and on two tasks related to topic clustering. The results of these evaluations are encouraging and outline future directions of research for the proposed algorithm, such as building automatic orthographic labels of the lexical items.

Mots clés

speech processing Topic clustering Lexical acquisition

Domaines

Apprentissage [cs.LG]

Christophe Cerisara : Connectez-vous pour contacter le contributeur

https://inria.hal.science/inria-00330698

Soumis le : mercredi 15 octobre 2008-11:40:46

Dernière modification le : vendredi 24 mars 2023-14:52:51

Dates et versions

inria-00330698 , version 1 (15-10-2008)

Identifiants

HAL Id : inria-00330698 , version 1
DOI : 10.1016/j.csl.2008.06.004

Citer

Christophe Cerisara. Automatic discovery of topics and acoustic morphemes from speech. Computer Speech and Language, 2009, 23 (2), pp.220-239. ⟨10.1016/j.csl.2008.06.004⟩. ⟨inria-00330698⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE INRIA2 LORIA

107 Consultations

0 Téléchargements

Automatic discovery of topics and acoustic morphemes from speech

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager