Idioma:

KMSAV: Korean multi‐speaker spontaneous audiovisual dataset

Park, Kiyoung ; Oh, Changhan ; Dong, Sunghee

ETRI Journal, 2024, 46(1), , pp.71-81 [Periódico revisado por pares]

Electronics and Telecommunications Research Institute (ETRI)

Texto completo disponível

Citações Citado por

Enviar para

Título:
KMSAV: Korean multi‐speaker spontaneous audiovisual dataset
Autor: Park, Kiyoung ; Oh, Changhan ; Dong, Sunghee
Assuntos: audiovisual data ; dataset ; multi-speaker spontaneous data ; multimodal data ; speech recognition ; 전자/정보통신공학
É parte de: ETRI Journal, 2024, 46(1), , pp.71-81
Notas: Funding information
This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (2022‐0‐00989,Development of Artificial Intelligence Technology for Multi‐speaker Dialog Modeling).
KISTI1.1003/JNL.JAKO202450348456438
https://doi.org/10.4218/etrij.2023-0352
Descrição: Recent advances in deep learning for speech and visual recognition have accelerated the development of multimodal speech recognition, yielding many innovative results. We introduce a Korean audiovisual speech recognition corpus. This dataset comprises approximately 150 h of manually transcribed and annotated audiovisual data supplemented with additional 2000 h of untranscribed videos collected from YouTube under the Creative Commons License. The dataset is intended to be freely accessible for unrestricted research purposes. Along with the corpus, we propose an open‐source framework for automatic speech recognition (ASR) and audiovisual speech recognition (AVSR). We validate the effectiveness of the corpus with evaluations using state‐of‐the‐art ASR and AVSR techniques, capitalizing on both pretrained models and fine‐tuning processes. After fine‐tuning, ASR and AVSR achieve character error rates of 11.1% and 18.9%, respectively. This error difference highlights the need for improvement in AVSR techniques. We expect that our corpus will be an instrumental resource to support improvements in AVSR.
Editor: Electronics and Telecommunications Research Institute (ETRI)
Idioma: Inglês;Coreano

Links

Access content in National Research Foundation of Korea (NRF)

Voltar para lista de resultados

Realização: Logos de Redes Sociais:

KMSAV: Korean multi‐speaker spontaneous audiovisual dataset

Park, Kiyoung ; Oh, Changhan ; Dong, Sunghee

Electronics and Telecommunications Research Institute (ETRI)

Buscando em bases de dados remotas. Favor aguardar.