skip to main content

Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture

Song, Yu ; Zhou, Qi

Applied artificial intelligence, 2024-12, Vol.38 (1) [Periódico revisado por pares]

Taylor & Francis Group

Texto completo disponível

Citações Citado por
  • Título:
    Bi-Modal Bi-Task Emotion Recognition Based on Transformer Architecture
  • Autor: Song, Yu ; Zhou, Qi
  • É parte de: Applied artificial intelligence, 2024-12, Vol.38 (1)
  • Descrição: ABSTRACTIn the field of emotion recognition, analyzing emotions from speech alone (single-modal speech emotion recognition) has several limitations, including limited data volume and low accuracy. Additionally, single-task models lack generalization and fail to fully utilize relevant information. To address these issues, this paper proposes a new bi-modal bi-task emotion recognition model. The proposed model introduces multi-task learning on the Transformer architecture. On one hand, unsupervised contrastive predictive coding is used to extract denser features from the data while preserving self-information and context-related information. On the other hand, model robustness against interfering information is enhanced by employing self-supervised contrastive learning. Furthermore, the proposed model utilizes a modality fusion module to incorporate textual and audio information to implicitly align features from both modalities. The proposed model achieved accuracy rates of 82.3% and 83.5% on the IEMOCAP and RAVDESS datasets, respectively, when considering weighted accuracy (WA). When weight is not considered (unweighted accuracy (UA)), the model achieved 83.0% and 82.4% accuracy. Compared to the existing methods, the performance is further improved.
  • Editor: Taylor & Francis Group
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.