Evaluation of vietnamese speech recognition platforms (vais, viettel, zalo, fpt and google) in news

Authors

  • Nguyen Thi My Thanh Eastern International University, Vietnam
  • Phan Xuan Dung Eastern International University, Vietnam
  • Nguyen Ngoc Hay Eastern International University, Vietnam
  • Le Ngoc Bich Eastern International University, Vietnam
  • Dao Xuan Quy Eastern International University, Vietnam

Corressponding author's email:

quy.dao@eiu.edu.vn

DOI:

https://doi.org/10.54644/jte.63.2021.46

Keywords:

Natural language processing, Speech recognition, WER, News, Api

Abstract

This article introduces an evaluation of Vietnamese Automatic Speech Recognition (VASR) in the news domain from top Vietnamese speech recognition companies such as Vais, Viettel, Zalo, Fpt and top world company such as Google. To evaluate speech recognition systems, Word Error Rate (WER) coefficient with recognized text inputs from Vais VASP, Viettel VASP, Zalo VASR, Fpt VASP and Google VASP platforms were utilized. The recognized texts were acquired by using audio files in the news domain and APIs from Vais VASP, Viettel VASP, Zalo VASR, Fpt VASP and Google VASP platforms to convert from speech to text. The evaluation results obtained from WER which was applied for Vais, Viettel, Zalo, Fpt and Google, show that VASP from Viettel, Zalo, FPT and Google are adequate in which Vais is superior.

Downloads: 0

Download data is not yet available.

References

V. Këpuska and G. Bohouta, Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx), Int. J. Eng. Res. Appl, 7(03), pp. 20-24. 2017. DOI: https://doi.org/10.9790/9622-0703022024

F. Filippidou and L. Moussiades, Α Benchmarking of IBM, Google and Wit Automatic Speech Recognition Systems, IFIP International Conference on Artificial Intelligence Applications and Innovations, pp. 73-82, 2020. DOI: https://doi.org/10.1007/978-3-030-49161-1_7

L.C. Mai and D.Q. Truong, Report on the Speech-to-Text Shared Task in VLSP Campaign 2019, Vietnamese Language Signal Processing, 2019. (https://vlsp.org.vn/sites/default/files/2019-10/VLSP2019-ASR-summary.pdf )

A. C. Morris, V. Maier and P. Green, From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition, Eighth International Conference on Spoken Language Processing, pp. 2786-2768, 2004. DOI: https://doi.org/10.21437/Interspeech.2004-668

Jitsi, JiWER: Similarity measures for automatic speech recognition evaluation. https://github.com/jitsi/jiwer

Giải vô địch quốc gia trở lại với những trận đấu đầy sôi động – VTV24 https://youtu.be/N2FfBEWO84A

Ngôi làng của những đầu sư tử thổi nữa – VTV24 https://youtu.be/YZc5TiXi_DE

Thiệt hại ban đầu do bão số 5 tại Huế - VTV Go https://youtu.be/kqnmPdwk62A

Phản ứng của Quốc tế trước thông tin Tổng thống Mỹ mắc covid-19 – HTV tin tức https://youtu.be/k6OTsmpKtbc

Ông Trump mắc covid-19-Chiến dịch tranh cử Tổng thống Mỹ có thể vỡ trận – VTC Now https://youtu.be/QehJIcATgH8

Published

29-04-2021

How to Cite

[1]
Nguyễn Thị Mỹ Thanh, Phan Xuân Dũng, Nguyễn Ngọc Hay, Lê Ngọc Bích, and Đào Xuân Quy, “Evaluation of vietnamese speech recognition platforms (vais, viettel, zalo, fpt and google) in news”, JTE, vol. 16, no. 2, pp. 28–36, Apr. 2021.

Issue

Section

Research Article

Categories