Evaluation of vietnamese speech recognition platforms (vais, viettel, zalo, fpt and google) in news
Corressponding author's email:
quy.dao@eiu.edu.vnDOI:
https://doi.org/10.54644/jte.63.2021.46Keywords:
Natural language processing, Speech recognition, WER, News, ApiAbstract
This article introduces an evaluation of Vietnamese Automatic Speech Recognition (VASR) in the news domain from top Vietnamese speech recognition companies such as Vais, Viettel, Zalo, Fpt and top world company such as Google. To evaluate speech recognition systems, Word Error Rate (WER) coefficient with recognized text inputs from Vais VASP, Viettel VASP, Zalo VASR, Fpt VASP and Google VASP platforms were utilized. The recognized texts were acquired by using audio files in the news domain and APIs from Vais VASP, Viettel VASP, Zalo VASR, Fpt VASP and Google VASP platforms to convert from speech to text. The evaluation results obtained from WER which was applied for Vais, Viettel, Zalo, Fpt and Google, show that VASP from Viettel, Zalo, FPT and Google are adequate in which Vais is superior.
Downloads: 0
References
V. Këpuska and G. Bohouta, Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx), Int. J. Eng. Res. Appl, 7(03), pp. 20-24. 2017. DOI: https://doi.org/10.9790/9622-0703022024
F. Filippidou and L. Moussiades, Α Benchmarking of IBM, Google and Wit Automatic Speech Recognition Systems, IFIP International Conference on Artificial Intelligence Applications and Innovations, pp. 73-82, 2020. DOI: https://doi.org/10.1007/978-3-030-49161-1_7
L.C. Mai and D.Q. Truong, Report on the Speech-to-Text Shared Task in VLSP Campaign 2019, Vietnamese Language Signal Processing, 2019. (https://vlsp.org.vn/sites/default/files/2019-10/VLSP2019-ASR-summary.pdf )
A. C. Morris, V. Maier and P. Green, From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition, Eighth International Conference on Spoken Language Processing, pp. 2786-2768, 2004. DOI: https://doi.org/10.21437/Interspeech.2004-668
Jitsi, JiWER: Similarity measures for automatic speech recognition evaluation. https://github.com/jitsi/jiwer
Giải vô địch quốc gia trở lại với những trận đấu đầy sôi động – VTV24 https://youtu.be/N2FfBEWO84A
Ngôi làng của những đầu sư tử thổi nữa – VTV24 https://youtu.be/YZc5TiXi_DE
Thiệt hại ban đầu do bão số 5 tại Huế - VTV Go https://youtu.be/kqnmPdwk62A
Phản ứng của Quốc tế trước thông tin Tổng thống Mỹ mắc covid-19 – HTV tin tức https://youtu.be/k6OTsmpKtbc
Ông Trump mắc covid-19-Chiến dịch tranh cử Tổng thống Mỹ có thể vỡ trận – VTC Now https://youtu.be/QehJIcATgH8
Downloads
Published
How to Cite
Issue
Section
Categories
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright © JTE.


