Đánh giá các hệ thống nhận dạng giọng nói tiếng việt (vais, viettel, zalo, fpt và google) trong bản tin

Nguyen Thi My Thanh; Phan Xuan Dung; Nguyen Ngoc Hay; Le Ngoc Bich; Dao Xuan Quy

doi:10.54644/jte.63.2021.46

Authors

Nguyen Thi My Thanh Eastern International University, Vietnam
Phan Xuan Dung Eastern International University, Vietnam
Nguyen Ngoc Hay Eastern International University, Vietnam
Le Ngoc Bich Eastern International University, Vietnam
Dao Xuan Quy Eastern International University, Vietnam

Corressponding author's email:

quy.dao@eiu.edu.vn

DOI:

https://doi.org/10.54644/jte.63.2021.46

Keywords:

Natural language processing, Speech recognition, WER, News, Api

Abstract

This article introduces an evaluation of Vietnamese Automatic Speech Recognition (VASR) in the news domain from top Vietnamese speech recognition companies such as Vais, Viettel, Zalo, Fpt and top world company such as Google. To evaluate speech recognition systems, Word Error Rate (WER) coefficient with recognized text inputs from Vais VASP, Viettel VASP, Zalo VASR, Fpt VASP and Google VASP platforms were utilized. The recognized texts were acquired by using audio files in the news domain and APIs from Vais VASP, Viettel VASP, Zalo VASR, Fpt VASP and Google VASP platforms to convert from speech to text. The evaluation results obtained from WER which was applied for Vais, Viettel, Zalo, Fpt and Google, show that VASP from Viettel, Zalo, FPT and Google are adequate in which Vais is superior.

Downloads: 0

Download data is not yet available.

References

V. Këpuska and G. Bohouta, Comparing speech recognition systems (Microsoft API, Google API and CMU Sphinx), Int. J. Eng. Res. Appl, 7(03), pp. 20-24. 2017. DOI: https://doi.org/10.9790/9622-0703022024

F. Filippidou and L. Moussiades, Α Benchmarking of IBM, Google and Wit Automatic Speech Recognition Systems, IFIP International Conference on Artificial Intelligence Applications and Innovations, pp. 73-82, 2020. DOI: https://doi.org/10.1007/978-3-030-49161-1_7

L.C. Mai and D.Q. Truong, Report on the Speech-to-Text Shared Task in VLSP Campaign 2019, Vietnamese Language Signal Processing, 2019. (https://vlsp.org.vn/sites/default/files/2019-10/VLSP2019-ASR-summary.pdf )

A. C. Morris, V. Maier and P. Green, From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition, Eighth International Conference on Spoken Language Processing, pp. 2786-2768, 2004. DOI: https://doi.org/10.21437/Interspeech.2004-668

Jitsi, JiWER: Similarity measures for automatic speech recognition evaluation. https://github.com/jitsi/jiwer

Giải vô địch quốc gia trở lại với những trận đấu đầy sôi động – VTV24 https://youtu.be/N2FfBEWO84A

Ngôi làng của những đầu sư tử thổi nữa – VTV24 https://youtu.be/YZc5TiXi_DE

Thiệt hại ban đầu do bão số 5 tại Huế - VTV Go https://youtu.be/kqnmPdwk62A

Phản ứng của Quốc tế trước thông tin Tổng thống Mỹ mắc covid-19 – HTV tin tức https://youtu.be/k6OTsmpKtbc

Ông Trump mắc covid-19-Chiến dịch tranh cử Tổng thống Mỹ có thể vỡ trận – VTC Now https://youtu.be/QehJIcATgH8

Evaluation of vietnamese speech recognition platforms (vais, viettel, zalo, fpt and google) in news

Authors

Corressponding author's email:

DOI:

Keywords:

Abstract

Downloads: 0

References

Downloads

Published

How to Cite

Issue

Section

Categories

License

Most read articles by the same author(s)

Make a Submission

Announcements

Journal Score Upgraded in Several Disciplines by the State Council for Professorship

Announcement on the Change in Publication Schedule of JTE

Call for Papers: Special Issue on Information Technology

Language

Information

Connections

Keywords

Visitors

Current Issue