Create the Video Subtitles Based on Voice Recognition Technology: Test for Some Programs at VTV
Corressponding author's email:
phongsolo@gmail.comDOI:
https://doi.org/10.54644/jte.71B.2022.1128Keywords:
STT, WER, VOD, OTT, CCAbstract
This paper presents the trial results of Speech-To-Text (STT) recognition tool for VOD (Video On Demand) contents of the VTVgo system at Vietnam Television. In order to evaluate the accuracy of the STT tool, the word error rate (WER) was used to measuring the performance of the automatic speech recognition, the machine translation system. Test results of 10 different types of TV show with 1065 video hours were analyzed. The WER had achieved low level from 2.8% to 4.3% with some genres of news, 19h, weather forecasts, where the majority of speakers, presenters (MC) read standard voices in the Studio. The dialogue from a speaker, less interference from outside noise. Besides, to illustrating the video subtitle application, we had conducted the test on the VTVgo system, integrated the optional subtitle display tool into the VTVgo app. The test Android platform was Smart TV and SmartPhone, to demonstrating the ability to apply video subtitles on the OTT (Over The Top) - the digital content distribution platform.
Downloads: 0
References
G. Galvez, "Closed Captioning and Subtitling for Social Media," in SMPTE 2017 Annual Technical Conference and Exhibition, 2017. DOI: https://doi.org/10.5594/M001804
C. J. Hughes and M. Armstrong, "Automatic retrieval of closed captions for web clips from broadcast TV content," in National Association of Broadcasters Conference, 2015, pp. 318-324.
A. Lambourne, J. Hewitt, C. Lyon, and S. J. I. J. o. S. T. Warren, "Speech-based real-time subtitling services," vol. 7, no. 4, pp. 269-279, 2004. DOI: https://doi.org/10.1023/B:IJST.0000037071.39044.cc
N. Nitta and N. Babaguchi, "Automatic Story Segmentation of Closed-Caption Text for Semantic Content Analysis of Broadcasted Sports Video," in Multimedia information systems, 2002, pp. 110-116.
T. Imai, S. Homma, A. Kobayashi, T. Oku, and S. Sato, "Speech recognition with a seamlessly updated language model for real-time closed-captioning," in Eleventh Annual Conference of the International Speech Communication Association, 2010. DOI: https://doi.org/10.21437/Interspeech.2010-106
M. J. S. M. I. J. Armstrong, "Automatic recovery and verification of subtitles for large collections of video clips," vol. 126, no. 8, pp. 1-7, 2017. DOI: https://doi.org/10.5594/JMI.2017.2732858
P. Bell et al., "The MGB challenge: Evaluating multi-genre broadcast media recognition," in 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2015, pp. 687-693: IEEE. DOI: https://doi.org/10.1109/ASRU.2015.7404863
IBM, "AI Closed Captioning Services for Local and State Governments," vol. 2018, pp. 1-7
E. Costa-Montenegro, F. M. García-Doval, J. Juncal-Martínez, and B. J. U. A. i. t. I. S. Barragáns-Martínez, "SubTitleMe, subtitles in cinemas in mobile devices," vol. 15, no. 3, pp. 461-472, 2016. DOI: https://doi.org/10.1007/s10209-015-0420-5
M. Montagud, F. Boronat, J. Pastor, D. J. M. T. Marfil, and Applications, "Web-based platform for a customizable and synchronized presentation of subtitles in single-and multi-screen scenarios," vol. 79, pp. 21889-21923, 2020. DOI: https://doi.org/10.1007/s11042-020-08955-x
K. J. C. Ellis, Politics and Culture, "Netflix closed captions offer an accessible model for the streaming video industry, but what about audio description?," vol. 47, no. 3, pp. 3-20, 2015.
L. N. Y. Tirumala, "Captioning Social Media Video," Public Relations Education vol. 7, no. 1, pp. 169-187, 2021.
E. B. Marrese-Taylor, Jorge A Matsuo, Yutaka, "Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN," arXiv:02420, 2017. DOI: https://doi.org/10.18653/v1/W17-5213
P. J. L. Romero-Fresco and Communication, "Accessing communication: The quality of live subtitles in the UK," vol. 49, pp. 56-69, 2016. DOI: https://doi.org/10.1016/j.langcom.2016.06.001
J. Jarmulak, "Speech-to-Text Accuracy Benchmark: Word Error Rate for major Speech-to-Text platforms," October 31, 2021.
T. D. Mai Luong, "A Report on the Speech-to-Text Shared Task in VLSP Campaign 2019," presented at the VLSP, 2019.
N. T. M. D. Thanh, Phan Xuan Hay, Nguyen Ngoc Quy, Dao Xuan "Đánh giá các hệ thống nhận dạng giọng nói tiếng việt (vais, viettel, zalo, fpt và google) trong bản tin," Journal of Technical Education Science, no. 63, pp. 28-36, 2021. DOI: https://doi.org/10.54644/jte.63.2021.46
D. C. Tran, D. L. Nguyen, H. S. Ha, and M. F. Hassan, "Speech Recognizing Comparisons Between Web Speech API and FPT. AI API," in Proceedings of the 12th National Technical Seminar on Unmanned System Technology 2020, 2022, pp. 853-865: Springer. DOI: https://doi.org/10.1007/978-981-16-2406-3_64
D. C. Tran, D. L. Nguyen, M. F. J. B. o. E. E. Hassan, and Informatics, "Development and testing of an FPT. AI-based voicebot," vol. 9, no. 6, pp. 2388-2395, 2020. DOI: https://doi.org/10.11591/eei.v9i6.2620
Q. B. Nguyen, B. Q. Dam, and M. H. Le, "Development of a Vietnamese speech recognition system for Viettel call center," in 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA), 2017, pp. 1-5: IEEE. DOI: https://doi.org/10.1109/ICSDA.2017.8384456
Q. T. Do, "VAIS-Speech: An Overview of Automatic Speech Recognition and Text-to-speech Development at VAIS," in VLSP 2018, Ha Noi, Vietnam, 2018.
G. Saon, B. Ramabhadran, and G. Zweig, "On the effect of word error rate on automated quality monitoring," in 2006 IEEE Spoken Language Technology Workshop, 2006, pp. 106-109: IEEE. DOI: https://doi.org/10.1109/SLT.2006.326828
A. Ali and S. Renals, "Word error rate estimation for speech recognition: e-WER," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2018, pp. 20-24. DOI: https://doi.org/10.18653/v1/P18-2004
Github. (2021). Available: https://github.com/belambert/asr-evaluation
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2022 Journal of Technical Education Science

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright © JTE.


