Toxic Text Detection In Vietnamese Language

Authors

  • Dong Tran Trường Đại học Sư phạm Kỹ thuật Tp.HCM, Việt Nam
  • Minh Phuoc Huynh Trường Đại học Sư phạm Kỹ thuật Tp.HCM, Việt Nam
  • Mai Thanh Nhat Van Trường Đại học Sư phạm Kỹ thuật Tp.HCM, Việt Nam
  • Nhat Quang Tran HCMUTE
  • Minh Tan Le https://orcid.org/0009-0004-4912-6795

Corressponding author's email:

tanlm@hcmute.edu.vn

DOI:

https://doi.org/10.54644/jte.2024.1528

Keywords:

Machine Learning, Natural Language Processing, Text Classification, Long Short-Term Memory, Gated Recurrent Unit

Abstract

The rapid growth of online platforms in recent years, such as social In recent years, the online world has seen an explosion of platforms for communication and sharing. Social networks, forums, and countless websites have created a vast and diverse online landscape. This abundance of content, while exciting, has also introduced new challenges, particularly when it comes to protecting children. The ease of access to the internet can expose them to potential risks, such as encountering toxic language and online bullying. Traditional methods of mitigation, like blocking connections or restricting screen time, can be cumbersome and may not be entirely effective. This paper proposes a novel solution that leverages the power of deep learning. By training deep learning models to identify malicious phrases, our models can recognize various forms of inappropriate language, including both sensitive words and seemingly harmless words used with harmful intent. This intelligent filtering system can be implemented on both the server-side and client-side of online platforms, offering a robust layer of protection for users as they navigate the digital world.

Downloads: 0

Download data is not yet available.

Author Biographies

Dong Tran, Trường Đại học Sư phạm Kỹ thuật Tp.HCM, Việt Nam

Trần Đông is pursuing a degree in Information Technology at the Ho Chi Minh City University of Technology and Education (HCMUTE). Email: 20133035@student.hcmute.edu.vn

Minh Phuoc Huynh , Trường Đại học Sư phạm Kỹ thuật Tp.HCM, Việt Nam

Huỳnh Minh Phước is pursuing a degree in Information Technology at the Ho Chi Minh City University of Technology and Education (HCMUTE). Email: 20133082@student.hcmute.edu.vn

Mai Thanh Nhat Van , Trường Đại học Sư phạm Kỹ thuật Tp.HCM, Việt Nam

Văn Mai Thanh Nhật is pursuing a degree in Information Technology at the Ho Chi Minh City University of Technology and Education (HCMUTE). Email: 20133076@student.hcmute.edu.vn.

Nhat Quang Tran, HCMUTE

Trần Nhật Quang completed his PhD in Computer Science at Curtin University (Australia). He is currently focusing on applying artificial intelligence to real-world problems, such as predicting agricultural prices to support crop planning and developing systems to support people with hearing and visual impairments. Email: quangtn@hcmute.edu.vn

Minh Tan Le

Lê Minh Tân graduated from Ho Chi Minh city University of Technology and Education with a bachelor’s degree in Information Technology in 2019 and a master’s degree in Computer Science in 2021.

Artificial Intelligence, Machine Learning, Deep Learning, Computer Vision

Email: tanlm@hcmute.edu.vn.

ORCID:  https://orcid.org/0009-0004-4912-6795

References

"Vietnam 'in' top 5 countries with poor online behavior." https://vtc.vn/viet-nam-lot-top-5-ung-xu-kem-van-minh-tren-internet-ar529256.html, 2020. Accessed: Dec. 21, 2023.

C. C. Aggarwal, "Neural networks and deep learning: A textbook." Cham, Switzerland: Springer Nature, Jun. 2023. DOI: https://doi.org/10.1007/978-3-031-29642-0

C. C. Aggarwal, "Machine learning for text." Cham, Switzerland: Springer Nature, May 2022. DOI: https://doi.org/10.1007/978-3-030-96623-2

S. A. Amidi, "Recurrent neural networks cheatsheet." https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks, 2019. Accessed: Dec. 13, 2023.

"What is word embedding? Why is it important?." https://trituenhantao.io/kien-thuc/word-embedding-la-gi-tai-sao-no-quan-trong/, 2019. Accessed: Dec. 13, 2023.

B. Q. Manh, "Viblo - word embedding - understanding basic concepts in NLP." https://viblo.asia/p/word-embedding-tim-hieu-khai-niem-co-ban-trong-nlp-1Je5E93G5nL, 2020. Accessed: Dec. 13, 2023.

V. A. Krithika, "Introduction to fasttext embeddings and its implication." https://www.analyticsvidhya.com/blog/2023/01/introduction-to-fasttext-embeddings-and-its-implication/, 2023. Accessed: Dec. 13, 2023.

"Toxic comment classification challenge." https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/data, 2018. Accessed: Dec. 21, 2023.

K. Dubey, R. Nair, M. U. Khan, and S. Shaikh, "Toxic comment detection using LSTM," in Proc. 2020 Third International Conference on Advances in Electronics, Computers and Communications (ICAECC), Dec. 2020, doi:10.1109/icaecc50550.2020.9339521. DOI: https://doi.org/10.1109/ICAECC50550.2020.9339521

A. K. Bala, "Toxic comments identification and classification using Deep Neural Networks," Academia.edu, https://www.academia.edu/41458366/Toxic_Comments_Identification_and_Classification_Using_Deep_Neural_Networks. Accessed: Dec. 21, 2023.

R. Sharma and M. Patel, "Toxic comment classification using neural networks and machine learning," IARJSET, vol. 5, no. 9, pp. 47–52, Sep. 2018, doi:10.17148/iarjset.2018.597. DOI: https://doi.org/10.17148/IARJSET.2018.597

V. M. Krešňáková, M. Sarnovský, P. Butka, and K. Machová, "Comparison of deep learning models and various text pre-processing techniques for the toxic comments classification," Applied Sciences, vol. 10, no. 23, p. 8631, Dec. 2020, doi:10.3390/app10238631. DOI: https://doi.org/10.3390/app10238631

Published

28-04-2024

How to Cite

[1]
Trần, Huỳnh Minh, Văn Mai Thanh, Trần Nhật, and Lê Minh, “Toxic Text Detection In Vietnamese Language”, JTE, vol. 19, no. 02, pp. 46–57, Apr. 2024.

Issue

Section

Research Article

Categories

Most read articles by the same author(s)