Nhận Diện Ngôn Ngữ Độc Hại Tiếng Việt

Dong Tran; Minh Phuoc Huynh; Mai Thanh Nhat Van; Nhat Quang Tran; Minh Tan Le

doi:10.54644/jte.2024.1528

Authors

Dong Tran Trường Đại học Sư phạm Kỹ thuật Tp.HCM, Việt Nam
Minh Phuoc Huynh Trường Đại học Sư phạm Kỹ thuật Tp.HCM, Việt Nam
Mai Thanh Nhat Van Trường Đại học Sư phạm Kỹ thuật Tp.HCM, Việt Nam
Nhat Quang Tran HCMUTE
Minh Tan Le https://orcid.org/0009-0004-4912-6795

Corressponding author's email:

tanlm@hcmute.edu.vn

DOI:

https://doi.org/10.54644/jte.2024.1528

Keywords:

Machine Learning, Natural Language Processing, Text Classification, Long Short-Term Memory, Gated Recurrent Unit

Abstract

The rapid growth of online platforms in recent years, such as social In recent years, the online world has seen an explosion of platforms for communication and sharing. Social networks, forums, and countless websites have created a vast and diverse online landscape. This abundance of content, while exciting, has also introduced new challenges, particularly when it comes to protecting children. The ease of access to the internet can expose them to potential risks, such as encountering toxic language and online bullying. Traditional methods of mitigation, like blocking connections or restricting screen time, can be cumbersome and may not be entirely effective. This paper proposes a novel solution that leverages the power of deep learning. By training deep learning models to identify malicious phrases, our models can recognize various forms of inappropriate language, including both sensitive words and seemingly harmless words used with harmful intent. This intelligent filtering system can be implemented on both the server-side and client-side of online platforms, offering a robust layer of protection for users as they navigate the digital world.

Downloads: 0

Download data is not yet available.

Author Biographies

Dong Tran, Trường Đại học Sư phạm Kỹ thuật Tp.HCM, Việt Nam

Trần Đông is pursuing a degree in Information Technology at the Ho Chi Minh City University of Technology and Education (HCMUTE). Email: 20133035@student.hcmute.edu.vn

Minh Phuoc Huynh , Trường Đại học Sư phạm Kỹ thuật Tp.HCM, Việt Nam

Huỳnh Minh Phước is pursuing a degree in Information Technology at the Ho Chi Minh City University of Technology and Education (HCMUTE). Email: 20133082@student.hcmute.edu.vn

Mai Thanh Nhat Van , Trường Đại học Sư phạm Kỹ thuật Tp.HCM, Việt Nam

Văn Mai Thanh Nhật is pursuing a degree in Information Technology at the Ho Chi Minh City University of Technology and Education (HCMUTE). Email: 20133076@student.hcmute.edu.vn.

Nhat Quang Tran, HCMUTE

Trần Nhật Quang completed his PhD in Computer Science at Curtin University (Australia). He is currently focusing on applying artificial intelligence to real-world problems, such as predicting agricultural prices to support crop planning and developing systems to support people with hearing and visual impairments. Email: quangtn@hcmute.edu.vn

Minh Tan Le

Lê Minh Tân graduated from Ho Chi Minh city University of Technology and Education with a bachelor’s degree in Information Technology in 2019 and a master’s degree in Computer Science in 2021.

Artificial Intelligence, Machine Learning, Deep Learning, Computer Vision

Email: tanlm@hcmute.edu.vn.

ORCID: https://orcid.org/0009-0004-4912-6795

References

"Vietnam 'in' top 5 countries with poor online behavior." https://vtc.vn/viet-nam-lot-top-5-ung-xu-kem-van-minh-tren-internet-ar529256.html, 2020. Accessed: Dec. 21, 2023.

C. C. Aggarwal, "Neural networks and deep learning: A textbook." Cham, Switzerland: Springer Nature, Jun. 2023. DOI: https://doi.org/10.1007/978-3-031-29642-0

C. C. Aggarwal, "Machine learning for text." Cham, Switzerland: Springer Nature, May 2022. DOI: https://doi.org/10.1007/978-3-030-96623-2

S. A. Amidi, "Recurrent neural networks cheatsheet." https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks, 2019. Accessed: Dec. 13, 2023.

"What is word embedding? Why is it important?." https://trituenhantao.io/kien-thuc/word-embedding-la-gi-tai-sao-no-quan-trong/, 2019. Accessed: Dec. 13, 2023.

B. Q. Manh, "Viblo - word embedding - understanding basic concepts in NLP." https://viblo.asia/p/word-embedding-tim-hieu-khai-niem-co-ban-trong-nlp-1Je5E93G5nL, 2020. Accessed: Dec. 13, 2023.

V. A. Krithika, "Introduction to fasttext embeddings and its implication." https://www.analyticsvidhya.com/blog/2023/01/introduction-to-fasttext-embeddings-and-its-implication/, 2023. Accessed: Dec. 13, 2023.

"Toxic comment classification challenge." https://www.kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge/data, 2018. Accessed: Dec. 21, 2023.

K. Dubey, R. Nair, M. U. Khan, and S. Shaikh, "Toxic comment detection using LSTM," in Proc. 2020 Third International Conference on Advances in Electronics, Computers and Communications (ICAECC), Dec. 2020, doi:10.1109/icaecc50550.2020.9339521. DOI: https://doi.org/10.1109/ICAECC50550.2020.9339521

A. K. Bala, "Toxic comments identification and classification using Deep Neural Networks," Academia.edu, https://www.academia.edu/41458366/Toxic_Comments_Identification_and_Classification_Using_Deep_Neural_Networks. Accessed: Dec. 21, 2023.

R. Sharma and M. Patel, "Toxic comment classification using neural networks and machine learning," IARJSET, vol. 5, no. 9, pp. 47–52, Sep. 2018, doi:10.17148/iarjset.2018.597. DOI: https://doi.org/10.17148/IARJSET.2018.597

V. M. Krešňáková, M. Sarnovský, P. Butka, and K. Machová, "Comparison of deep learning models and various text pre-processing techniques for the toxic comments classification," Applied Sciences, vol. 10, no. 23, p. 8631, Dec. 2020, doi:10.3390/app10238631. DOI: https://doi.org/10.3390/app10238631

Toxic Text Detection In Vietnamese Language

Authors

Corressponding author's email:

DOI:

Keywords:

Abstract

Downloads: 0

Author Biographies

Dong Tran, Trường Đại học Sư phạm Kỹ thuật Tp.HCM, Việt Nam

Minh Phuoc Huynh , Trường Đại học Sư phạm Kỹ thuật Tp.HCM, Việt Nam

Mai Thanh Nhat Van , Trường Đại học Sư phạm Kỹ thuật Tp.HCM, Việt Nam

Nhat Quang Tran, HCMUTE

Minh Tan Le

References

Downloads

Published

How to Cite

Issue

Section

Categories

License

Most read articles by the same author(s)

Similar Articles

Make a Submission

Announcements

Journal Score Upgraded in Several Disciplines by the State Council for Professorship

Announcement on the Change in Publication Schedule of JTE

Call for Papers: Special Issue on Information Technology

Language

Information

Connections

Keywords

Visitors

Current Issue