Using Deep Learning for the Taxonomic Classification of Microbial Sequences

Authors

  • Manh Hung Hoang HCMC Industry and Trade College, Vietnam
  • Vu Hoang HCMC University of Technology and Education, Vietnam
  • Van-Vinh Le HCM University of Technology and Education, Vietnam https://orcid.org/0000-0001-5218-3089

Corressponding author's email:

vinhlv@hcmute.edu.vn

DOI:

https://doi.org/10.54644/jte.2024.1521

Keywords:

Taxonomic classification, Deep learning, DNA, Metagenomic, k-mer embedding

Abstract

Microbes are common creatures and play a crucial role in our world. Thus, the understanding of microbial communities brings benefits to human lives. Because the material samples of microbes contain sequences belonging to different organisms, an important task in analyzing processes is to classify the sequences into groups of different species or closely related organisms, called metagenomic classification. Many classification approaches were proposed to analyze the metagenomic data. However, due to the complexity of microbial samples, the accuracy performance of those methods still remains a challenge. This study applies an effective deep learning framework for the classification of microbial sequences. The proposed architecture combines a sequence embedding layer with other layers of a bidirectional Long Short-Term Memory, Seft-attention, and Dropout mechanisms for feature learning. Experimental results demonstrate the strength of the proposed method on datasets of real metagenomes.

Downloads: 0

Download data is not yet available.

Author Biographies

Manh Hung Hoang, HCMC Industry and Trade College, Vietnam

Hoang Manh Hung received the bachelor's degree in Information Technology from University of Information Technology in 2013, and MSc degree in Computer Science from Ho Chi Minh City University of Technology and Education in 2023.

Currently, he is working at the faculty of Information Technology, HCMC Industry and Trade College, Vietnam.

Email: manhhung@hitu.edu.vn

Vu Hoang, HCMC University of Technology and Education, Vietnam

Hoang Vu received the bachelor's degree in Information Technology from Ho Chi Minh City University of Technology in 2006, and MSc degree in Computer Science from Ho Chi Minh City University of Technology and Education in 2021.

Currently, he is studying at the faculty of Information Technology, HCMC University of Technology and Education, Vietnam. Email: hvu267@gmail.com

Van-Vinh Le, HCM University of Technology and Education, Vietnam

Le Van Vinh received the bachelor's degree in Information Technology, and MSc degree in Computer Science from University of Science (Vietnam National University, Ho Chi Minh City) in 2005 and 2009, respectively. He received the PhD degree in Computer Science from HCMC University of Technology (Vietnam National University, Ho Chi Minh City) in 2017. Currently, he is working at the faculty of Information Technology, HCMC University of Technology and Education, Vietnam. His research interests include bioinformatics, high-performance computing, data science, and deep learning. Email: vinhlv@hcmute.edu.vn. ORCID:  https://orcid.org/0000-0001-5218-3089

References

C. Simon and R. Daniel, "Metagenomic analyses: past and future trends," Applied and Environmental Microbiology, vol. 77, no. 4, pp. 1153-1161, 2011. DOI: https://doi.org/10.1128/AEM.02345-10

D. H. Huson, et al., "MEGAN analysis of metagenomic data," Genome Research, vol. 17, no. 3, pp. 377-386, 2007. DOI: https://doi.org/10.1101/gr.5969107

C. Bağcı, S. Patz, and D. H. Huson, "DIAMOND+ MEGAN: fast and easy taxonomic and functional analysis of short and long microbiome sequences," Current Protocols, vol. 1, no. 1, pp. e59, 2021. DOI: https://doi.org/10.1002/cpz1.59

T. N. Furstenau et al., "MTSv: rapid alignment-based taxonomic classification and high-confidence metagenomic analysis," Peer J., vol. 10, no. 3, pp. e14292, 2022. DOI: https://doi.org/10.7717/peerj.14292

A. K. Adams et al., "Qmatey: an automated pipeline for fast exact matching-based alignment and strain-level taxonomic binning and profiling of metagenomes," Briefings in Bioinformatics, vol. 24, no. 2, pp. bbad351, 2023. DOI: https://doi.org/10.1093/bib/bbad351

T. Madden, "The BLAST sequence analysis tool," The NCBI Handbook, vol. 2, no. 5, pp. 425-436, 2013.

B. Buchfink, C. Xie, and D. H. Huson, "Fast and sensitive protein alignment using DIAMOND," Nature Methods, vol. 12, no. 1, pp. 59-60, 2015. DOI: https://doi.org/10.1038/nmeth.3176

Y. Chen et al., "High speed BLASTN: an accelerated MegaBLAST search tool," Nucleic Acids Research, vol. 43, no. 16, pp. 7762-7768, 2015. DOI: https://doi.org/10.1093/nar/gkv784

D. E. Wood and S. L. Salzberg, "Kraken: ultrafast metagenomic sequence classification using exact alignments," Genome Biology, vol. 15, no. 3, pp. 1-12, 2014. DOI: https://doi.org/10.1186/gb-2014-15-3-r46

D. E. Wood, J. Lu, and B. Langmead, "Improved metagenomic analysis with Kraken 2," Genome Biology, vol. 20, pp. 1-13, 2019. DOI: https://doi.org/10.1186/s13059-019-1891-0

R. Ounit et al., "CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers," BMC Genomics, vol. 16, no. 1, pp. 1-13, 2015. DOI: https://doi.org/10.1186/s12864-015-1419-2

D. Storato and M. Comin, "K2mem: discovering discriminative k-mers from sequencing data for metagenomic reads classification," IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 19, no. 1, pp. 220-229, 2021. DOI: https://doi.org/10.1109/TCBB.2021.3117406

G. L. Rosen, E. R. Reichenberger, and A. M. Rosenfeld, "NBC: the Naive Bayes Classification tool webserver for taxonomic classification of metagenomic reads," Bioinformatics, vol. 27, no. 1, pp. 127-129, 2011. DOI: https://doi.org/10.1093/bioinformatics/btq619

Z. Rasheed and H. Rangwala, "TAC-ELM: Metagenomic Taxonomic Classification with Extreme Learning Machines," BICoB, 2011. DOI: https://doi.org/10.1142/S0219720012500151

N. N. Diaz et al., "TACOA–Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach," BMC Bioinformatics, vol. 10, pp. 1-16, 2009. DOI: https://doi.org/10.1186/1471-2105-10-56

Q. Liang et al., "DeepMicrobes: taxonomic classification for metagenomics with deep learning," NAR Genomics and Bioinformatics, vol. 2, no. 1, pp. qaa009, 2020. DOI: https://doi.org/10.1093/nargab/lqaa009

F. Mock et al., "BERTax: taxonomic classification of DNA sequences with Deep Neural Networks," BioRxiv, vol. 07, 2021. DOI: https://doi.org/10.1101/2021.07.09.451778

B. Matougui et al., "NLP-MeTaxa: A Natural Language Processing Approach for Metagenomic Taxonomic Binning Based on Deep Learning," Current Bioinformatics, vol. 16, no. 7, pp. 992-1003, 2021. DOI: https://doi.org/10.2174/1574893616666210621101150

A. Wichmann et al., "MetaTransformer: deep metagenomic sequencing read classification using self-attention models," NAR Genomics and Bioinformatics, vol. 5, no. 3, pp. lqad082, 2023. DOI: https://doi.org/10.1093/nargab/lqad082

D. C. Richter et al., "MetaSim - a sequencing simulator for genomics and metagenomics," PLoS ONE, vol. 3, no. 10, pp. e3373, 2008. DOI: https://doi.org/10.1371/journal.pone.0003373

Downloads

Published

28-02-2024

How to Cite

[1]
M. H. Hoang, V. Hoang, and V.-V. Le, “Using Deep Learning for the Taxonomic Classification of Microbial Sequences”, JTE, vol. 19, no. Special Issue 01, pp. 8–14, Feb. 2024.