Retinal Diseases Classification From OCT Images Using Pretrained Dual-Encoder Architecture

Ngo Quang Huy; Hoang Thai Xuan Khoa; Le Van Vinh

doi:10.54644/jte.2026.2062

Authors

Ngo Quang Huy International University, VNU-HCM, Vietnam https://orcid.org/0009-0004-9538-4111
Hoang Thai Xuan Khoa Ho Chi Minh City University of Technology and Engineering, Vietnam https://orcid.org/0009-0006-8912-297X
Le Van Vinh Ho Chi Minh City University of Technology and Engineering, Vietnam https://orcid.org/0000-0001-5218-3089

Corressponding author's email:

vinhlv@hcmute.edu.vn

DOI:

https://doi.org/10.54644/jte.2026.2062

Keywords:

Multimodal, Classification, Retinal diseases, OCT images, Deep learning

Abstract

Retinal diseases, such as age-related macular degeneration (AMD), diabetic retinopathy (DR), and glaucoma, are leading causes of irreversible vision loss, making early and accurate diagnosis essential for effective treatment. Optical coherence tomography (OCT) provides high-resolution cross-sectional retinal images that support disease assessment; however, many challenges still remain due to noise and artifacts in images or complex retinal structures. In this study, we propose a dual-encoder framework for retinal disease classification from OCT B-scan images by jointly leveraging two pretrained foundation models: RETFound and MIRAGE. Following standardized preprocessing and resampling, high-quality features extracted from the encoders are combined for the final classification tasks. To mitigate overfitting on limited medical data, the RETFound encoder is frozen during training to preserve general visual features, whereas the MIRAGE encoder is fine-tuned to adapt to specific classification objectives. Extensive experiments conducted on seven public OCT benchmark datasets demonstrate that the proposed method outperforms single-encoder baselines on the majority of benchmarks. The framework achieved an average balanced accuracy (BAcc) of 89.8%, an F1-score of 90.7%, and a Matthews Correlation Coefficient (MCC) of 83.9%. These results confirm the effectiveness of combining complementary pretrained encoders for robust and generalizable retinal disease classification in clinical settings.

Downloads: 0

Download data is not yet available.

Author Biographies

Ngo Quang Huy, International University, VNU-HCM, Vietnam

Ngo Quang Huy received the bachelor's degree in Information Technology from University of Science (Vietnam National University, Ho Chi Minh City) in 2007. Currently, he is working at the International University - Vietnam National University HCM City.

Email: nqhuy@hcmiu.edu.vn. ORCID: https://orcid.org/0009-0004-9538-4111

Hoang Thai Xuan Khoa, Ho Chi Minh City University of Technology and Engineering, Vietnam

Hoang Thai Xuan Khoa received the bachelor's degree in Information Technology, and MSc degree in Computer Science from HCMC University of Technology and Education (currently Ho Chi Minh City University of Technology and Engineering) in 2012 and 2025, respectively. Currently, he is working at the Ho Chi Minh City University of Technology and Engineering, Vietnam. His research interests include Bioinformatics, Deep learning.

Email: khoahtx@hcmute.edu.vn. ORCID: https://orcid.org/0009-0006-8912-297X

Le Van Vinh, Ho Chi Minh City University of Technology and Engineering, Vietnam

Le Van Vinh received the bachelor's degree in Information Technology, and MSc degree in Computer Science from University of Science (Vietnam National University, Ho Chi Minh City) in 2005 and 2009, respectively. He received the PhD degree in Computer Science from HCMC University of Technology (Vietnam National University, Ho Chi Minh City) in 2017. Currently, he is working at the faculty of Information Technology, Ho Chi Minh City University of Technology and Engineering, Vietnam. His research interests include Bioinformatics, Deep learning, Computer vision, and High-performance computing.

Email: vinhlv@hcmute.edu.vn. ORCID: https://orcid.org/0000-0001-5218-3089

References

M. J. Burton et al., “The Lancet Global Health Commission on global eye health: Vision beyond 2020,” The Lancet Global Health, vol. 9, pp. e489–e551, 2021. DOI: https://doi.org/10.1016/S2214-109X(21)00138-8

J. S. Schuman, J. G. Fujimoto, J. J. Duker, and H. Ishikawa, Optical Coherence Tomography of Ocular Diseases. Boca Raton, FL, USA: CRC Press, 2024. DOI: https://doi.org/10.1201/9781003525455

V. Srinivasan et al., “Fully automated detection of diabetic macular edema and dry age-related macular degeneration from optical coherence tomography images,” Biomedical Optics Express, vol. 5, no. 10, pp. 3568–3577, 2014. DOI: https://doi.org/10.1364/BOE.5.003568

A. Albarrak, F. Coenen, and Y. Zheng, “Age-related macular degeneration identification in volumetric optical coherence tomography using decomposition and local feature extraction,” in Proc. Int. Conf. Medical Image Understanding and Analysis (MIUA), 2013, pp. 59–64.

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. Int. Conf. Learning Representations (ICLR), 2015.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. DOI: https://doi.org/10.1109/CVPR.2016.90

C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818–2826. DOI: https://doi.org/10.1109/CVPR.2016.308

A. Kamble et al., “Automated diabetic macular edema (DME) analysis using fine-tuning with Inception-ResNet-v2 on OCT images,” in Proc. IEEE-EMBS Int. Conf. Biomedical Engineering and Sciences (IECBES), 2018, pp. 442–446. DOI: https://doi.org/10.1109/IECBES.2018.8626616

O. Perdomo et al., “Oct-net: A convolutional network for automatic classification of normal and diabetic macular edema using SD-OCT volumes,” in Proc. IEEE Int. Symp. Biomedical Imaging (ISBI), 2018, pp. 1423–1426. DOI: https://doi.org/10.1109/ISBI.2018.8363839

N. Tran, “Segmentation on chest CT imaging in COVID-19 based on the improved attention U-Net model,” in Proc. Int. Conf. Intelligent Software Methodologies, Tools, and Techniques (SOMET), 2022, pp. 596–606. DOI: https://doi.org/10.3233/FAIA220288

V. Pham et al., “Robust engineering-based unified biomedical imaging framework for liver tumor segmentation,” Current Medical Imaging, vol. 19, no. 1, pp. 37–45, 2023. DOI: https://doi.org/10.2174/1573405617666210804151024

W. Nazih, A. O. Aseeri, O. Y. Atallah, and S. El-Sappagh, “Vision transformer model for predicting the severity of diabetic retinopathy in fundus photography-based retina images,” IEEE Access, vol. 11, pp. 117546–117561, 2023. DOI: https://doi.org/10.1109/ACCESS.2023.3326528

D. Philippi, K. Rothaus, and M. Castelli, “A vision transformer architecture for the automated segmentation of retinal lesions in spectral-domain optical coherence tomography images,” Scientific Reports, vol. 13, no. 1, art. no. 517, 2023. DOI: https://doi.org/10.1038/s41598-023-27616-1

A. Dosovitskiy et al., “An image is worth 16×16 words: Transformers for image recognition at scale,” in Proc. Int. Conf. Learning Representations (ICLR), 2021.

Y. Zhou et al., “A foundation model for generalizable disease detection from retinal images,” Nature, vol. 622, no. 7981, pp. 156–163, Sep. 2023.

S. Azizi et al., “Robust and efficient medical imaging with self-supervision,” arXiv preprint arXiv:2205.09723, 2022.

K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2022, pp. 16000–16009. DOI: https://doi.org/10.1109/CVPR52688.2022.01553

Y. Liu et al., “Contrastive predictive coding with transformer for video representation learning,” Neurocomputing, vol. 482, pp. 154–162, 2022. DOI: https://doi.org/10.1016/j.neucom.2021.11.031

L. Zhang, C. Bao, and K. Ma, “Self-distillation: Towards efficient and compact neural networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 8, pp. 4388–4403, Aug. 2022.

M. Oquab et al., “DINOv2: Learning robust visual features without supervision,” Transactions on Machine Learning Research, 2024.

X. Li et al., “Multi-modal multi-instance learning for retinal disease recognition,” in Proc. 29th ACM Int. Conf. Multimedia (MM’21), 2021, pp. 2474–2482. DOI: https://doi.org/10.1145/3474085.3475418

W. Wang et al., “Learning two-stream CNN for multi-modal age-related macular degeneration categorization,” IEEE J. Biomed. Health Inform., vol. 26, no. 11, pp. 5565–5576, Nov. 2022. DOI: https://doi.org/10.1109/JBHI.2022.3171523

Y. Li et al., “Multimodal information fusion for glaucoma and diabetic retinopathy classification,” in Ophthalmic Medical Image Analysis (OMIA). Cham, Switzerland: Springer, 2022, pp. 53–62. DOI: https://doi.org/10.1007/978-3-031-16525-2_6

D. Shi et al., “Eyefound: A multimodal generalist foundation model for ophthalmic imaging,” arXiv preprint arXiv:2405.11338, 2024.

D. Shi et al., “A multimodal visual–language foundation model for computational ophthalmology,” npj Digital Medicine, vol. 8, no. 1, art. no. 381, 2025. DOI: https://doi.org/10.1038/s41746-025-01772-2

J. Morano et al., “Multimodal foundation model and benchmark for comprehensive retinal OCT image analysis,” npj Digital Medicine, vol. 8, no. 1, art. no. 576, Sep. 2025. DOI: https://doi.org/10.1038/s41746-025-01852-3

R. Bachmann, D. Mizrahi, A. Atanov, and A. Zamir, “MultiMAE: Multi-modal multi-task masked autoencoders,” in Proc. Eur. Conf. Computer Vision (ECCV). Cham, Switzerland: Springer, 2022, pp. 348–367. DOI: https://doi.org/10.1007/978-3-031-19836-6_20

C. Lanczos and J. Boyd, Discourse on Fourier Series. Philadelphia, PA, USA: SIAM, 2016. DOI: https://doi.org/10.1137/1.9781611974522

R. Yacouby and D. Axman, “Probabilistic extension of precision, recall, and F1 score for more thorough evaluation of classification models,” in Proc. Workshop on Evaluation and Comparison of NLP Systems, 2020, pp. 79–91. DOI: https://doi.org/10.18653/v1/2020.eval4nlp-1.9

D. Chicco and G. Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 1, art. no. 6, 2020. DOI: https://doi.org/10.1186/s12864-019-6413-7

J. A. Hanley and B. J. McNeil, “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiology, vol. 143, no. 1, pp. 29–36, 1982. DOI: https://doi.org/10.1148/radiology.143.1.7063747

L. Mosley, A Balanced Approach to the Multi-Class Imbalance Problem, Ph.D. dissertation, Iowa State University, Ames, IA, USA, 2013.

Retinal Diseases Classification From OCT Images Using Pretrained Dual-Encoder Architecture

Authors

Corressponding author's email:

DOI:

Keywords:

Abstract

Downloads: 0

Author Biographies

Ngo Quang Huy, International University, VNU-HCM, Vietnam

Hoang Thai Xuan Khoa, Ho Chi Minh City University of Technology and Engineering, Vietnam

Le Van Vinh, Ho Chi Minh City University of Technology and Engineering, Vietnam

References

Downloads

Published

How to Cite

Issue

Section

Categories

License

Most read articles by the same author(s)

Make a Submission

Announcements

Journal Score Upgraded in Several Disciplines by the State Council for Professorship

Announcement on the Change in Publication Schedule of JTE

Call for Papers: Special Issue on Information Technology

Language

Information

Connections

Keywords

Visitors

Current Issue