Using and improving cosine similarity algorithm for building and managing question bank
Corressponding author's email:
pvtinh@hcmuaf.edu.vnKeywords:
Near Duplicate Detection, Text similarity, Cosine similarity, Weighted Cosine Similarity, Question bankAbstract
The bank of multiple-choice questions is a core component of the evaluation system to ensure the quality of training in educational institutions. The current research focuses only on the method of creating the exam from the prepared question bank, but it does not focus on the prevention of duplicate material in the question bank. As the number of questions in the question bank increases, the management of questions contents become more difficult and the duplication of question content becomes unavoidable. In this study, we propose using and improving the Cosine similarity algorithm by weighting the keywords (shingles) used to detect the duplicate content of questions in the exams or in question bank to ensure that exams are generated more accurately.
Downloads: 0
References
Yildirim M., Heuristic optimization methods for generating test from a question bank, Advances in Artificial Intelligence, pp. 1218-1229 (2007).
Yildirim M., A genetic algorithm for generating test from a question bank, Computer Applications in Engineering Education, Vol.18, No. 2, pp. 298 – 305 (2010).
Toan Bui, Tram Nguyen, Bay Vo, Thanh Nguyen, Witold Pedrycz, Václav Snásel: Application of Particle Swarm Optimization to Create Multiple-Choice Tests. J. Inf. Sci. Eng. 34(6): 1405-1423 (2018).
Anand Rajaraman, Jure Leskovec,and Jeffrey D. Ullman,Mining of Massive Datasets, Cambridge University Press, 2014
Felix Naumann, Melanie Herschel, An Introduction to Duplicate Detection, Morgan & Claypool, 2010
Lavanya Pamulaparty, C.V Guru Rao, M. Sreenivasa Rao, A NEAR-DUPLICATE DETECTION ALGORITHM TO FACILITATE DOCUMENT CLUSTERING, International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.6, November 2014
Wael H. Gomaa, Aly A. Fahmy, A Survey of Text Similarity Approaches, International Journal of Computer Applications (0975 – 8887) Volume 68 – No.13, April 2013
Anshumali Shrivastava,Ping Li, In Defense of MinHash Over SimHash, Artificial Intelligence and Statistics pp. 886-894 (2014)
Henzinger Monika, Finding near-duplicate web pages: a large-scale evaluation of algorithms, Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006
Pratap Dangeti, Statistics for Machine Learning, Packt Publishing, 2017
Li, Baoli: Distance Weighted Cosine Similarity Measure for Text Classification. In IDEAL 2013 proceedings. 10.1007/978-3-642-41278-3_74, 2013
Downloads
Published
How to Cite
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright © JTE.


