Using and improving cosine similarity algorithm for building and managing question bank

Authors

  • Van Tinh Pham Ho Chi Minh Ciy University of Agriculture and Forestry, Vietnam
  • Thi Phuong Tram Nguyen Ho Chi Minh Ciy University of Agriculture and Forestry, Vietnam

Corressponding author's email:

pvtinh@hcmuaf.edu.vn

Keywords:

Near Duplicate Detection, Text similarity, Cosine similarity, Weighted Cosine Similarity, Question bank

Abstract

The bank of multiple-choice questions is a core component of the evaluation system to ensure the quality of training in educational institutions. The current research focuses only on the method of creating the exam from the prepared question bank, but it does not focus on the prevention of duplicate material in the question bank. As the number of questions in the question bank increases, the management of questions contents become more difficult and the duplication of question content becomes unavoidable. In this study, we propose using and improving the Cosine similarity algorithm by weighting the keywords (shingles) used to detect the duplicate content of questions in the exams or in question bank to ensure that exams are generated more accurately.

Downloads: 0

Download data is not yet available.

References

Yildirim M., Heuristic optimization methods for generating test from a question bank, Advances in Artificial Intelligence, pp. 1218-1229 (2007).

Yildirim M., A genetic algorithm for generating test from a question bank, Computer Applications in Engineering Education, Vol.18, No. 2, pp. 298 – 305 (2010).

Toan Bui, Tram Nguyen, Bay Vo, Thanh Nguyen, Witold Pedrycz, Václav Snásel: Application of Particle Swarm Optimization to Create Multiple-Choice Tests. J. Inf. Sci. Eng. 34(6): 1405-1423 (2018).

Anand Rajaraman, Jure Leskovec,and Jeffrey D. Ullman,Mining of Massive Datasets, Cambridge University Press, 2014

Felix Naumann, Melanie Herschel, An Introduction to Duplicate Detection, Morgan & Claypool, 2010

Lavanya Pamulaparty, C.V Guru Rao, M. Sreenivasa Rao, A NEAR-DUPLICATE DETECTION ALGORITHM TO FACILITATE DOCUMENT CLUSTERING, International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.4, No.6, November 2014

Wael H. Gomaa, Aly A. Fahmy, A Survey of Text Similarity Approaches, International Journal of Computer Applications (0975 – 8887) Volume 68 – No.13, April 2013

Anshumali Shrivastava,Ping Li, In Defense of MinHash Over SimHash, Artificial Intelligence and Statistics pp. 886-894 (2014)

Henzinger Monika, Finding near-duplicate web pages: a large-scale evaluation of algorithms, Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006

Pratap Dangeti, Statistics for Machine Learning, Packt Publishing, 2017

Li, Baoli: Distance Weighted Cosine Similarity Measure for Text Classification. In IDEAL 2013 proceedings. 10.1007/978-3-642-41278-3_74, 2013

Published

31-07-2019

How to Cite

[1]
V. T. Pham and . T. P. T. Nguyen, “Using and improving cosine similarity algorithm for building and managing question bank”, JTE, vol. 14, no. 3, pp. 17–24, Jul. 2019.

Issue

Section

Research Article

Categories