An Exact and Inexact Data Matching Approach for Saving Time and Preventing Errors in Processing of Student Exam Results at the University of Botswana

George Anderson(1*), Audrey Masizana(2), Dimane Mpoeleng(3)

(1) Department of Computer Science, University of Botswana, P/BAG UB 00704, Gaborone., Botswana
(2) Department of Computer Science, University of Botswana, P/BAG UB 00704, Gaborone., Botswana
(3) Department of Computer Science, University of Botswana, P/BAG UB 00704, Gaborone., Botswana
(*) Corresponding author


DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)

Abstract


The University of Botswana (UB) offers a comprehensive range of academic programmes, leading to a variety of undergraduate and postgraduate qualifications offered across its seven faculties. Like other universities, UB faces a problem of large enrolment creating a large numbers of students in some of the courses offered. Often this problem is seen in the university-wide General Education Courses which UB created to provide basic interdisciplinary skills for students. We study an examination problem using one such course, Computing and Information Skills Fundamentals I, with a registration of more than 4000 students. The exam questions are designed with multiple choice questions and the exam answer sheets are marked using an Optical Mark Recognition scanner. A recurring problem is observed where students do not shade their details perfectly, leading to a cumbersome problem when matching exam records with class list records. We implement a data matching solution and test it, yielding excellent results. We make use of an exact matching processing step, before use of standard data matching methods for indexing, comparison, and classification. We use q-gram indexing, edit-distance comparison, and Support Vector Machine classification methods. The results show that the impact is very high. First the processing time is reduced from a cumbersome process of more than 13 hours to less than 18 minutes, with no mistakes made in the automated processing phase. Secondly, record pairs which had too little information to match automatically and have to be matched manually are few hence manageable
Copyright © 2013 Praise Worthy Prize - All rights reserved.

Keywords


Data Matching; Computational University Administration; Optical Mark Recognition Data Processing

Full Text:

PDF


References


P. Christen, Data Matching (Springer-Verlag, 2012).
http://dx.doi.org/10.1007/978-3-642-31164-2

P. Christen, Febrl: Freely Extensible Biomedical Record Linkage Manual Release 0.4.2., http://sourceforge.net/projects/febrl/, 2011.

Christen, P., FEBRL – A Freely Available Record Linkage System with a Graphical User Interface, Proceedings of the Second Australasian Workshop on Health Data and Knowledge Management, Wollongong, NSW, Australia (HDKM 2008) (Pages: 17-25, Year of Publication: 2008, ISBN: 978-1-920682-61-3).

X. Li, C. Shen, Linkage of patient records from disparate sources, Statistical Methods in Medical Research, Vol. 22 n. 1, pp. 31-38, 2013.
http://dx.doi.org/10.1177/0962280211403600

Z. Fu, J. Zhou, P. Christen, M. Boot, Multiple Instance Learning for Group Record Linkage, In P.-N. Tan, S. Chawla, C.-K. Ho, J. Bailey (Eds.), Advances in Knowledge Discovery and Data Mining (Berlin: Springer, 2012, 171-182).
http://dx.doi.org/10.1007/978-3-642-30217-6_15

Zobel, J., Dart, P., Phonetic string matching: lessons from information retrieval, Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '96) (Pages: 166-172, 1996).
http://dx.doi.org/10.1145/243199.243258

S.O.A. Pei, A Comparative Study of Record Matching Algorithms, European Master in Informatics (EuMI) Thesis, Media Informatics, RWTH Aachen, Germany and University of Edinburgh, Scotland, 2008.

Chávez, E., Navarro, G., A Metric Index for Approximate String Matching, Proceedings of the 5th Latin American Symposium on Theoretical Informatics (LATIN '02) (Pages: 181-195, Springer-Verlag, 2002).
http://dx.doi.org/10.1007/3-540-45995-2_20

M. Padmanaban, T. Bhuvaneswari, A Technique for Data Deduplication using Q-Gram Concept with Support Vector Machine, International Journal of Computer Applications, Vol. 61 n. 12, pp. 1-9, 2013.
http://dx.doi.org/10.5120/9977-4100

W.E. Winkler, Approximate string comparator search strategies for very large administrative lists, Technical report, US Bureau of the Census, 2005.

Patil, M., Cai, X., Thankachan, S.V., Shah, R., Park, S.-J., Foltz, D., Approximate string matching by position restricted alignment, In Proceedings of the Joint EDBT/ICDT 2013 Workshops (EDBT '13) (Pages: 384-391, ISBN: . 978-1-4503-1599-9, 2013).
http://dx.doi.org/10.1145/2457317.2457388

Christen, P., A Two-Step Classification Approach to Unsupervised Record Linkage, Proceedings of the Sixth Australasian Data Mining Conference (AusDM 2007) (Pages 111-119, 2007).

G. Navarro, A guided tour to approximate string matching, ACM Computing Surveys, Vol. 33 n. 1, pp. 31-88, 2011.
http://dx.doi.org/10.1145/375360.375365

S. Theodoridis, A. Pikrakis, K. Koutroumbas, D. Cavouras, Introduction to Pattern Recognition: A Matlab Approach (Academic Press, 2010).

S. Theodoridis, K. Koutroumbas, Pattern Recognition (Academic Press, 2009).
http://dx.doi.org/10.1016/b978-1-59749-272-0.50003-7

Abdellaoui, M., Douik, A., Besbes, K., Tracking and analyzing human motion in video, (2009) International Review on Computers and Software (IRECOS), 4 (6), pp. 640-647. Cited 3 times.

Hosseini, S.M., Farsi, H., Face detection base on fuzzy adjusted filter, (2010) International Review on Computers and Software (IRECOS), 5 (5), pp. 501-504.

Zhiwen, W., Shao-Zi, L., Ying, D., The integration of skin color segmentation and self-adapted template matching for the robust face detection, (2011) International Review on Computers and Software (IRECOS), 6 (5), pp. 795-805.

N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and other Kernel-Based Learning Methods, (Cambridge University Press, 2000).
http://dx.doi.org/10.1017/cbo9780511801389

P. Christen, K. Goiser, Quality and Complexity Measures for Data Linkage and Deduplication, In F.J. Guillet, H.J. Hamilton (Eds.), Quality Measures in Data Mining (Berlin: Springer, 2007, 127-151).
http://dx.doi.org/10.1007/978-3-540-44918-8_6


Refbacks

  • There are currently no refbacks.



Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2020 Praise Worthy Prize