Open Access Open Access  Restricted Access Subscription or Fee Access

Determining Optimal Blocking Sizes for Large Exam Groups When Data Matching

(*) Corresponding author

Authors' affiliations



Computer literacy courses in the University of Botswana have a large number of students registered for them every year. Exams are multiple-choice, involving shading of identification details and answers on answer forms. This creates a data matching problem, since sometimes identification details are incorrectly shaded. In this paper, we explore the best way to structure such exams such that data matching can be used to match answer forms with students in the master registration list, in such a way that data matching accuracy is very good, due to blocking by exam group, and at the same time maximizing the class room size to save exam administration costs. Our results indicate the effectiveness of an optimal exam room group.
Copyright © 2021 Praise Worthy Prize - All rights reserved.


Data Matching; Optimal Blocking; Computational University Administration

Full Text:



George Anderson, Audrey Masizana, Dimane Mpoeleng, An Exact and Inexact Data Matching Approach for Saving Time and Preventing Errors in Processing of Student Exam Results at the University of Botswana, (2013) International Journal on Information Technology (IREIT), 1 (3), pp. 179-185.

P. Christen, A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication, IEEE Transactions on Knowledge and Data Engineering (TKDE), vol. 24 n. 9, September 2012, pp. 1537 - 1555.

P. Christen, Data Matching (Berlin: Springer-Verlag, 2012).

X. Li, C. Shen, Linkage of Patient Records from Disparate Sources, Statistical Methods in Medical Research, vol. 22 n. 1, February 2013, pp. 31-38.

P. Christen, A Two-Step Classification Approach to Unsupervised Record Linkage, Proceedings of the Sixth Australasian Data Mining Conference ~AusDM 2007~, December 3-4, 2007, Gold Coast, Queensland, Australia.

G. Navarro, A Guided Tour to Approximate String Matching, ACM Computing Surveys, vol. 33 n. 1, March 2001, pp. 31-88.

I.P. Fellegi, A.B. Sunter, A Theory for Record Linkage, Journal of the American Statistical Association, vol. 64 n. 328, 1969, pp. 1183 - 1210.

S. Yan, D. Lee, M.-Y. Kan, L.C. Giles, Adaptive Sorted Neighborhood Methods for Efficient Record Linkage, Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries ~JCDL '07~, June 18-23, 2007, Vancouver, BC, Canada.

T. de Vries, H. Ke, S. Chawla, P. Christen, Robust Record Linkage Blocking Using Suffix Arrays, Proceedings of the 18th ACM Conference on Information and Knowledge Management ~CIKM '09~, November 2-6, 2009, Hong Kong, China.

A. Aizawa, K. Oyama, A Fast Linkage Detection Scheme for Multi-Source Information Integration, Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration ~WIRI '05~, April 8-9, 2005, Tokyo, Japan.

M. Bilenko, B. Kamath, R.J. Mooney, Adaptive Blocking: Learning to Scale Up Record Linkage, Proceedings of the Sixth International Conference Data Mining ~ICDM '06~, December 18-22, 2006, Hong Kong, China.

P. Christen, Febrl: Freely Extensible Biomedical Record Linkage Manual Release 0.4.2.,, 2011.

P. Christen, FEBRL - A Freely Available Record Linkage System with a Graphical User Interface, Proceedings of the Second Australasian Workshop on Health Data and Knowledge Management ~HDKM 2008~, January 2008, Wollongong, NSW, Australia.

S. Theodoridis, A. Pikrakis, K. Koutroumbas, D. Cavouras, Introduction to Pattern Recognition: A Matlab Approach (Academic Press, 2010).

S. Theodoridis, K. Koutroumbas, Pattern Recognition (Academic Press, 2009).

Zhiwen, W., Shao-Zi, L., Ying, D., The integration of skin color segmentation and self-adapted template matching for the robust face detection, (2011) International Review on Computers and Software (IRECOS), 6 (5), pp. 795-80.

N. Cristianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and other Kernel-Based Learning Methods, (Cambridge University Press, 2000).

Marrakchi, O., Agina, A., Elghali, A., Evaluating SVM and BPNN classifiers for remote sensing data, (2009) International Review on Computers and Software (IRECOS), 4 (5), pp. 600-60.

Moutachaouik, H., Ouhbi, B., Behja, H., Frikh, B., Marzak, A., Douzi, H., Hybrid method for information retrieval based on the similarity between queries, (2012) International Review on Computers and Software (IRECOS), 7 (6), pp. 2960-296.

P. Christen, K. Goiser, Quality and Complexity Measures for Data Linkage and Deduplication, In F.J. Guillet, H.J. Hamilton (Eds.), Quality Measures in Data Mining (Berlin: Springer, 2007, 127-151).


  • There are currently no refbacks.

Please send any question about this web site to
Copyright © 2005-2023 Praise Worthy Prize