A Modified Decision Tree Algorithm for Uncertain Data Classification

S. Meenakshi; V. Venkatachalam

doi:10.15866/irecos.v9i1.1049

A Modified Decision Tree Algorithm for Uncertain Data Classification

^(*) Corresponding author

DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)

Abstract

The classifications of uncertain data become one of the tedious processes in the data mining domain. The uncertain data are contains tuples with different probability distribution and thus to find similar class of tuples is a complex process. When we consider uncertain data, the feature vector will not be a single valued but a function. Recently, different methods are proposed on decision tree based uncertain data classification with binary based operation on the decision tree. When multiclass data are given to the decision tree, their algorithm has to give repeated calculation to produce the probability distribution matching the class labels, thus time and memory utilization will be high for the particular algorithm. In this paper, we have intended to propose a classification method for uncertain data based on the decision tree. The proposed approach concentrates on an adaptive averaging method, where we have incorporated mean and median of the tuple to produce the feature value that will be used in the decision tree for decision making. Then a probability calculation is executed to find the relevance of tuple with respect to a class. If the calculated probability value is similar to a particular probability distribution, then the tuple is marked to that particular class. Thus, we produce a decision tree with c number of leaf nodes, where c is the number of class labels in the training data. The test data is subjected to the trained decision tree to obtain the classified data. . The experimental analysis are conducted for evaluating the performance of the proposed approach. The vehicle dataset and segment dataset from the UCI data repository is selected for the performance analysis. The results from the experimental analysis showed that the adaptive method has achieved a maximum average accuracy of 0.997 while the existing approach achieved only 0.985
Copyright © 2014 Praise Worthy Prize - All rights reserved.

Keywords

Uncertain Data; Probability Distribution; Decision Tree Algorithm; Classification

Full Text:

PDF

References

Smith Tsang, Ben Kao, Kevin Y. Yip, Wai-Shing Ho, Sau Dan Lee, “Decision Trees for Uncertain Data, TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, PP: 1-

JiangtaoRen, Sau Dan Lee, Xianlu Chen, Ben Kao, Reynold Cheng and David Cheung,"Naive Bayes Classification of Uncertain Data, “Ninth IEEE International Conference on Data Mining,pp.944-949,2009.

Fayyad U., Piatetsky-Shapiro G., Smyth P., From data mining to knowledge discovery: an overview, Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, Menlo Park, CA, AAAI/MIT Press, 1996, pp: 1-36.

Romero C., Ventura S., Espejo P.G., and Hervas C., Data Mining Algorithms to Classify Students, proceedings of the 1st Int'l conference on educational data mining, Canada, 2008, pp: 8-17.

Zhang J., Mani I., kNN Approach to Unbalanced Data Distributions: A Case Study involving Information Extraction, In Proceedings of The Twentieth International Conference on Machine Learning (ICML-2003), Workshop on Learning from Imbalanced Data Sets II, August 21, 2003.

Charu C. Aggarwal and Philip S. Yu, "A Survey of Uncertain Data Algorithmsand Applications", IEEE transactions on knowledge and data engineering, Vol. 21, No. 5, MAY 2009.

Barbara, D., Garcia-Molina, H. and Porter, D. “The Management of Probabilistic Data,” IEEE Transactionson Knowledge and Data Engineering, 4(5), 1992.

Cheng, R., Kalashnikov, D., and Prabhakar, S. “Evaluating Probabilistic Queries over Imprecise Data,”Proceedings of the ACM SIGMOD International Conference on Management of Data, June 2003.

Chau, M., Cheng, R., and Kao, B., "Uncertain Data Mining: A New Research Direction," in Proceedings of the Workshop on the Sciences of the Artificial, Hualien, Taiwan, December 7-8, 2005.

J. R. Quinlan, C4.5: Programs for Machine Learning. Morgan KaufmanPublishers, 1993.

W. W. Cohen, “Fast effective rule induction,” in Proc. of the 12th Intl.Conf. on Machine Learning, 1995, pp. 115–123.

P. Langley, W. Iba, and K. Thompson, “An analysis of bayesianclassifiers,” in National Conf. on Artigicial Intelligence, 1992, pp. 223-228.

V. Vapnik, The Nature of Statistical Learning Theory. Springer Verlag,1995.

R. Andrews, J. Diederich, and A. Tickle, “A survey and critique oftechniques for extracting rules from trained artificial neural networks,”Knowledge Based Systems, vol. 8, no. 6, pp. 373–389, 1995.

T. G. Dietterich, “Ensemble methods in machine learning,” LectureNotes in Computer Science, vol. 1857, pp. 1–15, 2000.

Q. JR, Probabilistic decision trees, in Machine Learning:An ArtificialIntelligence Approach. Morgan Kaufmann, 1990.

L. O and N. M, “Ordered estimation of missing values,” in PAKDD,1999, pp. 499–503.

H. L, S. A, and S. M, “Dealing with missing values in a probabilistic decisiontree during classification,” in The Second International Workshopon Mining Complex Data, 2006, pp. 325–329.

B. Jinbo and Z. Tong, “Support vector classification with input data uncertainty,”in Advances in Neural Information Processing Systems, 2004, pp. 161–168.

E. V. Gonzalez, I. A. E. Broitman, E. E. Vallejo, and C. E. Taylor,“Targeting input data for acoustic bird species recognition using datamining and hmms,” in ICDMW 2007, pp. 513–518.

C. Jebari and H. Ounelli, “Genre categorization of web pages,” inICDMW 2007, pp. 455–464.

Biao Qin, Yuni Xia, Sunil Prabhakar and YichengTu, " A Rule-Based Classification Algorithm forUncertain Data", IEEE 25th International Conference on Data Engineering,pp.1633 - 1640,2009.

JiaqiGe, and Yuni Xia, "UNN: A Neural Network for uncertain data classification",In Proceedings of PAKDD, vol.1, pp.449-460, 2010.

WłodzisławDuch,"Uncertainty of data, fuzzy membership functions, and multi-layer perceptrons,"IEEE Transactions on Neural Networks,vol.16,no.1,pp.10-23,2005.

Jianqiang Yang and Steve Gunn,”Exploiting Uncertain Data in Support VectorClassification,”Springer-Verlag Berlin Heidelberg, pp. 148–155, 2007.

Ju, C., Zhang, P., A new clustering algorithm for uncertain data stream based on the existence probability of tuple, (2012) International Review on Computers and Software (IRECOS), 7 (3), pp. 1038-1044.

Dhanalakshmi, B., Chandra Sekar, A., Fast and efficient opinion mining technique for online reviews using ranking and classification algorithms, (2013) International Review on Computers and Software (IRECOS), 8 (7), pp. 1587-1595.

Web reference http://archive.ics.uci.edu/ml/datasets.html.

Refbacks

There are currently no refbacks.

Username
Password
Remember me