Adaptive Classification for Concept Drifting Data Streams with Unlabeled Data

Pramod D. Patil; Parag Kulkarni

doi:10.15866/irecos.v8i6.3275

Adaptive Classification for Concept Drifting Data Streams with Unlabeled Data

^(*) Corresponding author

DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)

Abstract

The research in data stream mining has gained a high attraction due to the importance of its applications and the increasing generation of streaming information. The data streams are the set of data, which are moving in specified distribution. Any changes or variation in their target can be considered as concept drift. Recent researches are concentrating reducing the level of concept drift. In this paper, we have planned to propose a concept drift controlling in unlabeled data streams. we have planned to incorporate a clustering technique and the decision tree algorithm to control the concept drift in the unlabeled data. We have planned to start the training process with the unlabeled data. After the training process the data are labelled and a decision tree is created for the train data. This data is again tested for error value for a particular number of iteration. The experimentation is conducted by using sky concept dataset. The experimental evaluation produced satisfactory results. The minimum error rate obtained is in the range of 0.02 to 0.3 percentages.
Copyright © 2013 Praise Worthy Prize - All rights reserved.

Keywords

Data Stream; Concept Drift; Clustering; Decision Tree; Similarity Measure; Bias Value

Full Text:

PDF

References

Xiangnan Kong and Philip S. Yu, "An Ensemble-based Approach to Fast Classification of Multilabel Data Streams", In proceeding of 7th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), pp. 95-104, 2011.

S. Wu, C. Yang and J. Zhou, "Clustering-training for Data Stream Mining", In proceedings if ICDMW, pp. 653-656, 2006.

S. S. Ho and H. Wechsler, "Detecting Changes in Unlabeled Data Streams Using Martingale", In proceedings of IJCAI, pp. 1912-1917, 2007.

M. M. Masud, J. Gao, K. Latifur, and J. W. Han, "APractical Approach to Classify Evolving Data Streams: Training with Limited Amount of Labeled Data", In proceedings of ICDM’08, pp. 929-934, 2008.

D. H. Widyantoro, "Exploiting unlabeled data inconcept drift learning", Jurnal Informatika, vol. 8, no.1, pp. 54-62, 2007.

Peipei Li, Xindong Wu and Xuegang Hu, "Learning from Concept Drifting Data Streams with Unlabeled Data", In Proceedings of the 24th AAAI Conference on Artificial Intelligence, vol. 92, no. 1, pp. 145-155, 2012.

C. Aggarwal, Data Streams: Models and Algorithms, Springer.

Albert Bifet, Geoff Holmes, Bernhard Pfahringer, Richard Kirkby and RicardGavalda, "New ensemble methods for evolving data streams", In Proceedings of 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 139-148, 2009.

J. Gao, W. Fan, J. Han. On appropriate assumptions to mine data streams: analysis and practice, In Proc. of ICDM 2007.

H.Wang, W. Fan, P. Yu, J. Han. Mining concept-drifting data streams using ensemble classifiers,In Proc. of KDD 2003.

P. Domingos, G. Hulten. Mining high-speed data streams, In Proc. of KDD, 2000.

C. Domeniconi, D. Gunopulos. Incremental Support VectorMachine Construction, In Proc. of ICDM 2001.

Peng Zhang, Xingquan Zhu, Jianlong Tan and Li Guo, "Classifier and Cluster Ensembles for Mining Concept Drifting Data Streams", In Proceedings of IEEE 10th International Conference on Data Mining (ICDM), pp. 1175-1180, 2010.

Tsymbal A, "The problem of concept drift: Definitions and related work", 2004.

Ioannis Katakis, Grigorios Tsoumakas and Ioannis Vlahavas, "Tracking recurring contexts using ensemble classifiers: an application to email filtering", Knowledge and Information Systems, Springer, vol. 22, no. 3, pp. 371-391, 2010.

L. Schafer, "Analysis of incomplete multivariate data. Monographs on statistics and applied probability", 1997.

J. A. Little and D. B. Rubin, "Statistical Analysis with Missing Data", Wiley Series in Probability and Statistics, 2002.

Roman Garnett, "Learning from Data Streams with Concept Drift", 2008.

Kubat, M.,“Floating approximation in time-varying knowledge bases”, Pattern recognition letters, pp. 223–227, 1989.

Klinkenberg, R., and Joachims, T., “Detecting concept drift with support vector machines”, Proc. 7th ICML 11, 2000.

Delany, S. J.; Cunningham, P.; Tsymbal, A.; and Coyle, L., “A case-based technique for tracking concept drift in spam filtering”, Knowledge-Based Systems, vol. 18, pp. 4–5, 2005.

Kolter, J., and Maloof, M., “Dynamic weighted majority:a new ensemble method for tracking concept drift”, In 3rd IEEE ICDM, pp. 123–130, 2003.

Patrick Lindstrom, Sarah Jane Delany and Brian Mac Namee, "Handling Concept Drift in a Text Data Stream Constrained by High Labelling Cost", In Proceedings of Florida Artificial Intelligence Research Society Conference (FLAIRS), 2010.

G. Hulten, L. Spencer, and P. Domingos, “Mining timechanging data streams,” in Proc. KDD, pp. 97–106, 2001.

H. Wang, W. Fan, P. S. Yu, and J. Han, “Mining concept drifting data streams using ensemble classifiers,” in Proc. KDD ’03, pp. 226–235, 2003.

J. Kolter and M. Maloof., “Using additive expert ensembles tocope with concept drift.” in Proc. ICML, pp. 449–456, 2005.

C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu, “A framework for on-demand classification of evolving data streams,” IEEE TKDE, vol. 18, no. 5, pp. 577–589, 2006.

Mohammad M. Masud, Tahseen M. Al-Khateeb, Latifur Khan, Charu Aggarwal, Jing Gao, Jiawei Han and Bhavani Thuraisingham, "Detecting Recurring and Novel Classes in Concept-Drifting Data Streams", In Proceedings of IEEE 11th International Conference on Data Mining, pp. 1176-1181, 2011.

EIA data 826, “http://www.eia.gov/cneaf/electricity/page/data.html”

H. Boudouda, H. Seridi, Hybrid Algorithm for Unsupervised Automatique Classification”, (2008) International Review on Computers and Software (IRECOS), 3 (3), pp. 296 – 301.

Hristea, F.T., Recent advances concerning the usage of the naïve bayes model in unsupervised word sense disambiguation, (2009) International Review on Computers and Software (IRECOS), 4 (1), pp. 54-67.

Dhifallah, J.B.S., Laabidi, K., Lahmari, M.K., Classification methods based on pattern recognition and on neural networks for failure detection, (2010) International Review on Computers and Software (IRECOS), 5 (3), pp. 257-263.

Refbacks

There are currently no refbacks.

Username
Password
Remember me