Data Access Prediction and Optimization in Data Grid Using SVM and AHL Classifications

Grace R. Kingsy; R. Manimegalai

doi:10.15866/irecos.v9i7.2109

Data Access Prediction and Optimization in Data Grid Using SVM and AHL Classifications

^(*) Corresponding author

DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)

Abstract

In recent years, scientific applications have the need to effectively handle huge volume of data. The processing and handling of such large quantity of data requires large scale computing infrastructure such as grid structures. The main objective of the proposed system is to improve the performance of the grid system by predicting the behavior of the application and its future events. Time Series classification technique models the behavior of the application based on three properties namely Stochasticity, Linearity and Stationarity. The major drawback of time series classification is that it is a complex process and not suitable for long-term forecasting. In order to overcome the above mentioned difficulties, in this work, two techniques, namely, Support Vector Machines (SVM) and Adaptive Hypergraph Learning (AHL) are used for predicting the behavior of user applications. SVM is a supervised learning algorithm which is used in many domains such as classification and regression analysis. AHL is a graph-based learning algorithm which uses K-Nearest Neighbors algorithm (KNN) to construct hypergraph. Confusion matrix is constructed to determine the accuracy of the prediction using the proposed algorithms based on SVM and AHL. The jobs are executed in the grid simulator, OptorSim, with the predicted events. Experimental results show that the proposed data prediction schemes using SVM and AHL decrease the application execution time by more than 50%.

Keywords

Classification; Time Series; Support Vector Machine; Adaptive Hypergraph Learning; Adaptive Sliding Window; Prediction

Full Text:

PDF

References

I. Foster and C. Kesselman, The Grid: Blueprint for a Future Computing Infrastructure (Morgan Kaufmann Publishers, USA, 1999).

M. Baker, R. Buyya, D. Laforenza, Grids and Grid Technologies for Wide-area Distributed Computing, Journal of Software Practice and Experience, Vol. 32, n. 15, pp. 1437-1466, 2002.

I. Chana, S. Bawa, Grid Middleware Technologies and OGSA: A Comparative Study, (2006) International Review on Computers and Software (IRECOS), 1. (3), pp. 202 - 208.

M. Parashar and S. Hariri, Autonomic Computing: Concepts, Infrastructure, and Applications (Manish Parashar and Salim Hariri eds. Taylor and Francis), Inc., 2007.

B. Allcock, A. Chervenak, I. Foster, C. Kesselman, M. Livny, Data Grid tools: enabling science on big distributed data, Journal of Physics: Conference Series, Vol. 16, n. 1, pp. 571-575, 2005.

Pan-STARRS Project: http://pan-starrs.ifa.hawaii.edu/public.

Large Hadron Collider Project (LHC): http://public.web.cern.ch/ public/en/LHC/LHC-en.html.

N.N. Dang and S.B. Lim, Combination of Replication and Scheduling in Data Grids, International Journal of Computer Science and Network Security, Vol. 7, n. 3, pp. 304-308, 2007.

Kong, F., Hao, H., Zuo, J., Classification and dynamic fuzzy clustering of mold resources based on mold manufacturing grid platform, (2012) International Review on Computers and Software (IRECOS), 7 (5), pp. 2447-2452.

R. P. Ishii, R.F. de Mello, An Online Data Access Prediction and Optimization Approach for Distributed Systems, IEEE Transactions on Parallel and Distributed Systems, Vol. 23, n. 6, pp. 1017-1029, 2012.

R package: www.cran.r-project.org.

W. H. Bell, D.G. Cameron, L. Capozza, A.P. Millar, K. Stockinger, F. Zini, OptorSim - A Grid Simulator for Studying Dynamic Data Replication Strategies, International Journal of High Performance Computing Applications, Vol. 17, n. 4, 2003.

G. Belalem, Economic Model for Consistency Management of Replicas in Data Grids with OptorSim Simulator, Networks for Grid Applications Lecture Notes of the Institute for computer Sciences, Social Informatics and Telecommunications Engineering, Vol. 2, pp. 121-129, 2009.

H.H.E. AL-Mistarihi and C.H. Yong, On Fairness, Optimizing Replica Selection in Data Grids, IEEE Transactions on Parallel and Distributed Systems, Vol. 20, n. 8, pp. 1102-1111, 2009.

X.Y. Ren, R.C. Wang, and Q. Kong, Using OptorSim to Efficiently Simulate Replica Placement Strategies, The Journal of China Universities of Posts and Telecommunications, Vol. 17, n. 1, pp. 111-119, 2010.

K. Jain, A.V. Vidhate, V. Wangikar, and S. Shah, Design of File Size and Type of Access Based Replication Algorithm for Data Grid, Proceedings of International Conference and Workshop Emerging Trends in Technology, ICWET 11 (Page: 315, Year of Publication: 2011, ISBN: 978-1-4503-0449-8).

K. Sashi and A.S. Thanamani, Dynamic Replication in a Data Grid Using a Modified BHR Region Based Algorithm, Future Generation Computer Systems, Vol. 27, n. 2, pp. 202-210, 2011.

M. Faerman, A. Su, R. Wolski, and F. Berman, Adaptive Performance Prediction for Distributed Data-intensive Applications, Proceedings of the 1999 ACM/IEEE Conference on Supercomputing (Page: 36, Year of Publication: 1999, ISBN:1-58113-091-0).

I. Aydin, M. Karakose, E. Akin, The Prediction Algorithm Based on Fuzzy Logic using Time Series Data Mining Method, World Academy of Science, Engineering and Technology, Vol. 51, pp. 91-98, 2009.

L.J. Senger, M. Santana, and R. Santana, An Instance-based Learning Approach for Predicting Parallel Applications Execution Times, Proceedings of Third International Information and Telecommunication Technologies Symposium (Page: 9, Year of Publication: 2005).

R. P. Ishii, R. A. Rios, R. F. de Mello, Classification of Time Series Generation Processes using Experimental Tools: A Survey and Proposal of an Automatic and Systematic Approach, International Journal of Computational Science and Engineering, Vol. 6, n. 4, pp. 217-237, 2011.

R.P. Ishii and R.F. de Mello, An Adaptive and Historical Approach to Optimize Data Access in Grid Computing Environments, INFOCOMP Journal of Computer Science, Vol. 10, n. 2, pp. 26-43, 2011.

N. Saadat, A. M. Rahmani, PDDRA: A New Pre-fetching Based Dynamic Data Replication Algorithm in Data Grids, Future Generation Computer Systems, Vol. 28, n. 4, pp. 666-681, 2012.

R. M. Rahman, K. Barker, R. Alhajj, A Predictive Technique for Replica Selection in Grid Environments, In Proceedings of Seventh IEEE International Symposium on Cluster Computing and the Grid (Page: 163, Year of Publication: 2007, ISBN: 0-7695-2833-3).

S. Naseera, K. V. M. Murthy, Performance Evaluation of Predictive Replica Selection using Neural Network Approaches, In Proceedings of International Conference on Intelligent Agent and Multi-Agent Systems, IAMA2 009 (Page: 1, Year of Publication: 2009, ISBN: 978-1-4244-4710-7).

R. Kingsy Grace, R. Manimegalai, Dynamic Replica Placement and Selection Strategies in Data Grids - A comprehensive Survey, Journal of Parallel and Distributed Computing, Vol. 74, n. 2, pp. 2099-2108, 2014.

Support Vector Machines (SVMs): http://en.wikibooks.org.

V. N. Vapnik, Statistical Learning Theory (Hoboken, NJ: Wiley-Interscience, 1998).

B. Demir, S. Erturk, Improving SVM Classification Accuracy using A Hierarchical Approach for Hyperspectral Images, In Proceedings of 16th IEEE International Conference on Image Processing (ICIP) (Page: 2849, Year of Publication: 2009, ISBN: 978-1-4244-4710-7).

X.O Li, W. Yu, SVM Classification for Large Datasets by Considering Models of Classes Distribution, In Proceedings of Sixth Mexican International Conference on Artificial Intelligence (MICAI) (Page: 51, Year of Publication: 2007, ISBN: 978-0-7695-3124-3).

J. Yu, D. Tao, M. Wang, Adaptive Hypergraph Learning and its Application in Image Classification, IEEE Transactions on Image Processing, Vol. 21, n. 7, pp. 3262-3272, 2012.

M. Kontaki, A.N. Papadopoulos, Y. Manolopoulos, Adaptive Similarity Search in Streaming Time Series with Sliding Windows, Data and Knowledge Engineering, Vol. 63, n. 2, pp. 478-502, 2007.

D. Narayanan, A. Donnelly, and A. Rowstron, Write Off-Loading: Practical Power Management for Enterprise Storage, Transactions on Storage, Vol. 4, n. 3, pp. 1-23, 2008.

R. Kohavi, F. Provost, Glossary of terms, Journal of Machine Learning, Vol. 30, n. 2-3, pp. 271-274, 1998.

H. Liu, P. Le Pendu, R. Jin, D. Dou, A Hypergraph-based Method for Discovering Semantically Associated Item sets, In Proceedings of IEEE 11th International Conference on Data Mining, ICDM 11 (Page: 398, Year of Publication: 2011, ISBN: 978-0-7695-4408-3).

Refbacks

There are currently no refbacks.

Username
Password
Remember me