Similarity Distance Based Clustering Framework for Aggregation of Web Usage Data

Joan M. John; A. Shajin Nargunam

doi:10.15866/irecos.v8i1.2782

Similarity Distance Based Clustering Framework for Aggregation of Web Usage Data

Joan M. John^(1*), A. Shajin Nargunam⁽²⁾

^(*) Corresponding author

DOI's assignment:
the author of the article can submit here a request for assignment of a DOI number to this resource!
Cost of the service: euros 10,00 (for a DOI)

Abstract

The rapid improvement of the World Wide Web as a standard for information broadcasting has engendered a rising attention in the area of Web personalization. Owing to this raise in expansion and difficulty of WWW, web site publishers are facing rising complexity in drawing and maintaining users. To plan trendy and striking websites publishers have got to appreciate their users’ needs. So as to determine the usage patterns that might be analyzed to user’s navigational actions, Web Usage Mining (WUM) application has been used. The knowledge obtained from the study of the user’s navigational actions (usage data) can be expediently subjugated in order to modify the Web information space to the provisions of users. Diverse techniques have been examined in literature for the recognition of Web user profiles. Recent work on web usage mining utilized clustering models such as k-means to aggregate the web user information. K-means is a distance based cluster model becomes more successful if only attribute value ranges are finite and categorical. In case of wide range values and undefined attributes k-means cannot cluster the data efficiently. To enhance the web usage mining process based on users’ behavior, in this work, we generate cluster of web usage data obtained web server logs to aggregate the data access information of the user related to web sites. Multi-cluster model is deployed in our proposal whose distance measure is based on more than one predominant attribute. Similarity distances of dominant attributes are evaluated to its threshold ranges of the web user profile history. Attribute cohesiveness is maintained with web access data logs of past and current data ranges. Web access attributes in consideration for the evaluation are user visits, page visits, session duration, product feature lists, active time of the page access, etc.,. Experimentation is conduct with real data sets extracted from UCI repository. Performance evaluation is made at the three stages to show the effectiveness of the proposed technique compared to the existing one referred from different contributors of web usage mining.
Copyright © 2013 Praise Worthy Prize - All rights reserved.

Keywords

Data Aggregation; Data Preprocessing; Fuzzy Clustering; Knowledge Discovery; Similarity Distance Based Text Clustering; Web Usage Mining

Full Text:

PDF

References

Adda, M. et. Al., Toward Recommendation Based on Ontology-Powered Web-Usage Mining, IEEE Journal Internet computing, pp.45-52, 2007

R. Sid Ahmed, K. Salim, B. Hafida ., CLOMAINT: A Data Mining Algorithm Applied in Maintenance SONATRACH, International Reviews on Computers and Software, pp. 258 - 263 Vol. 2. n. 3 May 2007

R. Dhir, C. Singh Recognition of Bilingual Segmented Characters (Gurmukhi and Roman), International Reviews on Computers and Software, pp. 173 - 180 Vol. 1. n. 2 Sept 2006

F. L. Gaol, B. Widjaja GraphConnect: Framework of Discovering Closed Highly Connected Pattern from Semistructured Dataset,International Reviews on Computers and Software, pp. 76 - 80 Vol. 3. n. 1 January 2008

M. Sassi, A. Grissa Touzi, H. Ounelli, A Multiple Aspect Data Model Design for Knowledge Discovery in Databases, International Reviews on Computers and Software, pp. 133 - 140 Vol. 3. n. 2 March 2008

D. Ben Ayed, N. Arous, N. Ellouze Phoneme Classification Using Different Approaches Based on Fuzzy Measurements and Neural Networks International Reviews on Computers and Software, pp. 205 - 211 Vol. 4. n. 2 March 2009

Wang, X.-J.et. Al., Duplicate-Search-Based Image Annotation Using Web-Scale Data, Proceeding of the IEEE journal on engineering profession, 2012, pp.2705-2721

Agarwal, M. Et. Al., Performance problem prediction in transaction-based e-business systems, IEEE Transactions on Network and Service Management, pp.1-10, 2008

Sung Min Bae et. Al., “Fuzzy Web ad selector based on Web usage mining”, IEEE journal on Intelligent Systems, pp. 62-69, 2003

Dong, D., Exploration on web usage mining and its application. Proceedings of the Journal on Intelligent Systems and Application, May 23-24, IEEE Xplore, Wuhan, 2009, pp: 1-4.

Yang, Y. Et. Al., GHIC: a hierarchical pattern-based clustering algorithm for grouping Web transactions, IEEE Transactions on Knowledge and Data Engineering, pp. 1300-1304, 2005

Petridou, S.G. et. Al., Time-Aware Web Users' Clustering, IEEE Transactions on Knowledge and Data Engineering. Pp. 653-667, 2008

Zahid Ansari et. Al., “A Fuzzy Clustering Based Approach for Mining Usage Profiles from Web Log Data”, (IJCSIS) International Journal of Computer Science and Information Security, Vol. 9, No.6, pp. 70-80, 2011

Ying Zhang et. Al., “Duplicate-Insensitive Order Statistics Computation over Data Streams”, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENG. VOL. 22, NO. 4, pp.493-507, APRIL 2010

Rozita A. Dara et. Al., Filter-Based Data Partitioning for Training Multiple Classifier Systems, IEEE TRANS. ON KNOWLEDGE AND DATA ENG, VOL. 22, NO. 4, pp. 508-522, APRIL 2010

Tak-Lam Wong et. Al., Learning to Adapt Web Information Extraction Knowledge and Discovering New Attributes via a Bayesian Approach, IEEE TRANS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 4, pp.523-536, APRIL 2010

Weifeng Su et. Al., Record Matching over Query Results from Multiple Web Databases, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 4, pp.578-589, 2010

Yi-Hong Chu et. Al., Density Conscious Subspace Clustering for High-Dimensional Data, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 1, pp.16-30, JAN 2010

Barzan Mozafari et. Al., Optimal Load Shedding with Aggregates and Mining Queries, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 1, pp.76-88, JAN 2010

Yuzhe Tang et. Al., LIGHT: A Query-Efficient Yet Low-Maintenance Indexing Scheme over dhts, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 1, pp.59-75, 2010

Renchu Guan et. Al., Text Clustering with Seeds Affinity Propagation, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 4, pp.627-637, 2011

Yun Yang et. Al., Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 2, pp.307-320, 2011

Sharadh Ramaswamy et. Al., Adaptive Cluster Distance Bounding for High-Dimensional Indexing, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 6, pp.815-830, JUNE 2011

Eric Hsueh-Chan Lu et. Al., Mining Cluster-Based Temporal Mobile Sequential Patterns in Location-Based Service Environments, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 6, pp.914-927, 2011

Jung-Yi Jiang et. Al., A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 3, pp.335-349, MARCH 2011

Pierrakos, D. Et. Al., Personalizing Web Directories with the Aid of Web Usage Data, IEEE Transactions on Knowledge and Data Engineering, pp.1331-1344, 2010

Nasraoui, O. Et. Al., A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites, IEEE Transactions on Knowledge and Data Engineering, pp.202-215, 2008

Refbacks

There are currently no refbacks.

Username
Password
Remember me