Open Access Open Access  Restricted Access Subscription or Fee Access

Adaptive Piecewise Approximation Based on Weighted Average Approach for Symbolic Time Series Representation


(*) Corresponding author


Authors' affiliations


DOI: https://doi.org/10.15866/irecos.v10i8.7151

Abstract


The process of time series representation has become a significant research problem. It is a challenging task to effectively manage and analyze large volumes of data with high dimensionality. Many high-level representations of time series have been proposed for mining tasks. The selection of effective and scalable method for the appropriate representation of data is another challenge. Piecewise Aggregate Approximation (PAA) method is applied in an effective similarity search of time series to minimize dimensionality by the mean values of equal-sized frames. PAA focus on mean value takes into consideration exclusively the central tendency and not the dispersion present in each segment, which may contribute to some important patterns being missed in some time series data sets and may not yield good tightness of the lower bound. We propose a method based on PAA called Time-weighted Average for Symbolic Aggregate Approximation (TWA_SAX) and compare its performance with the current methods. TWA_SAX enables raw data to be specifically compared to the minimized representation and ensures reduced limits to Euclidean distance. TWA_SAX is utilized to generate faster, more precise algorithms for similarity searches and improves the preciseness of the representation of time series data through enabling better tightness of the lower limit.
Copyright © 2015 Praise Worthy Prize - All rights reserved.

Keywords


Time Series; Dimensionality Reduction; Piecewise Aggregate Approximation; Similarity Search; Symbolic; Discretize

Full Text:

PDF


References


E. Keogh and J. Lin, "Clustering of time-series subsequences is meaningless: implications for previous and future research," Knowledge and information Systems, vol. 8, pp. 154-177, 2005.
http://dx.doi.org/10.1007/s10115-004-0172-7

D. E. Shasha and Y. Zhu, High performance discovery in time series: techniques and case studies: Springer, 2004.
http://dx.doi.org/10.1007/978-1-4757-4046-2

M. G. V. Marroquín, et al., "Time series: Empirical characterization and artificial neural network-based selection of forecasting techniques," Intelligent Data Analysis, vol. 13, pp. 969-982, 2009.

S.-H. Park, et al., "Trend forecasting of financial time series using PIPs detection and continuous HMM," Intelligent Data Analysis, vol. 15, pp. 779-799, 2011.

F. Korn, et al., "Efficiently supporting ad hoc queries in large datasets of time sequences," ACM SIGMOD Record, vol. 26, pp. 289-300, 1997.
http://dx.doi.org/10.1145/253262.253332

J. L. E. K. S. Lonardi and P. Patel, "Finding motifs in time series," in Proc. of the 2nd Workshop on Temporal Data Mining, 2002, pp. 53-68.

E. Keogh, et al., "Dimensionality reduction for fast similarity search in large time series databases," Knowledge and information Systems, vol. 3, pp. 263-286, 2001.
http://dx.doi.org/10.1007/pl00011669

K. Chakrabarti, et al., "Locally adaptive dimensionality reduction for indexing large time series databases," ACM Transactions on Database Systems (TODS), vol. 27, pp. 188-228, 2002.
http://dx.doi.org/10.1145/568518.568520

Q. Chen, et al., "Indexable PLA for efficient similarity search," in Proceedings of the 33rd international conference on Very large data bases, 2007, pp. 435-446.

K.-P. Chan and A.-C. Fu, "Efficient time series matching by wavelets," in Data Engineering, 1999. Proceedings., 15th International Conference on, 1999, pp. 126-133.
http://dx.doi.org/10.1109/icde.1999.754915

C. Faloutsos, et al., Fast subsequence matching in time-series databases vol. 23: ACM, 1994.
http://dx.doi.org/10.1145/191843.191925

J. Lin, et al., "A symbolic representation of time series, with implications for streaming algorithms," in Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, 2003, pp. 2-11.
http://dx.doi.org/10.1145/882082.882086

B. Lkhagva, et al., "Extended SAX: Extension of symbolic aggregate approximation for financial time series data representation," DEWS2006 4A-i8, vol. 7, 2006.

J. Shieh and E. Keogh, "i SAX: indexing and mining terabyte sized time series," in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008, pp. 623-631.
http://dx.doi.org/10.1145/1401890.1401966

Y. Zhu, "High performance data mining in time series: Techniques and case studies," New York University, 2004.

E. Montañés, et al., "A greedy algorithm for dimensionality reduction in polynomial regression to forecast the performance of a power plant condenser," Intelligent Data Analysis, vol. 15, pp. 733-748, 2011.

A. A. Bakar, et al., "Discretization of time series dataset using relative frequency and K-nearest neighbor approach," in Advanced Data Mining and Applications, ed: Springer, 2010, pp. 193-201.
http://dx.doi.org/10.1007/978-3-642-17316-5_18

J. Han, et al., "Efficient mining of partial periodic patterns in time series database," in Data Engineering, Proceedings.99, 15th International Conference on, 1999, pp. 106-115.
http://dx.doi.org/10.1109/icde.1999.754913

V. J. Rayward-Smith, "Measure based metrics for aggregated data," Intelligent Data Analysis, vol. 15, pp. 109-130, 2011.

J. Han, et al., "Efficient mining of partial periodic patterns in time series database," in Data Engineering, . Proceedings.99, 15th International Conference on, 1999, pp. 106-115.
http://dx.doi.org/10.1109/icde.1999.754913

F. Rechy-Ramírez, et al., "Times series discretization using evolutionary programming," in Advances in Soft Computing, ed: Springer, 2011, pp. 225-234.
http://dx.doi.org/10.1007/978-3-642-25330-0_20

H. Azami, et al., "Automatic signal segmentation using the fractal dimension and weighted moving average filter," Journal of Electrical & Computer science, vol. 11, pp. 8-15, 2011.

E. Meijering, "A chronology of interpolation: from ancient astronomy to modern signal and image processing," Proceedings of the IEEE, vol. 90, pp. 319-342, 2002.
http://dx.doi.org/10.1109/5.993400

L. Karamitopoulos and G. Evangelidis, "A dispersion-based paa representation for time series," in Computer Science and Information Engineering, 2009 WRI World Congress on, 2009, pp. 490-494.
http://dx.doi.org/10.1109/csie.2009.622

N. Q. V. Hung and D. T. Anh, "An improvement of PAA for dimensionality reduction in large time series databases," in PRICAI 2008: Trends in Artificial Intelligence, ed: Springer, 2008, pp. 698-707.
http://dx.doi.org/10.1007/978-3-540-89197-0_64

C. Guo, et al., "An improved piecewise aggregate approximation based on statistical features for time series mining," in Knowledge Science, Engineering and Management, ed: Springer, 2010, pp. 234-244.
http://dx.doi.org/10.1007/978-3-642-15280-1_23

H. Li and C. Guo, "Piecewise aggregate approximation based on the major shape features of time series," Information Sciences, 2013.
http://dx.doi.org/10.1016/j.ins.2013.02.052

Y. SUN, et al., "A Trend Based Lower Bounding Distance Measure for Time Series⋆," Journal of Computational Information Systems, vol. 10, pp. 5907-5914, 2014.

R. Moskovitch and Y. Shahar, "Medical temporal-knowledge discovery via temporal abstraction," in AMIA Annual Symposium Proceedings, 2009, p. 452.

S.-y. Jiang, et al., "Approximate equal frequency discretization method," in Intelligent Systems, 2009. GCIS'09. WRI Global Congress on, 2009, pp. 514-518.
http://dx.doi.org/10.1109/gcis.2009.131

A. M. Ahmed, et al., "Harmony Search algorithm for optimal word size in symbolic time series representation," in Data Mining and Optimization (DMO), 2011 3rd Conference on, 2011, pp. 57-62.
http://dx.doi.org/10.1109/dmo.2011.5976505

A. M. Ahmed, et al., "A Harmony Search Algorithm with Multi-pitch Adjustment Rate for Symbolic Time Series Data Representation," International Journal of Modern Education and Computer Science (IJMECS), vol. 6, p. 58, 2014.
http://dx.doi.org/10.5815/ijmecs.2014.06.08

G. Li, et al., "TSX: A novel symbolic representation for financial time series," in PRICAI 2012: Trends in Artificial Intelligence, ed: Springer, 2012, pp. 262-273.
http://dx.doi.org/10.1007/978-3-642-32695-0_25

N. Q. V. Hung and D. T. Anh, "Combining SAX and piecewise linear approximation to improve similarity search on financial time series," in Information Technology Convergence, 2007. ISITC 2007. International Symposium on, 2007, pp. 58-62.
http://dx.doi.org/10.1109/isitc.2007.24

B.-K. Yi and C. Faloutsos, "Fast time sequence indexing for arbitrary Lp norms," 2000.

A. Apostolico, et al., "Monotony of surprise and large-scale quest for unusual words," presented at the Proceedings of the sixth annual international conference on Computational biology, Washington, DC, USA, 2002.
http://dx.doi.org/10.1145/565196.565200

R. Larsen and M. Marx, "An Introduction to Mathematical Statistics and its Applications Prentice-Hall," Englewood-Cli s, New Jersey, 1986.

M. L. Marx and R. J. Larsen, Introduction to mathematical statistics and its applications: Pearson/Prentice Hall, 2006.

S. Lonardi, "Global detectors of unusual words: design, implementation, and applications to pattern discovery in biosequences," Purdue University, 2001.

A. Apostolico, et al., "Monotony of surprise and large-scale quest for unusual words," Journal of Computational Biology, vol. 10, pp. 283-311, 2003.
http://dx.doi.org/10.1089/10665270360688020

R. Wetzels, et al., "Statistical evidence in experimental psychology an empirical comparison using 855 t tests," Perspectives on Psychological Science, vol. 6, pp. 291-298, 2011.
http://dx.doi.org/10.1177/1745691611406923

B. Lkhagva, et al., "New time series data representation ESAX for financial applications," in Data Engineering Workshops, 2006. Proceedings. 22nd International Conference on, 2006, pp. x115-x115.
http://dx.doi.org/10.1109/icdew.2006.99


Refbacks

  • There are currently no refbacks.



Please send any question about this web site to info@praiseworthyprize.com
Copyright © 2005-2024 Praise Worthy Prize