DCMOGA: a New Method for Multiple Sequences Alignment Based on the Principle Divide and Conquers and the Multi-Objective Genetic Algorithm

Abdelhakim El Fatmi; Arakil Chentoufi; Molay Ali Bekri; Said Benhlima; Mohamed Sabbane

doi:10.15866/irecos.v11i8.10007

DCMOGA: a New Method for Multiple Sequences Alignment Based on the Principle Divide and Conquers and the Multi-Objective Genetic Algorithm

Abdelhakim El Fatmi^(1*), Arakil Chentoufi⁽²⁾, Molay Ali Bekri⁽³⁾, Said Benhlima⁽⁴⁾, Mohamed Sabbane⁽⁵⁾

^(*) Corresponding author

Authors' affiliations

DOI: https://doi.org/10.15866/irecos.v11i8.10007

Abstract

In this paper, we present a new algorithm to solve multiple alignments of nucleic acid sequences. The presented method is divided in three steps. Firstly, the algorithm separates the common regions between the aligned sequences (conserved regions) from the non-common regions. Consequently, the first step produces two types of sub-sequences. Secondly, the obtained non-common sub-sequences are aligned using a multi-objective genetic algorithm. Thirdly, we concatenate the sub-alignments obtained from the second step and the similar parts obtained from the first step to construct the final alignment. In order to evaluate the performance of the proposed algorithm, a comparative study has been performed using some of the most popular methods for alignment of nucleic acid sequences and the benchmark Bralibase (Benchmark RNA Database). The obtained results illustrate clearly the effectiveness of the proposed algorithm.
Copyright © 2016 Praise Worthy Prize - All rights reserved.

Keywords

Bioinformatics; RNA Sequence Alignment; Genetic Algorithm; DCMOGA (Distributed Cooperation Model of Multi-Objective Genetic Algorithm)

Full Text:

PDF

References

C. Notredame, Recent progress in multiple sequence alignment:a survey, Pharmacogenomics, vol. 3, no. 1, pp. 131–144, 2002.
http://dx.doi.org/10.1517/14622416.3.1.131

S. B. Needleman and C. D. Wunsch, A general method applicable to the search for similarities in the amino acid sequence of two proteins, Journal of molecular biology, vol. 48, no. 3, pp. 443–453, 1970.
http://dx.doi.org/10.1016/0022-2836(70)90057-4

T. F. Smith and M. S. Waterman, Identification of common molecular sub sequences, Journal of molecular biology, vol. 147, no. 1, pp. 195–197, 1981.
http://dx.doi.org/10.1016/0022-2836(81)90087-5

H. Carrillo and D. Lipman, The multiple sequence alignment problem in biology, SIAM Journal on Applied Mathematics, vol. 48, no. 5, pp. 1073–1082, 1988.
http://dx.doi.org/10.1137/0148063

J. Stoye, V. Moulton, and A. Dress, Dca: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment, Computer applications in the biosciences, vol. 13, no. 6, 1997.
http://dx.doi.org/10.1093/bioinformatics/13.6.625

X.-D. Wang, J.-X. Liu, Y. Xu, and J. Zhang, A survey of multiple sequence alignment techniques, in InternationalConference on Intelligent Computing. Springer, pp.529–538, 2015.
http://dx.doi.org/10.1007/978-3-319-22180-9_52

P. Hogeweg and B. Hesper, The alignment of sets of sequences and the construction of phyletic trees: an integrated method, Journal of molecular evolution, vol. 20, no. 2, pp.175–186, 1984.
http://dx.doi.org/10.1007/bf02257378

D.-F. Feng and R. F. Doolittle, Progressive alignment and phylogenetic tree construction of protein sequences, Methods in enzymology, vol. 183, pp. 375–387, 1990.
http://dx.doi.org/10.1016/0076-6879(90)83025-5

F. Corpet, Multiple sequence alignment with hierarchical clustering, Nucleic acids research, vol. 16, no. 22, pp.10 881–10 890, 1988.
http://dx.doi.org/10.1093/nar/16.22.10881

K. Katoh, K. Misawa, K.-i. Kuma, and T. Miyata, MAFFT: a novel method for rapid multiple sequence alignment based on fast fourier transform, Nucleic acids research, vol. 30,no. 14, pp. 3059–3066, 2002.
http://dx.doi.org/10.1093/nar/gkf436

C. Notredame, D. G. Higgins, and J. Heringa, T-coffee: A novel method for fast and accurate multiple sequence alignment, Journal of molecular biology, vol. 302, no. 1, pp. 205–217, 2000.
http://dx.doi.org/10.1006/jmbi.2000.4042

T. Lassmann and E. L. Sonnhammer, Kalign–an accurate and fast multiple sequence alignment algorithm, BMC bioinformatics, vol. 6, no. 1, p. 298, 2005.

J. D. Thompson, D. G. Higgins, and T. J. Gibson, Clustal w: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic acids research, vol. 22, no. 22, pp. 4673–4680, 1994.
http://dx.doi.org/10.1093/nar/22.22.4673

S. Wu and U. Manber, Fast text searching: allowing errors, Communications of the ACM, vol. 35, no. 10, pp. 83–91, 1992.
http://dx.doi.org/10.1145/135239.135244

D. G. Higgins and P. M. Sharp, Clustal: a package for performing multiple sequence alignment on a microcomputer, Gene, vol. 73, no. 1, pp. 237–244, 1988.
http://dx.doi.org/10.1016/0378-1119(88)90330-7

N. Saitou and M. Nei, The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular biology and evolution, vol. 4, no. 4, pp. 406–425, 1987.

F. Ortuno, J. P. Florido, J. M. Urquiza, H. Pomares, A. Prieto, and I. Rojas, Optimization of multiple sequence alignment methodologies using a multiobjective evolutionary algorithm based on nsga-ii, in Evolutionary Computation (CEC), 2012 IEEE Congress on. IEEE, pp. 1–8, 2012.
http://dx.doi.org/10.1109/cec.2012.6256146

R. C. Edgar, Muscle: multiple sequence alignment with high accuracy and high throughput, Nucleic acids research, vol. 32, no. 5, pp. 1792–1797, 2004.
http://dx.doi.org/10.1093/nar/gkh340

F. Naznin, R. Sarker, and D. Essam, Progressive alignment method using genetic algorithm for multiple sequence alignment, ,IEEE Transactions on Evolutionary Computation, vol. 16, no. 5, pp. 615–631, 2012.
http://dx.doi.org/10.1109/tevc.2011.2162849

D. Haussler, A. Krogh, K. Sj¨olander et al., Protein modelling using hidden markov models: Analysis of globins, in System Sciences, 1993, Proceeding of the Twenty-Sixth Hawaii International Conference on, vol. 1. IEEE, pp. 792–802, 1993.
http://dx.doi.org/10.1109/hicss.1993.270611

C. Notredame and D. G. Higgins, Saga: sequence alignment by genetic algorithm, Nucleic acids research, vol. 24, no. 8, pp. 1515–1524, 1996.
http://dx.doi.org/10.1093/nar/24.8.1515

C. Gondro and B. Kinghorn, A simple genetic algorithm for multiple sequence alignment, Genetics and Molecular Research, vol. 6, no. 4, pp. 964–982, 2007.

J. Taheri and A. Y. Zomaya, RBT-GA: a novel metaheuristic for solving the multiple sequence alignment problem, Bmc Genomics, vol. 10, no. Supp l 1, p. S10, 2009.
http://dx.doi.org/10.1186/1471-2164-10-s1-s10

P. Seeluangsawat and P. Chongstitvatana, A multiple objective evolutionary algorithm for multiple sequence alignment, in Proceedings of the 7th annual conference on Genetic and evolutionary computation. ACM, pp. 477–478, 2005.
http://dx.doi.org/10.1145/1068009.1068088

F. J. M. da Silva, J. M. S. Perez, J. Pulido, and M. Rodr´ıguez, Parallel niche pareto alineaga–an evolutionary multiobjective approach on multiple sequence alignment, Journal of Integrative Bioinformatics, vol. 8, no. 3, p. 174, 2011.

M. Kaya, A. Sarhan, and R. Alhajj, Multiple sequence alignment with affine gap by using multi-objective genetic algorithm, Computer methods and programs in biomedicine, vol. 114, no. 1, pp. 38–49, 2014.
http://dx.doi.org/10.1016/j.cmpb.2014.01.013

M. Kaya, B. Kaya, and R. Alhajj, A novel multi-objective genetic algorithm for multiple sequence alignment, International Journal of Data Mining and Bioinformatics, vol. 14, no. 2, pp. 139–158, 2016.
http://dx.doi.org/10.1504/ijdmb.2016.074684

H.B. Nicholas Jr, A. J. Ropelewski, D.W. Deerfield II, Strategies for Multiple Sequence Alignment. BioTechniques vol. 32, no 3, p. 572-591. 2002.

S.F. Altschul, R. J. Carroll, D.J. Lipman, Weights for Data Related by a Tree. J. Mol. Biol, vol. 207, no 4, p. 647-653.1989.
http://dx.doi.org/10.1016/0022-2836(89)90234-9

C. Notredame, L.Holm, D. G. Higgins, COFFEE: an objective function for multiple. BIOINFORMATICS, vol. 14, no 5, p. 407-422, 1998.
http://dx.doi.org/10.1093/bioinformatics/14.5.407

R. A. Cartwright, Logarithmic gap costs decrease alignment accuracy, BMC bioinformatics, vol. 7, no. 1, p. 527, 2006.

P. P. Gardner, A. Wilm, and S. Washietl, A benchmark of multiple sequence alignment programs upon structural rnas, Nucleic acids research, vol. 33, no. 8, pp. 2433–2439, 2005.
http://dx.doi.org/10.1093/nar/gki541

http://projects.binf.ku.dk/pgardner/bralibase/bralibase2.html

Refbacks

There are currently no refbacks.

Username
Password
Remember me