Retention Index Prediction of Flavor and Fragrance by Multiple Linear Regression and the Genetic Algorithm

The Kovats retention indexes of 51 flavor and fragrance compounds are determined using optimal molecular descriptor data. The genetic algorithm, built with Perl, is used to select the optimum molecular descriptor. The optimal molecular descriptor is used to predict the Kovats retention index with a multiple linear regression created with R. The determination of the molecular descriptor value can be efficiently conducted with free (open source) Online Chemical Database software. Both Perl and R are also free software. The results demonstrate that the 51 flavor and fragrance compounds give 170 molecular descriptors. Among those molecular descriptors, the optimal six are selected based on 200 repetitions in order to build a multiple linear regression (MLR). The best model is selected, and the optimization indicator has an R-Square value of 0.981, Adjusted R-Square value of 0.978 and root mean square error (RMSE) value of 43.50. The constructed genetic algorithm-multiple linear regression (GA-MLR) model can also predict the Kovats retention index with a differences average of 6.6%. The obtained results demonstrate that the GA-MLR method can predict the Kovats retention index.
Genetic Algorithm; Molecular Descriptor; Multiple Linear Regression; Kovats Retention Index; Flavor and Fragrance

