Randic Shape and Size Indices Account for the Variability in Xanthine Oxidase Inhibitory Activity of a Family of Fused Pyrans LIZA

Gout and oxidative stress have been strongly associated with hyperuricemia, a metabolic defect marked by high levels of uric acid (UA) in the serum. Hyperuricemia has been managed by the use of drugs that inhibit xanthine oxidase. The recent account on synthesis of 4-aryl/heteroaryl4H-fused pyrans as XO inhibitors provided excellent opportunity to uncover the crucial properties of these compounds that confer XO inhibitory action. In here, multiple linear regression analysis of DRAGON-type descriptors showed that the Randic Shape Index (PW3), and a size descriptor P_VSA_v_3account for the 75% of the variability of IC50 values. Correlation studies with familiar QSAR descriptors indicate that the observed activity is primarily influenced by the molecular ovality and volume, and partly by charge distribution. The Comparative Molecular Field Analysis (CoMFA) models provide further insights on the steric and electronic features of this class of XO inhibitors.


INTRODUCTION
Gout, oxidative stress, and other systemic disorders like cardiovascular and other metabolic diseases have been strongly associated with hyperuricemia, a disorder in the metabolism marked by high levels of uric acid (UA) in the serum 1 .The management of hyperuricemia involves reducing the production, increasing the excretion, and mopping out of excess uric acid 2 .The reduction of UA level in the blood has been accomplished by dispensing xanthine oxidase (XO) inhibitors.XO is responsible for the conversion of purine to UA in the latter part of the metabolic pathway, particularly the catabolism of hypoxanthine to xanthine followed by hydroxylation of xanthine to uric acid 3,4 .During purine oxidation, XO utilizes dioxygen as electron acceptor that is converted to reactive oxygen species (ROS) in the process.The ROS derived from XO has been linked to several disease conditions including carcinogenesis, atherosclerosis, inflammation, and chronic obstructive pulmonary disease (COPD) 5 .Therefore, the selective inhibition of XO may pave the way for multi-targeted chemotherapy for gout, cardiovascular disease, cancer, and oxidative damage 6 .
Recently, a series of forty-one 4-aryl/ heteroaryl-4H-fused pyrans, has been prepared and tested for xanthine oxidase (XO) inhibition 26 .This family of compounds offers great opportunity to identify the relevant properties inherent in these compounds that dictate their xanthine oxidase inhibitory activity.A routine approach in drug design is the generation of quantitative structure-activity relationship (QSAR) models, which relate the observed bioactivity to the molecular properties 27 as exemplified in our previous works 28,29 .QSAR studies are useful in the ligand-based design of next generation drug candidates with improved pharmacodynamics and pharmacokinetics properties.A QSAR study on this sizeable class of compounds unravels the crucial chemical properties and the extent of the contribution of each predictor to effect XO inhibition.Thus, in this study over 4000 DRAGON-type descriptors were generated for each member of the pyran family.The structure-derived predictors that influence the variation of experimental XO activity were determined by multiple linear regression analyses.The generated QSAR models were supplemented with Comparative Molecular Field Analysis (COMFA) on the same set of pyran XO inhibitors.In COMFA 30 , the 3D structure of a molecule defined by an array of grid structures is subjected to a probe atom or functional group to calculate the steric and electrostatic fields at certain points outside the grid 27 .The results of this work provide critical insight in the development of next generation of fused pyrans as therapeutic agents for hyperuricemia.

Data Collection
The biological data (IC 50 ) of 4-aryl/ heteroaryl-4H-fused pyrans (Table 1) included in this study were obtained from literature 26 .IC 50 denotes the concentration of the test compound that exerts 50% inhibition of the biological action.The 3D structures of all compounds were generated using Spartan 14 ® (Wave function, Inc.) software.Geometry optimization was performed at semiempirical PM3 level of theory prior to the generation of molecular electrostatic potential map and other isosurfaces.The CoMFA models were created with the use of BIOVIA Discovery Studio (DS) 2017 software (http://accelrys.com).The QSAR descriptors were computed (using the DRAGON 5 software (http://www.talete.mi.it).Over 4000 descriptors were computed including 0D properties (constitutional); 1D descriptors (i.e.atom-centered fragments, functional groups, properties, and information descriptors); 2D descriptors (i.e.connectivity, edge adjacency, molecular walk counts, topological, topological charge indices, eigenvalue-based indices, Burden eigenvalues, and 2D autocorrelation descriptors); and 3D descriptors (i.e.geometry, charge, Randic molecular profiles, 3D-MoRSE, RDF, GETAWAY, and WHIM descriptors) 31 .

QSAR Model Building
To establish a quantitative relationship between structure-based properties (dependent variables) and experimental activity (independent variable) of pyrans, a multiple linear regression (model was derived with the use of forward stepping protocol 32 in SPSS version 20 that ran on MacOS 10.11 system.In order to relate the k independent variables or descriptors (Xi) to the dependent variable Y, a linear function, with parameters α and β,were obtained in the form: To preliminarily assess the quality of the fitted equation, the amount of the variability in the dependent variable that is explained by the regression equation was determined by calculating the squared correlation coefficient (r 2 ) 33 .In addition, bivariate correlation studies were done to check for multi-collinearity among the predictors. (1)

QSAR model validation
The predictive ability of the QSAR model was evaluated using Leave-One-Out (LOO) 34 and Leave-Group-Out (LGO) 35 techniques.In LOO method, a single entry (row) is removed from the dataset containing n entries, then a new equation is generated based on n-1 dataset left.Subsequently, the equation is used to calculate the response variable y for the compound that had been removed.This process is done repeatedly until all y values in the dataset are obtained.In the LGO approach, the test set (a group of compounds omitted in each instance) is usually 20% of n.A model is then generated based on the remaining 80% of the dataset, also called as training set.The generated model is used to calculate the y values for the excluded compounds.After which, the original dataset of n compounds is restored and another round of test set selection, refitting based on new training set, and prediction of y values for the compounds in the test set.This process is continued until all the y values for n dataset have been calculated.In this work, four more cycles of y value calculations were done in order to obtain an average of five predicted values for each compound.
The cross-validated r 2 value, (also known as q 2 ) was determined to assess the statistical validity of the model.The q 2 values for the crossvalidated models were calculated from the model PRESS(prediction error sum of the squares) according to equations 2 and 3, respectively.Both r 2 and q 2 are useful indicators of model validity.While the r 2 value quantifies the goodnessof-fit, the q 2 value determines the goodness of prediction 33 .
Further more, to verify the absence of chance correlation, y-randomization 36 was performed by randomly scrambling the y values while keeping the x values intact.Accordingly, the QSAR model was used to predict the randomly scrambled y values.

QSAR in drug discovery
Drug discovery in modern times follows a repetitive, cyclic work flow involving three major steps that include design, synthesis, and evaluation 33 .One of the two approaches in drug design,called ligand-based discovery, is appropriate in cases where a set of compounds with common biological action is known.QSAR study can be performed on a set of compounds in order to decipher the structure-based properties that influence the experimental bioactivity.The deep understanding of the relationship between molecular structure and bioactivity is useful in the design of a lead compound and its derivatives.
QSAR models are mathematical equations derived from systematic analysis of diverse descriptors or properties of a compound using appropriate statistical techniques.A supervised univariate method of analysis called multiple linear

DRAGON-Type Descriptors and QSAR Model
In this study, over 4000 molecular descriptors were calculated for each of the 41 XO inhibitors using the DRAGON software.The output of DRAGON as text file was converted to Microsoft Excel ® format and, in turn, exported to SPSS ® .After removing the descriptors that have zero or invariant values throughout the dataset of 41 compounds, only 3062 independent variables (descriptors) were left.With a sample size of 41, the QSAR model should contain at most 8 descriptors, following the rule-of-thumb in model development of having 1 independent variable per 5 samples) 33,37 .Incidentally, the multiple linear regression analysis of this dataset using both Enter and Stepwise methods returned eight equations consisting of one, two, three up to eight variables, respectively.The eight-variable model includes the following descriptors: PW3, P_VSA_v_3, Sp Diam_AEA (ed), RDF045u, SpMax2_Bh(m), H8m, R4i, and Mor15s, in order of decreasing contribution to the variability of the dependent variable (IC 50 ).
Nevertheless, the correlation analysis on these descriptors revealed notable multi-collinearity involving several descriptors, particularly the four least important contributors to XO inhibitory activity.Thus, it is instructive to remove these four descriptors namely, Sp Max2_Bh(m), H8m, R4i, and Mor15s.The r 2 value was hardly eroded (Δr 2 = 0.067) by the exclusion of the four variables, and thus the resulting QSAR model (Equation 4) with just four predictors remains adequate in explaining the variability of IC 50 .
The MLR analysis showed that only 7% of the variability of IC 50 values was explained by the four excluded descriptors combined.On the other hand, the four included predictors in the model account for 90% of the variation in XO inhibitory activity of fused pyrans.Specifically, the Randic shape index Path/Walk 3 (PW3)alone explains more than half (57%) of the variation of response variable, and the combined PW3 and a size index P_VSA_v_3, account for 75% of the fluctuations in IC 50 values.The negative coefficient of PW3 indicates that its value should increase in order to minimize IC 50 and improve the potency of the XO inhibitor.The graph theoretical index Path/Walk 3 introduced by Randic 38 has been demonstrated to give regressions of high quality for a number of physicochemical properties.Meanwhile, P_VSA_v_3, which stands for P-VSA on van der Waals volume, bin 3, is obviously a size descriptor.P-VSA-like descriptors are defined as the amount of van der Waals surface area (VSA) having a property P in a certain range 39 .
To appreciate the remarkable influence of a poorly understood graph theoretical index like PW3 on the observed activity of a compound, the DRAGON-type descriptors were correlated with typical QSAR variables such as those that can be derived from Spartan software (e.g.MW, area, volume, etc.).The interrelatedness of the two groups of descriptors is evident in Table 2.It can be seen that PW3 is highly significantly inversely correlated with Ovality and LogP.These results indicate that a more spherical and more hydrophilic pyran-based lead tends to be a more potent XO inhibitor.Similarly, the negative coefficient of P_VSA_v_3, which is highly significantly correlated with size-dependent properties like MW, area, volume, and polarizability, indicates that a larger and more polarizable variant of fused pyran tends to have smaller IC 50 .Further more, a molecule with smaller SpDiam_AEA(ed) and larger RDF045u values tends to enhance potency, albeit these descriptors have minor contributions to observed XO inhibitory activity.SpDiam_AEA(ed) is an edge adjacency index while RDF045u is a radial distribution function descriptor (http://www.talete.mi.it).

QSAR Model Validation
The predictive ability of the QSAR model was evaluated by performing cross-validation using both leave-one-out (LOO) and leave-group-out (LGO) methods.The results detailed in Table 3 and illustrated graphically in Figure 1

3D QSAR by CoMFA
Finally, Comparative Molecular Field Analysis (CoMFA) was also performed on the same set of 4-aryl/heteroaryl-4H-fused pyrans to further shed light on the structural features needed to effect XO inhibition.The 3D QSAR model was built using Partial Least Squares (PLS) regression using energy grids as descriptors.The energy grids were computed using two probe types, a positive point charge H + and van der Waals Carbon, designed to measure electrostatic and steric effects, respectively.In model building, the most uncorrelated descriptors were chosen.The dataset of 41 compounds was split into two; the 11 compounds were randomly selected to constitute the test set, while the remaining 30 compounds served as the training set.Then, the log IC 50 model was successfully created and validated.The external validation using the test set yielded a predictive squared correlation coefficient, q 2 of 0.49, which is above the threshold value 40 of 0.3, although lower than practical cut-off 41 of 0.6.Although the 3D QSAR model is not as robust as Equation 4, the insights it offers are consistent with that of the QSAR model consisting of DRAGON-derived indices (vide infra).
The results of CoMFA (Fig. 4) indicate that a new molecule must have stronger van der Waals (vdW) attractive interaction at the 4-aryl position and at distal end of the heteroaryl moiety (i.e.green vdW isosurface) of the fused pyran.These spots almost coincide with the red electrostatic potential (EP) isosurface, where there should be no negative functionalities.In addition, the EP isosurface of the 3D QSAR model favors the presence of electronegative groups at the proximal end (i.e.blue isosurface) of the heteroaryl fragment as well as the amino/nitrile-substituted side of the core pyran.Compounds 20 -25, the most active class of fused pyrans satisfy these requirements, the additional benzene fused to the pyranone having set the stage for van der Waals interaction while the lactone functionality provided the negative electrostatic potential.Apparently, the compounds with larger heteroaryl group (i.e.20 -30, 36 -41) are generally more active agents.Furthermore, the chromen-2one-4H-fused pyrans (20 -25), having negative potential at the right position, are more active than their naphthalene counterparts (26 -30).These observations are consistent with the QSAR model derived from DRAGON-type descriptors and the 3D    created by performing multiple linear regression analysis on a dataset consisting of 41 compounds, with over 4000 Dragon-type descriptors each, unveils the crucial molecular properties that confer XO inhibitory action.In particular, the Randic Shape Index Path/Walk 3 (PW3), and the size descriptor P_VSA-like on van der Waals Volume, bin 3 (P_VSA_v_3), which are highly significantly correlated with hydrophilicity and molecular size, respectively, account for three quarters of the variation in IC 50 values.A more spherical, hydrophilic, and larger fused pyran tends to be a more active XO inhibitor.This is in accord with Comparative Molecular Field Analysis (CoMFA), which points to a compound with larger heteroaryl fragment that enhances van der Waals attraction at the distal end, and a negative functionality at its promixal side with respect to the pyran core.These findings provide useful insights in the design of more active drug candidates for hyperuricemia.
demonstrate the robustness (large q 2 ) and thus the utility of the QSAR model in the design of next generation XO inhibitors.The random distribution of residuals around Z-score(4)   IC 50 = 507.335(±37.711)-1747.216(±100.700)PW3 -0.139 (±0.015)P_VSA_v_3 + 25.837 (±4.061)SpDiam_AEA(ed)-1.344(±0.235)RDF045u n = 41 r 2 = 0.902 F = 82.6Sig = 0.000value of 0 and within ±3 (Fig.2) indicates the absence of any bias or systematic error, and outliers with the use of the QSAR model.Furthermore, the results of y-randomization validation also indicate the absence of chance correlation, with a negligible correlation coefficient (r = 0.089) for the jumbled experimental IC 50 values and the calculated values.

Fig. 4 .
Fig. 4. Training set molecules aligned with the is osurface of the 3D QSAR model coefficients on van der Waals grids (left) and electrostatic potential grids (right)