Pharmacoinformatics Profiling and Dynamic Studies of Selected Compounds Acting as Potential Inhibitors against DPP4 Enzyme

DPP-IV rapidly degrades glucagon-like peptide-1 and glucose-dependent insulinotropic peptides. Delaying the breakdown of endogenous incretin hormones with DPP-IV inhibitors may help correct the physiologic deficit. The purpose of this work is to identify new compounds that inhibit the DPP-IV enzyme. The anticipated compounds were potent anti-diabetic candidates in this investigation. Two 2d QSAR models were created using 179 different substances from diverse sources. QSAR models were created using two methods. The first technique included docking score as an additional descriptor, while the second did not. Docking-based QSAR considered 74 compounds out of 179. Another approach used 40 molecules from 179 compounds. Each method had a precise strategy. Descriptors were computed using DRAGON for both training and test sets. Using DRAGON data, SYSTAT generated regression curves. The docking-based QSAR model produced R2=0.7098 (training set) and R2=0.9987 (test set), whereas the other technique produced R2=0.7644 (training set) and R2=0.9857 (test set).


INTRODUCTION
Diabetes is a worldwide metabolic disorder that has become a widespread epidemic disease in the last few decades, owing especially to the increasing frequency and incidence of type 2 diabetes mellitus (T2DM), which affects 90-95 percent of people with the disease and is difficult to cure. The ultimate treatment goals for T2DM include the long-term regulation of blood sugar levels and the treatment of diabetes complications. Exercise plays an important role in the prevention and control of insulin resistance, prediabetes, gestational diabetes mellitus, type 2 diabetes, and diabetes-related health problems. Exercise must be done regularly to reap long-term benefits, and it must comprise a variety of regular pieces of training. Aerobic and resistance training both increase insulin action, at least immediately, and aid in the management of blood glucose levels, lipids, blood pressure, cardiovascular risk, mortality, and quality of life. For all of the reasons described above, physical exercise is an important component in the prevention and control of type 2 diabetes and can be regarded as a cornerstone of diabetes management, along with proper food and medicine. Current oral therapy options include sulfonylureas, metformin, thiazolidinedione derivatives, glycosidase inhibitors, and newly developed dipeptidyl peptidase-4 (DPP-4) inhibitors and glucagon-like peptide-1 (GLP-1) analogs 1,2 .
Glucagon-like peptide-1 (GLP-1) is an incretin hormone. Which is secreted by intestinal L-cells in response to food intake. The active form of GLP-1 is a 30-amino acid long peptide, which stimulates insulin release, inhibits glucagons release, and slows down gastric emptying. Each of these phenomena benefits in the control of glucose homeostasis in patients with type II diabetes. The active form of GLP-1 is rapidly inactivated (t1/2 = 1 min) by the plasma DPP-IV enzyme, which cleaves a dipeptide from the N-terminus 14,15 . Thus inhibition of DPP-IV extends the half-life of endogenously secreted GLP-1, which in turn enhances insulin secretion and improves glucose tolerance. [3][4][5] The main enzyme of interest considered in this study is the DPP-4 inhibitors. DPP IV (dipeptidyl peptidase IV) is a versatile protein involved in a variety of physiological functions. It acts as a binding protein, a receptor, and a proteolytic enzyme. It's a serine peptidase from the S9b protein family. DPP IV occurs as a soluble homodimer and as a ubiquitous type II integral plasma membrane glycoprotein. It has a strong link to several disorders, including diabetes, obesity, and tumor growth, making it an appealing target for drug discovery research 2,6,7 .
DPP-IV is a serine protease that deactivates the glucagon-like peptide 1 (GLP-1) and the glucosedependent insulinotropic peptide (GIP), both of which stimulate insulin production. GLP-1 is a substrate of DPP-IV, a major incretin hormone that regulates glucose activity in a glucose-dependent manner, inhibits glucagon release, reduces stomach emptying, and promotes islet-cell regeneration and differentiation. DPP-IV inhibitors raise the concentration of active GLP-1 in the blood and promote insulin release in response to an increase in blood glucose levels. In-silico approaches appear to be a potent practice in the area of technology for creating new and optimized pharmacological agents with superior therapeutic intervention. The quantitative association between structural characteristics of substances and their biological activity is studied using statistical correlation analysis in QSAR. It is a key method in the design of ligandbased drugs. In this study, we presented a 2D QSAR methodology for generating and optimizing new medicines against the DPP-IV enzyme from the existing literature. We used computational methods such as 2D QSAR modeling, molecular docking, virtual screening, and molecular dynamics simulation to develop a new, selective, and powerful DPP-IV inhibitor for diabetes treatment. The findings of this study could be crucial in the future development of effective Type II anti-diabetic medicines based on prospective DPP-IV inhibitors.   Out of 40 molecules 7 molecules were randomly chosen as the test set and the remaining 33 molecules were considered as the training set. The next step was forecasting the predicted PIC50 value of test set molecules using the SYSTAT program (total 7 molecules with mole ID 60, 62, 147, 152, 171, 172, 179). The column of experimental PIC50 value of the test set was chosen as the dependent variable, while the other four columns of the test set were chosen as the independent variables. As a result, we obtained the following chart Table 1 in SYSTAT software 9 . Then by using this chart we generate a formula to predict the PIC50 value (explain in result section). The same formula is also used to predict the PIC50 value of the training set.
The fundamental issue with 2d QSAR is that it fails when the structure of the molecules and their scaffold is extremely diverse. For a highly diverse structure, even 3d QSAR fails. The key reason for the failure is that only ligand-based QSAR was used. No DRAGON characteristics are related to receptor-ligand interactions. The descriptors of DRAGON only give an idea about the ligand structure and morphology. But the IC 50 value of a drug is highly correlated with receptor-ligand interaction-related values such as Docking Score, Ligand Efficiency, Glide Ligand Efficiency, Glide Gscore, Glide Lipo, Glide Hbond, Glide Rewards, Glide Evdw, Glide Ecoul, Glide Erotb, Glide Esite, Glide Emodel, Glide Energy, Glide Einternal, Glide Confnum, Glide Posenum, and so on.

Molecular Docking
A new dataset consisting of 74 compounds was established from the list of 179 original sets of compounds. These 74 compounds were docked against the DPP-IV receptor.

Protein Preparation
The crystal structure of protein Human DPP-4 (Dipeptidyl peptidase-4) (PDB ID: 6B1E) was downloaded from the Protein Data Bank (PDB) and modeled in this study using Protein preparation wizard (Maestro version 11.4) at 2.2 Å (10,11). Protein structures must be processed before they can be employed as a receptor for docking. Some common activities are I hydrogen atom addition, (ii) atomic charge assignment, (iii) removal of water molecules that are not involved in ligand binding, and (iv) selenocysteine replacement with cysteines. Site Map Generation: The site map generation tool was used to identify the probable binding site of the protein. The binding site containing the native ligand was chosen for the docking study.

Ligand Preparation
The ligand structures were created in the CDX format using the application Chem Draw extreme version 8.0. These ligands were then translated to the mol2 3d format and run using the Maestro LigPrep module in the Schrodinger suite, version 11.4 12 . They were transformed to a stable form by minimizing energy and optimizing missing hydrogen atoms. These ligands' bond orders were set, and the charged groups were neutralized. The ionization and tautomeric states were created using the Epik module at pH levels ranging from 6.8 to 7.2. Compounds were minimized in the last stage of LigPrep.

Receptor Grid Generation
For the selected binding site, the grid was generated taking the binding site as the centroid.

Glide Ligand Docking
The proposed compounds were glide docked using the previously created receptor grid and ligand molecules. 74 molecules were docked for the chosen grid. The Glide ligand docking program was used to score the favorable contacts between ligand molecules and the receptor 13 . All docking calculations were done in standard precession (SP) mode with the OPLS-2005 force field.

2D QSAR modeling with help of docked scores
There is no good association between the docking score and the experimental PIC50 result. Because, in addition to docking score, other factors such as Ligand Efficiency, Glide Ligand Efficiency, Glide gscore, Glide lipo, Glide hbond, Glide rewards, Glide evdw, Glide ecoul, Glide erotb, Glide esite, Glide emodel, Glide energy, Glide einternal, Glide confnum, Glide posenum, and so on are responsible for a molecule's activity (PIC50). As a result, an attempt was made to identify the best descriptors among them that are highly connected with the experimental PIC50 value. Then, a test set of 9 random molecules was created. The remaining molecules (65 molecules) are used as the training set.
All compounds in the training and test sets were chosen using the following criteria: (i) Both training and test sets should cover the broadest range of molecular bioactivities (IC 50 ); (ii) both low-active and high-active compounds should be included and (iii) both training and test sets should include varied architectures. The training set contains 65 compounds exhibiting high structural diversity and a broad range of molecular bioactivities with PIC50 values ranging from 4.6 to 9 nM (PIC50=9-log IC 50 ) Table 5, 6.
In SYSTAT software, the experimental PIC50 is used as the dependent variable for the test set (9 molecules), and a couple of the above-mentioned descriptors (which are highly correlated with the experimental PIC50 value) are used as the independent variable. As a result, we were able to gather the following information. Then, using a test set, an attempt was made to construct a regression curve and produce an activity forecasting equation Table 4.

Molecular Dynamics Simulations Study
The molecule with the best binding affinity was further subjected to a molecular dynamics simulation study. Molecular Dynamics Simulation is a computer-based simulation approach used to analyze the physical motions of atoms or molecules.
MD simulations can identify a few critical hydrogen bond interactions. MD simulations assist in protein docking and virtual screening advances. The iMODS server was utilized in this work to simulate molecular dynamics. The iMODS service aids in the exploration of normal mode analysis and generates accessible information about routes that may involve macromolecules or homologous structures [14][15][16] .
For the hit chemical receptor complex, molecular dynamics simulations were also run using the Desmond program 17 . Individually, the complex was solvated in an explicit water box of size 10 with a single-point charge (SPC) water model TIP3P with periodic boundary condition (PBC). The protein and ligand were modeled using the OPLS3e force field, and Na and Clions were added to make the total charge of the system neutral. Following that, the system was energy reduced for 2000 steps before a 50ns production run. Following minimization, the complex was subjected to manufacturing run at the NPT ensemble. Using the Nose-Hoover thermostatic algorithm and the Martina-Tobias-Klein approach, the system was gently heated to maintain a temperature of 300 K and pressure. To simulate long-range electrostatic interactions, the Particle-Mesh Ewald (PME) approach was used with a grid spacing of 0.8. The Desmond package's Simulation Interaction Diagram tool was used to investigate the precise interactions between the ligand and protein. The data was examined in terms of protein and ligand RMSD and root mean square fluctuation (RMSF).

2D QSAR Result analysis (Without docking approach)
The PIC50 value of the test set molecules (total of 7 molecules with mole IDs 60, 62, 147, 152, 171, 172, 179) was predicted using the formula given below and the regression curve was constructed using MICROSOFT EXCEL. The following formula was Here we performed 2d QSAR because it is more robust than 3d QSAR. And we got R 2 = 0.7644 which is a decent value for 2d QSAR [ Fig. 1]. For 2D QSAR modeling, the value of the regression coefficient, R 2 should be greater than 0.7 to build a decent model and also predict the activity of all molecules with reasonable precision. [27][28][29]  Glide Evdw are docking descriptors that are highly correlated with Experimental PIC50 (VAR9). With the help of SYSTAT software taking those descriptors as independent variables and the Experimental PIC50 value as the dependent variable for the TEST set, we obtained the following data Table 4.
We used the following formula to predict the PIC50 value of any molecule.
We classified our 74 molecules into three groups: (i) those with a PIC50 value less than 5.5 are inactive, those with a PIC50 value greater than 5.5 are active, and those with a PIC50 value less than 5.5 are inactive; (ii) those with a PIC50 score between 5.5 and 7.5 are moderately active; (iii) while those with a PIC50 value greater than 7.5 are active. However, we discovered that a few compounds produce erroneous active results, which are referred to as outliers, in which the projected PIC50 value differs from the experimental PIC50 value. However, because the R 2 value is 0.7098, or greater than 70%, the activity of the majority of the molecules may be predicted with reasonable precision Figure 2.
Biovia Discovery Studio Visualizer 18 . Compound 131.mol2 showed two hydrogen bond interactions at SER209, VAL207. It also 2 pi-alkyl interactions at ARG356 and one pi-alkyl interaction at PHE357. Furthermore, the native co-crystallized ligand, vildagliptin was also separately docked with the DPP IV receptor. To validate the docking procedure, the docked pose of the hit compound 131 was superimposed with the docked pose of the cocrystallized ligand, vildagliptin Fig. 3. It was observed that both the docked poses of the two compounds superimposed with each other thus validating the docking protocol Fig. 4.

Molecular Docking Result analysis
The 40 compounds used for 2D QSAR modeling using the docking approach were analyzed. The best docking score was demonstrated by compound 131.mol2. The compound 131.mol2 demonstrated a docking score of -6.637. The structural analysis of this compound was done on

Molecular Dynamics Simulation Results
Compound 131 was identified as the best hit and was subjected to molecular dynamics simulation analysis. Here the docked complex of the compound 131 with receptor dipeptidyl peptidase IV was considered for MD simulation. Normal mode analysis mobility allows us to analyze the large-scale B-factor and mobility as well as the stability of the molecules Fig. 5. The IMOD server exposed the internal coordinates analysis depending on the protein-ligand structural interactions.

Fig. 5. Normal Mode Analysis of hit compound 131 with target receptor dipeptidyl peptidase IV using iMODS software
IMODs also measure the B-factor and structural deformity and calculate the eigenvalue. Image 1 represents the docked complex of our protein and ligand. Image 2 of the Figure represents the deformability graph. The deformity graph illustrated peaks in the graph which represent regions in the protein with deformability. Image 3 represents the B-Factor graph. The main-chain deformability, also known as the B-Factor, is a measure of a molecule's ability to deform at each of its residues. Image 4 represents the eigenvalue of the complex. The motion stiffness is represented by the eigenvalue associated with each normal mode. Its value is proportional to the amount of energy required to distort the structure. The simpler the deformation, the lower the eigenvalue. Our docked complex demonstrated an eigenvalue of 1.145917e-04 which eventually suggested that our protein-ligand complex can be deformed easily. Image 5 represents the variance plot. The variance plot demonstrates individual variances in red color whereas cumulative variance in green color. Image 6 represents the covariance map. This map demonstrates the correlation motion between a pair of residues in red color, uncorrelated motion in white color, and anti-correlated motion in blue color. Image 7 represents the elastic map of our docked complex. Each dot in the graph represents one spring inside the atoms' pair. The dots are colored dependent on stiffness, with darker grey dots indicating stiffer springs and lighter grey dots indicating softer springs. From the molecular dynamics study, it was evident that our complex showed a good amount of deformability. Furthermore, it also showed a moderately low eigenvalue, suggesting that it could be deformed easily. The variance map exhibited a higher degree of cumulative variances than an individual variance. The elastic network map also produced satisfactory results.
For MD Simulation using the Desmond program for our hit compound-receptor complex, the protein RMSD exhibited a stable trajectory throughout the entire 50 ns simulation process. The ligand RMSD exhibited fluctuations until 34ns but then showed a stable trajectory throughout the rest of the simulation process Fig. 6. Regarding the Protein RMSF analysis, the highest fluctuations were observed at 4.8 Å and 4.2 Å. Overall, the Protein and Ligand RMSF trajectories were found to be stable Fig. 7, Fig. 8. Other than these, the amino acid interactions of our protein-ligand complex were also analyzed. The notable hydrogen interactions were observed at PHE208, GLU361, ASP302, ILE405, CYS551, TYR585. The notable hydrophobic interactions were observed at ARG429, TYR585, CYS551, MET591. Water bridges were observed in VAL207, CYS301, ASP302, GLY355, ARG356, PRO359, GLU361, ILE405, CYS551, TYR585. Ionic interactions were found in ASP302 and GLU361. Among these residues, only GLU361 and TYR585 exhibited strong interactions for more than 30% of the entire simulation process Fig. 9, Fig. 10, Fig.  11. Other than this, other ligand properties of the hit compound such as radius of gyration, molecular surface area, intramolecular hydrogen bonds, solvent accessible surface area, and polar surface area were monitored throughout the 50ns simulation process Fig. 12. The radius of gyration, molecular surface area, and polar surface area plots of the hit compound demonstrated stable trajectories throughout the entire 50ns simulation study. The solvent-accessible surface area plot showed slight fluctuations between 16-28ns but still showed a stable trajectory throughout the rest of the simulation process. Furthermore, the intramolecular hydrogen bond plot of the hit compounds demonstrated zero hydrogen bonds. The 50ns simulation process gave a detailed analysis of the structural stability of our protein-ligand complex.

DISCUSSION
Designing a drug that targets DPP4 is not a new area of research. In previous studies, for designing a potent DPP4 inhibitor most of the researchers have chosen 3 to 4 well-known marketed DPP4 inhibitors and then tried to build a scaffold virtually by pharmacophore modeling i.e combining the important pharmacophore. Then they have synthesized several derivatives and measured their IC 50 Value 19,20 or performed virtual screening from a large database [20][21][22][23] . In most of the cases they have used a structurally similar scaffold for QSAR modeling and that's too in less number [24][25][26] . But in this cumulative study, we have collected 180 highly diversified molecules with their experimental PIC50 value. All of them are either marketed drugs or passing through the clinical trial phase. Still, we are getting a good correlation curve, so this model is quite reliable. Contributing a good model along with a reliable activity predicting formula to the current research community was our motto, which we have achieved successfully. The practical application of this work is that we can predict the PIC50 value of any molecule by using the above-mentioned formula, just we need to dock this molecule with the DPP4 receptor and have to collect the descriptors obtained after docking. If the predicted PIC50 value comes out to be less than 5 then the molecule would be non-potent as a DPP4 inhibitor. So we can skip the synthesis process of that molecule. That will save our time as well as expenditure.
With the help of this study, we were able to generate two different 2D QSAR models with the help of two different approaches. The first approach was to generate 2D descriptors for a given set of molecules using the DRAGON software and then eventually generating regression curves with the aid of SYSTAT software. The first approach generated a QSAR model with regression coefficient, R 2 = 0.7644 (training set) and R 2 = 0.9857 (test set). The second approach was to perform docking for another set of molecules and eventually use docking results as an additional set of descriptors. Finally, the descriptors data were used to generate regression curves using SYSTAT software. The second approach also gave a satisfactory regression coefficient value, R 2 = 0.7098 (training set) and R 2 = 0.9987 (test set). The proposed modeling process and computer-aided drug creation were founded on computational trials using statistically stable descriptor values. This method can be used to find new potential DPP-4 inhibitors.

ACKNOWLEDGEMENT
The products used in this study are products that are commonly and primarily used in our field of study and country.There is no conflict of interest between the authors and producers of the products because we do not intend to use these products as a means of litigation but rather to advance knowledge. Furthermore, the research was not funded by the production company, but rather by the authors' own personal efforts.