In silico Screening and Identification of Inhibitor Molecules Targeting SDS22 protein

World's population is increasing at an alarming rate. Contraceptive methods for male are comparatively less common than female. Sperm motility, an indicator for fertilisation, is regulated by a set of proteins of protein phosphatase (PP) family. Among these PP1 is directly related with sperm motility. SDS22 (suppressor of Dis2 mutant 2) is a conserved and extensively expressed PP1 regulator, with less information regarding its function. This study used SDS22 protein from Homo sapiens as target and 100 plant-based compounds as the most relevant lead molecules with highest binding energy and affinity. Furthermore, this research incorporates homology modelling of SDS22 and protein-ligand interaction analysis. Benzeneacetonitrile, 4-hydroxyhad a binding energy of -6.9 kcal mol-1, higher to the reference MDP's -3.5 kcal mol-1, while other ligands exhibited binding energies of -6.2 kcal mol-1 for -terpineol, Coumarin, and 2-Phenylpropan-2-ol. These compounds may reduce the sperm motility and pave a promising path towards male contraception.


INTRODUCTION
The phosphoprotein phosphatase (PPP) family consists of seven members that are present in all eukaryotic cells and contributes a significant amount of protein phosphatase activity to tissue extracts. 1In vertebrates, 200 regulatory proteins regulate PP1 spatiotemporally, allowing for the creation of highly selective PP1 holoenzymes. 2rimary role of PP1 regulatory proteins is to localize PP1 to a particular subcellular region, regulate catalytic activity, and/or facilitate substrate selection.
PP1 regulators may create trimeric PP1 complexes, which is intriguing.
PP1g1 is present ubiquitously but PP1g2 is only present in germ cells and spermatozoa. 1P1g2 has been found to be present in most of the mammalian species and with an astounding similar structure, elucidating that this conservation is most likely related to sperm specific role in different species. 3Testis-specific PP1g2 is crucial in the final stages of spermatogenesis 4 and this is where the understanding mechanism of action of activation, inhibition of PP1g2 becomes crucial since PP1g2 is not inhibited by the usual regulators of PP1 such as I1 and I2.The activity of PP1g2 phosphatase has been shown to be negatively connected with motility, i.e., Low activity in caudal spermatozoa that are actively moving and high activity in caput spermatozoa that are not moving. 5Previous studies have isolated an inactive complex of PP1g2-SDS22 from caudal sperm whereas catalytically active PP1g2 has been isolated from caput sperm.It could be inferred that this binding with SDS22 and subsequent inactivation of PP1g2 is one of the major steps in maturation of spermatozoa. 6,7S22 (also known as PPP1R7) is a PP1 regulatory component that is mainly preserved among bovine, human and mouse orthologs. 8,9There are 360 residues in human SDS22, with twelve leucine-rich repeats (LRRs) predicted.An LRR cap that flanks SDS22 LRRs on the C-terminus is thought to shield the LRRs' hydrophobic core from solvent. 10 SDS22 and PP1 and Inhibitor-3 may combine to produce heterotrimeric complexes 8,11 or KNL112, indicating that SDS22 and Inhibitor-3/KNL1 have PP1-binding sites that at least partially overlap one another.
Despite the fact that one of the most prevalent and conserved PP1 regulators is SDS22, 13 little is known about its function.PP1:SDS22 has also been linked to chromosomal segregation 12,14 and cell shape control. 15SDS22 decreases the catalytic activity of PP1g2 which in turn enhances the motility of sperm. 9,16Recent findings suggest that SDS22 regulates various stages of PP1's life cycle.It is involved in stabilization, translocation and storage of PP1. 17 PP1 that has been discharged or has grown old may also be scavenged by SDS22, which can be used again in holoenzyme assembly or proteolytic destruction, according to preliminary findings. 18,19S22 is located on chromosome 2q37.3,and been found to be one of the most often deleted areas in various malignancies, and mostly inactive owing to loss of heterozygosity. 20It has been revealed to be a crucial regulator of the G2/M cell cycle progression in previous investigations. 21,22,23It has also been linked to chemo-resistance in ovarian cancer. 24mputer-aided drug design (CADD) and in silico pharmacotherapy are expanding fields that involve the development of software tools for gathering, evaluating, and mixing facts about biology and medicine from several sources. 25Therapeutic agent screening has been facilitated by the pharmaceutical sciences and other academic fields, with rapid highthroughput results. 26Furthermore, by supplying a richer understanding of the target-receptor interaction, bioinformatics tools offer a better comprehension of the biological effect. 27,28Identification of prospective medication compounds entails a sequence of steps, starting with illness selection, then selecting an appropriate target molecule, building a small molecule library, and scoring research on targetligand interactions.The technique of "molecular docking" foretells the connection between the protein and the ligand, in addition it also tells the ways in which the ligand and protein interact. 29,30Although molecular docking is a quick method for figuring out how any ligand will attach to a protein's active site, the outcomes have several drawbacks.As a result, as the system is simulated under temperature fluctuations, simulation is followed by docking to mimic the natural biological systems. 31is study consolidates the homology modelling of SDS22 protein followed by molecular docking, molecular dynamics and protein-ligand interaction analysis.Hence, this study included SDS22 protein of Homo sapiens as a target protein, and 100 plant-based compounds were selected to determine suitable lead molecule having highest affinity and binding energy to understand and elucidate the mechanism of motility inhibition of spermatozoa using in silico studies.

Homology modelling of protein
In order for our body's basic functions to be carried out completely, the choice of protein and ligand is crucial to the CADD process as a whole.The protein sequence database was used to acquire the SDS22 amino acid sequence., i.e., UniProt. 32he retrieved sequence was then searched to obtain modeling templates using the Protein-BLAST program. 33The template with the highest sequence similarity and query coverage were then selected.A selected template was 'Okadaic acid, a tumor-promoting substance, is linked to protein phosphatase-1 in the crystal structure.(PDB ID:1JK7)' as template structure showing 100% sequence identity and 93% query coverage with PP1g2 sequence.The three-dimensional protein structure of SDS 22 was then predicted using Phyre2, followed by a comparison with the available models in the Protein Model Portal 34 predicted through MODBASE 35 and SWISS-MODEL. 36

Model Validation and Optimization
The Models generated for SDS22 used two different homology modeling approaches, and the ab-initio prediction was then validated using different methods.The optimisation of the model is a suitable method for correcting errors and carrying out energy minimization because a predicted model can contain some small inconsequential flaws and high energy configurations, likely causing a physical disruption and instability of the structure.The model's energy consumption was reduced utilizing the Swiss PDB viewer (SPDBV) tool. 37n order to accurately simulate the structure of a protein, model evaluation and validation are crucial steps.Evaluation for generated models of SDS22, SAVES servers (Structure Analysis and Verification Server) that collectively checked model in several independent programs, including ERRAT, Verify 3D 38 , Prove, Procheck, ProSa, ProQ and RAMPAGE 39 .For predicting overall model quality in terms of total energy deviation of protein, Z-score was estimated using ProSA web server.To further check accuracy of the protein structure Ramachandran Plot was generated using Discovery Studio 40 , which supports the appropriateness of the predicted model.

Phytochemical ligand library preparation
Phytochemical compounds were selected as potential lead molecules to dock against the target protein, SDS22.Since ancient times plantbased compounds have been in the limelight for their immense potential as antibacterial and antiviral agents.Such 100 plant-based compounds were selected, and their structures were obtained from PubChem (https://pubchem.ncbi.nlm.nih.gov), including information on their structure, characteristics, and uses.Phytochemicals were acquired in SDF format and converted using Open Babel into PDB files.

Drug-likeness property analysis of ligand
To determine the cytotoxic activity of compounds, a drug-likeness calculation was carried out for humans by DruLiTo open-source software.Based on specific physiochemical and structural characteristics, a ligand's drug bioavailability or druglikeness defines its pharmacological significance.Consequently, using Lipinski's rule of five, all ligands were assessed for their potential as drugs 41 by DruLiTo software.The suitable file format for DruLito is .sdf,first 100 molecule structures obtained from the database were merged to a single .sdffile using the OpenBabel file format converter.The combined file was then uploaded on the page of DruLito, the sole filter used to screen the molecules was the Lipinski rule of 5.The calculated properties obtained were then exported as a .csvfile to the desired folder.Structure of molecules that made it through the filter was then saved from the file-save Lipinski filtered molecule option.

Protein and Ligand preparation
3D structure of protein SDS22 model was generated and then got ready for docking.The protein molecule was stripped of all ligands, ions, and water molecules using PyMOL software for preparation.After adding hydrogen atoms to the receptor, the molecular docking was accomplished using MG Tools of AutoDock Vina software. 42The protein structure was then stored in PDB format for later study.Before starting the docking procedure, using the "centre of mass" command line in PyMOL software to get the x, y, and z coordinates of the reference molecule and for all the proteins, the centre of mass of the co-crystallized ligand was examined.These coordinate sites were their active site.The coordinate sites of the protein SDS22 were (-2.13, 24.63, -9.66).The ligands used for the docking process are prepared before using them.The success of the docking process depends on choosing and processing the coordinates for receptors and ligands correctly.Therefore, the qualities of the coordinates in ligand and protein have significance in this process.To complete these preparation steps, the.pdb file format needs to be changed to the PDBQT file format.All water molecules were removed, and ADT software was used for preparing the necessary files for AutoDock Vina allocating hydrogen polarities, calculating Gasteiger and Kollman charges to protein and ligand structures. 43,44,45

Molecular Docking study
Using Autodock Vina, a molecular docking procedure was carried out into the SDS22 domain's active site 42 under PyRx (an open-source software-GUI version 0.8 of autodock).Autodock Vina and PrRx are docking tools that employ the target protein's atomic structure along with the ligand of choice, and also predict which docking conformation between the two will work best.X-ray Crystallography and NMR Spectroscopy accessed the target protein coordinates, whereas ligand molecules were obtained in the Structure-Data file (.sdf) from the PubChem database.One simplification method that Autodock Vina uses is considering receptors as rigid molecules.Therefore, it reduces the size of the conformational space search, making the process reliable and less time-consuming as scoring of each trial confirmation is not done.Forcefield used for the study was a Physical based method that included directional hydrogen bonding, primarily polar hydrogen and electrostatics.Preparation of coordinates files was performed as described, and after preparation of the required files, each of the files was closely observed for their protonation state and their charges.The metal-bound state was checked for its information compatibility with the existing knowledge.As ADT does not provides charges to the bound metal ion.Therefore, this has to be added manually to a text document.So, a text editor was used to directly add the prepared PDBQT file.Proteins prepared by adding missing hydrogens were loaded on the page of AutoDock Vina, and a grid box point each for SDS22 proteins in x, y, and z directions were built according to their coordinate site with a grid spacing of 25 A°.Vina configuration file was then generated on a text document named conf.txt.The configuration file for each ligand was prepared with the mentioned coordinates(-2.13,24.63, -9.66), grid box size ≈25, and grid spacing of 25 A°.The target (protein) was kept rigid during the docking process, while ligands were flexible in determining the most suitable pose.The command line was used to run Autodock Vina.The directory containing the protein and ligand files were accessed, commands were run.Vina_split and vina.exe were also kept in the same directory.
In the case of PyRx (an open-source software-GUI version 0.8 of autodock), protein and ligand preparation steps were performed in the program omitting the need for preparation on the ADT platform.PDBQT files were generated, and then the grid box was adjusted to the position of the active site in all three proteins.All the ligands were loaded in the same panel, and the autodock program was run.The resultant files automatically get saved to the mgl tools that can be determined/edited from the EDIT-Preferences option of the main menu.The output of the autodock run using the PyRx program can be visualized by opening the pdbqt file of a target protein and the pdbqt of ligands obtained after running.To observe the binding of a small molecule inside the protein pocket, a right-click on the protein name on the left-hand side options panel shows an option of Display and then molecular surface.In contrast, the ligand molecule must be displayed in ball and stick form to visualize the docked molecule.All the poses of the same molecule and different poses of several molecules can be superimposed to compare their binding in the protein's active site pocket.

Protein-ligand interactions
The highest-ranking postures were chosen following the docking procedure for additional proteinligand interaction investigation.The interaction was shown using the application LigPlot+ v.1.4.5.This tool makes it easier to translate 3D structure into a 2D picture, allowing thorough investigation of the 2D hydrogen and hydrophobic interactions within the protein-ligand complex.
The resultant binding energies were observed from the log file of each ligand (small molecule).Based ligands were screened from the results, filtering out others.Each ligand's output (in pdbqt format) file contained all the poses in one file.Among these poses, after identifying the top-scoring molecules and their highest-ranking pose, poses were split to obtain separate structures of each molecule.First, splitting was done by copying the three vina files (vina, vina_license, and vina_split) in the directory containing the output of docking results.Then using the cmd prompt, the following command was run to obtain a split file.
The separated poses for each ligand were then obtained in the same folder.The output pdbqt file was then opened on LigPlot+ v.1.4.5 along with the pdbqt of protein to observe an interaction between residues of protein and ligand.The interaction between the most suitable ligand and protein observed on LigPlot was subsequently examined to determine the acting amino acid residues in the protein's active site, type of bonds formed between residues and atoms of ligand, and bond length.The recorded data were then compared with the reference molecules and their interactions with the target protein.

Homology modelling
Three hundred sixty amino acid residues make up SDS22, which has a molecular weight of 41,564 Da (about 42 kDa).The three-dimensional protein structure of SDS22 was predicted using Phyre2, followed by a comparison with the available models in the Protein Model Portal 34 predicted through MODBASE 35 and SWISS-MODEL. 36The predicted model was aligned with 2.44Å and 0.75Å RMSD with MODBASE and SWISS-MODEL models, respectively, supporting the prediction and reliability of the modeled structure (Figure 1).

Model Optimization
This study checked and evaluated the model using several bioinformatics tools and servers of SAVES (Structure Analysis and Verification Server), including ERRAT, Verify 3D 38 , PROVE, and PROCHECK.Along with this server PROSA, PROQ and RAMPAGE tool were also employed to evaluate the protein structure's quality.Results from all the tools deciphered that the predicted structure is accurate and will remain stable during biological processes (Table 1).In the predicted model of SDS22, 72.3% residues are present in the most favored region, 26.2% residues were present in the allowed region and 1.5 % residues were found in the generously allowed region and 0.0 % of total protein, are present in the disallowed region demonstrating that the anticipated model of SDS22 is of a suitable quality (Figure 2).

Active site prediction
Accurate prediction of the active site before docking is essential in bioinformatics 46 .Thus, coordinate locations of the active site of protein were calculated using the "centreofmass" and VMD process, as mentioned earlier.The coordinate sites of the protein was found to be -2.13,24.63, -9.66.

Construction of phytochemical library
In view of less studied phytochemicals from plants as potential lead molecules against SDS22, our study explored 100 small molecules of plant origin to identify their binding affinity for the selected protein.Following the selection of 100 small plant compounds from the PubChem database based on their properties, the SDF files for all 100 molecules were retrieved from the same database.Lipinski's rule of five is employed to determine in humans the best way to deliver a medication orally 47 .Using Open Babel, the SDF data of these 100 molecules were translated to PDB files, then DruLito software was utilized to screen the molecules, which identified 28 molecules that was suitable for lead compounds since it adhered to the Lipinski rule of five.

Molecular Docking Analysis
The starting of a docking study is marked by defining a specific protein region called the binding site, into which small molecular compounds are docked, and their affinity is estimated, and this contributes a significant region of the protocol of designing drug based on structure. 48Out of 100 possible phytochemicals, docking studies for the molecular target SDS22 revealed that five had greater binding affinities and binding modes than the reference (MDP).For further study, the 2D structures of the selected phytochemicals were acquired.
The binding energy of molecular targets, SDS22, with six compounds is shown in Table 2. Predicting the affinity of a ligand's binding is a crucial step in the CADD procedure 49 , where the binding equilibrium free energy between two molecules is used to define the binding affinity. 50In the case of SDS22, Benzeneacetonitrile, 4-hydroxy-showed -6.9 kcal mol -1 binding energy which was closest to reference MDP with -3.5 kcal mol -1 while other ligands had binding energies of -6.2 kcal mol -1 for α-terpineol, -6.2 kcal mol -1 for Coumarin, -6.2 kcal mol -1 for 2-Phenylpropan-2-ol and -6.0 kcal mol -1 for Alpha citral.Comparing all phytochemicals to reference compounds, they all displayed somewhat lower binding energies.These five phytochemicals are therefore effective at inhibiting these molecular targets.

Ligand and Receptor Interaction
The molecular target protein and many phytochemicals that were assessed combined 2D interactions that were examined using hydrogen bonding sites with references and hydrophobic interactions with various residues, shown in Fig. 3.In the figures, the reference molecules are depicted in green, and the screened phytochemicals are shown in purple.Several molecular targets had distinct standard binding sites.In the figures, the red sparking arcs depict residues creating hydrophobic interactions with phytochemicals, while the green dotted lines represent hydrogen bonds with limitations.Protein residues in equivalent 3D positions are denoted by red circles and ellipses.
The results demonstrated lower interaction numbers due to the cleaning step in LigPlot+, which is the step before plotting, which minimizes the number of overlapping atoms and bonds to provide a possibly clear outcome of ligand interaction.Table 2 represents the interacting amino acids of SDS22 protein with selected ligands.The treatment given to the hydrogen and hydrophobic bonds are not similar, i.e., all-atom in the side chains are kept in the group with hydrogen bonds; also, atom of the main chains can also be kept whereas, a single spot is demonstrated for hydrophobic ones, linking to ligand atom via a virtual bond.The interactions were also visualized by using Discovery Studio to get a clearer picture of the interactions (Figure 4).

DISCUSSION
The SDS22 protein from Homo sapiens was employed as a target protein in this study, and to identify the most pertinent and significant lead molecule with the highest binding energy and affinity, 100 plant-based compounds were selected.. Computational biology pipeline was followed to study the library of medicinal active compound and perform the druglikeness activity.Promising compounds were docked with SDS22 protein.From the docking analysis study, 5 compounds from the pool of library were selected and their interaction analysis was performed.The criteria for the analysis of interaction were binding energy, H-bond distance and the interacting atom.The identified compounds were having the good binding energy along with the other criteria.Furthermore, this work combines homology modelling of the SDS22 protein with the molecular docking and protein-ligand interaction analysis.Thus, the study concluded that Benzeneacetonitrile, 4-hydroxy-had binding energy of -6.9 kcal/mol, which was closest to the reference MDP with -3.5 kcal mol -1 , while other ligands had binding energies of -6.2 kcal mol -1 for -terpineol, -6.2 kcal mol -1 for Coumarin, -6.2 kcal mol -1 for 2-Phenylpropan-2-ol.

CONCLUSION
This study was targeted to identify potential inhibitors of SDS22 protein.This protein is a wellknown regulator of PP1g2 which mediates the activity of the sperm motility.In the study out of 100 compounds, on the basis of binding energy five compounds were selected among which Benzeneacetonitrile, 4-hydroxy-had maximum binding energy, i.e., -6.9 kcal mol -1 and thus could be used to control sperm motility by acting as a potential inhibitor of SDS22 protein.
Due to the shortcomings of available male and female contraceptive methods, this study could be a new approach for male contraception.To substantiate these results, nevertheless, In vitro tests and molecular dynamics simulations are needed.

Fig. 2 .
Fig. 2. Ramachandran Plot and the Z ScoreCalculation of the modelled protein