Publicly available data are suitable for implementation in (Q)SAR modelling as the training set, despite its diversity. However, additional data filtration is necessary to avoid errors of misinterpretation of the data. The process of data preparation was developed and implemented to the data from ChEMBL_21.
The computational system based on data extracted from ChEMBL and machine learning method realized in the computer program PASS shows good results in all performed in silico validation experiments on the various test sets.
These results confirm that the computational system presented herein is suitable for a number of implementations in computer-aided drug design. The system may be used to search for potential ligands of certain targets, and for the selection of chemical compounds that are characterized by the desirable spectrum of interactions with distinct protein targets, that is for the purposes of drug repurposing or in the search for novel compounds.
PASS Targets provides following functionalitites:
• Prediction of direct interactions of drug-like compounds with 930 human protein targets, with average accuracy of prediction assessed as IAP (corresponds numerically to the ROC AUC) equals 0.94 on Leave-One-Out Cross Validation. This type of prediction is based on the SAR-studies of the data, which were obtained in biochemical assays where only one target protein at time, only one studied compound at time and reporter system are used. The score for each target is called confidence. Confidence is a difference between probabilities for chemical compound to interact and to do not interact with the particular target. The higher confidence means the higher chance of the positive prediction to be true.
• Prediction of probable indirect interactions with 764 targets, with average accuracy equals 0.98 on Leave-One-Out Cross Validation. This type of prediction is based on the SAR-studies of data, which were obtained in various types of assays (cell-based, tissue-based, etc), except biochemical ones. These assays are complex system and the resulting activity of chemical compound may be observed not only due to the target-of-interest’s binding, but also due to the interactions with different assay components, which, in their turn, may change the activity of the target-of-interest. The score for each target is called confidence. Confidence is a difference between probabilities for chemical compound to interact and to do not interact with the particular target. The higher confidence means the higher chance of the positive prediction to be true.
• Investigation of predicted targets in distinct groups of related proteins (according to ChEMBL protein classification.
• Investigation of predicted targets which are involved in distinct biological processes (according to ChEMBL GO_slim).
Content coming soon
Input chemical structure first, please