Discovery of novel antibacterial agents is the high-priority task because the existing therapy do not provide the necessary safety and sufficient long-term efficacy due to the emerging resistance . The process may be optimized using (Q)SAR methods based on the accumulated experimental data in the field.
Nowadays, data on antibacterial action of chemical compounds are well represented in public domain. ChEMBL database , for example, contains records on activity of chemical compounds against 1386 bacteria. We extracted bioactivity records on minimum inhibitory concentrations (MICs) of chemical compounds from ChEMBL_24 and prepared them as follows:
• Chemical data were prepared according to the good (Q)SAR practice .
• Biological data were reviewed to exclude the unreliable data points and to specify the records, related to the resistant microorganisms.
In general, pipeline for the data preparation was similar to those and described in publications [4, 5].
The training set containing structures of 41,065 chemical compounds and data on their antibacterial activities was prepared. All molecules with MIC < 10000 nM were considered as “actives”. Using this set, we trained PASS  to classify drug-like molecule as “actives” and “inactives” against 353 bacteria, including resistant ones. The average accuracy of prediction assessed as IAP (corresponds numerically to the ROC AUC) in Leave-One-Out Cross-Validation equals 0.93 (for particular biological activities IAP was in range from 0.75 to 1.00).
Using our web-service one could select the most promising chemical compounds for synthesis and determine the priorities for testing of their antibacterial activity.
1. Brown, E. D., Wright, G. D. (2016), Antibacterial drug discovery in the resistance era. Nature, 529(7586), 336. 2. Gaulton, A., et al. (2016), The ChEMBL database in 2017. Nucleic acids research, 45(D1), D945-D954. 3. Fourches, D., et al.. (2015), Curation of chemogenomics data. Nature chemical biology, 11(8), 535. 4. Pogodin, P. V., et al. (2015), PASS Targets: Ligand-based multi-target computational system based on a public data and naïve Bayes approach. SAR and QSAR in Environmental Research, 26(10), 783-793. 5. Pogodin, P., et al. (2018), How to Achieve Better Results Using PASS-Based Virtual Screening: Case Study for Kinase Inhibitors. Frontiers in chemistry, 6, 133. 6. Filimonov, D. A., et al. (2014). Prediction of the biological activity spectra of organic compounds using the PASS online web resource. Chemistry of Heterocyclic Compounds, 50(3), 444-457.
MICF allows user to predict whether chemical compound can inhibit the growth of one or more of 38 fungi in concentration below the 5000 nM. The score for each compound is expressed as confidence in its activity, which is a difference between probabilities for chemical compound to inhibit and to do not inhibit the growth of the particular bacteria. The higher confidence means the higher chance of the positive prediction to be true.
Only activities with Pa > Pi (confidence > 0) are considered as possible for a particular compound.
It is necessary to remember that probability Pa first of all reflects the similarity of molecule under prediction with the structures of molecules, which are the most typical in a sub-set of "actives" in the training set. Therefore, usually there is no direct correlation between the Pa values and quantitative characteristics of activities.
Even active and potent compound, whose structure is not typical to the structures of "actives" from the training set, may obtain a low Pa value and even Pa < Pi during the prediction. This is clear from the way how the functions Pa(B) and Pi(B) are constructed: the values Pa for "actives" and Pi for "inactives" are distributed fully uniformly. Taking this into account, the following interpretation of prediction results is possible.
If, for instance, Pa value equals to 0.9, then for 90% of "actives" from the training set the B values are less than for this compound, and only for 10% of "actives" this value is higher. If we decline the suggestion that this compound is active, we will make a wrong decision with probability 0.9.
In case if Pa value is less than 0.5, but Pa > Pi, then for more than half of "actives" from the training set the B values are higher than for this compound. If we decline the suggestion that this compound is active, we will make a wrong decision with probability less than 0.5. In such case the probability to confirm this kind of activity in the experiment is small, but it will be confirmed more than 50% chances that this structure has a high novelty and may become New Chemical Entity (NCE).
If the predicted biological activity spectrum is wide, the structure of the compound is quite simple, and does not contain peculiarities, which are responsible for the selectivity of its biological action.
If it appears that the structure under prediction contains a few new MNA descriptors (in comparison with the descriptors from the compounds of the training set), then the structure has low similarity with any structure from the training set, and the results of prediction should be considered as very rough estimates.
Based on these criteria, one may choose which activities have to be tested for the studied compounds on the basis of compromise between the novelty of pharmacological action and the risk to obtain the negative result in experimental testing. Detailed explanation of how to interpet the results of PASS is given in this publication
Laboratory for Structure-Function Based Drug Design, Department for Bioinformatics, Institute of Biomedical Chemistry (IBMC) Pogodinskaya Str. 10, Moscow, Russia, 119121
• Prof. Dr. Vladimir Poroikov, Tel: +7 499 246-09-20, Fax: +7 499 245-08-57, E-mail: email@example.com
• PhD. MD. Dmitry Druzhilovsky, Tel: +7 499 255-30-29, Fax: +7 499 245-08-57, E-mail: firstname.lastname@example.org
Input chemical structure first, please