Naive Bayes

Only activities with Confidence=Pa > Pi are considered as possible for a particular compound.

It is necessary to remember that probability Pa first of all reflects the similarity of molecule under prediction with the structures of molecules, which are the most typical in a sub-set of "actives" in the training set. Therefore, usually there is no direct correlation between the Pa values and quantitative characteristics of activities.

Even active and potent compound, whose structure is not typical to the structures of "actives" from the training set, may obtain a low Pa value and even Pa < Pi during the prediction. This is clear from the way how the functions Pa (B) and Pi (B) are constructed: the values Pa for "actives" and Pi for "inactives" are distributed fully uniformly. Taking this into account, the following interpretation of prediction results is possible.

If, for instance, Pa value equals to 0.9, then for 90% of "actives" from the training set the B values are less than for this compound, and only for 10% of "actives" this value is higher. If we decline the suggestion that this compound is active, we will make a wrong decision with probability 0.9.

In case if Pa value is less than 0.5, but Pa >Pi , then for more than half of "actives" from the training set the B values are higher than for this compound. If we decline the suggestion that this compound is active, we will make a wrong decision with probability less than 0.5. In such case the probability to confirm this kind of activity in the experiment is small, but it will be confirmed more than 50% chances that this structure has a high novelty and may become New Chemical Entity (NCE).

Self-Consistent Extreme Classifier