Confidence is calculated as Pa-Pi value.

P_a(probability "to be active") estimates the chance that the studied amino acid substitution in is belonging to the class of pathogenic variants (resembles the structures of molecules, which are the most typical in a sub-set of "actives" in PASS training set). We may consider the relation of the appropriate amino acid substitution with pathogenicity as its activity.

P_i(probability "to be inactive") estimates the chance that the studied amino acid substitution is belonging to the class of amino acid substitutions which don not cause pathogenicity (resembles the structures of molecules, which are the most typical in a sub-set of "inactives" in PASS training set).

IAP (Invariant Accuracy of Prediction) is the average accuracy of prediction that is obtained for the whole PASS training set in leave-one-out cross-validation procedure.

IAP equals numerically to ROC AUC

Leave-one-out cross-validation (LOO CV) procedure is performed using the whole PASS training set for validation of prediction quality. The prediction result is compared with known experimental data for the studied amino acid substitutions. The procedure is repeated for all amino acid substitutions from the PASS training set; then the average Invariant Accuracy of Prediction (IAP=1-IEP) values are calculated for each biological activity.

Only activities with P_a> P_i considered as possible for a particular amino acid substitution (Confidence >0).

It is necessary to remember that probability P_afirst of all reflects the similarity of molecule under prediction with the structures of molecules, which are the most typical in a sub-set of "actives" in the training set. Therefore, usually there is no direct correlation between the P_avalues and quantitative characteristics of pathogenicity.

Even active and potent amino acid sequences, whose sequences is not typical to the sequences of "actives" from the training set, may obtain a low P_avalue and even P_a< P_i during the prediction. This is clear from the way how the functions P_a(B) and P_i(B) are constructed: the values P_afor "actives" and P_i for "inactives" are distributed fully uniformly. Taking this into account, the following interpretation of prediction results is possible.

If, for instance, P_avalue equals to 0.9, then for 90% of "actives" from the training set the B values are less than for this compound, and only for 10% of "actives" this value is higher. If we decline the suggestion that this compound is active, we will make a wrong decision with probability 0.9.

In case if P_avalue is less than 0.5, but P_a> P_i, then for more than half of "actives" from the training set the B values are higher than for this compound. If we decline the suggestion that this compound is active, we will make a wrong decision with probability less than 0.5. In such case the probability to confirm this kind of activity in the experiment is small, but it will be confirmed more than 50% chances that this structure has a high novelty.

If it appears that the structure under prediction contains a few new MNA descriptors (in comparison with the descriptors from the compounds of the training set), then the structure has low similarity with any structure from the training set, and the results of prediction should be considered as very rough estimates.

Based on these criteria, one may choose which activities have to be tested for the studied compounds on the basis of compromise between the novelty and the risk to obtain the negative result in experimental testing.