Proteochem dataset

Prediction of protein-ligand interaction based on sequence similarity and ligand structural features

Dmitry Karasev, Boris Sobolev, Alexey Lagunin, Dmitry Filimonov, Vladimir Poroikov

Abstract

Computational predicting the interaction of proteins and ligands present three main directions: the search of new target proteins for ligands with known targets; the search of new ligands for targets with established ligand sets; predicting the interaction of new proteins and new ligands. We suggested an approach based on the protein sequences classified on ligand specificities with fuzzy belonging coefficients, derived from ligands' structural features, to implement the latter most complicated case. We tested our approach on five protein groups representing promised drug-targets. The training sets were built with the original procedure overcoming the data ambiguity. Our study showed the high accuracy prediction of new ligands for targets. The prediction of new ligands for known targets displayed the high accuracy close to our previous results, comparable to those of other methods or exceeding them. Using the fuzzy coefficients reflecting the target-to-ligand specificity, we provided predicting interactions for the pairs of new proteins and new ligands; the accuracy values were quite acceptable for such a sophisticated task. The protein kinase family demonstrated the more complicated case, probably relating to very subtle features required for the interaction specificity. Thus, our approach is suitable for solving a wide task area.

Supplementary materials