The method SPrOS (Specificity Projection On Sequence) is developed to analyze the amino acid sequences related to the same protein family in order to recognize the amino acid residues associated with separated subclasses within this family.
The algorithm SPrOS requires the training set of preliminary classified amino acid sequences. A user should choose the test sequence(s), in which class-specific positions to be predicted. The test sequence is excluded from the training set and compared with all the rest sequences. The obtained positional scores are used as input to the procedure, which estimates the specificity of each query sequence position to each given class
The method provides the specificity estimates Eia, evaluating the specificity of the position i of tested sequence to the class A. The more the Eia value the more the specificity evaluation. The p-values are used to obtain the statistical significance of the Eia estimates. The lower the p-value the more significant Eia.
For more details see SPrOS.pdf
|Use Sequence to upload protein sequences from a file of Fasta format. See sequence.fasta for example.|
|Use Sequence Classification to upload a file containing tab-delimited pairs of protein IDs and identifiers of classes, which the proteins have to be assigned (the classes containing less than five proteins are not processed). The last raw should contain the single symbol ‘#’. See seq_to_group.txt for example.|
|Use Test Sequence to select test sequences from the uploaded sequence set.|
|Frame defines the length of compared sequence segments.|
|Mode defines the type of positional similarity scores. The Smooth scores allow better accounting for the similarity of the position-surrounding regions within a separate group, as well as their intergroup differences. The Focused scores better reveal the positions differing in separated groups, even if the surrounding regions are conserved in the whole family.|
|Cuttoff p-value limits the output results to data with p-values not exceeding the established value.|
|Result are output to the tab-delimited file containing the rows, each of which present protein identifier, amino acid position number, amino acid type, class identifier, specificity estimation (Eia), p-value for the obtained Eia, and coefficient of belonging of the protein to the given class (predefined in training data). If the calculated p-value is equal to 0.0, then the minimal non-zero value, which can be obtained, is output with the prefix symbol "<". See result.txt for example|