Predicting quality using percentage identity

Percentage identity is a frequently quoted statistic for an alignment of two sequences. However, the expected value of percentage identity is strongly dependent upon the length of alignment and this is frequently overlooked. Figure 4 shows the percentage identities found for a large number of locally optimal alignments of differing length between proteins known to be of unrelated three dimensional structure. Clearly, an alignment of length 200 showing 30%identity is more significant than an alignment of length 50 with the same identity.

Plot of Alignment length vs Percentage Identity of local alignments between unrelated proteins

Figure 4.



Original from:
geoff.barton@ox.ac.uk
This page original location is at http://barton.ebi.ac.uk/papers/rev93_1/subsubsection3_5_1_1.html#SECTION0005110000000000000