Item Type: | Article |
---|---|
Title: | Benchmarking homology detection procedures with low complexity filters |
Creators Name: | Forslund, K. and Sonnhammer, E.L.L. |
Abstract: | Background: Low-complexity sequence regions present a common problem in finding true homologs to a protein query sequence. Several solutions to this have been suggested, but a detailed comparison between these on challenging data has so far been lacking. A common benchmark for homology detection procedures is to use SCOP/ASTRAL domain sequences belonging to the same or different superfamilies, but these contain almost no low complexity sequences. Results: We here introduce an alternative benchmarking strategy based around Pfam domains and clans on whole-proteome data sets. This gives a realistic level of low complexity sequences. We used it to evaluate all six built-in BLAST low complexity filter settings as well as a range of settings in the MSPcrunch post-processing filter. The effect on alignment length was also assessed. Conclusion: Score matrix adjustment methods provide a low false positive rate at a relatively small loss in sensitivity relative to no filtering, across the range of test conditions we apply. MSPcrunch achieved even less loss in sensitivity, but at a higher false positive rate. A drawback of the score matrix adjustment methods is however that the alignments often become truncated. Availability: Perl scripts for MSPcrunch BLAST filtering and for generating the benchmark dataset are available at http://sonnhammer.sbc.su.se/download/software/MSPcrunch+Blixem/benchmark.tar.gz |
Keywords: | Amino Acid Sequence, Amino Acid Sequence Homology, Benchmarking, Computational Biology, Protein Databases, Protein Sequence Analysis, Proteins, Sequence Alignment, Tertiary Protein Structure |
Source: | Bioinformatics |
ISSN: | 1367-4803 |
Volume: | 25 |
Number: | 19 |
Page Range: | 2500-2505 |
Date: | 1 October 2009 |
Official Publication: | https://doi.org/10.1093/bioinformatics/btp446 |
PubMed: | View item in PubMed |
Repository Staff Only: item control page