Item Type: | Article |
---|---|
Title: | Towards completion of the Earth's proteome |
Creators Name: | Perez-Iratxeta, C., Palidwor, G. and Andrade-Navarro, M.A. |
Abstract: | New protein sequences are deposited in databases at an accelerating pace; however, many of these are homologous to known proteins and could be considered redundant. If all historical releases of the protein database are analysed using the original sequence-clustering procedure described here, the fraction of newly sequenced proteins that are redundant is increasing. We interpret this as an indication that the sequencing of the Earth's proteome-the complete set of proteins on Earth-is approaching completion. We estimate the approximate size of the Earth's proteome to be 5 million sequences, most of which will be identified during the next 5 years. As the Earth's proteome nears completion, cluster analysis of the protein database will become essential to identify under-explored taxa to which future sequencing efforts should be directed and to focus research on protein families without experimental characterization. |
Keywords: | Protein Sequence Database, Genomics, Sequencing Project, Database Annotation, Phylogenetic Analysis, Animals |
Source: | EMBO Reports |
ISSN: | 1469-221X |
Publisher: | Nature Publishing Group |
Volume: | 8 |
Number: | 12 |
Page Range: | 1135-1141 |
Date: | 1 December 2007 |
Official Publication: | https://doi.org/10.1038/sj.embor.7401117 |
PubMed: | View item in PubMed |
Repository Staff Only: item control page