PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 70%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977821
1978324
1979226
1980430
1981838
19821755
1983964
19841074
19851286
1986894
198711105
198823128
198940168
199041209
199150259
199259318
1993178496
1994347843
19952751,118
19963281,446
19974581,904
19985942,498
19997103,208
20008224,030
20018764,906
20029325,838
200313167,154
200418178,971
2005200110,972
2006223413,206
2007245815,664
2008228717,951
2009229620,247
2010229722,544
2011205624,600
2012220426,804
2013233629,140
2014282631,966
2015229034,256
2016257136,827
2017267239,499
2018260442,103
2019279344,896
2020345248,348
2021272051,068
2022354854,616
2023343858,054
2024346661,520
2025395265,472
202657766,049