PDB Statistics: Growth in Number of Unique Protein Sequences in Released PDB Structures (Cumulative) at Identity 70%

This chart shows the annual and cumulative numbers of protein sequences in released PDB structures. The chart can be viewed for a few different levels of sequence identity since the beginning of the PDB archive. The cumulative bars represent the growth in unique protein sequences (number of polymeric entities) across history. The yearly bars (dark blue) tell how many new protein sequences were added in a certain year.

Note: The total number of sequence clusters in the statistics table is linked to the sequence cluster group search result page. There is a default precision threshold in calculating the numbers for performance balance. So the statistics count may have a slight discrepancy compared to the actual non-redundant group search result when the result count approaches or goes above 10,000. The group search result page provides an accurate count. The statistics page provides the trend.

Chart is currently loading

Sequence cluster level:

YearNumber of New Protein SequencesTotal Number of Protein Sequences
19761313
1977821
1978324
1979226
1980430
1981838
19821856
1983965
19841176
19851288
1986997
198711108
198824132
198940172
199040212
199148260
199260320
1993182502
1994351853
19952771,130
19963351,465
19974611,926
19985872,513
19997123,225
20008214,046
20018774,923
20029425,865
200313297,194
200418389,032
2005202511,057
2006225713,314
2007246915,783
2008229818,081
2009232220,403
2010231622,719
2011208724,806
2012224227,048
2013237529,423
2014285932,282
2015233434,616
2016264337,259
2017272139,980
2018271342,693
2019285345,546
2020352449,070
2021288051,950
2022362955,579
2023350959,088
2024348662,574
2025400766,581
2026171268,293