RCSB PDB Help

File Download Services

● The PDB archive

○ File Access URLs

○ Automated download of data

○ Archival snapshots of the PDB archive

○ Major directories in the PDB archive

● Other downloads offered by RCSB PDB

○ PDB entry files

○ Small molecule files

○ Experimental data files and 3DEM maps

○ Sequence data

○ Sequence clusters data

○ Holdings data

○ Chemical Component Dictionary (CCD) Data

The PDB archive

Searches and reports performed on this RCSB PDB website utilize data from the PDB archive. The PDB archive is maintained by the wwPDB at the main archive, files.wwpdb.org (data download details) and the versioned archive, files-versioned.wwpdb.org (versioning details). Since February 01 2023, the wwPDB enriches PDB entries with additional annotation and distributes the latest versions of each entry via next generation archive (NextGen) accessible at files-nextgen.wwpdb.org.

In addition to experimental structures, the PDB archive includes structures determined by integrative and hybrid structure determination methods (IHM). Users can access and download IHM structures and associated data at files.wwpdb.org/pub/pdb_ihm/.

All data are available via the HTTPS protocol. Note that the FTP protocol is no longer supported. See the announcement.

RCSB PDB hosts the archive as part of the Registry of Open Data on Amazon Web Services (AWS).

File Access URLs

DNS names are required for programmatic access to PDB archive downloads:

HTTPS: https://files.wwpdb.org
RSYNC: rsync://rsync.rcsb.org

Automated download of data

The URLs in this document are useful for scripted downloads using utilities such as wget. For instance you can consider using the batch downloads shell script.

The RCSB PDB provides rsync capabilities for efficiently maintaining full copies of the archive. To facilitate automated downloads, we offer scripts that simplify the process.

Use the following script to copy the current contents of the entire archive: rsyncPDB.sh.

Additional information on obtaining and maintaining copies of the entire PDB archive or certain portions of it is available at wwpdb.org/ftp/pdb-ftp-sites.

Archival snapshots of the PDB archive

Since 2004, the PDB archive has been preserved through yearly time-stamped snapshots. These snapshots provide well-defined datasets for research on the PDB archive and include coordinate data in multiple formats, as well as experimental data.

The archival snapshots maintain the historical directory structure of the PDB archive. Coordinate files are organized into subdirectories based on the two middle characters of the PDB ID. For example, the structure 100d is located in the directory '00'.

Each file's date and timestamp reflect the last modification time, providing a historical reference for changes within the archive.

For more details on accessing these snapshots, visit the documentation.

Major directories in the PDB archive

The directory pub/pdb is the entry directory for the PDB archive downloads.

Some general notes:

Entry files are date-stamped to show the date they were released
Entries are grouped by the middle two characters of the 4-character PDB identifier. For example, entry file pdb100d.ent can be found in pub/pdb/data/structures/divided/pdb/00/pdb100d.ent.gz
The two letter naming convention for structure holdings is retained for the directories within /pub/pdb/data/structures/divided and /pub/pdb/data/structures/divided/obsolete but not for the directories within /pub/pdb/data/structures/all, which contain the structure holdings in undivided layout.
PDB entries are available in PDB, mmCIF, and PDBML/XML format.
Only UNIX compressed files are supported for coordinates, structure factors, and restraints.

For information about large structures that cannot be represented in the legacy PDB file format see here.

/pub/pdb/data/assemblies/mmCIF	Biological assembly coordinate files in mmCIF format
/pub/pdb/data/biounit/PDB	Biological assembly coordinate files in legacy PDB format
/pub/pdb/data/monomers	PDB Chemical Component Dictionary and other info on monomers
/pub/pdb/data/status	Details of entries on hold and in processing
/pub/pdb/data/structures/all	Analogous to the divided directory, containing pdb, mmCIF, nmr_restraint, and structure_factors directories, with symbolic links to files in the divided subdirectories. In the ./all directory, files are not divided into two-letter directories, however.
/pub/pdb/data/structures/divided	This is the entry point for a user finding a structure. This directory contains the current PDB, in pdb, mmCIF, XML, nmr_restraint, and structure_factors directories, with the files divided according to a two letter organization. Entries are grouped by the middle two characters of the ident code. For example, entry file pdb1abc.ent can be found in pub/pdb/data/structures/divided/pdb/ab
/pub/pdb/data/structures/models	Theoretical model files that are maintained separately from the main archive
/pub/pdb/data/structures/obsolete	Structures and associated data files no longer part of the archive
/pub/pdb/derived_data	Plain text files that list information derived from all PDB entries, such as all PDB sequences in FASTA format.
/pub/pdb/doc	Documentation, including file format descriptions and RCSB PDB Newsletters
/pub/pdb/validation_reports	Validation reports files in mmCIF, PDF and XML formats and supporting data
/pub/pdb_ihm/data/entries/	Structures determined by integrative and hybrid structure determination methods (IHM) and associated data files
/pub/pdb_ihm/holdings/	Current PDB-IHM holdings, released IHM structures last modified dates, unreleased IHM entries

Other downloads offered by RCSB PDB

Some of the http links above are also available in a short style (e.g. /download/4hhb.cif.gz). Additionally, for the short style links 2 URLs are available:

view: The HTTP/HTTPS response headers to the client are set with: Content-Type: text/plain
download: The HTTP/HTTPS response headers to the client are set with: Content-Type: application/octet-stream and Content-Transfer-Encoding: binary

PDB entry files

PDB entry files are available in several file formats (PDBx/mmCIF, XML, BinaryCIF and legacy PDB for some entries), compressed or uncompressed, and with an option to download a file containing only "header" information (summary data, no coordinates).

File Format	Action	Storage Compression	Example URL
PDBx/mmCIF	Download	Compressed	https://files.rcsb.org/download/4hhb.cif.gz or https://files.rcsb.org/download/pdb_00004hhb.cif.gz
PDBx/mmCIF	Download	Uncompressed	https://files.rcsb.org/download/4hhb.cif or https://files.rcsb.org/download/pdb_00004hhb.cif
Biological Assembly File in PDBx/mmCIF	Download	Compressed	https://files.rcsb.org/download/5a9z-assembly1.cif.gz
Biological Assembly File in PDBx/mmCIF	Download	Uncompressed	https://files.rcsb.org/download/5a9z-assembly1.cif
PDBx/BinaryCIF	Download	Compressed	https://models.rcsb.org/4hhb.bcif.gz
PDBx/BinaryCIF	Download	Uncompressed	https://models.rcsb.org/4hhb.bcif
XML	Download	Compressed	https://files.rcsb.org/download/4hhb.xml.gz
XML	Download	Uncompressed	https://files.rcsb.org/download/4hhb.xml
XML (header only)	Download	Compressed	https://files.rcsb.org/download/4hhb-noatom.xml.gz
XML (header only)	Download	Uncompressed	https://files.rcsb.org/download/4hhb-noatom.xml
PDBx/mmCIF	View	Uncompressed	https://files.rcsb.org/view/4hhb.cif
PDBx/mmCIF (header only)	View	Uncompressed	https://files.rcsb.org/header/4hhb.cif
Biological Assembly File in PDBx/mmCIF	View	Uncompressed	https://files.rcsb.org/view/5a9z-assembly1.cif
XML (header only)	View	Uncompressed	https://files.rcsb.org/view/4hhb-noatom.xml

The following table contains all of the legacy PDB format URLs. Please note these are to be discontinued when the PDB transitions to extended PDB IDs.

File Format	Action	Storage Compression	Example URL
Legacy PDB	Download	Compressed	https://files.rcsb.org/download/4hhb.pdb.gz
Legacy PDB	Download	Uncompressed	https://files.rcsb.org/download/4hhb.pdb
Biological Assembly File in legacy PDB format	Download	Compressed	https://files.rcsb.org/download/1hh3.pdb1.gz
Biological Assembly File in legacy PDB format	Download	Uncompressed	https://files.rcsb.org/download/1hh3.pdb1
Legacy PDB	View	Uncompressed	https://files.rcsb.org/view/4hhb.pdb
Legacy PDB (header only)	View	Uncompressed	https://files.rcsb.org/header/4hhb.pdb
Biological Assembly File in legacy PDB format	View	Uncompressed	https://files.rcsb.org/view/1hh3.pdb1

Small molecule files

Small molecule files, including the ligands/chemical components maintained in the Chemical Component Dictionary and the Biologically Interesting Molecule Reference Dictionary (BIRD) are available in multiple formats.

Type	Format	Action	Example URL
BIRD atom representation	CIF	Download	https://files.rcsb.org/birds/download/PRDCC_000001.cif
BIRD definition	CIF	Download	https://files.rcsb.org/birds/download/PRD_000001.cif
Definition	CIF	Download	https://files.rcsb.org/ligands/download/HEM.cif
Ideal coordinates	SDF	Download	https://files.rcsb.org/ligands/download/HEM_ideal.sdf
Definition	CIF	View	https://files.rcsb.org/ligands/view/HEM.cif
Ideal coordinates	SDF	View	https://files.rcsb.org/ligands/view/HEM_ideal.sdf
BIRD definition	CIF	View	https://files.rcsb.org/birds/view/PRD_000001.cif
BIRD atom representation	CIF	View	https://files.rcsb.org/birds/view/PRDCC_000001.cif
Chemical Component Instance	SDF	View	https://models.rcsb.org/v1/4hhb/ligand?auth_asym_id=A&auth_seq_id=142&encoding=sdf
Chemical Component Instance	MOL	View	https://models.rcsb.org/v1/4hhb/ligand?auth_asym_id=A&auth_seq_id=142&encoding=mol
Chemical Component Instance	MOL2	View	https://models.rcsb.org/v1/4hhb/ligand?auth_asym_id=A&auth_seq_id=142&encoding=mol2

Experimental data files and 3DEM maps

This table includes structure factors, NMR constraints, chemical shifts, electron density maps and map coefficient files.

File Format	Action	Storage Compression	Example URL
Chemical Shifts	Download	Compressed	https://files.rcsb.org/download/2n2z_cs.str.gz
Chemical Shifts	Download	Uncompressed	https://files.rcsb.org/download/2n2z_cs.str
Chemical Shifts	View	Uncompressed	https://files.rcsb.org/view/2n2z_cs.str
Combined NMR data (NEF)	Download	Compressed	https://files.rcsb.org/pub/pdb/data/structures/divided/nmr_data/e5/1e52_nmr-data.nef.gz
Combined NMR data (NMR-STAR)	Download	Compressed	https://files.rcsb.org/pub/pdb/data/structures/divided/nmr_data/e5/1e52_nmr-data.str.gz
Structure Factors	Download	Compressed	https://files.rcsb.org/download/1btn-sf.cif.gz
Structure Factors	Download	Uncompressed	https://files.rcsb.org/download/1btn-sf.cif
Structure Factors	View	Uncompressed	https://files.rcsb.org/view/1btn-sf.cif
Electron Density 2Fo-Fc & Fo-Fc Map (might be downsampled) - BinaryCIF format	Download	Uncompressed	https://maps.rcsb.org/x-ray/6dil/cell/
NMR Restraints	Download	Compressed	https://files.rcsb.org/download/108d.mr.gz
NMR Restraints	Download	Uncompressed	https://files.rcsb.org/download/108d.mr
NMR Restraints	View	Uncompressed	https://files.rcsb.org/view/108d.mr
NMR Restraints v2	Download	Compressed	https://files.rcsb.org/download/108d_mr.str.gz
NMR Restraints v2	Download	Uncompressed	https://files.rcsb.org/download/108d_mr.str
NMR Restraints v2	View	Uncompressed	https://files.rcsb.org/view/108d_mr.str

Sequence data

Sequence data in FASTA format (full deposited sequence as in SEQRES records).

Please note that the FASTA download service at URL /pdb/download/downloadFastaFiles.do?structureIdList=4hhb&compressionType=uncompressedhas been discontinued. Users will need to migrate to the new endpoints below. Note that the output of the new endpoints are per entity (with chain identifiers provided in header) instead of per chain.

FASTA sequences per PDB entry	Download	Uncompressed	/fasta/entry/4HHB/download
FASTA sequence per polymer entity (identified by `<pdb_id>_<entity_id>`)	Download	Uncompressed	/fasta/entity/4HHB_1/download
FASTA sequence per polymer entity instance (chain) (identified by `<pdb_id>.<asym_id>`, please note this is the label_asym_id and not the author chain id)	Download	Uncompressed	/fasta/chain/4HHB.A/download
Sequences in FASTA format for all entries in the PDB archive	Download	Compressed	https://files.rcsb.org/pub/pdb/derived_data/pdb_seqres.txt.gz

Sequence clusters data

Results of the weekly clustering of protein sequences in the PDB by DIAMOND at 30%, 40%, 50%, 70%, 90%, 95%, and 100% sequence identity. Note that these files use polymer entity identifiers, instead of chain identifiers to avoid redundancy. The files are plain text with one cluster per line, sorted from largest cluster to smallest.

File	Type	Storage Compression	URL
Sequence clusters at 30% sequence identity clustering	Download	Uncompressed	https://cdn.rcsb.org/resources/sequence/clusters/clusters-by-entity-30.txt
Sequence clusters at 40% sequence identity clustering	Download	Uncompressed	https://cdn.rcsb.org/resources/sequence/clusters/clusters-by-entity-40.txt
Sequence clusters at 50% sequence identity clustering	Download	Uncompressed	https://cdn.rcsb.org/resources/sequence/clusters/clusters-by-entity-50.txt
Sequence clusters at 70% sequence identity clustering	Download	Uncompressed	https://cdn.rcsb.org/resources/sequence/clusters/clusters-by-entity-70.txt
Sequence clusters at 90% sequence identity clustering	Download	Uncompressed	https://cdn.rcsb.org/resources/sequence/clusters/clusters-by-entity-90.txt
Sequence clusters at 95% sequence identity clustering	Download	Uncompressed	https://cdn.rcsb.org/resources/sequence/clusters/clusters-by-entity-95.txt
Sequence clusters at 100% sequence identity clustering	Download	Uncompressed	https://cdn.rcsb.org/resources/sequence/clusters/clusters-by-entity-100.txt

Holdings data

PDB id holdings data in json format. For more information, see the data API documentation.

File	Type	Storage Compression	URL
All current PDB ids	Download	Uncompressed	https://data.rcsb.org/rest/v1/holdings/current/entry_ids
All unreleased PDB ids	Download	Uncompressed	https://data.rcsb.org/rest/v1/holdings/unreleased/entry_ids
All removed PDB ids (obsoleted entries or theoretical models)	Download	Uncompressed	https://data.rcsb.org/rest/v1/holdings/removed/entry_ids

Chemical Component Dictionary (CCD) Data

A subset of properties is provided for all components from the Chemical Component Dictionary (CCD) which describes chemical properties of all molecules in the PDB archive. The atom file (cca.bcif) provides the following CIF columns: atom_id, comp_id, charge, and pdbx_stereo_config. The bond file (ccb.bcif) provides the following CIF columns: atom_id_1, atom_id_2, comp_id, molstar_protonation_variant, pdbx_aromatic_flag, pdbx_stereo_config, and value_order.

This data can be used by the Mol* ModelServer.

File	Format	Action	URL
Chemical Component Atom Data	BinaryCIF	Download	https://models.rcsb.org/cca.bcif
Chemical Component Bond Data	BinaryCIF	Download	https://models.rcsb.org/ccb.bcif