RCSB PDB Help
Search and Browse > Advanced Search
Search Examples
Structure Attributes
Search by a wide range of metadata associated with macromolecular structures. This includes annotations defined in the mmCIF dictionary (e.g. Source Organism), external value-added annotations (e.g. Enzyme Classification, Gene Ontology), and ligand-specific information (e.g. Chemical Name). You can mix these attributes using AND and OR logic to build complex queries.
Macromolecules
The polymers in PDB structures can be proteins, DNA, RNA, and DNA/RNA hybrids. Polymer instances (a.k.a chains) are the individual copies of distinct macromolecules. A structure may contain multiple copies of identical macromolecules.
Macromolecular Composition Search
- This search example queries structures with a single protein chain (and no others) of length between 350-400 residues;
- This search example queries structures that contain an RNA polymer, regardless of what other polymer types the structures may or may not contain.
Macromolecule Type Search
- This search example queries for all membrane protein structures in the PDB (as annotated by the resources PDBTM, MemProtMD, OPM, or mpstruc).
Modified Residues Search
Modified residues are non-standard polymeric components (i.e. non-standard amino acids in protein sequences or non-standard nucleotides in nucleic acid sequences).
This example shows how to find structures with modified residues.
Chimeric Macromolecular Entities Search
Polymeric sequences in the PDB are some times engineered by fusing sequence fragments from different organisms. These are known as chimeric entities. This search will find any PDB entry containing chimeric entities.
Assemblies
The biological assembly is the arrangement of macromolecules in the structure that is believed to be the biologically meaningful molecular assembly.
Assembly Composition
Below are examples of searches that query biological assemblies with different compositional features:
- This search example queries the total number of polymers in the biological assembly, regardless of whether that includes multiple identical molecules or different molecules.
- This search example queries biological assemblies with a single protein chain (and no others) of length between 350-400 residues.
- This search example queries biological assemblies that contain exactly 24 identical chains. For example, the biological assembly of the ferritin 1aew is comprised of 24 copies of a single polymer chain.
- This search example queries for immunoglobulin Fab fragments bound to a dimeric antigen (i.e., the assembly should have 2 Fab heavy chains, 2 Fab light chains, 2 antigen chains) using a stoichiometry based search (A2B2C2) AND a structure based search for a Fab light chain (e.g., using the PDB structure 1bj1, chain A).
- This search example queries for assemblies in the PDB that contain at least one heavy water (or DOD).
Ligands
Ligands are chemical substances that form a complex with larger biomolecule(s).
Free vs. Polymeric Ligands
Most ligands are considered “standalone ligands” that interact non-covalently with macromolecules. Less frequently, ligands can be covalently linked to macromolecules or other heterogen groups.
Find structures with adenosine triphosphate (ATP) where:
- ATP is present as a standalone ligand
- ATP is present as a covalently linked ligand
Structure-Ligand Complexes
This search example queries complexes with ligands of any type.
You can also narrow down this search to include only complexes with specific features. For example:
- This search example queries the protein-ligand complexes solved using X-ray diffraction experimental technique;
- This search example queries the complexes of proteins from Staphylococcus aureus (strain N315) with ligands.
- This search example queries the DNA-ligand complexes from structures with following experimental details:
- Experimental method: X-Ray diffraction
- Refinement X-Ray Resolution: 0-2
- Refinement R-Factors (R Work): 0-0.2
- Refinement R-Factors (R Free): 0-0.214
- Has Experimental data: Yes
Ligand Of Interest (LOI)
Structures may include small molecules annotated as "ligands of interest", meaning that a small molecule is a subject of the author’s research.
This search example queries structures that contain "ligand(s) of interest".
Binding Affinity
You can search for structure-ligand complexes with associated binding affinity data coming from BindingDB resource.
Binding affinity measurement are of one of the following types:
- IC50: the concentration of ligand that reduces enzyme activity by 50%;
- EC50: the concentration of compound that generates a half-maximal response;
- Kd: dissociation constant;
- Ka: association constant;
- Ki: enzyme inhibition constant;
- ΔG: Gibbs free energy of binding (for association reaction);
- ΔH: change in enthalpy associated with a chemical reaction;
- -TΔS: change in entropy associated with a chemical reaction.
The concentration constants (IC50, EC50) and binding constants (Ki, Kd) are given in nM; The thermodynamic parameters (ΔG, ΔH, -TΔS) are given in kJ/mol; Association binding constant (Ka) is given in M-1.
For example, this search returns structure-ligand complexes with an EC50 = 2 nM, e.g. the Thyroid Hormone Receptor from 3GWS structure has an EC50 of 2 nM for 3,5,3'TRIIODOTHYRONINE (T3).
Drugs
A variety of information about small-molecule drugs from DrugBank is available for searching the PDB archive. This includes the drug’s target name, brand name, classification (e.g., approved, investigational, withdrawn), and market availability (United States, Europe, and Canada). These features can be used to find PDB structures in which a specific drug molecule is present as a ligand.
Approval and Market Availability
Small molecule drugs can be searched based on their availability on one of the following markets: US, EU, Canada. All new drugs in the U.S. should be shown to be safe and effective for their intended use prior to marketing and FDA approval is required.
Use this search to find structures with drugs that were approved for the use on the US market at any point in history.
This search example includes only those structures with drugs that are currently on the market, - this can be done by leaving the Drug Marketing End field void.
Withdrawn
Following their approval for use in a clinical setting, some drugs may be withdrawn due to harmful side-effects. For example, the painkiller Vioxx, also known as Rofecoxib (RCX in PDB entry 5kir) was recalled due to discovery of increased chances of heart attack and stroke.
This search example helps find structures with withdrawn drugs
Publications
Search for PDB structures that do not have a publication associated with it.
This search example queries for structures in the PDB that have the primary publication journal listed as "To be published".
Chemical Attributes
Chemical components include all residues (present in protein or nucleic acid sequences), small molecules (ligands) as well as peptide-like antibiotic and inhibitor molecules found in the PDB archive.
Search for small molecules using metadata from the CCD (e.g., Chemical Name, Molecular Weight) and external value-added annotations (e.g., PubChem ID, DrugBank ID).
Synonyms
Synonym-based chemical searches allow you to query chemical components using alternative names commonly used in the literature.
This search example finds the small molecule Aciclovir using synonym-based chemical component search.
Computed Structure Models (CSMs)
As of August 2022, CSMs predicted by AlphaFold2 (Jumper et al., 2021) and RoseTTAFold (Baek et al., 2021)] are available from RCSB.org for query, visualization, and analysis.
No Experimental Structures Available
To search for mouse proteins that have CSMs but do not have a corresponding experimental structure.
Approach the problem as follows to see examples:
1. Search for all mouse sequences
2. Group by UniProt ID
3. Order results by group size (starting from smallest)
The groups of size 1 that are listed first mostly contain only models with no experimental data.
Predicted Structure Confidence
Query for high-quality (pLDDT > 90) computed structure models of human proteins.














