Protein Structure Initiative
An Information Portal to Biological Macromolecular Structures

TargetDB | Target deposition to TargetDB

TargetDB provides a very simple picture of the target progress. A more detailed description is provided by the Protein Expression, Purification, and Crystallization Database (PepcDB, http://pepcdb.rcsb.org), which includes detailed status information as well as protein production protocols.

The TargetDB database is updated on each Wednesday, when target data files are downloaded from each contributing structural genomics center. Targets are submitted to TargetDB in XML format (document type definition). To add target information from your center to TargetDB, send mail to target-help@deposit.rcsb.org

Preparing data file for TargetDB

Key data elements:

  1. Center-specific target identifier
  2. Laboratory name
  3. Date of status update (YYYY-MM-DD)
  4. Target status
  5. Protein sequence (using IUPAC 1-letter codes)
  6. Protein name
  7. Related URLs
  8. Additional remarks
  9. Target source organism
  10. Database references for sequences and solved structures

Detailed considerations:

Once a target is deposited to TargetDB it cannot be removed from the database. Target data elements such as target ID, protein sequence, source organism, and protein name should not be subjected to change. Target data elements that can be modified are date of status update, target status, database references, related URLs, and remarks.

  1. Center specific target identifier

    Target identifiers should uniquely represent a single protein sequence. All of the information about a particular target should be provided within one target element. In cases where a target is pursued by multiple experimental tracks, the progress of all approaches should be summarized within a single target data element. A number of common situations arise in organizing target data for TargetDB:
    • Ortholog and homolog sequences
      Protein orthologs/homologs should be submitted using different target identifiers. Relationships between ortholog and homolog sequences can be represented in a database reference to another TargetDB sequence and as a remark record.
    • Multiple experimental approaches
      If structure determination for the same protein is pursued by both X-ray and NMR, target progress should be represented in a single TargetDB entry. In this situation, all experimental steps specific for both approaches should be merged in one status list. If structures from both NMR and X-ray are solved and deposited in PDB then both PDB identifiers should be noted as database references.
    • Protein complexes
      Protein complexes should be identified in TargetDB as unique entries. Components (protein sequences) of protein complexes should be represented by a single TargetDB entry. In this situation, the components of protein complexes should be listed in the "target sequence list".
    • Work on target performed by multiple laboratories within a SG center
      If work on definition of target structure is a joint effort of several laboratories, target progress should be represented by a single TargetDB entry. In this situation all participating laboratory names should be included in laboratory name record.
  2. Laboratory name

    Name of a laboratory within the contributing structural genomics center.
  3. Date of status update

    Date on which this target information was updated in the following format - yyyy-mm-dd.
  4. Target status

    Target status provides a measure of the progress in obtaining a protein structure. Summary statistics for individual structural genomics centers and overall statistics are calculated weekly. (http://targetdb.rcsb.org/statistics/TargetStatistics.html) It is important that all experimental steps associated with a particular target are provided in the target status list. In order for meaningful analysis to be applied to target status data a logical sequence of experimental steps must be provided. For example if the current target status is "Purified" the target status list should also include "Selected", "Cloned", "Expressed", and "Soluble". Once a target is solved and deposited in the PDB it should be identified as status "In PDB". In addition, the PDB data code should be included as a database reference.
  5. Protein Sequence.

    The one-letter code sequence for FASTA comparison
  6. Target source organism

    The scientific name of the source organism for the target sequence following the nomenclature of the NCBI Taxonomy database (http://www.ncbi.nih.gov/Taxonomy).
  7. Database references

    Database name and database identifier for target sequences. For example: GeneBank, Swiss-Prot, UniProt
  8. Protein name

    The name of the protein for the target sequence.
  9. Related URLs

    Universal resource locator/internet address related to this target. This can be a link to a project site containing more information about this target, or some other related site address.
  10. Additional remarks

    Any additional useful information about a target. For example: relation to other targets in TargetDB as part of protein complex or ortholog/homolog.

    Questions may be sent to target-help@deposit.rcsb.org.

© RCSB PDB