TargetDB | Target deposition to TargetDB
TargetDB provides a very simple picture of the target progress. A more detailed
description is provided by the Protein Expression,
Purification, and
Crystallization Database (PepcDB, http://pepcdb.rcsb.org), which includes detailed status information as well as protein
production protocols.
The TargetDB database is updated on each Wednesday, when target data
files are downloaded from each contributing structural genomics
center. Targets are submitted to TargetDB in XML format (document type
definition). To add target information from your center to TargetDB,
send mail to target-help@deposit.rcsb.org
Preparing data file for TargetDB
Key data elements:
- Center-specific target identifier
- Laboratory name
- Date of status update (YYYY-MM-DD)
- Target status
- Protein sequence (using IUPAC 1-letter codes)
- Protein name
- Related URLs
- Additional remarks
- Target source organism
- Database references for sequences and solved structures
Detailed considerations:
Once a target is deposited to TargetDB it cannot be removed from the
database. Target data elements such as target ID, protein
sequence, source organism, and protein name should not be
subjected to change. Target data elements that can be
modified are date of status update, target status, database
references, related URLs, and remarks.
Center specific target identifier
Target identifiers should uniquely represent a single protein
sequence. All of the information about a particular target should be
provided within one target element. In cases where a target is pursued
by multiple experimental tracks, the progress of all approaches should
be summarized within a single target data element.
A number of common situations arise in organizing target data for TargetDB:
Ortholog and homolog sequences
Protein orthologs/homologs should be submitted using different target
identifiers. Relationships between ortholog and homolog sequences can
be represented in a database reference to another TargetDB sequence and
as a remark record.
Multiple experimental approaches
If structure determination for the same protein is pursued by both X-ray
and NMR, target progress should be represented in a single TargetDB entry.
In this situation, all experimental steps specific for both approaches
should be merged in one status list. If structures from both NMR and X-ray
are solved and deposited in PDB then both PDB identifiers should be noted
as database references.
Protein complexes
Protein complexes should be identified in TargetDB as unique
entries. Components (protein sequences) of protein complexes
should be represented by a single TargetDB entry. In this situation,
the components of protein complexes should be listed in the "target
sequence list".
Work on target performed by multiple laboratories within a SG center
If work on definition of target structure is a joint effort of several
laboratories, target progress should be represented by a single TargetDB
entry. In this situation all participating laboratory names should be
included in laboratory name record.
Laboratory name
Name of a laboratory within the contributing structural genomics center.
Date of status update
Date on which this target information was updated in the following
format - yyyy-mm-dd.
Target status
Target status provides a measure of the progress in obtaining a protein
structure. Summary statistics for individual structural genomics centers
and overall statistics are calculated weekly.
(http://targetdb.rcsb.org/statistics/TargetStatistics.html)
It is important that all experimental steps associated with a particular
target are provided in the target status list. In order for meaningful
analysis to be applied to target status data a logical sequence of
experimental steps must be provided. For example if the current target
status is "Purified" the target status list should also include "Selected", "Cloned", "Expressed", and "Soluble".
Once a target is solved and deposited in the PDB it should be identified
as status "In PDB". In addition, the PDB data code should be included as
a database reference.
Protein Sequence.
The one-letter code sequence for FASTA comparison
Target source organism
The scientific name of the source organism for the target sequence following
the nomenclature of the NCBI Taxonomy database (http://www.ncbi.nih.gov/Taxonomy).
Database references
Database name and database identifier for target sequences.
For example: GeneBank, Swiss-Prot, UniProt
Protein name
The name of the protein for the target sequence.
Related URLs
Universal resource locator/internet address related to this target. This can
be a link to a project site containing more information about this target,
or some other related site address.
Additional remarks
Any additional useful information about a target.
For example:
relation to other targets in TargetDB as part of protein complex or
ortholog/homolog.
Questions may be sent to
target-help@deposit.rcsb.org.
|