RCSB PDB Protein Data Bank A Member of the wwPDB
An Information Portal to Biological Macromolecular Structures
Print Page | Close Window

TargetDB Document Type Definition(DTD): clearly explained

Content:


Center-specific Target Identifier

  • A unique identifier for the target sequence defined by each structural genomics center.

Examples:
  • WR90EC
  • NYSGRC-P007
XML Presentation
<id>WR90EC</id>
back to top

Laboratory Name

  • Laboratory name

Examples:
  • Southeast Collaboratory for Structural Genomics
  • National Institute for Medical Research, UK
XML Presentation
<lab>Southeast Collaboratory for Structural Genomics</lab>
back to top

Date of Target Update

  • Date on which this target information was updated - yyyy-mm-dd

Example:
  • 2001-02-14
XML Presentation
<date>2001-02-14</date>
back to top

Status

  • The status of the target sequence. One or more descriptive terms shown below.

Examples:
  • Selected
  • Cloned
  • Expressed
  • Soluble
  • Purified
  • Crystallized
  • Diffraction-quality Crystals
  • Diffraction
  • HSQC
  • NMR Assigned
  • Crystal Structure
  • NMR Structure
  • In PDB
  • Work Stopped
  • Test Target
  • Other
XML Presentation
<status>Selected</status>
<status>Cloned</status>
<status>Expressed</status>
<status>Soluble</status>
<status>Purified</status>
<status>Crystallized</status>
<status>Diffraction</status>
<status>Diffraction-quality Crystals</status>
<status>Crystal Structure</status>
<status>In PDB</status>
     
back to top

Protein Sequence

  • The one-letter code sequence for FASTA comparison

Example:
  • MKTIIALSYIFCLVFAQDLPGNDNNSTATLCLGHHAVPNGTLVKTITNDQIEVTNATELV
             
XML Presentation
<sequence>MKTIIALSYIFCLVFAQDLPGNDNNSTATLCLGHHAVPNGTLVKTITNDQIEVTNATELV</sequence>
back to top

Protein Name

  • The name of the protein for the target sequence.

Examples:
  • Glutamate synthase
  • 29-C10
XML Presentation
<name>Glutamate synthase</name>
back to top

Related URL

  • Universal resource locator/internet address related to this target. This can be a link to a project site containing more information about this target, or some other related site address.

Example:
  • http://www.doe-mbi.ucla.edu/TB/PUBLIC/qs/qsearch.php?dowork=Rv2031c
XML Presentation
<url>http://www.doe-mbi.ucla.edu/TB/PUBLIC/qs/qsearch.php?dowork=Rv2031c</url>
back to top

Remarks

  • Additional text details about this target

Example:
  • An important comment about this target.
XML Presentation
<remark>An important comment about this target</remark>
back to top

Source Organism

  • The scientific name of the source organism for the target sequence following the nomenclature of the NCBI Taxonomy database (http://www.ncbi.nih.gov/Taxonomy).

Examples:
  • Arabidopsis thaliana
  • Escherichia coli
  • Caenorhabditis elegans
XML Presentation
<sourceOrganism>Arabidopsis thaliana</sourceOrganism>
back to top

Database References

  • Describing related information about this target including a database name and an identifier within the database.

XML Presentation
<databaseRef>
  <databaseName>Genbank</databaseName>
  <databaseId>189676547</databaseId>
</databaseRef>
<databaseRef>
  <databaseName>PDB</databaseName>
  <databaseId>1y63</databaseId>
</databaseRef>
<databaseRef>
  <databaseName>PDB</databaseName>
  <databaseId>2tem</databaseId>
</databaseRef>
back to top

Database Name

  • The database name

Examples:
  • PDB
  • BMRB
  • Genbank
XML Presentation
<databaseName>PDB</databaseName>
back to top

Database ID

  • The accession code for the named database.

Examples:
  • 189676547
  • 1y63
XML Presentation
<databaseId>1y63</databaseId>
back to top

FASTA Sequence Comparison Details

  • Pearson, W.R. and Lipman, D.J. Improved tools for biological sequence comparison.   PNAS 85:2444-2448(1988)

The E()-value cutoff limits the number of scores and alignments shown based on the expected number of scores. A cutoff value of 2.0 (-E 2.0) will show all library sequences with scores with an expectation value <= 2.0.

For protein searches, matched sequences with E()-values < 0.01 for searches of 10,000 protein sequences are almost always homologous. Frequently sequences with E()-values from 1 - 10 are related as well. However, E()-values also reflect differences between the amino acid composition of the query sequence and that of the "average" database sequence. Thus, when searches are done with query sequences with "biased" amino-acid composition, unrelated sequences may have "significant" scores because of sequence bias.

FASTA is available from ftp://ftp.virginia.edu/pub/fasta/.
back to top

XML Format

  • Example of target entry in XML format.


The XML format follows the recommendations of the Task Force on Target Tracking. [XML format DTD]

Example target:
<target>
<id> Pfu-157236-001 </id>
<lab> Southeast Collaboratory for Structural Genomics </lab>
<date> 2001-06-18 </date>
<status> Proposed </status>
<status> Active Target </status>
<sequence>
 MLKIDLSGKLAFTTASSKGIGFGVAKVLAMAGADVIILSRNEENLKKAKEKIKEIADVNVEYIVADLTKKEDLER
 IVEVKNIGDPDIFFYSTGGPKPGYFMEMTMEDWEEAVKLLLYPAVYLTRALVPGMEKKGFGRIIYSTSVAIKEPI
 PNIALSNVVRISLAGLVRTLAKELGPKGITVNGIMPGIIRTDRVIQLAQDKARREGKSLEEALQDYAKPIPLGRL
 GEPEEIGYLVAFLSSELGSYINGAMIPVDGGRLNSVF  
</sequence>
<name> putative alcohol dehydrogenase/reductase </name> 
</target> 
         
back to top
© RCSB PDB