Protein Structure Initiative
Print Page | Close Window

TargetDB Statistics Summary Report

                          Last updated: Jul 2 2008


Target Status Statistics

Total number of targets deposited by worldwide SG Centers in TargetDB: 188563

Table 1: TargetDB Status Statistics

Status Total Number of Targets (%) Relative to "Cloned" Targets(%) Relative to "Expressed" Targets(%) Relative to "Purified" Targets (%) Relative to "Crystallized" Targets
Cloned122933100.0---
Expressed8122066.1100.0--
Soluble3254326.540.1--
Purified2876023.435.4100.0-
Crystallized104518.512.936.3100.0
Diffraction-quality Crystals51954.26.418.149.7
Diffraction47503.95.816.545.5
NMR Assigned17661.42.26.1-
HSQC32402.64.011.3-
Crystal Structure38813.24.813.537.1
NMR Structure16841.42.15.9-
In PDB157104.67.019.939
Work Stopped30855-- --
Test Target1-- --
Other6963-- --

Last updated: Jul 2 2008

Note 1:   Number of targets with status "in PDB". A target may reference several PDB IDs (example: structure of the same polypeptides with different ligands). Multiple targets in TargetDB may identify the same PDB structure when a stucture is a result of collaboration between different centers and each center includes the target on its target list.

Figure 1: Experimental Status in TargetDB

Last updated: Jul 2 2008

This graph is normalized relative to number of cloned targets in TargetDB.
Targets that progressed to status "Cloned" constitute 65% of TargetDB.

Table 2: TargetDB Status Statistics by Organism

Organism Total Number1 Work Stopped Cloned Expressed Purified Crystallized Crystal Structure NMR Structure In PDB2
Total Viruses6031173432281243227831
Archaea134761758948864703098123560045679
Bacteria10994112416731525357819294777527821923103
Total Prokaryotes12341714174826406004822392901033822373782
Yeast260466419061335774112541450
Plasmodium5200336295712632006719019
Trypanosoma6419793974193030059908
Leishmania95972884575220840414621017
Arabidopsis809953904030127331982365287
Rice13410112563124101
Nematode1506834631264555904259928736
Fly87727117193425336
Mouse22819221773140475520367267336
Human12321440463544671265552015910801241
Other Eukaryotes194064614371111356112751494
Total Eukaryotes645401656439947209416242140947214371895
Synthetic303331123
Unknown101100000
Total18856430855122934812212876110452388216845711

Last updated: Jul 2 2008

Note 1:   Total counts in this table may differ from total number of targets. If targtet is a hybrid complex
(for example:a complex of human and mouse polypeptides) it is counted in different organism classifications.

Note 2:   Number of targets with status "in PDB". A target may reference several PDB IDs (example: structure of the same polypeptides with different ligands).
Multiple targets in TargetDB may identify the same PDB structure when a stucture is a result of collaboration between different centers and each center includes
the target on its target list.

Figure 2: Source Organisms in TargetDB

Last updated: Jul 2 2008 back to top


Deposited Structure Statistics

Number of released X-Ray structures reported to TargetDB: 4128

Number of released NMR structures reported to TargetDB: 1679

Number of released Cryo-Electron Microscopy structures reported to TargetDB: 3

Total number of released structures from worldwide SG Centers reported to TargetDB: 5810

View list of all reported to TargetDB structures deposited by worldwide SG Centers to the PDB

Table 3: PDB Status Statistics for Structural Genomics Structures

StatusAll CentersPSI CentersNon-PSI SG Centers in North America SG Centers in EuropeSG Centers in Asia
Total Deposited594330401111342670
Released581029431081302641
Release on Publication2701026
Release on Certain Date4441003
In Process6256240
Last updated: Jul 2 2008
1:   Some PDB IDs are cross referenced by different centers. Example: PDB_id 106Y is associated with SPINE and TB centers. Therefore difference between number of structures in "ALL Centers" column and direct sum of number of structures from projects/geographical regions can be observed.
2:   "Total Deposited" are all structures in the PDB including structures released to the public and structures that are in the process to be released("Released on Publication" , "Released on Certain Date", etc.).

Figure 3: Structures Released by SG Centers by Year

Last updated: Jul 2 2008 back to top


Sequence Redundancy Statistics

Table 4: TargetDB Sequence Redundancy Statistics by Experimental Status

Sequence Identity(%)Novel Targets
Status:
Selected
Novel Targets
Status:
Cloned
Novel Targets
Status:
Expressed
Novel Targets
Status:
Purified
Novel Targets
Status:
Crystallized
Novel Targets
Status:
Crystal Structure
Novel Targets
Status:
NMR Structure
Novel Targets
Status:
in PDB
<1001266729092260928237019052348815664875
<901160748518457312224088643332315464685
<701052027862153545212598429327714224527
<50862556665846023186487826312112694214
<3048127405692886712470602225778943295
Last updated: 08-04-08
Sequence redundancy is calculated by clustering analysis using BLASTClust program with similarity threshold set to percent of sequence identity.   Please view detailed explanation of sequence redundancy calculations and BLASTClust threshold settings.  Sequence redundancy calculations are based on comparison to all protein sequences in TargetDB which are in the same experimental status category and at least 20 amino acids long

Table 5: Sequence Redundancy Statistics for Structures Released by SG Centers in the PDB by Year

YearReleased Structures Number of Released Structures <30% Sequence Identity at Time of Release Percent(%) of Released Structures <30% Sequence Identity at Time of Release
<= 2000843137
2001592441
20021525838
200338815440
200491138442
2005102336736
2006108745542
2007154858338
200855822941
Total5810228539
Last updated: 08-07-02
Sequence redundancy is calculated by clustering analysis using BLASTClust program with similarity threshold set to percent of sequence identity.   Please view detailed explanation of sequence redundancy calculations and BLASTClust threshold settings.  Sequence redundancy calculations are based on comparison to all protein sequences in the PDB which are at least 20 amino acids long

Figure 4: Comparison of Novel Structures with Number of Structures Released By SG Centers

Last updated: 08-07-02
Sequence redundancy is calculated by clustering analysis using BLASTClust program with similarity threshold set to percent of sequence identity.   Please view detailed explanation of sequence redundancy calculations and BLASTClust threshold settings.  Sequence redundancy calculations are based on comparison to all protein sequences in the PDB which are at least 20 amino acids long
back to top

Summary Statistics Reports by Project or Geographical Region:

© RCSB PDB