==================================================== TargetDB files Last updated: June 19, 2008 ==================================================== Database files: *************** target.sql - TargetDB tables "target_complex" and "targ_sequence". These two tables together contain all data items provided by depositors. Please see target.dtd for details: http://targetdb.pdb.org/target.dtd pdbt_id is incremental ID assigned to each target entry at parsing of the data file provided by depositors. Sequence_id is an internal protein sequence id assigned to each distinct sequence in the target sequence list. Distinct sequence is a sequence(string) that differs from other sequences in the database by at least one character.Distinct sequence is a sequence(string) that differs from other sequences in the database by at least one character. targetpdb_info.sql - TargetDB table targetpdb_info This table represents a list of structures in the PDB that are determined by SG centers and included into target data file. XML files: ********** targets.xml All targets in TargetDB. The file is built according to the old version of the TargetDB DTD: target.dtd targets.xml.zip/targets.xml.gz - zipped files of targets.xml targetsV2.xml All targets in TargetDB. The file is built according to the latest version of the TargetDB DTD: target.v2.dtd targetsV2.xml.zip/targetsV2.xml.gz - zipped files of targetsV2.xml lablist.xml - list of SG laboratory names as provided by depositors. FASTA files: ************ targets.fa.gz targets in FASTA format. >header: 'target id'_'SG center initials'_'sequence_id'_'internal database counter'_ targets_distinct.fa.gz Distinct targets in FASTA format. Distinct sequence is a sequence(string) that differs from other sequences in the database by at least one character. >header: 'sequence_id'|'SG center initials'_'target_id' if sequence is reported by more than one SG centers than header contains all the 'SG center initials'_'target_id' pairs separated by a bar. Example: >SQ154604|NESGC_MbR227|NON-SSGCID_MytuD.00010.a Tab delimited TXT files: ************************ targets.txt.gz - zipped, tab delimited output of TargetDB tables "targ_sequence" and "target_complex" targetpdb_info.txt - tab delimited output of TargetDB table "targetpdb_info" tarDBstructures.txt (List of SG structures) - Tab delimited version of the table: http://targetdb.pdb.org/statistics/pdb_targetdb_title.html This is a parsed version of targetpdb_info table. pdbt_id and method columns were excluded from this file. Related PDB_ID(s): PDB_ID(s)that are referenced by the same target in TargetDB. lablist.txt - list of SG laboratory names as provided by depositors. tarDBorganisms.txt - List of source organisms in TargetDB. This is a list of target IDs and source organism names. If source organism is not provided by depositors organism name is derived by parsing of BLAST output of the target sequence against NR database (NCBI). SOFTWARE: ************************ targetdb_search.pl - perl script that performs TargetDB search. The output file is the query result in XML format. ==================================================== if you have any questions or suggestions please contact: target-help@deposit.rcsb.org