Contents:
Inter-chain beta-sheet contacts play in particular a structural role in protein-protein interactions that are central to healthy biological function and diseases ranging from AIDS and cancer to Alzheimer's and Huntington's diseases. Beta-sheets that extend over more than one protein chain can be spotted through the analysis of the atom coordinates of known protein structures, or through the analysis of the corresponding secondary structure, as defined by programs such as DSSP.
The ICBS database is intended as a tool to:
Note that there might be occasional delays between the release of a new / modified PDB structure and the release of the corresponding new / modified PQS structures. Some PQS structures might thus appear or be updated with some delay in the ICBS database with respect to the release of the corresponding PDB entries.
So far, however, the likely quaternary structures corresponding to PDB structures, which are generated by the EBI from PDB structures, are only available as PDB files. Moreover, mmCIF files do not yet represent the official release of the PDB structures. For these reasons, the current version of the ICBS database relies on PDB files. This prompted a number of methodological choices linked to limitations of the PDB file format or to variations or errors in the PDB files. When mmCIF files are officialy release or when an API is provided to directly query the database for PDB and for PQS entries, then, a new, improved version of the ICBS database will be generated. For further details, please refer to:
Obtention of structure coordinatesPDB files are downloaded from the PDB ftp site .PQS files are downloaded from the PQS web server. We only consider the few PQS-entry types that are relevant to the ICBS database, namely:
Determination of the secondary structureThe DSSP (Define Secondary Structure of Proteins) program is then run on every retrieved file, to produce a secondary structure files for each coordinates file.Identification of inter-chain ß-sheets and number of hydrogen bondsThe secondary structure files are then scanned to spot ladders that join residues belonging to different chains. For each pair of chains that are found to interact through ß-sheets, we estimate the number of hydrogen bonds in the ICBS interface, as the number of beta-bridges in the ladders that contribute to the inter-chain ß-sheet interfaces.See a scanned version of the DSSP paper (page 19 of 21) for a definition of secondary structure elements such as hydrogen bonds, bridges, ladders, sheets, etc. Note that we take into account single beta-bridges, i.e., ladders of length 1. Single beta-bridges can strengthen the effect of regular ladders (of length > 1), and they are therefore taken into account for the computation of the ICBS index. The ICBS database thus contains some proteins for which the inter-chain interface is reduced to a single beta-bridge, and whose ICBS index is consequently very low. The query interface lets users exclude ICBS entries whose ICBS interface consists solely of single beta-bridges. Characterization of the strength of inter-chain ß-sheets: ICBS indexIn order to characterize and rank inter-chain ß-sheet interactions, we developed a simple 'ICBS' index. This index attempts to capture the relative importance of the ß-sheet interactions in the overall inter-chain interface.We first scan coordinate files and count all heavy atom contacts between chains that pair through inter-chain ß-sheets. Two heavy atoms belonging to different chains are here considered to be in contact when they are less than 4.4 Angstroms apart. The ICBS index is then obtained for each pair of chain by forming the ratio between the number of hydrogen bonds in the ß-sheets between the 2 chains, and the number of heavy atom contacts. For readability purposes, the result is multiplied by 1000. The highest value over all pairs is retained for a global characterization of the ICBS interface at the level of the whole structure. Homogeneity and sense of the ICBS interfaceFor each pair of chain, we determine two additional characteristics.HomogeneityTwo chains are considered identical when they have the same number of residues.When identical chains pair through inter-chain ß-sheets, their interface is considered homogeneous. It is considered heterogeneous when the 2 chains are different. The overall homogeneity of the interface at the level of the whole structure is derived from the homogeneity of all pairs of chains. See the explanation of the Homogeneity column of the results table. OrientationDepending on the parallel or anti-parallel nature of the single or multiple ladders found in the inter-chain ß-sheets, the interface is considered parallel, anti-parallel, or mixed.See the explanation of the Orientation column of the results table. See a scanned version of the DSSP paper (page 19 of 21) for a definition of parallel and antiparallel ladders and sheets. Redundancy of the ICBS interfaceSome ICBS entries correspond to likely protein quaternary structures, i.e., to PQS entries. They can contain the same ICBS interfaces as the PDB structure they are derived from, or they can contain new, unique ICBS interfaces. The query form lets users choose whether to display unique and/or redundant entries.A simple but effective redundancy criterion is used. An ICBS entry corresponding to a PQS structure is considered redundant if one of the following conditions is met:
Note on some methodological choices that were prompted by limitations in the format of the source data, or by errors and variations in the data.Automatically spotting and analyzing inter-chain beta-bridges, ladders and beta-sheets in PDB files representing PDB and PQS entries is a challenging task for a number of reasons:
Unique chain IDs versus non-unique chain labelsPDB and DSSP chain labels are one-character long. Upper- and lower-case letters, digits, and non-alphanumerical characters are used, thus providing for at least 62 unique chain identifiers. However, this is not sufficient for some large macro-molecules; in addition, not all authors use non-alphanumerical labels.Instead of rebuilding unique ICBS chain identifiers and mapping them to non-unique PDB and DSSP labels, we currently ignore chains whose label has previously been used for another chain in the same entry. This only affects the results of some very large PQS macro-molecules. As these entries consist of numerous repetitions of the same PDB asymmetric unit, and of the same interfaces between them, this solution is not likely to cause any unique ICBS interaction to be missed. Unique beta-bridge, ladder and ß-sheet IDs versus non-unique labelsDSSP labels beta-bridges with a letter. Case is used to denote the parallel or anti-parallel nature of the bridge. Only 26 unique identifiers are thus available in each category. ß-sheets are labeled with capital letters. Therefore, one theoretically would have to rebuild the whole connectivity of ß-bridges and ladders to uniquely identify bridges, ladders and ß-sheets.Instead of adopting this costly and error-prone solution, we created an ICBS ladder label that is very unlikely to be non-unique, by combining the DSSP ladder label, the label of the ß-sheet it belongs to, and the labels of the two pairing chains. We are thus able to count ladders and hydrogen-bonds, and to determine the interface orientation for each pair of chains. This method might cause some overestimations in the number of ladders in the probably rare cases where a same ladder joins more than 2 chains. This would however be of no consequence as far as the data we display in the database interface are concerned. HETATM PDB rows and ß-bridges that occur in "non-standard" groupsHETATM rows of PDB files specify "non-standard" groups that sometimes contain amino-acid residues. Such residues are sometimes found to participate in ß-bridges with other residues belonging to "standard" chains specified in ATOM rows, or to other "non-standard" groups. Some of these bridges are relevant to the characterization of inter-chain interfaces.However, because of the numerous variations in the way HETATM rows were used and placed in PDB files, taking into account HETATM rows in the analysis of inter-chain ß-sheets has a number of undesirable effects. For the time being, we therefore only consider beta-bridges and atom contacts between residues that belong to "standard" chains, i.e. residues whose coordinates are provided in PDB ATOM rows. As a consequence, we might under- or over-estimate the ICBS index for a few ICBS entries, and we might miss a few proteins in which ICBS interactions would occurr only between residues specified in HETATM rows. Please refer to the PDB documentation for details on ATOM and HETATM PDB records. Case of very large entries that break DSSP formatIn some rare cases, the DSSP sequential number that idenfies residues can be higher than 9999. As the 'BP1' and 'BP2' DSSP fields that specify ß-bridges partners are 4-character long, partners corresponding to a sequential number >= 10000 cause DSSP to output rows that break its own format. There is no 100%-safe way of resolving the problem.When such cases are encountered, we simply ignore the corresponding DSSP rows. As a consequence, we might occasionally miss some inter-chain ß-sheet interactions. Case of inconsistent DSSP partner and/or secondary structure assignmentWhen the output of the DSSP programs specifies that a residue R1 'pairs' with another residue R2 through a beta-bridge, one would expect that residue R2 pair with R1. However, this is not the case in some cases, presumably due to precision or secondary-structure assignment problems in the DSSP program.
Given our current method to count 'partnerships' in ladders and inter-chain interfaces
(i.e., counting every partnership twice, once for R1, once for R2, and then dividing by 2),
this DSSP problem might cause an underestimation of the number of Hydrogen-bonds and of the ICBS index.
To limit the impact of this rare problem, we increase the number of any fractional
number of partnerships to the next higher integer.
PDB header nameEntries containing the entered word or text fragment in their PDB header name will be selected.PDB IDEnter or paste one or more space separated PDB codes. Spaces and new line characters will be ignored. Entries corresponding to the PDB code(s) will be selected.Entry originThe ICBS database contains entries corresponding to PDB structures, as well as hypothetical quaternary structures derived from them. One can select entries corresponding to one, to the other, or to both categories.PQS entriesWhile some entries corresponding to PQS structures contain unique ICBS interactions, that are not present in the corresponding PDB structure, some do not and are thus redundant. The query form lets one select which categories of PQS entries should be displayed: only entries with unique ICBS interactions; only entries with ICBS interactions that are redundant (with respect to the corresponding PDB structure); both.Please note that this selection criterion does not exclude entries based on protein sequence redundancy; it is for ICBS-interface redundancy. Ladder lengthThe ICBS interface between two chains may contain single beta-bridges (ladders of length 1) only. One can display: entries for which at least one pair of chains has an ICBS interface involving ladders of length greater than 1; entries for which only single-bridge ladders occur; both categories.Please note that the interface between two chains can comprise several single beta-bridges. Please refer to the Methods section for details on secondary structure determination. JournalSelect one or more abbreviated Journal names. Entries whose PDB structure's primary citation was published in this (these) journal(s) will be selected.Please see the note on the information extracted from PDB files for an explanation of why there are 'missing' values and/or some discrepancies with what the web interface to the PDB database shows. Deposition date or revision dateEnter a lower and/or higher limit to set a selection criteria on the deposition date or the revision date of the PDB structure corresponding to the ICBS entries.To display recent additions to the ICBS database only, one can set a lower limit on the PDB revision date. All ICBS entries (PDB and PQS structures) whose PDB revision date is higher than the limit will be displayed. Please note that the release of PQS structure corresponding to new or modified PDB entries might occasionally suffer some delay. In such cases, some PQS structures will appear / be updated with subsequent updates of the ICBS database. TechniqueSelect one or more experimental techniques. ICBS entries whose PDB structure was obtained using this (these) technique(s) will be selected.ICBS indexEnter a minimum and/or maximum ICBS index value(s) to limit the selection to structures whose ICBS index is in a given range.Homogeneity of the ICBS interfaceSelect one of the homogeneity codes from the list. Entries whose ICBS interface matches this code will be selected.ß-sheets orientationSelect one of the orientation codes from the list. Entries whose ICBS interface matches this code will be selected.Number of chainsEnter a minimum and/or maximum number of protein chains an ICBS entry must have in order to be selected.Number of residuesEnter a minimum and/or maximum number of residues an ICBS entry must have in order to be selected.Number of pairs of chains with ICBSEnter a minimum and/or maximum number of pairs of ICBS chains an entry must have in order to be selected. Note that the number of pairs is not directly displayed in the results table. The table instead shows a list of pairs.Number of hydrogen bondsEnter a minimum and/or maximum number of hydrogen bonds. Entries that have at least one pair of chains with a number of hydrogen bonds in the specified range will be selected.Display optionsQueries can return a large number of ICBS entries. Results are thus broken up in a number of display pages. Users can:
Tabulation of resultsThe results of a query are presented in a table.Some columns reproduce pieces of information extracted from PDB files, such as the deposition date of the PDB structure. Note that several ICBS entries can correspond to a same PDB code, and therefore share the same PDB information. Other columns present global information that characterizes the inter-chain ß-sheet interfaces found in the protein. Lastly, some columns present detailed information on each pair of chains that interact through ß-sheets. Navigation from page to pageThe display of query results is broken up into a number of pages.The number of results per page can be adjusted in the query form. Results pages can be accessed using the controls available in the navigation section of the page. Context specific help
Sorting by columnControls (Running a new queryTo run a new query from a results page, click on the 'Query' link at the top of the page.
For some entries, you might therefore notice some discrepancies between what the ICBS database shows and what the PDB query interface returns. For instance, many original PDB files are missing the primary citation record, or have this information placed under a wrong section of the file. Such missing citation problems have been fixed in the improved data set, but not in the PDB files themselves. Typically, the ICBS Journal column will show a 'missing' primary citation for such entries. While we tried to fix some common errors found in PDB files (e.g., handling the many variations or errors in Journal abbreviations as explained below), fixing problems such as missing primary citations records and duplicating the results will only be achieved when we will use improved primary sources . The pieces of data extracted from PDB files are still considered helpful to filter and sort the query results. Each ICBS entry has a link to the corresponding (improved) PDB entry. Users can thus verify the corresponding PDB information, e.g., check whether a primary citation is actually missing. The primary citation record, when present in PDB files, often contains errors. Instead of displaying the journal name they contain 'as is', we use the journal abbreviation used by the Journal Citation Reports.
PDB header nameThis columns displays part of the header of the PDB file. Names are usually generic. To obtain more specific information on the coumpound, please use the link to the corresponding PDB entry.Clicking on the header name brings up a RASMOL view of the ICBS entry, (a PDB structure or a quaternary structure derived from a PDB structure). For help on how to install RASMOL or modify display options, click here. ICBS ID (ICBS)This column displays the unique identifier of the entry in the ICBS database. The identifier is:
PDB ID (PDB)This column displays the PDB code corresponding to an ICBS entry.Clicking on the PDB code brings up the corresponding PDB entry in a new window. PQS PID (PQS)This column displays the PQS code of ICBS entries.Clicking on the PQS code brings up the corresponding PQS entry in a new window. JournalThis column displays the abbreviated name of the journal corresponding to the primary citation for the PDB structure of the ICBS entry.Please see the note on the information extracted from PDB files for an explanation of why there are 'missing' values and/or some discrepancies with what the web-interface to the PDB database shows. Deposition date (Dep. Date)This columns displays the deposition date of the PDB structure corresponding to the ICBS entry, in yyyy-mm-dd format.TechniqueThis columns displays the experimental technique that was used to determine the coordinates of the PDB structure corresponding to the ICBS entry.ICBS Index (Index)This columns displays the index value that characterizes the overall 'strength' of the inter-chain ß-sheet interactions in the protein structure. Index values are computed for every pair of chains where inter-chain ß-sheets occur. The maximum value over all pairs is retained to characterize the overall strength.The higher the value, the higher the importance of the inter-chain ß-sheets in the interface between chains. For details on the computation of the ICBS index, see Characterization of the strength of inter-chain ß-sheets: ICBS index . Homogeneity of the ICBS interface (Hom.)This column displays the overall 'homogeneity' of the ICBS interactions within a structure.The interface is considered:
See Homogeneity and sense of the ICBS interface for an explanation of the simple identity criterion retained here. ß-sheet orientation (Sense)This column shows the overall orientation of the ICBS interface. The following codes are used:
Total number of protein chains (Chains)This column displays the total number of protein chains in the structure.Total number of residues (Res.)This column displays the total number of residues that are found in standard protein chains. Residues belonging to "non-standard" groups (i.e., corresponding to HETATM rows in the PDB file) are not counted. Consequently, the number displayed in this column does not always correspond to the number of residues displayed in the PDB query interface and/or the corresponding PDB files and/or the corresponding DSSP files.Please see the note on the information extracted from PDB files for an explanation of other possible discrepancies between ICBS columns and what the web-interface to the PDB database displays. Pairing chains (Pairs)This column shows both the name of the protein chains that pair through inter-chain ß-sheets, and the 'homogeneity' of the pair, i.e., whether the chains are identical or not.Chains are represented by their PDB or PQS one letter code, as specified in the coordinate file. Note that when a quaternary structure corresponds to the assembly of several copies of a PDB structure, the chains are renamed in the coordinate file so that each chain is uniquely identified. The homogeneity of a pair is displayed as a 2-character code inserted between the chain identifiers. The homogeneity codes are as follows:
See Homogeneity and sense of the ICBS interface for an explanation of the simple identity criterion retained here. Use the sorting buttons ( Per pair ICBS index (Index)This columns displays the value of the ICBS index for each pair of chains that interact through ß-sheets.Per pair number of hydrogen bonds (HB)This column shows the number of hydrogen bonds in the ICBS interface, for each pair of chains. Use the sorting buttons ( Per pair number of heavy atom contacts (Cont.)This column shows the number of heavy-atom contacts in the ICBS interface, for each pair of chains. Use the sorting buttons ( Per pair ß-sheet orientation (Sense)This column shows the orientation of the ICBS interface for each pair of chains. The following codes are used:
|