Contents:
Overview Query form
Results pages




1. Overview

Back to Top

Interchain ß-sheets

Back to Top
Interactions between the edges of protein beta-sheets occur widely in the formation of protein quaternary structures, in protein-protein interactions, and in protein aggregation. In this under-appreciated mode of molecular recognition between proteins, hydrogen bonds form between the edges of protein beta-sheets, which stabilizes the partnership in conjunction with many other non-covalent forces (e.g., hydrophobic, van der Waals, salt-bridges).

Inter-chain beta-sheet contacts play in particular a structural role in protein-protein interactions that are central to healthy biological function and diseases ranging from AIDS and cancer to Alzheimer's and Huntington's diseases.

Beta-sheets that extend over more than one protein chain can be spotted through the analysis of the atom coordinates of known protein structures, or through the analysis of the corresponding secondary structure, as defined by programs such as DSSP.


Contents, primary data sources, purpose

Back to Top
The Inter-Chain Beta-Sheet (ICBS) database identifies and characterizes all inter-chain beta-sheet interactions: The data are stored in a relational database. The database can be accessed through the Web and queried through a simple form. Entries can be ranked according to the relative structural importance of their inter-chain ß-sheet interactions, or according to other criteria.

The ICBS database is intended as a tool to:

  • further the study of ß-sheet protein-protein interactions
  • identify new ICBS interactions as new structures are deposited and as old structures are revised in the Protein Data Bank
  • help select targets for drug design.


Update of the ICBS database

Back to Top
The ICBS database is updated on a weekly basis. New PDB structures are analyzed, as well as new versions of already deposited structures. The corresponding quaternary structures (PQS entries) are also analyzed. The ICBS entries are then updated, and any obsolete structure is removed from the database.

Note that there might be occasional delays between the release of a new / modified PDB structure and the release of the corresponding new / modified PQS structures. Some PQS structures might thus appear or be updated with some delay in the ICBS database with respect to the release of the corresponding PDB entries.


Limitations linked to PDB files - Foreseen evolution of the ICBS database

Back to Top
PDB files, although they still represent the official public release of the PDB data (as of July 2002), contain errors and suffer from the limitations of their fixed-field-width format. Over the past few years, PDB and the EBI have revised the data PDB files contain. As a result of their Data Uniformity Project, revised data are displayed by the web interface that PDB offers to query its database. The revised data are also available as 'mmCIF' files deposited on a 'beta' PDB site.

So far, however, the likely quaternary structures corresponding to PDB structures, which are generated by the EBI from PDB structures, are only available as PDB files. Moreover, mmCIF files do not yet represent the official release of the PDB structures.

For these reasons, the current version of the ICBS database relies on PDB files. This prompted a number of methodological choices linked to limitations of the PDB file format or to variations or errors in the PDB files.

When mmCIF files are officialy release or when an API is provided to directly query the database for PDB and for PQS entries, then, a new, improved version of the ICBS database will be generated.

For further details, please refer to:

.

Methods

Back to Top

Obtention of structure coordinates

PDB files are downloaded from the PDB ftp site .

PQS files are downloaded from the PQS web server.

We only consider the few PQS-entry types that are relevant to the ICBS database, namely:

  • 'SYMMETRY-COMPLEX'
  • 'SPLIT-SYMMETRY'
Please refer to the PQS documentation, for a description of PQS-entry types.

Determination of the secondary structure

The DSSP (Define Secondary Structure of Proteins) program is then run on every retrieved file, to produce a secondary structure files for each coordinates file.

Identification of inter-chain ß-sheets and number of hydrogen bonds

The secondary structure files are then scanned to spot ladders that join residues belonging to different chains. For each pair of chains that are found to interact through ß-sheets, we estimate the number of hydrogen bonds in the ICBS interface, as the number of beta-bridges in the ladders that contribute to the inter-chain ß-sheet interfaces.

See a scanned version of the DSSP paper (page 19 of 21) for a definition of secondary structure elements such as hydrogen bonds, bridges, ladders, sheets, etc.

Note that we take into account single beta-bridges, i.e., ladders of length 1. Single beta-bridges can strengthen the effect of regular ladders (of length > 1), and they are therefore taken into account for the computation of the ICBS index. The ICBS database thus contains some proteins for which the inter-chain interface is reduced to a single beta-bridge, and whose ICBS index is consequently very low. The query interface lets users exclude ICBS entries whose ICBS interface consists solely of single beta-bridges.

Characterization of the strength of inter-chain ß-sheets: ICBS index

In order to characterize and rank inter-chain ß-sheet interactions, we developed a simple 'ICBS' index. This index attempts to capture the relative importance of the ß-sheet interactions in the overall inter-chain interface.

We first scan coordinate files and count all heavy atom contacts between chains that pair through inter-chain ß-sheets. Two heavy atoms belonging to different chains are here considered to be in contact when they are less than 4.4 Angstroms apart.

The ICBS index is then obtained for each pair of chain by forming the ratio between the number of hydrogen bonds in the ß-sheets between the 2 chains, and the number of heavy atom contacts. For readability purposes, the result is multiplied by 1000.

The highest value over all pairs is retained for a global characterization of the ICBS interface at the level of the whole structure.

Homogeneity and sense of the ICBS interface

For each pair of chain, we determine two additional characteristics.

Homogeneity

Two chains are considered identical when they have the same number of residues.

When identical chains pair through inter-chain ß-sheets, their interface is considered homogeneous. It is considered heterogeneous when the 2 chains are different.

The overall homogeneity of the interface at the level of the whole structure is derived from the homogeneity of all pairs of chains. See the explanation of the Homogeneity column of the results table.

Orientation

Depending on the parallel or anti-parallel nature of the single or multiple ladders found in the inter-chain ß-sheets, the interface is considered parallel, anti-parallel, or mixed.
See the explanation of the Orientation column of the results table.
See a scanned version of the DSSP paper (page 19 of 21) for a definition of parallel and antiparallel ladders and sheets.

Redundancy of the ICBS interface

Some ICBS entries correspond to likely protein quaternary structures, i.e., to PQS entries. They can contain the same ICBS interfaces as the PDB structure they are derived from, or they can contain new, unique ICBS interfaces. The query form lets users choose whether to display unique and/or redundant entries.

A simple but effective redundancy criterion is used. An ICBS entry corresponding to a PQS structure is considered redundant if one of the following conditions is met:

  • it represents only a part of (i.e., has less chains than) the PDB structure
  • the ratio between the number of chains and the number of pairs of chains is the same in the PQS and in the PDB structures.

Note on some methodological choices that were prompted by limitations in the format of the source data, or by errors and variations in the data.

Automatically spotting and analyzing inter-chain beta-bridges, ladders and beta-sheets in PDB files representing PDB and PQS entries is a challenging task for a number of reasons:
  • Limitations in PDB and DSSP file formats (which both use fixed-width fields) makes it difficult to uniquely identify chains, beta-bridges, ladders, beta-sheets, and to correctly process large macro-molecules.
  • Not all PDB files match the current PDB format.
  • In submitting PDB entries, authors have used and broken a variety of rules.
  • Large files and variations or errors in file format cause some errors in the output of DSSP.
We have adopted a number of methodological choices, with the purpose of limiting the impacts of such limitations, variations or errors. These choices will change when other formats become officially available for PDB structures and for the corresponding likely macro-molecular structures generated at the EBI. Please refer to Limitations linked to PDB files - Foreseen evolution of the ICBS database for details on the current choice of source data and format and future alternatives.

Unique chain IDs versus non-unique chain labels

PDB and DSSP chain labels are one-character long. Upper- and lower-case letters, digits, and non-alphanumerical characters are used, thus providing for at least 62 unique chain identifiers. However, this is not sufficient for some large macro-molecules; in addition, not all authors use non-alphanumerical labels.

Instead of rebuilding unique ICBS chain identifiers and mapping them to non-unique PDB and DSSP labels, we currently ignore chains whose label has previously been used for another chain in the same entry. This only affects the results of some very large PQS macro-molecules. As these entries consist of numerous repetitions of the same PDB asymmetric unit, and of the same interfaces between them, this solution is not likely to cause any unique ICBS interaction to be missed.

Unique beta-bridge, ladder and ß-sheet IDs versus non-unique labels

DSSP labels beta-bridges with a letter. Case is used to denote the parallel or anti-parallel nature of the bridge. Only 26 unique identifiers are thus available in each category. ß-sheets are labeled with capital letters. Therefore, one theoretically would have to rebuild the whole connectivity of ß-bridges and ladders to uniquely identify bridges, ladders and ß-sheets.

Instead of adopting this costly and error-prone solution, we created an ICBS ladder label that is very unlikely to be non-unique, by combining the DSSP ladder label, the label of the ß-sheet it belongs to, and the labels of the two pairing chains. We are thus able to count ladders and hydrogen-bonds, and to determine the interface orientation for each pair of chains.

This method might cause some overestimations in the number of ladders in the probably rare cases where a same ladder joins more than 2 chains. This would however be of no consequence as far as the data we display in the database interface are concerned.

HETATM PDB rows and ß-bridges that occur in "non-standard" groups

HETATM rows of PDB files specify "non-standard" groups that sometimes contain amino-acid residues. Such residues are sometimes found to participate in ß-bridges with other residues belonging to "standard" chains specified in ATOM rows, or to other "non-standard" groups. Some of these bridges are relevant to the characterization of inter-chain interfaces.

However, because of the numerous variations in the way HETATM rows were used and placed in PDB files, taking into account HETATM rows in the analysis of inter-chain ß-sheets has a number of undesirable effects.

For the time being, we therefore only consider beta-bridges and atom contacts between residues that belong to "standard" chains, i.e. residues whose coordinates are provided in PDB ATOM rows. As a consequence, we might under- or over-estimate the ICBS index for a few ICBS entries, and we might miss a few proteins in which ICBS interactions would occurr only between residues specified in HETATM rows. Please refer to the PDB documentation for details on ATOM and HETATM PDB records.

Case of very large entries that break DSSP format

In some rare cases, the DSSP sequential number that idenfies residues can be higher than 9999. As the 'BP1' and 'BP2' DSSP fields that specify ß-bridges partners are 4-character long, partners corresponding to a sequential number >= 10000 cause DSSP to output rows that break its own format. There is no 100%-safe way of resolving the problem.

When such cases are encountered, we simply ignore the corresponding DSSP rows. As a consequence, we might occasionally miss some inter-chain ß-sheet interactions.

Case of inconsistent DSSP partner and/or secondary structure assignment

When the output of the DSSP programs specifies that a residue R1 'pairs' with another residue R2 through a beta-bridge, one would expect that residue R2 pair with R1. However, this is not the case in some cases, presumably due to precision or secondary-structure assignment problems in the DSSP program.

Given our current method to count 'partnerships' in ladders and inter-chain interfaces (i.e., counting every partnership twice, once for R1, once for R2, and then dividing by 2), this DSSP problem might cause an underestimation of the number of Hydrogen-bonds and of the ICBS index. To limit the impact of this rare problem, we increase the number of any fractional number of partnerships to the next higher integer.



2. Query form

Back to Top
Query parameters can be specified in the ICBS query form for the following criteria...

PDB header name

Entries containing the entered word or text fragment in their PDB header name will be selected.

PDB ID

Enter or paste one or more space separated PDB codes. Spaces and new line characters will be ignored. Entries corresponding to the PDB code(s) will be selected.

Entry origin

The ICBS database contains entries corresponding to PDB structures, as well as hypothetical quaternary structures derived from them. One can select entries corresponding to one, to the other, or to both categories.

PQS entries

While some entries corresponding to PQS structures contain unique ICBS interactions, that are not present in the corresponding PDB structure, some do not and are thus redundant. The query form lets one select which categories of PQS entries should be displayed: only entries with unique ICBS interactions; only entries with ICBS interactions that are redundant (with respect to the corresponding PDB structure); both.
Please note that this selection criterion does not exclude entries based on protein sequence redundancy; it is for ICBS-interface redundancy.

Ladder length

The ICBS interface between two chains may contain single beta-bridges (ladders of length 1) only. One can display: entries for which at least one pair of chains has an ICBS interface involving ladders of length greater than 1; entries for which only single-bridge ladders occur; both categories.
Please note that the interface between two chains can comprise several single beta-bridges. Please refer to the Methods section for details on secondary structure determination.

Journal

Select one or more abbreviated Journal names. Entries whose PDB structure's primary citation was published in this (these) journal(s) will be selected.
Please see the note on the information extracted from PDB files for an explanation of why there are 'missing' values and/or some discrepancies with what the web interface to the PDB database shows.

Deposition date or revision date

Enter a lower and/or higher limit to set a selection criteria on the deposition date or the revision date of the PDB structure corresponding to the ICBS entries.

To display recent additions to the ICBS database only, one can set a lower limit on the PDB revision date. All ICBS entries (PDB and PQS structures) whose PDB revision date is higher than the limit will be displayed. Please note that the release of PQS structure corresponding to new or modified PDB entries might occasionally suffer some delay. In such cases, some PQS structures will appear / be updated with subsequent updates of the ICBS database.

Technique

Select one or more experimental techniques. ICBS entries whose PDB structure was obtained using this (these) technique(s) will be selected.

ICBS index

Enter a minimum and/or maximum ICBS index value(s) to limit the selection to structures whose ICBS index is in a given range.

Homogeneity of the ICBS interface

Select one of the homogeneity codes from the list. Entries whose ICBS interface matches this code will be selected.

ß-sheets orientation

Select one of the orientation codes from the list. Entries whose ICBS interface matches this code will be selected.

Number of chains

Enter a minimum and/or maximum number of protein chains an ICBS entry must have in order to be selected.

Number of residues

Enter a minimum and/or maximum number of residues an ICBS entry must have in order to be selected.

Number of pairs of chains with ICBS

Enter a minimum and/or maximum number of pairs of ICBS chains an entry must have in order to be selected. Note that the number of pairs is not directly displayed in the results table. The table instead shows a list of pairs.

Number of hydrogen bonds

Enter a minimum and/or maximum number of hydrogen bonds. Entries that have at least one pair of chains with a number of hydrogen bonds in the specified range will be selected.

Display options

Queries can return a large number of ICBS entries. Results are thus broken up in a number of display pages. Users can:
  • Modify the maximum number of ICBS entries that will be displayed on each display page;
  • Select a criterion to sort the set of ICBS entries matching the query, in ascending or descending order. Depending on the type of the criterion, the order will be alphabetical or numerical. Once the query is executed, the sorting criterion can be changed from any of the results pages.



3. Results pages

Back to Top

General description

Back to Top

Tabulation of results

The results of a query are presented in a table.

Some columns reproduce pieces of information extracted from PDB files, such as the deposition date of the PDB structure. Note that several ICBS entries can correspond to a same PDB code, and therefore share the same PDB information.

Other columns present global information that characterizes the inter-chain ß-sheet interfaces found in the protein.

Lastly, some columns present detailed information on each pair of chains that interact through ß-sheets.

Navigation from page to page

The display of query results is broken up into a number of pages.
The number of results per page can be adjusted in the query form.
Results pages can be accessed using the controls available in the navigation section of the page.

Context specific help

  • Popup legends can be obtained by positioning the mouse over each title cell.
  • Clicking on the title of a cell brings up a help topic for the corresponding column.

Sorting by column

Controls (sorting sorting buttons) are available in most title cells to sort results according to the current column, by ascending or descending order (alphabetical or numerical, depending on the column).

Running a new query

To run a new query from a results page, click on the 'Query' link at the top of the page.


Note on the information extracted from PDB files

Back to Top
Some pieces of data in the ICBS database are derived from PDB files. The web-interface that the PDB offers to query PDB structures does not rely on these files, but on an improved and expanded data set that is not yet publicly available as an official release.

For some entries, you might therefore notice some discrepancies between what the ICBS database shows and what the PDB query interface returns. For instance, many original PDB files are missing the primary citation record, or have this information placed under a wrong section of the file. Such missing citation problems have been fixed in the improved data set, but not in the PDB files themselves. Typically, the ICBS Journal column will show a 'missing' primary citation for such entries.

While we tried to fix some common errors found in PDB files (e.g., handling the many variations or errors in Journal abbreviations as explained below), fixing problems such as missing primary citations records and duplicating the results will only be achieved when we will use improved primary sources .

The pieces of data extracted from PDB files are still considered helpful to filter and sort the query results. Each ICBS entry has a link to the corresponding (improved) PDB entry. Users can thus verify the corresponding PDB information, e.g., check whether a primary citation is actually missing.

The primary citation record, when present in PDB files, often contains errors. Instead of displaying the journal name they contain 'as is', we use the journal abbreviation used by the Journal Citation Reports.


Column specific help

Back to Top
The content of each column of the results table is described below.

PDB header name

This columns displays part of the header of the PDB file. Names are usually generic. To obtain more specific information on the coumpound, please use the link to the corresponding PDB entry.

Clicking on the header name brings up a RASMOL view of the ICBS entry, (a PDB structure or a quaternary structure derived from a PDB structure). For help on how to install RASMOL or modify display options, click here.

ICBS ID (ICBS)

This column displays the unique identifier of the entry in the ICBS database. The identifier is:
  • a PDB code when the entry corresponds to a PDB structure
  • a PDB code, followed by an underscore and a number, when the entry corresponds to a hypothetical quaternary structure derived from a PDB structure. A '0' indicates a unique structure; a '1' indicates the first of several possible structures; etc.

PDB ID (PDB)

This column displays the PDB code corresponding to an ICBS entry.

Clicking on the PDB code brings up the corresponding PDB entry in a new window.

PQS PID (PQS)

This column displays the PQS code of ICBS entries.

Clicking on the PQS code brings up the corresponding PQS entry in a new window.

Journal

This column displays the abbreviated name of the journal corresponding to the primary citation for the PDB structure of the ICBS entry.

Please see the note on the information extracted from PDB files for an explanation of why there are 'missing' values and/or some discrepancies with what the web-interface to the PDB database shows.

Deposition date (Dep. Date)

This columns displays the deposition date of the PDB structure corresponding to the ICBS entry, in yyyy-mm-dd format.

Technique

This columns displays the experimental technique that was used to determine the coordinates of the PDB structure corresponding to the ICBS entry.

ICBS Index (Index)

This columns displays the index value that characterizes the overall 'strength' of the inter-chain ß-sheet interactions in the protein structure. Index values are computed for every pair of chains where inter-chain ß-sheets occur. The maximum value over all pairs is retained to characterize the overall strength.

The higher the value, the higher the importance of the inter-chain ß-sheets in the interface between chains.

For details on the computation of the ICBS index, see Characterization of the strength of inter-chain ß-sheets: ICBS index .

Homogeneity of the ICBS interface (Hom.)

This column displays the overall 'homogeneity' of the ICBS interactions within a structure.
The interface is considered:
  • homogeneous when all pairs of chains that interact through inter-chain ß-sheets contain identical chains
  • heterogeneous when all pairs contain chains that are different
  • mixed when both homogeneous and heterogeneous pairs are found.
The following 2-character codes are used for the homogeneity:
  • == homogeneous: only chains that are identical pair.
  • != heterogeneous: only chains that are different pair.
  • !! mixed: both identical and different chains pair within the structure.

See Homogeneity and sense of the ICBS interface for an explanation of the simple identity criterion retained here.

ß-sheet orientation (Sense)

This column shows the overall orientation of the ICBS interface. The following codes are used:
  • +     parallel: all the ladders between between all pairing chains are parallel
  • -     antiparallel: all the ladders between all pairing chains are antiparallel
  • *     mixed: there are both parallel and antiparallel ladders between chains.

Total number of protein chains (Chains)

This column displays the total number of protein chains in the structure.

Total number of residues (Res.)

This column displays the total number of residues that are found in standard protein chains. Residues belonging to "non-standard" groups (i.e., corresponding to HETATM rows in the PDB file) are not counted. Consequently, the number displayed in this column does not always correspond to the number of residues displayed in the PDB query interface and/or the corresponding PDB files and/or the corresponding DSSP files.

Please see the note on the information extracted from PDB files for an explanation of other possible discrepancies between ICBS columns and what the web-interface to the PDB database displays.

Pairing chains (Pairs)

This column shows both the name of the protein chains that pair through inter-chain ß-sheets, and the 'homogeneity' of the pair, i.e., whether the chains are identical or not.

Chains are represented by their PDB or PQS one letter code, as specified in the coordinate file. Note that when a quaternary structure corresponds to the assembly of several copies of a PDB structure, the chains are renamed in the coordinate file so that each chain is uniquely identified.

The homogeneity of a pair is displayed as a 2-character code inserted between the chain identifiers.

The homogeneity codes are as follows:

  • == homogeneous: the two chains in the pair are identical
  • != heterogeneous: the two chains in the pair are different

See Homogeneity and sense of the ICBS interface for an explanation of the simple identity criterion retained here.

Use the sorting buttons (sorting sorting) below the column title to sort the entries according to their number of pairing chains.

Per pair ICBS index (Index)

This columns displays the value of the ICBS index for each pair of chains that interact through ß-sheets.

Per pair number of hydrogen bonds (HB)

This column shows the number of hydrogen bonds in the ICBS interface, for each pair of chains.

Use the sorting buttons (sorting sorting) below the column title to sort the entries according to their maximum number of hydrogen bonds over all chain pairs.

Per pair number of heavy atom contacts (Cont.)

This column shows the number of heavy-atom contacts in the ICBS interface, for each pair of chains.

Use the sorting buttons (sorting sorting) below the column title to sort the entries according to their maximum number of heavy-atom contacts over all chain pairs.

Per pair ß-sheet orientation (Sense)

This column shows the orientation of the ICBS interface for each pair of chains. The following codes are used:
  • +     parallel: all the ladders between the 2 chains are parallel
  • -     antiparallel: all the ladders between the 2 chains are antiparallel
  • *     mixed: there are both parallel and antiparallel ladders between the 2 chains

Back to Top