Global alignment of protein interaction networks by graph matching methods

Mikhail Zaslavskiy, Francis Bach and Jean-Philippe Vert

This site presents supplementary materials for the article "Global alignment of protein interaction networks by graph matching methods". It aims to propose a detailed description of the Results section presented in the paper.

Source data

In our experiments we used the data from http://www.cellcircuits.org/Bandyopadhyay2006/:
  1. Fly protein interaction network fly.intr with 7038 proteins (vertices), 20720 interactions (edges).
  2. Yeast protein interaction network yeast.intr with 4391 proteins (vertices), 14319 interactions (edges).
  3. Fly-Yeast InParanoid clusters table.SC-DM.highconfidence, 2244 clusters covering 3837 Fly and 2837 Yeast proteins.
Fly PPI network Yeast PPI network
Fly PPI network Yeast PPI network
Up to date interaction data may be retrieved from the Database of Interacting Proteins server, InParanoid clustering software are available at http://inparanoid.cgb.ki.se/. We use the same date that was used by S.Bandyopadhyay and al. in order to be able to compare results of different algorithms.

Methods


Results.

Section 5.1 Disambiguation of functional orthologs within Inparanoid clusters

Table 1
Algorithm #cons.interactions matching matching id
Message Passing 238 mp_match.list mp_match.id
MRF 233 mrf_match.list mrf_match.id
IsoRank 228 rank_match.list rank_match.id
Gradient Ascent 238 ga_match.list ga_match.id
PATH 238 path_match.list path_match.id
To check the number of conserved interactions produced by different matchings, you should download PPI network adjacency matrices fly_inp.adj and yeast_inp.ad, a matching of interest (with ids, the last column) and use the following script (octave, matlab, scilab)
f=load('fly_inp.adj');y=load('yeast_inp.adj');                        
G=zeros(max(f(:)));H=zeros(max(y(:)));
G(sub2ind(size(G),f(:,1),f(:,2)))=f(:,3);H(sub2ind(size(H),y(:,1),y(:,2)))=y(:,3);
T=load('mp_match.id');
num_cons_int=sum(sum(G(T(:,1),T(:,1)).*H(T(:,2),T(:,2))>0.25))/2 

Section 5.2 Disambiguation of Inparanoid clusters with second-order interactions

Table 2
Algorithm #cons.interactions matching matching id
MRF 1112 mrf_match_so.list mrf_match_so.id
IsoRank 1101 rank_match_so.list rank_match_so.id
Gradient Ascent 1140 ga_match_so.list ga_match_so.id
PATH 1143 path_match_so.list path_match_so.id
To check the number of conserved interactions produced by different matchings, you can use the following script (octave, matlab, scilab)
f=load('fly_inp.adj');y=load('yeast_inp.adj');                        
G=zeros(max(f(:)));H=zeros(max(y(:)));
G(sub2ind(size(G),f(:,1),f(:,2)))=f(:,3);H(sub2ind(size(H),y(:,1),y(:,2)))=y(:,3);
T=load('path_match.id');
num_cons_int=sum(sum(G(T(:,1),T(:,1)).*H(T(:,2),T(:,2))>0))/2 

Running algorithms

Here we show how results from previous section may be reproduced

Data preprocessing

Section 5.1
InParanoid clusters table.SC-DM.highconfidence provide information on protein pairs which may be assigned as functional orthologs. As it was mentioned already by S.Bandyopadhyay and al. in ``Systematic identification ...", the majority of Inparanoid clusters are trivial, consisting of two proteins (one of Fly, and one of Yeast), where the choice of functional orthologs is clear. Information on protein-protein interactions may be useful in ambiguous clusters with more than two proteins where some protein pairs may produce a conserved interaction with other pairs of proteins. The graph on the figure below represents the system of InParanoid clusters
InParanoid clusters
InParanoid clusters
where each vertex corresponds to one cluster and there is an edge between two clusters C1 and C2 if there exists two pairs of proteins (f1,y1) in C1 and (f2,y2) in C2 such that one of the following conditions is true
  1. f1 interacts to f2, and y1 interacts to y2 in Fly and Yeast PPI networks correspondingly
  2. f1 interacts to f2, and y1 has a common neighbor with y2 in Fly and Yeast PPI networks correspondingly
  3. f1 has a common neighbor with f2, and y1 interacts to y2 in Fly and Yeast PPI networks correspondingly
Since we are interested only in clusters which are not trivial and which are not isolated (otherwise the information on PPI is useless), we keep only 121 InParanoid clusters.
Ambiguous InParanoid clusters
Ambiguous InParanoid clusters
Because we trace only ambiguous clusters, some clusters on the figure above are isolated, it means that they are connected to some trivial clusters and information on PPI may be useful for ortholog assignment in these clusters as well. We can work directly with initial PPI networks, or we can consider only proteins which are interesting for ortholog assignment in ambiguous clusters of interest. After filtering out proteins which are not important, there stay 356 Fly proteins and 256 Yeast proteins. At the end of the preprocessing step we obtain:
  1. ind_fly_sel (ids of fly selected proteins),
  2. ind_yeast_sel (idem for yeast),
  3. c_inp_sel.sim (matrix encoding Inparanoid clusters on selected proteins),
  4. fly_inp_sel.adj (ppi network of fly selected proteins),
  5. yeast_inp_sel.adj (idem for yeast).


Section 5.2
In this case we expand the definition of the conserved interaction by adding the forth condition.
  1. f1 has a common neighbor with f2, and y1 has a common neighbor with y2 in Fly and Yeast PPI networks correspondingly.
Because of the generalized definition, the graph of InParanoid clusters becomes much more complex (see figure below)
Ambiguous InParanoid clusters (second-order interactions)
Ambiguous InParanoid clusters
Now, there are 602 InParanoid clusters where information on PPI may be useful. After filtering out unimportant proteins there stay
  1. ind_fly_sel_so (ids of 984 fly selected proteins),
  2. ind_yeast_sel_so (ids of 736 yeast selected proteins),
  3. c_inp_so_sel.sim (matrix encoding Inparanoid clusters on selected proteins),
  4. fly_inp_so_sel.adj (ppi network of fly selected proteins),
  5. yeast_inp_so_sel.adj (idem for yeast).

Examples of scripts

To run the Message Passing Algorithm in matlab, you need to download (or create yourself) the following files: ind_fly_sel (Ids of Fly selected proteins), ind_yeast_sel (idem for Yeast), fly_inp_sel.adj (PPI network of Fly selected proteins), yeast_inp_sel.adj (idem for Yeast),c_inp_sel.sim (matrix encoding InParanoid clusters on selected proteins), trivial protein matchings ind_simple_match, mp_algo.m(matlab script implementing Message Passing). Then you just run the following code in matlab
%Message passing
T=mp_algo('fly_inp_sel.adj','yeast_inp_sel.adj','c_inp_sel.sim','ind_fly_sel','ind_yeast_sel','ind_simple_match');
%number of conserved interactions
f=load('fly_inp.adj');y=load('yeast_inp.adj');                        
G=zeros(max(f(:)));H=zeros(max(y(:)));
G(sub2ind(size(G),f(:,1),f(:,2)))=f(:,3);H(sub2ind(size(H),y(:,1),y(:,2)))=y(:,3);
T=load('path_match.id');
num_cons_int=sum(sum(G(T(:,1),T(:,1)).*H(T(:,2),T(:,2))>0.25))/2 
The GA,PATH and IsoRank algorithms are implemented in the GraphM package. Here you have to provide PPI network adjacency matrices and matrix C encoding InParanoid clusters. Consider, for example, the case of second-order interactions (see Section 5.2 in the article). First, again, you download (or create yourself) ind_fly_sel_so (ids of 984 fly selected proteins), ind_yeast_sel_so (ids of 736 yeast selected proteins),c_inp_so_sel.sim (matrix encoding Inparanoid clusters on selected proteins),fly_inp_so_sel.adj (ppi network of fly selected proteins), yeast_inp_so_sel.adj (idem for yeast), trivial protein matchings ind_simple_match, and configuration file config.txt. To run, for instance, the GA algorithm
./graphm config.txt "graph_1=fly_inp_so_sel.adj s;graph_2=yeast_inp_so_sel.adj s;C_matrix=c_inp_so_sel.sim s;algo=GA s;exp_out_file=ga_out.txt s;" 
Here GA may be substituted by PATH, IsoRANK or any other graph matching algorithm from the GraphM package, for more information on algorithm parameters, see documentation of GraphM. To count the number of conserved interactions, you can use graphm_nci.m (matlab script)
  
n=graphm_nci('ga_out.txt','fly_inp.adj','yeast_inp.adj','ind_fly_sel_so','ind_yeast_sel_so','ind_simple_match');