Introduction

This notebook shows how to get neighbour nodes of a given protein in pypath. It also looks for the shortest paths between two given nodes. This can be used to extract a smaller network that connects some nodes of interest.

Analysis

Preliminaries

In [1]:
# Show all the plots inside the notebook
%matplotlib inline
In [2]:
# load packages
import pypath
import igraph  # import igraph to use the plot function

import numpy as np
import pandas as pd
import seaborn as sns
In [3]:
# Load the ipython display and image module
from IPython.display import Image
from IPython.display import display
In [4]:
pa = pypath.main.PyPath()

	=== d i s c l a i m e r ===

	All data coming with this module
	either as redistributed copy or downloaded using the
	programmatic interfaces included in the present module
	are available under public domain, are free to use at
	least for academic research or education purposes.
	Please be aware of the licences of all the datasets
	you use in your analysis, and please give appropriate
	credits for the original sources when you publish your
	results. To find out more about data sources please
	look at `pypath.descriptions` and
	`pypath.data_formats.urls`.

	» New session started,
	session ID: 'b9u8a'
	logfile:'./log/b9u8a.log'.
In [5]:
pa.init_network()
	:: Loading data from cache previously downloaded from www.uniprot.org
	:: Ready. Resulted `plain text` of type file object. 
	:: Local file at `/Users/admin/Documents/ltobalina/COMBINE/Projects/PrECISE/Code/Omnipath/cache/ec920965677ac83b8805d72853c79d45-`.
	:: Loading data from cache previously downloaded from www.uniprot.org
	:: Ready. Resulted `plain text` of type file object. 
	:: Local file at `/Users/admin/Documents/ltobalina/COMBINE/Projects/PrECISE/Code/Omnipath/cache/ec920965677ac83b8805d72853c79d45-`.
	:: Loading data from cache previously downloaded from ftp.uniprot.org
	:: Ready. Resulted `gz extracted data` of type file object. 
	:: Local file at `/Users/admin/Documents/ltobalina/COMBINE/Projects/PrECISE/Code/Omnipath/cache/079410d8e3f429e7699c167b9d4ef3b7-HUMAN_9606_idmapping.dat.gz`.
	:: Processing ID conversion list: finished, 100.0%
 » NetPath
	:: Reading from cache: cache/netpath.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
 » DOMINO
	:: Reading from cache: cache/domino.edges.pickle
	:: Loading 'genesymbol' to 'uniprot' mapping table
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
 » AlzPathway
	:: Reading from cache: cache/alzpathway.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
 » CancerCellMap
	:: Reading from cache: cache/cancercellmap.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
 » ARN
	:: Reading from cache: cache/arn.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
 » DeathDomain
	:: Reading from cache: cache/deathdomain.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
 » ELM
	:: Reading from cache: cache/elm.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
 » CA1
	:: Reading from cache: cache/ca1.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
 » DEPOD
	:: Reading from cache: cache/depod.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
 » TRIP
	:: Reading from cache: cache/trip.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
 » HPRD
	:: Reading from cache: cache/hprd.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `hprd_mechanism` has multiple types of values: str, list
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
 » SPIKE
	:: Reading from cache: cache/spike.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
 » LMPID
	:: Reading from cache: cache/lmpid.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
 » dbPTM
	:: Reading from cache: cache/dbptm.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
 » SignaLink3
	:: Reading from cache: cache/signalink3.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `netbiol_is_direct` has multiple types of values: str, list
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `netbiol_effect` has multiple types of values: str, list, unicode
WARNING:pypath.logn:### WARNING ###### Edge attribute `netbiol_mechanism` has multiple types of values: str, list
 » MatrixDB
	:: Reading from cache: cache/matrixdb.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `netbiol_effect` has multiple types of values: list, unicode
 » InnateDB
	:: Reading from cache: cache/innatedb.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `netbiol_effect` has multiple types of values: list, unicode
 » MPPI
	:: Reading from cache: cache/mppi.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `netbiol_effect` has multiple types of values: list, unicode
 » NRF2ome
	:: Reading from cache: cache/nrf2ome.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `netbiol_effect` has multiple types of values: list, unicode
 » Signor
	:: Reading from cache: cache/signor.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `netbiol_effect` has multiple types of values: list, unicode
 » Macrophage
	:: Reading from cache: cache/macrophage.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `netbiol_effect` has multiple types of values: list, unicode
 » PDZBase
	:: Reading from cache: cache/pdzbase.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `netbiol_effect` has multiple types of values: list, unicode
 » PhosphoSite
	:: Reading from cache: cache/phosphosite.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `netbiol_effect` has multiple types of values: list, unicode
 » BioGRID
	:: Reading from cache: cache/biogrid.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `netbiol_effect` has multiple types of values: list, unicode
 » Guide2Pharma
	:: Reading from cache: cache/guide2pharma.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `netbiol_effect` has multiple types of values: list, unicode
 » DIP
	:: Reading from cache: cache/dip.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `netbiol_effect` has multiple types of values: list, unicode
 » phosphoELM
	:: Reading from cache: cache/phosphoelm.edges.pickle
	:: Processing nodes: finished, 100.0%
	:: Processing edges: finished, 100.0%
	:: Processing attributes: finished, 100.0%
WARNING:pypath.logn:### WARNING ###### Vertex attribute `name` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `label` has multiple types of values: str, unicode
WARNING:pypath.logn:### WARNING ###### Vertex attribute `exp` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `negative_refs` has only None values
WARNING:pypath.logn:### WARNING ###### Edge attribute `netbiol_effect` has multiple types of values: list, unicode
 :: Comparing with reference lists... done.

 » 29949 interactions between 7476 nodes
 from 27 resources have been loaded,
 for details see the log: ./log/b9u8a.log
In [6]:
# remove links reported in papers with more than 50 interactions (by default)
pa.remove_htp()
	:: Interactions from only high-throughput resources have been removed.
	   4009 interactions removed.
	   Number of edges decreased from 29949 to 25940, number of vertices from 7476 to 6820.

After loading OmniPath, we will prepare a list of the proteins we want to query. The following are the 5 most frequently mutated genes in prostate cancer.

In [7]:
query_nodes = set(['PTEN', 'FOXA1', 'TP53', 'SPOP', 'AR'])

Neighbourhood

In [8]:
for igene in query_nodes:
    # to query a node based on the value of an attribute we can use the igraph find() method
    #prot = pa.graph.vs.find(label=i)['name']
    # if the attribute is the vertex label (genesymbol) we can use pypath's genesymbol() function
    prot = pa.genesymbol(igene)['name']
    #neighbours_of_prot = pa.first_neighbours(prot)
    neighbours_of_prot = list(pa.gs_neighbors(igene).gs())
    print('{} ({}) has {} neighbours:'.format(igene, prot, len(neighbours_of_prot)))
    if len(neighbours_of_prot)<10:
        print(neighbours_of_prot)
    else:
        print('(showing only 10 proteins)')
        print(neighbours_of_prot[0:10])
    print('---')
FOXA1 (P55317) has 4 neighbours:
['AR', 'TLE1', 'NFIX', 'NFIB']
---
PTEN (P60484) has 50 neighbours:
(showing only 10 proteins)
['AKT1', 'RELA', 'CDC42', 'EGR1', 'PPP1CA', 'CREB1', 'RAC1', 'ROCK1', 'CTNNB1', 'AR']
---
SPOP (O43791) has 4 neighbours:
['TRAF6', 'H2AFY', 'DAXX', 'CUL3']
---
AR (P10275) has 245 neighbours:
(showing only 10 proteins)
['AKT1', 'KDM3A', 'GTF2H3', 'GTF2H2', 'B3KNJ3', 'BAG1', 'AHR', 'CDK5', 'SVIL', 'IARS']
---
TP53 (P04637) has 271 neighbours:
(showing only 10 proteins)
['HDAC2', 'MAML1', 'CDK5', 'KDM1A', 'XPO1', 'BAD', 'DDX5', 'SIRT1', 'CCNA2', 'RELA']
---
In [9]:
# modify some of the visual style settings for the igraph plotting function
visual_style = {'bbox': (300, 300),
               'margin': 50}
In [10]:
# get neighbourhood graphs for each of the query nodes
subgraph = {}
for igene in query_nodes:
    subgraph[igene] = pa.neighbourhood_network(pa.genesymbol(igene)['name'])
    igraph.plot(subgraph[igene], layout=subgraph[igene].layout_auto(), **visual_style)
In [11]:
# plot neighbourhood of SPOP
igene = 'SPOP'
print(subgraph[igene].vs['label'])
plot2 = igraph.plot(subgraph[igene], layout=subgraph[igene].layout_auto(), **visual_style)
plot2.save('neigbourhood_SPOP.png')
display(Image('neigbourhood_SPOP.png'))
['TRAF6', 'H2AFY', 'DAXX', 'SPOP', 'CUL3']
In [12]:
# for some reason, the node labels are not always correctly displayed inside IPython notebook
# however, they appear correctly if printed to a file
igene = 'FOXA1'
print(subgraph[igene].vs['label'])
plot2 = igraph.plot(subgraph[igene], layout=subgraph[igene].layout_auto(), **visual_style)
plot2.save('neigbourhood_FOXA1.png')
display(Image('neigbourhood_FOXA1.png'))
['AR', 'FOXA1', 'TLE1', 'NFIX', 'NFIB']
In [13]:
# the *.pdf file generated after executing this line contains the graph with the correct labels
#igraph.plot(subgraph[igene], 'FOXA1_neighbourhood.pdf', layout=subgraph[igene].layout_auto(), **visual_style)

Shortest path

In [14]:
# find shortest path between SPOP and FOXA1
path = pa.graph.get_shortest_paths(pa.genesymbol('SPOP')['name'], to=pa.genesymbol('FOXA1')['name'])
# the result is returned as a list with a single element
path = path[0]
In [15]:
path_SPOP_to_FOXA1_length = len(path)-1
print('The path from SPOP to FOXA1 has {} steps:'.format(path_SPOP_to_FOXA1_length))
print('\t' + ' --> '.join(pa.graph.vs[i]['label'] for i in path))
The path from SPOP to FOXA1 has 4 steps:
	SPOP --> TRAF6 --> AKT1 --> AR --> FOXA1
In [16]:
# find shortest path between FOXA1 and SPOP
path = pa.graph.get_shortest_paths(pa.genesymbol('FOXA1')['name'], to=pa.genesymbol('SPOP')['name'])
In [17]:
# alternative way of showing path members
for i in path[0]:
    print(pa.graph.vs[i]['label'])
FOXA1
AR
AKT1
TRAF6
SPOP
In [18]:
# find all paths between SPOP and FOXA1 (of length equal to the shortest path length)

# to find the index based on the value of an attribute we can use igraph's select() function
#node_start = pa.graph.vs.select(label='SPOP').indices[0]
#node_end = pa.graph.vs.select(label='FOXA1').indices[0]
# or, if the attribute is the gene symbol, we can use pypath's genesymbol() function
node_start = pa.genesymbol('SPOP').index
node_end = pa.genesymbol('FOXA1').index

paths = pa.find_all_paths(start=node_start, end=node_end, maxlen=path_SPOP_to_FOXA1_length)
print('Number of paths: {}'.format(len(paths)))
	:: Looking up all paths up to length 4: finished, 100.0%
Number of paths: 20
In [19]:
p = [item for sublist in paths for item in sublist]
p = set(p)
print('Number of nodes: {}'.format(len(p)))
Number of nodes: 25
In [20]:
# extract graph expanded by the nodes included in the path
connection_graph = pa.graph.induced_subgraph(p)
In [21]:
plot2 = igraph.plot(connection_graph, layout=connection_graph.layout_auto(), **visual_style)
plot2.save('connection_SPOP_FOXA1.png')
display(Image('connection_SPOP_FOXA1.png'))
In [22]:
igraph.plot(connection_graph.get_adjacency())
Out[22]:
In [23]:
xs, ys = zip(*[(left, count) for left, _, count in connection_graph.degree_distribution().bins()])
sns.plt.bar(xs, ys)
sns.plt.title('Degree distribution of shortest path network between SPOP and FOXA1')
Out[23]:
<matplotlib.text.Text at 0x12bb59390>
In [24]:
degree_threshold = 10
label_tmp = [node if d>degree_threshold else '\n' for node, d in zip(connection_graph.vs['label'], connection_graph.degree())]
plot2 = igraph.plot(connection_graph, layout=connection_graph.layout_auto(), vertex_label=label_tmp, vertex_size=connection_graph.degree(), vertex_color='#ff000022', **visual_style)
plot2.save('connection_SPOP_FOXA1_(v2).png')
display(Image('connection_SPOP_FOXA1_(v2).png'))
In [25]:
label_high_degree = [node for node, d in zip(connection_graph.vs['label'], connection_graph.degree()) if d>degree_threshold]
print('Nodes with degree greater than {}:'.format(degree_threshold))
print(label_high_degree)
Nodes with degree greater than 10:
['TRAF6', 'AR']
In [26]:
# find all paths between SPOP and FOXA1 (of maximum length equal to the shortest path length + 1)
new_maxlen = path_SPOP_to_FOXA1_length + 1
node_start = pa.genesymbol('SPOP').index
node_end = pa.genesymbol('FOXA1').index
paths = pa.find_all_paths(start=node_start, end=node_end, maxlen=new_maxlen)
print('Number of paths: {}'.format(len(paths)))
	:: Looking up all paths up to length 5: finished, 100.0%
Number of paths: 884
In [27]:
p = [item for sublist in paths for item in sublist]
p = set(p)
print('Number of nodes: {}'.format(len(p)))
Number of nodes: 258
In [28]:
connection_graph = pa.graph.induced_subgraph(p)
In [29]:
plot2 = igraph.plot(connection_graph, layout=connection_graph.layout_auto(), **visual_style)
plot2.save('connection_SPOP_FOXA1_l5.png')
display(Image('connection_SPOP_FOXA1_l5.png'))
In [30]:
xs, ys = zip(*[(left, count) for left, _, count in connection_graph.degree_distribution().bins()])
sns.plt.bar(xs, ys)
sns.plt.title('Degree distribution of shortest path network (min_length+1) between SPOP and FOXA1')
Out[30]:
<matplotlib.text.Text at 0x11e5ea390>
In [31]:
label_tmp = [node if d>90 else '\n' for node, d in zip(connection_graph.vs['label'], connection_graph.degree())]
plot2 = igraph.plot(connection_graph, layout=connection_graph.layout_auto(), vertex_label=label_tmp, vertex_size=connection_graph.degree(), vertex_color='#ff000022', **visual_style)
plot2.save('connection_SPOP_FOXA1_l5_(v2).png')
display(Image('connection_SPOP_FOXA1_l5_(v2).png'))
In [32]:
degree_threshold = 40
label_high_degree = [node for node, d in zip(connection_graph.vs['label'], connection_graph.degree()) if d>degree_threshold]
print('Nodes with degree greater than {}:'.format(degree_threshold))
print(label_high_degree)
Nodes with degree greater than 40:
['AKT1', 'RELA', 'CHUK', 'MAPK3', 'STAT3', 'SMAD3', 'TRAF6', 'EP300', 'CTNNB1', 'AR', 'SRC', 'TP53', 'MAPK1', 'IKBKB', 'ESR1', 'CSNK2A1']

We will now try to obtain a network composed of nodes involved in the first shortest path found between each pair of nodes.

In [33]:
node_list = set()
distance = pd.DataFrame(np.nan, index=query_nodes, columns=query_nodes)
for igene1 in query_nodes:
    for igene2 in query_nodes:
        if igene1 == igene2:
            distance.loc[igene1, igene2] = 0
        else:
            path = pa.graph.get_shortest_paths(pa.genesymbol(igene1)['name'], to=pa.genesymbol(igene2)['name'])[0]
            #node_list = node_list.union(set(path))
            node_list.update(path)
            distance.loc[igene1, igene2] = len(path)-1 if len(path)>0 else np.nan
In [34]:
interconnection_graph = pa.graph.induced_subgraph(node_list)
In [35]:
plot2 = igraph.plot(interconnection_graph, layout=interconnection_graph.layout_auto(), **visual_style)
plot2.save('connection_5nodes.png')
display(Image('connection_5nodes.png'))
In [36]:
pa.get_directed()
	:: Setting directions: finished, 100.0%
In [37]:
dnode_list = set()
ddistance = pd.DataFrame(np.nan, index=query_nodes, columns=query_nodes)
for igene1 in query_nodes:
    for igene2 in query_nodes:
        if igene1 == igene2:
            ddistance.loc[igene1, igene2] = 0
        else:
            path = pa.dgraph.get_shortest_paths(pa.dgenesymbol(igene1)['name'], to=pa.dgenesymbol(igene2)['name'])[0]
            dnode_list.update(path)
            ddistance.loc[igene1, igene2] = len(path)-1 if len(path)>0 else np.nan
/Applications/anaconda/envs/py27/lib/python2.7/site-packages/ipykernel/__main__.py:8: RuntimeWarning: Couldn't reach some vertices at structural_properties.c:740
In [38]:
interconnection_dgraph = pa.dgraph.induced_subgraph(dnode_list)
In [39]:
plot2 = igraph.plot(interconnection_dgraph, layout=interconnection_dgraph.layout_auto(), **visual_style)
plot2.save('connection_d_5nodes.png')
display(Image('connection_d_5nodes.png'))

Now we focus on extracting an undirected network containing all the nodes present in all the possible shortest paths between our genes of interest.

In [40]:
node_list_all = set()
for igene1 in query_nodes:
    for igene2 in query_nodes:
        if igene1 != igene2:
            paths = pa.find_all_paths(start=pa.genesymbol(igene1).index, end=pa.genesymbol(igene2).index, maxlen=distance.loc[igene1, igene2])
            for sublist in paths:
                node_list_all.update(sublist)
	:: Looking up all paths up to length 2: finished, 100.0%
	:: Looking up all paths up to length 4: finished, 100.0%
	:: Looking up all paths up to length 1: finished, 100.0%
	:: Looking up all paths up to length 3: finished, 100.0%
	:: Looking up all paths up to length 2: finished, 100.0%
	:: Looking up all paths up to length 3: finished, 100.0%
	:: Looking up all paths up to length 1: finished, 100.0%
	:: Looking up all paths up to length 2: finished, 100.0%
	:: Looking up all paths up to length 4: finished, 100.0%
	:: Looking up all paths up to length 3: finished, 100.0%
	:: Looking up all paths up to length 3: finished, 100.0%
	:: Looking up all paths up to length 2: finished, 100.0%
	:: Looking up all paths up to length 1: finished, 100.0%
	:: Looking up all paths up to length 1: finished, 100.0%
	:: Looking up all paths up to length 3: finished, 100.0%
	:: Looking up all paths up to length 2: finished, 100.0%
	:: Looking up all paths up to length 3: finished, 100.0%
	:: Looking up all paths up to length 2: finished, 100.0%
	:: Looking up all paths up to length 2: finished, 100.0%
	:: Looking up all paths up to length 2: finished, 100.0%
In [41]:
interconnection_all_graph = pa.graph.induced_subgraph(node_list_all)
In [42]:
plot2 = igraph.plot(interconnection_all_graph, layout=interconnection_all_graph.layout_auto(), **visual_style)
plot2.save('connection_5nodes_asp.png')
display(Image('connection_5nodes_asp.png'))

Export network

Once we have our network, we can save into a file for analyzing it in different programs or with different algorithms.

We will count the number of references for each edge and store it in a new edge attribute. Then, we can use the function write_ncol(), available to any igraph object, to print a space delimited file with the label of the source and target vertices of each edge as well as an additional numerical attribute of each edge.

In [43]:
interconnection_dgraph.es['nrefs'] = [len(edge['references']) for edge in interconnection_dgraph.es]
In [44]:
# 'dsp' here wants to represent 'from directed network using shortest path'
file_to_write = 'prior_network_dsp.ncol'
interconnection_dgraph.write_ncol(file_to_write, names='label', weights='nrefs')

We can also define our own function for increased flexibility. For example, here we define a function to write the network information in a tab delimited file. This function allows to print several attributes of the source and target vertices as well as several attributes of each edge.

In [45]:
def custom_write(fname, graph, names=['name'], edge_attributes=[], sep='\t'):
    """
    Write edge list to text file with attributes
    
    @param fname: the name of the file or a stream to read from.
    @param graph: the igraph object containing the network
    @param names: list with the vertex attribute names to be printed for source and target vertices
    @param edge_attributes: list with the edge attribute names to be printed
    @param sep: string used to separate columns
    """
    # check that input 'names' and 'edge_attributes' exist
    names = [iname for iname in names if iname in graph.vs.attribute_names()]
    edge_attributes = [eattr for eattr in edge_attributes if eattr in graph.es.attribute_names()]
    # write file
    with open(fname, 'wt') as fid:
        # write header
        for iname in names:
            fid.write(sep.join(['{}_{}'.format(st, iname) for st in ('source', 'target')]))
            fid.write(sep)
        fid.write(sep.join(eattr for eattr in edge_attributes))
        fid.write('\n')
        # write data
        for edge in graph.es:
            for iname in names:
                fid.write(sep.join([graph.vs[v][iname] for v in edge.tuple]))
                fid.write(sep)
            fid.write(sep.join(['{}'.format(edge[eattr]) for eattr in edge_attributes]))
            fid.write('\n')
In [46]:
custom_write('prior_network_dsp.txt', interconnection_dgraph, names=['name', 'label'], edge_attributes=['nrefs'])
In [47]:
interconnection_all_graph.es['nrefs'] = [len(edge['references']) for edge in interconnection_all_graph.es]
In [48]:
# 'asp' here wants to represent 'from undirected network using all possible shortest paths'
custom_write('prior_network_asp.txt', interconnection_all_graph, names=['name', 'label'], edge_attributes=['nrefs'])
In [ ]: