Title: | Network Pre-Processing and Normalization |
---|---|
Description: | Network Pre-Processing and normalization. Methods for normalizing graphs, including Chua normalization, Laplacian normalization, Binary magnification, min-max normalization and others. Methods to sparsify adjacency matrices. Methods for graph pre-processing and for filtering edges of the graph. |
Authors: | Giorgio Valentini [aut, cre] |
Maintainer: | Giorgio Valentini <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.2 |
Built: | 2025-01-23 04:55:06 UTC |
Source: | https://github.com/cran/NetPreProc |
Network Pre-Processing and normalization.
Package: | NetPreProc |
Type: | Package |
Version: | 1.2 |
Date: | 2022-09-14 |
License: | GPL (>= 2) |
LazyLoad: | yes |
Depends: | methods |
Imports: | graph |
Suggests: | bionedata |
Network Pre-Processing and normalization. Methods for normalizing graphs, including Chua normalization, Laplacian normalization, Binary magnification, min-max normalization and oters. Methods to sparsify adjacency matrices. Methods for graph pre-processing and for filtering edges of the graph.
Giorgio Valentini – Universita' degli Studi di Milano
Maintainer: Giorgio Valentini<[email protected]>
Methods to transform a a real-valued network matrix into a binary matrix. The binary matrix is obtained by thresholding: values above the given threshold are set to 1, otherwise to 0
Binary.matrix.by.thresh(W, thresh=0.5)
Binary.matrix.by.thresh(W, thresh=0.5)
W |
an object representing the graph to be normalized |
thresh |
the threshold (def.=0.5) |
The normalized binary adjacency matrix of the network
signature(W = "graph")
an object of the virtual class graph (hence including objects of class graphAM
and graphNEL
from the package graph)
signature(W = "matrix")
a matrix representing the adjacency matrix of the graph
library(bionetdata); data(DD.chem.data); W <- Binary.matrix.by.thresh(DD.chem.data); # Using both methods with both signatures "matrix" and "graph" # reducing dimension of the graph library(graph); DD.chem.data.red <- DD.chem.data[1:100,1:100]; W.red <- Binary.matrix.by.thresh(DD.chem.data.red); g <- new("graphAM", adjMat=DD.chem.data.red, values=list(weight=DD.chem.data.red)); Wg <- Binary.matrix.by.thresh(g); any(W.red!=Wg);
library(bionetdata); data(DD.chem.data); W <- Binary.matrix.by.thresh(DD.chem.data); # Using both methods with both signatures "matrix" and "graph" # reducing dimension of the graph library(graph); DD.chem.data.red <- DD.chem.data[1:100,1:100]; W.red <- Binary.matrix.by.thresh(DD.chem.data.red); g <- new("graphAM", adjMat=DD.chem.data.red, values=list(weight=DD.chem.data.red)); Wg <- Binary.matrix.by.thresh(g); any(W.red!=Wg);
Method to check the characteristics of a graph. Check if its adjacency matrix is symmetric, if it has NA, NaN o Inf values, and some minimals statistics about nodes and edges.
check.network(W, name="Network matrix")
check.network(W, name="Network matrix")
W |
an object representing the graph to be checked |
name |
a character vector that will be printed as heading |
It return a list of strings about the characteristics of the graph
signature(W = "graph")
an object of the virtual class graph (hence including objects of class graphAM
and graphNEL
from the package graph)
signature(W = "matrix")
a matrix representing the adjacency matrix of the graph
library(bionetdata); data(DD.chem.data); check.network(DD.chem.data); W <- Prob.norm(DD.chem.data); check.network(W, "prob. transition matrix"); WL <- Laplacian.norm(DD.chem.data); check.network(WL, "Laplacian norm. matrix"); library(graph) g1 = randomEGraph(LETTERS[1:15], edges = 40); check.network(g1, "random graph");
library(bionetdata); data(DD.chem.data); check.network(DD.chem.data); W <- Prob.norm(DD.chem.data); check.network(W, "prob. transition matrix"); WL <- Laplacian.norm(DD.chem.data); check.network(WL, "Laplacian norm. matrix"); library(graph) g1 = randomEGraph(LETTERS[1:15], edges = 40); check.network(g1, "random graph");
Normalization of graphs according to Chua et al., 2007.
The normalized weigths between nodes are computed by taking into account their neighborhoods.
This normalization is meaningful in particular with interaction data.
More precisely, the normalized weigth between nodes
and
is computed by taking into account their neighborhods
and
:
where is the set of the neighbors of gene
(
is included).
Chua.norm(W)
Chua.norm(W)
W |
an object representing the graph to be normalized |
The normalized adjacency matrix of the network
signature(W = "graph")
an object of the virtual class graph (hence including objects of class graphAM
and graphNEL
from the package graph)
signature(W = "matrix")
a matrix representing the adjacency matrix of the graph
Chua, H., Sung, W., & Wong, L. An efficient strategy for extensive integration of diverse biological data for protein function prediction. Bioinformatics, 23 , 3364–3373, 2007.
library(bionetdata); data(Yeast.Biogrid.data); W <- Chua.norm(Yeast.Biogrid.data);
library(bionetdata); data(Yeast.Biogrid.data); W <- Chua.norm(Yeast.Biogrid.data);
Function to obtain the Pearson correlation matrix between rows of a given matrix.
Do.sim.matrix.Pearson(M, cut = TRUE, remove.negatives = TRUE, min.thresh = 0)
Do.sim.matrix.Pearson(M, cut = TRUE, remove.negatives = TRUE, min.thresh = 0)
M |
input matrix |
cut |
if TRUE (def.) at least one edge is maintained for each node, all the other edges are set to 0. If false no edgeis set to 0. |
remove.negatives |
if TRUE (def) negative values are replaced with 0 in the correlation matrix |
min.thresh |
minimum allowed threshold (def. 0). If a threshold lower than min.thresh is selected, thanit is substituted by min.thresh. Warning: setting min.thresh to large values may lead to highly disconneted network |
You can also "sparsify" the matrix, by putting to 0 all the weights, by setting a threshold such that at least one edge is maintained for each node. The diagonal values are set to 0.
a square symmetric matrix of the Pearson correlation coefficients computed between the rows of M
# a gaussian random matrix D <- matrix(rnorm(20000),nrow=200); W <- Do.sim.matrix.Pearson (D); # the same without default parameters W2 <- Do.sim.matrix.Pearson (D, cut=FALSE, remove.negatives=FALSE, min.thresh=-20);
# a gaussian random matrix D <- matrix(rnorm(20000),nrow=200); W <- Do.sim.matrix.Pearson (D); # the same without default parameters W2 <- Do.sim.matrix.Pearson (D, cut=FALSE, remove.negatives=FALSE, min.thresh=-20);
Methods to normalize weights of square symmetric adjacency matrices.
A network matrix is normalized by dividing each entry by the square root of the product of the sum of elements of row
and the sum of the elemnts in column
.
In other words if
is a diagonal matrix such that
, then the normalize matrix is:
Laplacian.norm(W)
Laplacian.norm(W)
W |
an object representing the graph to be normalized |
The normalized adjacency matrix of the network
signature(W = "graph")
an object of the virtual class graph (hence including objects of class graphAM
and graphNEL
from the package graph)
signature(W = "matrix")
a matrix representing the adjacency matrix of the graph
library(bionetdata); # normalization of drug-drug similarity networks data(DD.chem.data); W <- Laplacian.norm(DD.chem.data); # the same using an object of class graphAM g <- new("graphAM", adjMat=DD.chem.data, values=list(weight=DD.chem.data)); Wg <- Laplacian.norm(g);
library(bionetdata); # normalization of drug-drug similarity networks data(DD.chem.data); W <- Laplacian.norm(DD.chem.data); # the same using an object of class graphAM g <- new("graphAM", adjMat=DD.chem.data, values=list(weight=DD.chem.data)); Wg <- Laplacian.norm(g);
Normalization of binary matrices according to the procedure described in Mostafavi et al. 2008. Having a binary matrix M, for each feature, if b is the proportion of 1, then ones are replaced with -log(b) and zeros with log(1-b).
Magnify.binary.features.norm(M)
Magnify.binary.features.norm(M)
M |
an object representing the matrix to be normalized |
The normalized matrix
signature(M = "matrix")
Input binary matrix. Rows are examples, columns features
Mostafavi, S., Ray, D., Warde-Farley, D., Grouios, C., & Morris, Q. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biology, 9, 2008.
D <- matrix(ifelse(runif(40000)>0.9,1,0),nrow=100); M <- Magnify.binary.features.norm(D);
D <- matrix(ifelse(runif(40000)>0.9,1,0),nrow=100); M <- Magnify.binary.features.norm(D);
Graph normalization with respect to the minimum and maximum value of its weights.
Each entry of the normalized matrix is in the range :
Max.Min.norm(W)
Max.Min.norm(W)
W |
an object representing the graph to be normalized |
The normalized adjacency matrix of the network
signature(W = "graph")
an object of the virtual class graph (hence including objects of class graphAM
and graphNEL
from the package graph)
signature(W = "matrix")
a matrix representing the adjacency matrix of the graph
library(bionetdata); # max-min normalization for a drug-drug similarity network data(DD.chem.data); W <- Max.Min.norm(DD.chem.data); # the same using an object of class graphAM g <- new("graphAM", adjMat=DD.chem.data, values=list(weight=DD.chem.data)); Wg <- Max.Min.norm(g);
library(bionetdata); # max-min normalization for a drug-drug similarity network data(DD.chem.data); W <- Max.Min.norm(DD.chem.data); # the same using an object of class graphAM g <- new("graphAM", adjMat=DD.chem.data, values=list(weight=DD.chem.data)); Wg <- Max.Min.norm(g);
Method to compute the transition probability matrix of network.
A network matrix is normalized by dividing each entry by the the sum of elements of row
In other words if
is a diagonal matrix such that
then the normalize matrix is:
Prob.norm(W)
Prob.norm(W)
W |
an object representing the graph to be normalized |
The normalized transition probability matrix of network
signature(W = "graph")
an object of the virtual class graph (hence including objects of class graphAM
and graphNEL
from the package graph)
signature(W = "matrix")
a matrix representing the adjacency matrix of the graph
library(bionetdata); # making transition prob matrix for a drug-drug similarity network data(DD.chem.data); W <- Prob.norm(DD.chem.data); # the same using an object of class graphAM and of class graphNEL g <- new("graphAM", adjMat=DD.chem.data, values=list(weight=DD.chem.data)); Wg <- Prob.norm(g); g2 <- as(g, "graphNEL"); Wg2 <- Prob.norm(g2);
library(bionetdata); # making transition prob matrix for a drug-drug similarity network data(DD.chem.data); W <- Prob.norm(DD.chem.data); # the same using an object of class graphAM and of class graphNEL g <- new("graphAM", adjMat=DD.chem.data, values=list(weight=DD.chem.data)); Wg <- Prob.norm(g); g2 <- as(g, "graphNEL"); Wg2 <- Prob.norm(g2);
Methods to sparsify a network matrix. By this method a general threshold is et such that you a minimum of k edges is guranteed for each node
Sparsify.matrix(W, k=1)
Sparsify.matrix(W, k=1)
W |
an object representing the graph to be sparsified |
k |
the number of guaranteed edges for each node (def.=1) |
The sparsified adjacency matrix of the network
signature(W = "graph")
an object of the virtual class graph (hence including objects of class graphAM
and graphNEL
from the package graph)
signature(W = "matrix")
a matrix representing the adjacency matrix of the graph
library(bionetdata); data(FIN.data); W <- Laplacian.norm(as.matrix(FIN.data)); # sparsification by maintaining at least one neighbour per node W1 <- Sparsify.matrix(W); # sparsification by maintaining at least 20 neighbours per node (if any) W20 <- Sparsify.matrix(W, k=20);
library(bionetdata); data(FIN.data); W <- Laplacian.norm(as.matrix(FIN.data)); # sparsification by maintaining at least one neighbour per node W1 <- Sparsify.matrix(W); # sparsification by maintaining at least 20 neighbours per node (if any) W20 <- Sparsify.matrix(W, k=20);
Methods to sparsify a network matrix by fixing the number of edges for each node. It selects the first k neighbours for each node (by row) according to the weight of the edge By this function you select exactly k edges for each node (if there are at least k edges in the adjacency matrix). The resulting matrix is not symmetric.
Sparsify.matrix.fixed.neighbours(W, k=10)
Sparsify.matrix.fixed.neighbours(W, k=10)
W |
an object representing the graph to be normalized |
k |
the number of edges for each node (def.=10) |
a sparsified matrix (Warning: the matrix is not symmetric)
signature(W = "graph")
an object of the virtual class graph (hence including objects of class graphAM
and graphNEL
from the package graph)
signature(W = "matrix")
a matrix representing the adjacency matrix of the graph
library(bionetdata); data(FIN.data); W <- Laplacian.norm(as.matrix(FIN.data)); # sparsification with 10 neighbours per node W10 <- Sparsify.matrix.fixed.neighbours(W); # sparsification with 20 neighbours per node W20 <- Sparsify.matrix.fixed.neighbours(W, k=20);
library(bionetdata); data(FIN.data); W <- Laplacian.norm(as.matrix(FIN.data)); # sparsification with 10 neighbours per node W10 <- Sparsify.matrix.fixed.neighbours(W); # sparsification with 20 neighbours per node W20 <- Sparsify.matrix.fixed.neighbours(W, k=20);