Title: | Ranking of Nodes with Kernelized Score Functions |
---|---|
Description: | Implementation of Kernelized score functions and other semi-supervised learning algorithms for node label ranking to analyze biomolecular networks. RANKS can be easily applied to a large set of different relevant problems in computational biology, ranging from automatic protein function prediction, to gene disease prioritization and drug repositioning, and more in general to any bioinformatics problem that can be formalized as a node label ranking problem in a graph. The modular nature of the implementation allows to experiment with different score functions and kernels and to easily compare the results with baseline network-based methods such as label propagation and random walk algorithms, as well as to enlarge the algorithmic scheme by adding novel user-defined score functions and kernels. |
Authors: | Giorgio Valentini [aut, cre] |
Maintainer: | Giorgio Valentini <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.1 |
Built: | 2025-02-05 03:55:32 UTC |
Source: | https://github.com/cran/RANKS |
Implementation of Kernelized score functions and other semi-supervised learning algorithms for node label ranking in biomolecular networks.
RANKS can be easily applied to a large set of different relevant problems in computational biology, ranging from automatic protein function prediction, to gene disease prioritization and drug repositioning, and more in general to any bioinformatics problem that can be formalized as a node label ranking problem in a graph. The modular nature of the implementation allows to experiment with different score functions and kernels and to easily compare the results with baseline network-based methods such as label propagation and random walk algorithms, as well as to enlarge the algorithmic scheme by adding novel user-defined score functions and kernels.
Giorgio Valentini
AnacletoLab
DI, Dipartimento di Informatica
Universita' degli Studi di Milano
Maintainer: Giorgio Valentini
Giorgio Valentini, Giuliano Armano, Marco Frasca, Jianyi Lin, Marco Mesiti, and Matteo Re RANKS: a flexible tool for node label ranking and classification in biological networks Bioinformatics first published online June 2, 2016 doi:10.1093/bioinformatics/btw235
Re M, Mesiti M, Valentini G: A fast ranking algorithm for predicting gene functions in biomolecular networks. IEEE ACM Trans Comput Biol Bioinform 2012, 9(6):1812-1818.
Re M, Valentini G: Cancer module genes ranking using kernelized score functions. BMC Bioinformatics 2012, 13(S14):S3.
Re M, Valentini G: Network-based drug ranking and repositioning with respect to DrugBank therapeutic categories. IEEE/ACM Trans Comput Biol Bioinform 2013, 10(6):1359-1371.
G. Valentini, A. Paccanaro, H. Caniza, A. Romero, M. Re: An extensive analysis of disease-gene associations using network integration and fast kernel-based gene prioritization methods, Artif. Intell. in Med. 61 (2) (2014) 63-78
High level function to perform experiments with GBA. It perform a k fold CV repeated 1 time on a given data set
do.GBA(fun = GBAsum, k = 5, stratified=TRUE, filter = TRUE, seed = 1, data.dir, labels.dir, output.dir, data, labels)
do.GBA(fun = GBAsum, k = 5, stratified=TRUE, filter = TRUE, seed = 1, data.dir, labels.dir, output.dir, data, labels)
fun |
function performing GBA. it can be one of the following: - GBAsum: it sums the edge weights connecting a node to its positive neighbours - GBAmax: it computes the maximum between the edge weights connecting a node to its positive neighbours |
k |
number of folds for the cross validation (def. 5) |
stratified |
boolean. If TRUE (def.) stratified CV is performed otherwise vanilla CV is done |
filter |
if TRUE (def) the adjacency matrix is sparsified otherwise not |
seed |
seed of the random generator for the generation of the folds (def: 1) |
data.dir |
relative path to directory where the adjiacency matrix is stored |
labels.dir |
relative path to directory where the label matrix is stored |
output.dir |
relative path to directory where the results are stored |
data |
name of the data set to loaded (without rda extension). It must be an .rda file containing the adjiacency matrix of the graph. It assumes that it is in the "data.dir" directory |
labels |
name of the target labels (without rda extension). It must be an .rda file containing the label matrix of the examples. Rows correspond to examples and columns to classes It assumes that it is in the "labels.dir" directory |
High level function to perform cross-validation experiments with multiple classes using GBA.
It performs a k fold CV on a given data set, and output scores, AUC and Precision at a given recall results for multiple classes.
Graph data are read from a matrix representing the adjiacency matrix of the graph stored as a .rda file. The labels are read from a matrix having examples as rows and classes as columns stored as a .rda file. If M is the label matrix, then M[i,j]=1, if example i is annotated with class j, otherwise M[i,j] = 0.
Results are included in matrices representing Scores, AUC and precision at a given recall results stored as .rda files.
3 rda files stored in the "Results" directory:
Scores results |
A matrix with examples on rows and classes on columns representing the computed scores for each example and for each considered class |
AUC results |
AUC results files computed through |
Precision at given recall results |
computed through |
The name of the Score file starts with Score, of the AUC file with AUC, and of the Precision at given recall file with PXR. Other learning parameters are appended to the name of the file. All the results .rda files are stored in the Results directory (that must exist in advance).
# Yeast prediction of 177 FunCat classes by 5-fold cross validation using STRING data # data obtained from the bionetdata package from CRAN # See the AUC and Precision/recall results in the Results directory library(bionetdata); dd=tempdir(); rr=tempdir(); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); save(Yeast.STRING.data, file=paste(dd,"/net.rda", sep="")); save(Yeast.STRING.FunCat, file=paste(dd,"/labels.rda", sep="")); do.GBA(data.dir=dd, labels.dir=dd, output.dir=rr, data="/net", labels="/labels");
# Yeast prediction of 177 FunCat classes by 5-fold cross validation using STRING data # data obtained from the bionetdata package from CRAN # See the AUC and Precision/recall results in the Results directory library(bionetdata); dd=tempdir(); rr=tempdir(); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); save(Yeast.STRING.data, file=paste(dd,"/net.rda", sep="")); save(Yeast.STRING.FunCat, file=paste(dd,"/labels.rda", sep="")); do.GBA(data.dir=dd, labels.dir=dd, output.dir=rr, data="/net", labels="/labels");
High level function to perform RANKS leave one out (loo) experiments with mutliple classes.
do.loo.RANKS(score = eav.score, compute.kernel = TRUE, kernel = rw.kernel, a = 2, k = 19, d = 2, p = 1, sparsify = FALSE, norm = FALSE, data, labels, output.name, data.dir, labels.dir, output.dir)
do.loo.RANKS(score = eav.score, compute.kernel = TRUE, kernel = rw.kernel, a = 2, k = 19, d = 2, p = 1, sparsify = FALSE, norm = FALSE, data, labels, output.name, data.dir, labels.dir, output.dir)
score |
function. It must be a kernel-based score method: - eav.score (default) - NN.score - KNN.score - WSLD.score |
compute.kernel |
logical. If TRUE (def.) a kernel matrix is computed from data according to the choice of the function kernel, otherwise the data matrix is used as it is. |
kernel |
kernel method or function (def. rw.kernel) |
a |
kernel parameter (def. 2) |
k |
number of neighbours for KNN.score. It is meaningful only for kNN (def.19) |
d |
integer. Coefficient of linear decay for the WSLD score. It is meaningful only for the WSLD score (def.2) |
p |
number of steps of the RW kernel (def. 1) |
sparsify |
boolean. If TRUE the input matrix is sparsified using Sparsify.matrix from the package NetpreProc (def: FALSE) |
norm |
logical. If TRUE for each class the score is normalized in [0,1], otherwise the raw scores are maintained (default). |
data |
name of the network data set to be loaded (without rda extension). It must be an .rda file containing the adjiacency matrix of the graph. By default it assumes that it is in the data.dir directory |
labels |
name of the target labels (without rda extension). It must be an .rda file containing the label matrix of the examples. By default it assumes that it is in the net.dir directory |
output.name |
name of the output file (without rda extension). Other informations including the learning parameters are added in the name of the file |
data.dir |
relative path to the directory where the adjiacency matrix is stored |
labels.dir |
relative path to directory where the label matrix is stored |
output.dir |
relative path to directory where the results are stored. Note that data and labels must have the same number of rows and in the same order. Moreover if any label column corresponds to any GO root term, this is eliminated to avoid prediction of GO root nodes. |
High level function to perform loo experiments with multiple classes using RANKS.
It performs a loo on a given data set, and scores, AUC and Precision at a given recall results for multiple classes are generated.
Graph data are read from a matrix representing the adjiacency matrix of the graph stored as a .rda file. The labels are read from a matrix having examples as rows and classes as columns stored as a .rda file. If is the label matrix, then
, if example
is annotated with class
, otherwise
.
Results are included in matrices representing Scores, AUC and precision at a given recall results stored as .rda files.
3 rda files stored in the output.dir directory:
Scores results |
A matrix with examples on rows and classes on columns representing the computed scores for each example and for each considered class |
AUC results |
AUC results files computed through |
Precision at given recall results |
computed through |
The name of the Score file starts with Score.loo, of the AUC file with AUC.loo, and of the Precision at given recall file with PXR.loo. Other learning parameters are appended to the name of the file.
# Yeast prediction of 177 FunCat classes by leave-one-out using STRING data # data obtained from the bionetdata package from CRAN. # See the AUC and Precision/recall results in the Results directory library(bionetdata); dd=tempdir(); rr=tempdir(); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); save(Yeast.STRING.data, file=paste(dd,"/net.rda", sep="")); save(Yeast.STRING.FunCat, file=paste(dd,"/labels.rda", sep="")); do.loo.RANKS(data.dir=dd, labels.dir=dd, output.dir=rr, data="/net", labels="/labels", output.name="Yeast.loo"); # another leave-one-out prediction using KNN score and 2 steps random walk kernel do.loo.RANKS(score = KNN.score, k=3, p=2, data.dir=dd, labels.dir=dd, output.dir=rr, data="/net", labels="/labels", output.name="Yeast.loo");
# Yeast prediction of 177 FunCat classes by leave-one-out using STRING data # data obtained from the bionetdata package from CRAN. # See the AUC and Precision/recall results in the Results directory library(bionetdata); dd=tempdir(); rr=tempdir(); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); save(Yeast.STRING.data, file=paste(dd,"/net.rda", sep="")); save(Yeast.STRING.FunCat, file=paste(dd,"/labels.rda", sep="")); do.loo.RANKS(data.dir=dd, labels.dir=dd, output.dir=rr, data="/net", labels="/labels", output.name="Yeast.loo"); # another leave-one-out prediction using KNN score and 2 steps random walk kernel do.loo.RANKS(score = KNN.score, k=3, p=2, data.dir=dd, labels.dir=dd, output.dir=rr, data="/net", labels="/labels", output.name="Yeast.loo");
High level function to perform RANKS cross-validation experiments with multiple classes.
do.RANKS(score = eav.score, kernel = rw.kernel, a = 2, p = 1, sparsify = TRUE, kk = 5, rep = 1, stratified=TRUE, seed = 0, data.dir, labels.dir, output.dir, data, labels, ...)
do.RANKS(score = eav.score, kernel = rw.kernel, a = 2, p = 1, sparsify = TRUE, kk = 5, rep = 1, stratified=TRUE, seed = 0, data.dir, labels.dir, output.dir, data, labels, ...)
score |
function. It must be a kernel-based score method: - eav.score (default) - NN.score - KNN.score - WSLD.score |
kernel |
kernel metod or function (def. rw.kernel) |
a |
kernel parameter (def. 2) |
p |
number of steps of the RW kernel (def. 1) |
sparsify |
boolean. If TRUE (def) the input matrix is sparsified using Sparsify.matrix from the package NetpreProc |
kk |
number of folds of the cross validation (def: 5) |
rep |
number of repetitions of the cross validation (def: 1) |
stratified |
boolean. If TRUE (def.) stratified CV is performed otherwise vanilla CV is done |
seed |
initialization seed for the random generator to create folds (def:0) |
data.dir |
relative path to directory where the adjiacency matrix is stored |
labels.dir |
relative path to directory where the label matrix is stored |
output.dir |
relative path to directory where the results are stored |
data |
name of the data set to loaded (without rda extension). It must be an .rda file containing the adjiacency matrix of the graph. It assumes that it is in the data.dir directory |
labels |
name of the target labels (without rda extension). It must be an .rda file containing the label matrix of the examples. It assumes that it is in the labels.dir directory. Note that data and labels must have the same number of rows and in the same order |
... |
optional arguments to be passed to the function |
High level function to perform cross-validation experiments with multiple classes using RANKS.
It performs a k fold CV repeated multiple times on a given data set, and scores, AUC and Precision at a given recall results for multiple classes are generated.
Graph data are read from a matrix representing the adjiacency matrix of the graph stored as a .rda file. The labels are read from a matrix having examples as rows and classes as columns stored as a .rda file. If is the label matrix, then
, if example
is annotated with class
, otherwise
.
Results are included in matrices representing Scores, AUC and precision at a given recall results stored as .rda files.
3 rda files stored in the output.dir directory:
Scores results |
A matrix with examples on rows and classes on columns representing the computed scores for each example and for each considered class |
AUC results |
AUC results files computed through |
Precision at given recall results |
computed through |
The name of the Score file starts with Score, of the AUC file with AUC, and of the Precision at given recall file with PXR. Other learning parameters are appended to the name of the file.
multiple.ker.score.cv
, do.loo.RANKS
# Yeast prediction of 177 FunCat classes by 5-fold cross validation using STRING data # data obtained from the bionetdata package from CRAN # See the AUC and Precision/recall results in the Results directory library(bionetdata); dd=tempdir(); rr=tempdir(); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); save(Yeast.STRING.data, file=paste(dd,"/net.rda", sep="")); save(Yeast.STRING.FunCat, file=paste(dd,"/labels.rda", sep="")); do.RANKS(data.dir=dd, labels.dir=dd, output.dir=rr, data="/net", labels="/labels");
# Yeast prediction of 177 FunCat classes by 5-fold cross validation using STRING data # data obtained from the bionetdata package from CRAN # See the AUC and Precision/recall results in the Results directory library(bionetdata); dd=tempdir(); rr=tempdir(); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); save(Yeast.STRING.data, file=paste(dd,"/net.rda", sep="")); save(Yeast.STRING.FunCat, file=paste(dd,"/labels.rda", sep="")); do.RANKS(data.dir=dd, labels.dir=dd, output.dir=rr, data="/net", labels="/labels");
High level function to perform random walk cross-validation experiments with multiple classes.
do.RW(tmax = 1000, eps = 1e-10, k = 5, stratified=TRUE, filter = TRUE, seed = 1, data.dir, labels.dir, output.dir, data, labels)
do.RW(tmax = 1000, eps = 1e-10, k = 5, stratified=TRUE, filter = TRUE, seed = 1, data.dir, labels.dir, output.dir, data, labels)
tmax |
maximum number of iterations (def: 1000) |
eps |
maximum allowed difference between the computed probabilities at the steady state (def. 1e-10) |
k |
number of folds for the cross validation (def. 5) |
stratified |
boolean. If TRUE (def.) stratified CV is performed otherwise vanilla CV is done |
filter |
if TRUE (def) the adjacency matrix is sparsified otherwise not |
seed |
seed of the random generator for the generation of the folds (def: 1) |
data.dir |
relative path to directory where the adjiacency matrix is stored |
labels.dir |
relative path to directory where the label matrix is stored |
output.dir |
relative path to directory where the results are stored |
data |
name of the data set to loaded (without rda extension). It must be an .rda file containing the adjacency matrix of the graph. It assumes that it is in the data.dir directory |
labels |
name of the target labels (without rda extension). It must be an .rda file containing the label matrix of the examples. It assumes that it is in the labels.dir directory |
High level function to perform cross-validation experiments with multiple classes using RW.
It performs a k fold CV on a given data set, and output scores, AUC and Precision at a given recall results for multiple classes.
Graph data are read from a matrix representing the adjiacency matrix of the graph stored as a .rda file. The labels are read from a matrix having examples as rows and classes as columns stored as a .rda file. If is the label matrix, then
, if example
is annotated with class
, otherwise
.
Results are included in matrices representing Scores, AUC and precision at a given recall results stored as .rda files.
3 rda files stored in the Results directory:
Scores results |
A matrix with examples on rows and classes on columns representing the computed scores for each example and for each considered class |
AUC results |
AUC results files computed through |
Precision at given recall results |
computed through |
The name of the Score file starts with Score, of the AUC file with AUC, and of the Precision at given recall file with PXR. Other learning parameters are appended to the name of the file. All the results .rda files are stored in the Results directory (that must exist in advance).
# Yeast prediction of 177 FunCat classes by 5-fold cross validation # using 3 steps of Random walk and STRING data. # data obtained from the bionetdata package from CRAN # See the AUC and Precision/recall results in the Results directory library(bionetdata); dd=tempdir(); rr=tempdir(); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); save(Yeast.STRING.data, file=paste(dd,"/net.rda", sep="")); save(Yeast.STRING.FunCat, file=paste(dd,"/labels.rda", sep="")); do.RW(tmax = 3, filter = FALSE, seed = 1, data.dir=dd, labels.dir=dd, output.dir=rr, data="/net", labels="/labels");
# Yeast prediction of 177 FunCat classes by 5-fold cross validation # using 3 steps of Random walk and STRING data. # data obtained from the bionetdata package from CRAN # See the AUC and Precision/recall results in the Results directory library(bionetdata); dd=tempdir(); rr=tempdir(); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); save(Yeast.STRING.data, file=paste(dd,"/net.rda", sep="")); save(Yeast.STRING.FunCat, file=paste(dd,"/labels.rda", sep="")); do.RW(tmax = 3, filter = FALSE, seed = 1, data.dir=dd, labels.dir=dd, output.dir=rr, data="/net", labels="/labels");
High level function to perform random walk with restart cross-validation experiments with multiple classes.
do.RWR(gamma = 0.6, tmax = 1000, eps = 1e-10, k = 5, stratified=TRUE, filter = TRUE, seed = 1, data.dir, labels.dir, output.dir, data, labels)
do.RWR(gamma = 0.6, tmax = 1000, eps = 1e-10, k = 5, stratified=TRUE, filter = TRUE, seed = 1, data.dir, labels.dir, output.dir, data, labels)
gamma |
restart parameter (def: 0.6) |
tmax |
maximum number of iterations (def: 1000) |
eps |
maximum allowed difference between the computed probabilities at the steady state (def. 1e-10) |
k |
number of folds for the cross validation (def. 5) |
stratified |
boolean. If TRUE (def.) stratified CV is performed otherwise vanilla CV is done |
filter |
if TRUE (def) the adjacency matrix is sparsified otherwise not |
seed |
seed of the random generator for the generation of the folds (def: 1) |
data.dir |
relative path to directory where the adjiacency matrix is stored |
labels.dir |
relative path to directory where the label matrix is stored |
output.dir |
relative path to directory where the results are stored |
data |
name of the data set to loaded (without rda extension). It must be an .rda file containing the adjiacency matrix of the graph. It assumes that it is in the data.dir directory |
labels |
name of the target labels (without rda extension). It must be an .rda file containing the label matrix of the examples. It assumes that it is in the labels.dir directory |
High level function to perform cross-validation experiments with multiple classes using RWR.
It performs a k fold CV on a given data set, and output scores, AUC and Precision at a given recall results for multiple classes.
Graph data are read from a matrix representing the adjiacency matrix of the graph stored as a .rda file. The labels are read from a matrix having examples as rows and classes as columns stored as a .rda file. If is the label matrix, then
, if example
is annotated with class
, otherwise
.
Results are included in matrices representing Scores, AUC and precision at a given recall results stored as .rda files.
3 rda files stored in the output.dir directory:
Scores results |
A matrix with examples on rows and classes on columns representing the computed scores for each example and for each considered class |
AUC results |
AUC results files computed through |
Precision at given recall results |
computed through |
The name of the Score file starts with Score, of the AUC file with AUC, and of the Precision at given recall file with PXR. Other learning parameters are appended to the name of the file. All the results .rda files are stored in the Results directory (that must exist in advance).
# Yeast prediction of 177 FunCat classes by 5-fold cross validation # using 3 steps of Random walk with restart and STRING data. # data obtained from the bionetdata package from CRAN # See the AUC and Precision/recall results in the Results directory library(bionetdata); dd=tempdir(); rr=tempdir(); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); save(Yeast.STRING.data, file=paste(dd,"/net.rda", sep="")); save(Yeast.STRING.FunCat, file=paste(dd,"/labels.rda", sep="")); do.RWR(tmax = 3, k = 5, filter = FALSE, seed = 1, data.dir=dd, labels.dir=dd, output.dir=rr, data="/net", labels="/labels"); # the same experiment, but the iterations are repeated till to convergence # (this can require a quite long time ...) do.RWR(tmax = 1000, k = 5, eps = 1e-5, filter = FALSE, seed = 1, data.dir=dd, labels.dir=dd, output.dir=rr, data="/net", labels="/labels");
# Yeast prediction of 177 FunCat classes by 5-fold cross validation # using 3 steps of Random walk with restart and STRING data. # data obtained from the bionetdata package from CRAN # See the AUC and Precision/recall results in the Results directory library(bionetdata); dd=tempdir(); rr=tempdir(); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); save(Yeast.STRING.data, file=paste(dd,"/net.rda", sep="")); save(Yeast.STRING.FunCat, file=paste(dd,"/labels.rda", sep="")); do.RWR(tmax = 3, k = 5, filter = FALSE, seed = 1, data.dir=dd, labels.dir=dd, output.dir=rr, data="/net", labels="/labels"); # the same experiment, but the iterations are repeated till to convergence # (this can require a quite long time ...) do.RWR(tmax = 1000, k = 5, eps = 1e-5, filter = FALSE, seed = 1, data.dir=dd, labels.dir=dd, output.dir=rr, data="/net", labels="/labels");
Function to find the optimal quantile alpha and corresponding threshold by cross-validation with a kernel-based score method.
find.optimal.thresh.cv(K, ind.pos, ind.non.pos, m = 5, alpha = seq(from = 0.05, to = 0.6, by = 0.05), init.seed = NULL, opt.fun = compute.F, fun = KNN.score, ...)
find.optimal.thresh.cv(K, ind.pos, ind.non.pos, m = 5, alpha = seq(from = 0.05, to = 0.6, by = 0.05), init.seed = NULL, opt.fun = compute.F, fun = KNN.score, ...)
K |
matrix. Kernel matrix or any valid symmetric matrix |
ind.pos |
indices of the positive examples. They are the indices the row of RW corresponding to positive examples of the training set. |
ind.non.pos |
indices of the non positive examples. They are the indices the row of RW corresponding to non positive examples of the training set. |
m |
number of folds (default: 5) |
alpha |
vector of the quantiles to be tested |
init.seed |
initial seed for the random generator. If NULL (def) no initialization is performed |
opt.fun |
Function implementing the metric to select the optimal threshold. The F-score (compute.F) is the default. Available functions: - compute.F: F-score (default) - compute.acc:accuracy. Any function having two arguments representing the vector of predicted and true labels can be in principle used. |
fun |
function. It must be a kernel-based score method (default KNN.score) |
... |
optional arguments for the function fun |
Function to find the optimal quantile alpha and corresponding threshold by cross-validation with a kernel-based
score method. The optimality is computed with respect to a specific metric (def: F-score).
This function is used by multiple.ker.score.thresh.cv
, ker.score.classifier.holdout
, ker.score.classifier.cv
.
A list with 3 elements:
alpha |
quantile corresponding to the best F-score |
thresh |
threshold corresponding to the best F-score |
pos.scores |
scores of the positive elements computed through CV |
multiple.ker.score.thresh.cv
, Kernel functions
, ker.score.classifier.holdout
# Finding the optimal threshold in the Tanimoto chemical structure similarity network # between 1253 DrugBank drugs for the prediction of the DrugBank category Penicillins using # the KNN-score with the random walk kernel library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); K <- rw.kernel(DD.chem.data); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); ind.non.pos <- which(labels==0); res <- find.optimal.thresh.cv(K, ind.pos, ind.non.pos); res
# Finding the optimal threshold in the Tanimoto chemical structure similarity network # between 1253 DrugBank drugs for the prediction of the DrugBank category Penicillins using # the KNN-score with the random walk kernel library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); K <- rw.kernel(DD.chem.data); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); ind.non.pos <- which(labels==0); res <- find.optimal.thresh.cv(K, ind.pos, ind.non.pos); res
GBAmax implements a Guilt By Association (GBA) method based on the maximum of incident edge weights
GBAmax(W, ind.positives)
GBAmax(W, ind.positives)
W |
numeric matrix representing the adjacency matrix of the graph |
ind.positives |
indices of the "core" positive examples of the graph. They represent the indices of W corresponding to the positive examples. |
GBAmax implements a Guilt By Association (GBA) method for label ranking based on the maximum between the edge weights connecting a node to its positive neighbours
a list with one element:
p |
score associated to each node |
Oliver, S., Guilt-by-association goes global, Nature, 403, pp. 601-603, 2000.
# Application of GBAmax to the prediction of the DrugBank category Penicillins # using the Tanimoto chemical structure similarity network # between 1253 DrugBank drugs library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); GBAmax(DD.chem.data, ind.pos); # Application of GBAmax to the prediction of the DrugBank category "Anti_HIV_Agents" labels <- DrugBank.Cat[,"Anti_HIV_Agents"]; ind.pos <- which(labels==1); GBAmax(DD.chem.data, ind.pos);
# Application of GBAmax to the prediction of the DrugBank category Penicillins # using the Tanimoto chemical structure similarity network # between 1253 DrugBank drugs library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); GBAmax(DD.chem.data, ind.pos); # Application of GBAmax to the prediction of the DrugBank category "Anti_HIV_Agents" labels <- DrugBank.Cat[,"Anti_HIV_Agents"]; ind.pos <- which(labels==1); GBAmax(DD.chem.data, ind.pos);
GBAsum implements a Guilt By Association (GBA) method based on the sum of incident edge weights
GBAsum(W, ind.positives)
GBAsum(W, ind.positives)
W |
numeric matrix representing the adjacency matrix of the graph |
ind.positives |
indices of the "core" positive examples of the graph. They represent the indices of W corresponding to the positive examples. |
Function that implements a Guilt By Association (GBA) method for label ranking based on the sum of edge weights connecting a node to its positive neighbours.
a list with one element:
p |
score associated to each node |
Oliver, S., Guilt-by-association goes global, Nature, 403, pp. 601-603, 2000.
# Application of GBAsum to the prediction of the DrugBank category Penicillins # using the Tanimoto chemical structure similarity network # between 1253 DrugBank drugs library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); GBAsum(DD.chem.data, ind.pos); # Application of GBAsum to the prediction of the DrugBank category "Anti_HIV_Agents" labels <- DrugBank.Cat[,"Anti_HIV_Agents"]; ind.pos <- which(labels==1); GBAsum(DD.chem.data, ind.pos);
# Application of GBAsum to the prediction of the DrugBank category Penicillins # using the Tanimoto chemical structure similarity network # between 1253 DrugBank drugs library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); GBAsum(DD.chem.data, ind.pos); # Application of GBAsum to the prediction of the DrugBank category "Anti_HIV_Agents" labels <- DrugBank.Cat[,"Anti_HIV_Agents"]; ind.pos <- which(labels==1); GBAsum(DD.chem.data, ind.pos);
Function to classify labels according to an external cross-validation procedure with a kernel-based score method.
ker.score.classifier.cv(K, ind.pos, m = 5, p = 100, alpha = seq(from = 0.05, to = 0.6, by = 0.05), init.seed = 0, opt.fun = compute.F, fun = KNN.score, ...)
ker.score.classifier.cv(K, ind.pos, m = 5, p = 100, alpha = seq(from = 0.05, to = 0.6, by = 0.05), init.seed = 0, opt.fun = compute.F, fun = KNN.score, ...)
K |
matrix. Kernel matrix or any valid symmetric matrix |
ind.pos |
indices of the positive examples. They are the row indices of RW corresponding to positive examples. |
m |
number of folds for each cross-validation |
p |
number of repeated cross-validations |
alpha |
vector of the quantiles to be tested |
init.seed |
initial seed for the random generator (def: 0) |
opt.fun |
: function. Function implementing the metric to choice the optimal threshold. The F-score (compute.F) is the default. Available functions: - compute.F: F-score (default) - compute.acc: accuracy. Any function having two arguments representing the vector of predicted and true labels can be in principle used. |
fun |
function. It must be a kernel-based score method (default KNN.score) |
... |
optional arguments for the function fun |
Function to classify labels according to an external cross-validation procedure with a kernel-based score method. The optimal threshold for a given class id found by internal cross-validation. Scores are computed by averaging across (possibly) multiple external cross-validations. The optimal quantile and corresponding threshold are selected by internal cross-validation using the F-score (default) or the accuracy as metric.
A list with 4 components:
labels |
vector of the predicted labels (1 represents positive, 0 negative) |
av.scores |
a vector with the average scores across multiple cross-validations. Elements of the vector av.scores correspond to the rows of RW |
opt.alpha |
the optimal quantile alpha |
opt.thresh |
the optimal threshold |
a vector of the predicted scores for the test set
rw.kernel-methods
, Kernel functions
, ker.score.classifier.holdout
# Nodel label classification of the DrugBank category Penicillins # on the Tanimoto chemical structure similarity network (1253 drugs) # using 5 fold cross-validation repeated 3 times # and NN-score with 1-step random walk kernel library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); K <- rw.kernel(DD.chem.data); res <- ker.score.classifier.cv(K, ind.pos, m = 5, p = 3, fun = NN.score);
# Nodel label classification of the DrugBank category Penicillins # on the Tanimoto chemical structure similarity network (1253 drugs) # using 5 fold cross-validation repeated 3 times # and NN-score with 1-step random walk kernel library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); K <- rw.kernel(DD.chem.data); res <- ker.score.classifier.cv(K, ind.pos, m = 5, p = 3, fun = NN.score);
Functions to perform an held-out procedure for a single class with a kernel-based score method
ker.score.classifier.holdout(K, ind.pos, ind.test, m = 5, p = 10, alpha = seq(from = 0.05, to = 0.6, by = 0.05), init.seed = 0, opt.fun = compute.F, fun = KNN.score, ...) ker.score.holdout (K, ind.pos, ind.test, fun=KNN.score, ...)
ker.score.classifier.holdout(K, ind.pos, ind.test, m = 5, p = 10, alpha = seq(from = 0.05, to = 0.6, by = 0.05), init.seed = 0, opt.fun = compute.F, fun = KNN.score, ...) ker.score.holdout (K, ind.pos, ind.test, fun=KNN.score, ...)
K |
matrix. Kernel matrix or any valid symmetric matrix |
ind.pos |
indices of the positive examples of the training set. They are the indices the row of RW corresponding to positive examples of the training set |
ind.test |
indices of the examples of the test set. They are the indices the row of RW corresponding to examples of the test set |
m |
number of folds for the cross-validation on the training set |
p |
number of repeated cross-validations on the training set |
alpha |
vector of the quantiles to be tested |
init.seed |
nitial seed for the random generator (def: 0) |
opt.fun |
Function implementing the metric to select the optimal threshold. The F-score (compute.F) is the default. Available functions: - compute.F: F-score (default) - compute.acc:accuracy. Any function having two arguments representing the vector of predicted and true labels can be in principle used. |
fun |
function. It must be a kernel-based score method (default KNN.score) |
... |
optional arguments for the function fun |
ker.score.classifier.holdout
is a function to classify labels according to an hold-out procedure with a kernel-based score method.
The optimal threshold for a given class is obtained by (possibly multiple) internal cross-validation on the training set.
Scores of the held-out nodes are computed. Thresholds are computed on the training set
by cross-validation and then are used to classify the held-out nodes in the test set.
The optimal quantile and corresponding threshold are selected by internal cross-validation using the F-score as metrics.
Note the test examples are given as indices of the rows of the input matrix.
ker.score.holdout
provides a ranking according to an hold-out procedure with a kernel-based score method.
ker.score.classifier.holdout
returns a list with four components:
A list with 4 components:
labels |
vector of the predicted labels for the test set(1 represent positive, 0 negative) |
av.scores |
a vector with the scores computed on the test set. Elements of the vector av.scores correspond to ind.test rows of RW |
opt.alpha |
the optimal quantile alpha |
opt.thresh |
the optimal threshold |
ker.score.holdout
returns a vector of the predicted scores for the test set
rw.kernel-methods
, Kernel functions
, ker.score.classifier.cv
# Node label classification of the DrugBank category Penicillins # on the Tanimoto chemical structure similarity network (1253 drugs) # with eav-score with 1-step random walk kernel # using held-out with 5-fold CV repeated 10 times on the training set # to set the "optimal" threshold for classifiaction library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.test <- 1:300; ind.train <- 301:length(labels); ind.pos <- which(labels==1); ind.pos <- ind.pos[ind.pos>300]; K <- rw.kernel(DD.chem.data); res <- ker.score.classifier.holdout(K, ind.pos, ind.test, m = 5, p = 10, fun = eav.score);
# Node label classification of the DrugBank category Penicillins # on the Tanimoto chemical structure similarity network (1253 drugs) # with eav-score with 1-step random walk kernel # using held-out with 5-fold CV repeated 10 times on the training set # to set the "optimal" threshold for classifiaction library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.test <- 1:300; ind.train <- 301:length(labels); ind.pos <- which(labels==1); ind.pos <- ind.pos[ind.pos>300]; K <- rw.kernel(DD.chem.data); res <- ker.score.classifier.holdout(K, ind.pos, ind.test, m = 5, p = 10, fun = eav.score);
Function to perform cross-validation for a single class with a kernel-based score method
ker.score.cv(RW, ind.pos, m = 5, init.seed = NULL, fun = KNN.score, ...)
ker.score.cv(RW, ind.pos, m = 5, init.seed = NULL, fun = KNN.score, ...)
RW |
matrix. It can be a kernel matrix or the adjacency matrix of a graph |
ind.pos |
indices of the positive examples. They are the row indices of RW corresponding to positive examples. |
m |
number of folds (def: 5) |
init.seed |
initial seed for the random generator to generate folds. If NULL (default) no initialization is performed |
fun |
function. It must be a kernel-based score method (default KNN.score) |
... |
optional arguments for the function fun |
It performs a cross-validation using RANKS to predict the cross-validated scores. The cross-validation is stratified: the folds are constructed separately for each class, to maintain an equal ratio between classes among folds.
a numeric vector with the scores computed for each example
multiple.ker.score.cv
, multiple.ker.score.thresh.cv
, rw.kernel-methods
, Kernel functions
.
# Nodel label ranking of the DrugBank category Penicillins # on the Tanimoto chemical structure similarity network (1253 drugs) # using 5 fold cross-validation # and eav-score with 1-step random walk kernel library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); K <- rw.kernel(DD.chem.data); res <- ker.score.cv(K, ind.pos, m = 5, init.seed = NULL, fun = eav.score);
# Nodel label ranking of the DrugBank category Penicillins # on the Tanimoto chemical structure similarity network (1253 drugs) # using 5 fold cross-validation # and eav-score with 1-step random walk kernel library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); K <- rw.kernel(DD.chem.data); res <- ker.score.cv(K, ind.pos, m = 5, init.seed = NULL, fun = eav.score);
Compute similarities between feature vectors according to a specific kernel function
cauchy.kernel(W, sigma = 1) laplacian.kernel(W, sigma = 1) gaussian.kernel(W, sigma = 1) inv.multiquadric.kernel(W, v = 1) identity.kernel(W, a = 1) linear.kernel(W, a = 1) poly.kernel(W, degree = 2, scale = -1, v = 0)
cauchy.kernel(W, sigma = 1) laplacian.kernel(W, sigma = 1) gaussian.kernel(W, sigma = 1) inv.multiquadric.kernel(W, v = 1) identity.kernel(W, a = 1) linear.kernel(W, a = 1) poly.kernel(W, degree = 2, scale = -1, v = 0)
W |
a numeric matrix, Rows are examples and columns are features |
sigma |
a real value representing the sigma parameter (def. 1) of the Cauchy, Gaussian and Laplacian kernel |
v |
constant factor (def. 1) of the inverse multiquadric kernel and of the polynomail kernel; for the inverse multiquadric kernel v must be larger than 0. |
a |
unused parameter, maintained for compatibility reasons . |
degree |
integer corresponding to a degree of the polynomial (def. 2) |
scale |
double: scaling factor of the polynomial kernel. If |
All the kernel matrices are computed by calling C code to speed-up the computation.
cauchy.kernel
computes the Cauchy kernel.
laplacian.kernel
computes the Lapalacian kernel.
gaussian.kernel
computes the Gaussian kernel.
inv.multiquadric.kernel
computes the inverse multiquadric kernel.
identity.kernel
computes the identity kernel. In this case the input W represents a similarity square matrix (obtained i.e. through the Pearson correlation) between examples.
linear.kernel
computes the linear kernel.
A kernel matrix representing the similarities between the examples (rows of W), according to a specific kernel function.
# computing kernels on the Tanimoto chemical structure similarity matrix library(bionetdata); data(DD.chem.data); K <- identity.kernel(DD.chem.data); K <- linear.kernel(DD.chem.data); K <- gaussian.kernel(DD.chem.data); K <- inv.multiquadric.kernel(DD.chem.data); K <- poly.kernel(DD.chem.data);
# computing kernels on the Tanimoto chemical structure similarity matrix library(bionetdata); data(DD.chem.data); K <- identity.kernel(DD.chem.data); K <- linear.kernel(DD.chem.data); K <- gaussian.kernel(DD.chem.data); K <- inv.multiquadric.kernel(DD.chem.data); K <- poly.kernel(DD.chem.data);
Function that implements the Label propagation algorithm of Zhu and Ghahramani
label.prop(W, ind.positives, tmax = 1000, eps = 1e-05, norm = TRUE)
label.prop(W, ind.positives, tmax = 1000, eps = 1e-05, norm = TRUE)
W |
a numeric matrix representing the adjacency matrix of the graph |
ind.positives |
indices of the "core" positive examples of the graph. They represent the indices of W corresponding to the positive examples |
tmax |
maximum number of iterations (def: 1000) |
eps |
numeric. Maximum allowed difference between the computed probabilities at the steady state (def. 1e-5) |
norm |
boolean. If TRUE (def) the adjacency matrix |
label.prop implements the label propagation algorithm on a given graph by performing 1 or more steps on the graph, depending on the value of the tmax parameter. It stops also if the difference of the norm of the scores between two consecutive steps is less than eps.
A list with three elements:
p |
numeric vector. Scores of each node at the steady state or after tmax iterations |
ind.positives |
indices of the "core" positive examples of the graph (it is equal to the same input parameter) |
n.iter |
number of performed steps/iterations |
Zhu, X., Ghahramani, Z., Lafferty, J.: Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. In: Proc. of the Twentieth International Conference on Machine Learning, Washington DC (2003) 912-919
# Application of label prop algorithm to the prediction of the DrugBank category Penicillins # using the Tanimoto chemical structure similarity network # between 1253 DrugBank drugs library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); label.prop(DD.chem.data, ind.pos, tmax = 10, eps = 1e-05, norm = TRUE);
# Application of label prop algorithm to the prediction of the DrugBank category Penicillins # using the Tanimoto chemical structure similarity network # between 1253 DrugBank drugs library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); label.prop(DD.chem.data, ind.pos, tmax = 10, eps = 1e-05, norm = TRUE);
Function to execute multiple cross-validation with RANKS for a single class.
multiple.ker.score.cv(RW, ind.pos, m = 5, p = 100, stratified=TRUE, init.seed = 0, fun = KNN.score, ...)
multiple.ker.score.cv(RW, ind.pos, m = 5, p = 100, stratified=TRUE, init.seed = 0, fun = KNN.score, ...)
RW |
matrix. Kernel matrix or any valid symmetric matrix |
ind.pos |
indices of the positive examples. They are the row indices of RW corresponding to positive examples. |
m |
number of folds for each cross-validation |
p |
number of repeated cross-validations |
stratified |
boolean. If TRUE (def.) stratified CV is performed otherwise vanilla CV is done |
init.seed |
initial seed for the random generator (def: 0) |
fun |
function. It must be a kernel-based score method (default KNN.score) |
... |
optional arguments for the function fun |
It performs multiple cross-validation using RANKS to predict the cross-validated scores. The cross-validation is stratified: the folds are constructed separately for each class, to maintain an equal ratio between classes among folds. It computes the scores by averaging across multiple cross validations.
A list with two components:
av.scores |
a vector with the average scores across multiple cross-validations. Elements of the vector av.scores correspond to the rows of RW |
pos.scores |
a vector with the scores of positive elements collected at each iteration |
ker.score.cv
, multiple.ker.score.thresh.cv
, rw.kernel-methods
, Kernel functions
.
# Nodel label ranking for the DrugBank category Penicillins # on the Tanimoto chemical structure similarity network (1253 drugs) # using 5 fold cross-validation repeated 10 times # and eav-score with 1-step random walk kernel library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); K <- rw.kernel(DD.chem.data); res <- multiple.ker.score.cv(K, ind.pos, m = 5, p = 10, init.seed = 0, fun = eav.score); # the same but using the NN-score res <- multiple.ker.score.cv(K, ind.pos, m = 5, p = 10, init.seed = 0, fun = NN.score);
# Nodel label ranking for the DrugBank category Penicillins # on the Tanimoto chemical structure similarity network (1253 drugs) # using 5 fold cross-validation repeated 10 times # and eav-score with 1-step random walk kernel library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); K <- rw.kernel(DD.chem.data); res <- multiple.ker.score.cv(K, ind.pos, m = 5, p = 10, init.seed = 0, fun = eav.score); # the same but using the NN-score res <- multiple.ker.score.cv(K, ind.pos, m = 5, p = 10, init.seed = 0, fun = NN.score);
Function to execute multiple cross-validation and to find the optimal threshold with RANKS for a single class.
multiple.ker.score.thresh.cv(K, ind.pos, m = 5, p = 100, alpha = seq(from = 0.05, to = 0.6, by = 0.05), init.seed = 0, fun = KNN.score, ...)
multiple.ker.score.thresh.cv(K, ind.pos, m = 5, p = 100, alpha = seq(from = 0.05, to = 0.6, by = 0.05), init.seed = 0, fun = KNN.score, ...)
K |
matrix. Kernel matrix or any valid symmetric matrix |
ind.pos |
indices of the positive examples. They are the row indices of RW corresponding to positive examples. |
m |
number of folds for each cross-validation |
p |
number of repeated cross-validations |
alpha |
vector of the quantiles to be tested |
init.seed |
initial seed for the random generator (def: 0) |
fun |
function. It must be a kernel-based score method (default KNN.score) |
... |
optional arguments for the function fun |
Function to execute multiple cross-validation with a kernel-based score method and to find the optimal threshold for a given class by internal cross-validation.
Scores are computed by averaging across multiple external cross-validations. The optimal quantile and corresponding threshold are selected by internal cross-validation using a specific metric (def: F-score).
A list with three components:
av.scores |
a vector with the average scores across multiple cross-validations. Elements of the vector av.scores correspond to the rows of RW |
opt.alpha |
the optimal quantile alpha |
opt.thresh |
the optimal threshold |
ker.score.cv
, multiple.ker.score.cv
, rw.kernel-methods
, Kernel functions
.
# Node label ranking and best threshold search for the DrugBank category Penicillins # on the Tanimoto chemical structure similarity network (1253 drugs) # using 5 fold cross-validation repeated 2 times # and eav-score with 1-step random walk kernel library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); K <- rw.kernel(DD.chem.data); res <- multiple.ker.score.thresh.cv (K, ind.pos, m = 5, p = 2, init.seed = 0, fun = KNN.score);
# Node label ranking and best threshold search for the DrugBank category Penicillins # on the Tanimoto chemical structure similarity network (1253 drugs) # using 5 fold cross-validation repeated 2 times # and eav-score with 1-step random walk kernel library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); K <- rw.kernel(DD.chem.data); res <- multiple.ker.score.thresh.cv (K, ind.pos, m = 5, p = 2, init.seed = 0, fun = KNN.score);
Function to execute multiple cross-validation with random walk based, labelprop and GBA methods
multiple.RW.cv(W, ind.pos, k = 5, p = 100, init.seed = 0, fun = RW, ...)
multiple.RW.cv(W, ind.pos, k = 5, p = 100, init.seed = 0, fun = RW, ...)
W |
a numeric matrix representing the adjacency matrix of the graph. Note that if the optional argument norm=TRUE (def.), the W matrix is normalized, otherwise it is assumed that W is just normalized |
ind.pos |
indices of the "core" positive examples of the graph. They represent the indices of W corresponding to the positive examples |
k |
number of folds (def: 5) |
p |
number of repeated cross-validations |
init.seed |
initial seed for the random generator. If 0 (default) no initialization is performed |
fun |
function. It must be one of the following functions: - RW (default) - RWR - label.prop - GBAsum - GBAmax |
... |
optional arguments for the function fun: - gamma : restart parameter (def: 0.6) (meaningful only for RWR) - tmax : maximum number of iterations (def: 1000) - eps : maximum allowed difference between the computed probabilities at the steady state (def. 1e-10) |
Function to execute multiple cross-validation with random walk based, labelprop and GBA methods for a single class. It computes the scores by averaging across multiple cross validations. It can be used with of the following methods: RW, RWR, label.prop, GBAsum, GBAmax.
a vector with the the probabilities for each example at the steady state averaged across multiple cross-validations
RW
, RWR
, label.prop
, GBAsum
, GBAmax
, RW.cv
# Nodel label ranking of the DrugBank category Penicillins # on the Tanimoto chemical structure similarity network (1253 drugs) # using 5 fold cross-validation repeated 2 times # and "vanilla" 2-step random walk library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); res <- multiple.RW.cv(DD.chem.data, ind.pos, k = 5, p = 2, init.seed = 0, fun = GBAmax) # the same but using the label.prop res <- multiple.RW.cv(DD.chem.data, ind.pos, k = 5, p = 2, init.seed = 0, fun = label.prop, tmax=2) # the same but using "vanilla" 2-step random walk res <- multiple.RW.cv(DD.chem.data, ind.pos, k = 5, p = 2, init.seed = 0, fun = RW, tmax=2)
# Nodel label ranking of the DrugBank category Penicillins # on the Tanimoto chemical structure similarity network (1253 drugs) # using 5 fold cross-validation repeated 2 times # and "vanilla" 2-step random walk library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); res <- multiple.RW.cv(DD.chem.data, ind.pos, k = 5, p = 2, init.seed = 0, fun = GBAmax) # the same but using the label.prop res <- multiple.RW.cv(DD.chem.data, ind.pos, k = 5, p = 2, init.seed = 0, fun = label.prop, tmax=2) # the same but using "vanilla" 2-step random walk res <- multiple.RW.cv(DD.chem.data, ind.pos, k = 5, p = 2, init.seed = 0, fun = RW, tmax=2)
The function performs a random Walk on a given graph.
RW(W, ind.positives, tmax = 1000, eps = 1e-10, norm = TRUE)
RW(W, ind.positives, tmax = 1000, eps = 1e-10, norm = TRUE)
W |
a numeric matrix representing the adjacency matrix of the graph |
ind.positives |
indices of the "core" positive examples of the graph. They represent the indices of W corresponding to the positive examples |
tmax |
maximum number of iterations (steps) (def: 1000) |
eps |
maximum allowed difference between the computed probabilities at the steady state (def. 1e-10) |
norm |
if TRUE (def) the adjacency matrix |
RW performs a random Walk on a given graph by performing 1 or more steps on the graph, depending on the value of the tmax parameter. It stops also if the difference of the norm of the probabilities between two consecutive steps is less than eps.
A list with three elements:
p |
numeric vector. Probability of each node at the steady state or after tmax iterations |
ind.positives |
indices of the "core" positive examples of the graph (it is equal to the same input parameter) |
n.iter |
number of performed steps/iterations |
L. Lovasz, Random Walks on Graphs: a Survey, Combinatorics, Paul Erdos is Eighty, vol. 2, pp. 146, 1993.
# Application of the random walk to the prediction of the DrugBank category Penicillins # using the Tanimoto chemical structure similarity network # between 1253 DrugBank drugs library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); # 2-step random walk res <- RW(DD.chem.data, ind.pos, tmax = 2); # 5 steps random walk res <- RW(DD.chem.data, ind.pos, tmax = 5);
# Application of the random walk to the prediction of the DrugBank category Penicillins # using the Tanimoto chemical structure similarity network # between 1253 DrugBank drugs library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); # 2-step random walk res <- RW(DD.chem.data, ind.pos, tmax = 2); # 5 steps random walk res <- RW(DD.chem.data, ind.pos, tmax = 5);
Function to execute cross-validation with random walk based, labelprop and GBA methods
RW.cv(W, ind.pos, k = 5, stratified=TRUE, init.seed = 0, fun = RW, ...)
RW.cv(W, ind.pos, k = 5, stratified=TRUE, init.seed = 0, fun = RW, ...)
W |
a numeric matrix representing the adjacency matrix of the graph. Note that if the optional argument norm=TRUE (def.), the W matrix is normalized, otherwise it is assumed that W is just normalized |
ind.pos |
indices of the "core" positive examples of the graph. They represent the indices of W corresponding to the positive examples |
k |
number of folds (def: 5) |
stratified |
boolean. If TRUE (def.) stratified CV is performed otherwise vanilla CV is done |
init.seed |
initial seed for the random generator. If 0 (default) no initialization is performed |
fun |
function. It must be one of the following functions: - RW (default) - RWR - label.prop - GBAsum - GBAmax |
... |
optional arguments for the function fun: - gamma : restart parameter (def: 0.6) (meaningful only for RWR) - tmax : maximum number of iterations (def: 1000) - eps : maximum allowed difference between the computed probabilities at the steady state (def. 1e-10) |
It performs a single cross-validation for a single class. It can be used with of the following methods: RW, RWR, label.prop, GBAsum, GBAmax.
a vector with the the probabilities for each example at the steady state
RW
, RWR
, label.prop
, GBAsum
, GBAmax
, multiple.RW.cv
# Nodel label ranking of the DrugBank category Penicillins # on the Tanimoto chemical structure similarity network (1253 drugs) # using 5 fold cross-validation and GBAsum library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); res <- RW.cv(DD.chem.data, ind.pos, k = 5, init.seed = 0, fun = GBAsum); # the same but using label.prop res <- RW.cv(DD.chem.data, ind.pos, k = 5, init.seed = 0, fun = label.prop, tmax=2); # the same but using "vanilla" 2-step random walk res <- RW.cv(DD.chem.data, ind.pos, k = 5, init.seed = 0, fun = RW, tmax=2);
# Nodel label ranking of the DrugBank category Penicillins # on the Tanimoto chemical structure similarity network (1253 drugs) # using 5 fold cross-validation and GBAsum library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); res <- RW.cv(DD.chem.data, ind.pos, k = 5, init.seed = 0, fun = GBAsum); # the same but using label.prop res <- RW.cv(DD.chem.data, ind.pos, k = 5, init.seed = 0, fun = label.prop, tmax=2); # the same but using "vanilla" 2-step random walk res <- RW.cv(DD.chem.data, ind.pos, k = 5, init.seed = 0, fun = RW, tmax=2);
Methods to compute the random walk kernel (Smola and Kondor, 2003)
## S4 method for signature 'matrix' rw.kernel(W, a = 2) ## S4 method for signature 'graph' rw.kernel(W, a = 2) ## S4 method for signature 'graph' p.step.rw.kernel(RW, p = 2) ## S4 method for signature 'matrix' p.step.rw.kernel(RW, p = 2)
## S4 method for signature 'matrix' rw.kernel(W, a = 2) ## S4 method for signature 'graph' rw.kernel(W, a = 2) ## S4 method for signature 'graph' p.step.rw.kernel(RW, p = 2) ## S4 method for signature 'matrix' p.step.rw.kernel(RW, p = 2)
W |
a square symmetric matrix with positive values or an object of the class graphAM or graphNEL of the package graph |
RW |
matrix. It must be a random walk kernel matrix |
a |
numeric. It is correlated to the probability of remaining at the same vertex. Larger a, larger the probability (def. 2) |
p |
integer. Number of steps (def: p=2) |
rw.kernel
methods computes the one step random walk kernel RW, i.e.:
where is the identity matrix,
is the weighted adjacency matrix of an undirected graph,
and
is a diagonal matrix with
p.step.rw.kernel
methods compute the p-step random walk kernel pRW, i.e.:
rw.kernel
: A numeric square matrix representing a one-step random walk kernel matrix
p.step.rw.kernel
: A numeric square matrix representing a p-step random walk kernel matrix
signature(W = "graph")
rw.kernel
computes the random walk kernel starting from a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
signature(W = "matrix")
rw.kernel
computes the random walk kernel starting from a weighted adjacency matrix representing the graph
signature(RW = "graph")
p.step.rw.kernel
computes the a p-step random walk kernel starting from a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
signature(RW = "matrix")
p.step.rw.kernel
computes the p-step random walk kernel starting from a one-step random walk kernel matrix
# Random walk kernel computation using Functional Interaction network data library(bionetdata); data(FIN.data); W <- as.matrix(FIN.data); K <- rw.kernel(W); # this a 2-step random walk kernel K2 <- p.step.rw.kernel(K, p=2);
# Random walk kernel computation using Functional Interaction network data library(bionetdata); data(FIN.data); W <- as.matrix(FIN.data); K <- rw.kernel(W); # this a 2-step random walk kernel K2 <- p.step.rw.kernel(K, p=2);
Function that performs a random Walk with restart (RWR) on a given graph
RWR(W, ind.positives, gamma = 0.6, tmax = 1000, eps = 1e-10, norm = TRUE)
RWR(W, ind.positives, gamma = 0.6, tmax = 1000, eps = 1e-10, norm = TRUE)
W |
a numeric matrix representing the adjacency matrix of the graph |
ind.positives |
indices of the "core" positive examples of the graph. They represent the indices of W corresponding to the positive examples |
gamma |
restart parameter (def: 0.6) |
tmax |
maximum number of iterations (steps) (def: 1000) |
eps |
maximum allowed difference between the computed probabilities at the steady state (def. 1e-10) |
norm |
if TRUE (def) the adjacency matrix |
RWR performs a random Walk with restart on a given graph by performing 1 or more steps on the graph, depending on the value of the tmax parameter. The restart parameter expresses the probability of "restarting" from a "core" node at each step of the random walk algorithm. It stops also if the difference of the norm of the probabilities between two consecutive steps is less than eps.
A list with three elements:
p |
numeric vector. Probability of each node at the steady state or after tmax iterations |
ind.positives |
indices of the "core" positive examples of the graph (it is equal to the same input parameter) |
n.iter |
number of performed steps/iterations |
L. Lovasz, Random Walks on Graphs: a Survey, Combinatorics, Paul Erdos is Eighty, vol. 2, pp. 146, 1993.
# Application of the random walk with restart to the prediction of the # DrugBank category Penicillins # using the Tanimoto chemical structure similarity network # between 1253 DrugBank drugs library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); # 2-step RWR res <- RWR(DD.chem.data, ind.pos, tmax = 2); # till to convergence res <- RWR(DD.chem.data, ind.pos, tmax = 5000, eps=1e-6); # 5 steps and higher gamma res <- RWR(DD.chem.data, ind.pos, tmax = 5, gamma=0.8);
# Application of the random walk with restart to the prediction of the # DrugBank category Penicillins # using the Tanimoto chemical structure similarity network # between 1253 DrugBank drugs library(bionetdata); data(DD.chem.data); data(DrugBank.Cat); labels <- DrugBank.Cat[,"Penicillins"]; ind.pos <- which(labels==1); # 2-step RWR res <- RWR(DD.chem.data, ind.pos, tmax = 2); # till to convergence res <- RWR(DD.chem.data, ind.pos, tmax = 5000, eps=1e-6); # 5 steps and higher gamma res <- RWR(DD.chem.data, ind.pos, tmax = 5, gamma=0.8);
Methods to compute score functions for multiple vertices of the graph
## S4 method for signature 'graph' NN.score(RW, x, x.pos, auto = FALSE, norm = TRUE) ## S4 method for signature 'matrix' NN.score(RW, x, x.pos, auto = FALSE, norm = TRUE) ## S4 method for signature 'graph' KNN.score(RW, x, x.pos, k = 3, auto = FALSE, norm = TRUE) ## S4 method for signature 'matrix' KNN.score(RW, x, x.pos, k = 3, auto = FALSE, norm = TRUE) ## S4 method for signature 'graph' eav.score(RW, x, x.pos, auto = FALSE, norm = TRUE) ## S4 method for signature 'matrix' eav.score(RW, x, x.pos, auto = FALSE, norm = TRUE) ## S4 method for signature 'graph' WSLD.score(RW, x, x.pos, d = 2, auto = FALSE, norm = TRUE) ## S4 method for signature 'matrix' WSLD.score(RW, x, x.pos, d = 2, auto = FALSE, norm = TRUE)
## S4 method for signature 'graph' NN.score(RW, x, x.pos, auto = FALSE, norm = TRUE) ## S4 method for signature 'matrix' NN.score(RW, x, x.pos, auto = FALSE, norm = TRUE) ## S4 method for signature 'graph' KNN.score(RW, x, x.pos, k = 3, auto = FALSE, norm = TRUE) ## S4 method for signature 'matrix' KNN.score(RW, x, x.pos, k = 3, auto = FALSE, norm = TRUE) ## S4 method for signature 'graph' eav.score(RW, x, x.pos, auto = FALSE, norm = TRUE) ## S4 method for signature 'matrix' eav.score(RW, x, x.pos, auto = FALSE, norm = TRUE) ## S4 method for signature 'graph' WSLD.score(RW, x, x.pos, d = 2, auto = FALSE, norm = TRUE) ## S4 method for signature 'matrix' WSLD.score(RW, x, x.pos, d = 2, auto = FALSE, norm = TRUE)
RW |
matrix. It must be a kernel matrix or a symmetric matrix expressing the similarity between nodes |
x |
vector of integer. Indices corresponding to the elements of the RW matrix for which the score must be computed |
x.pos |
vector of integer. Indices of the positive elements of the RW matrix |
k |
integer. Number of the k nearest neighbours to be considered |
d |
integer. Coefficient of linear decay (def. 2) |
auto |
boolean. If TRUE the components |
norm |
boolean. If TRUE (def.) the scores are normalized between 0 and 1. |
The methods compute the scores for multiple vertices according to NN, KNN, Empirical Average or WSLD score (see reference for bibliographic details). Note that the argument x indicates the set of nodes for which the score must be computed. The vector x represents the indices of the rows of the matrix RW corresponding to the vertices for which the scores must be computed. If x = 1:nrow(RW) the scores for all the vertices of the graph are computed.
NN.score
: a numeric vector with the NN scores of the vertices. The names of the vector correspond to the indices x
KNN.score
: a numeric vector with the KNN scores of the vertices. The names of the vector correspond to the indices x
eav.score
: a numeric vector with the Empirical Average score of the vertices. The names of the vector correspond to the indices x
WSLD.score
: a numeric vector with the Weighted Sum with Linear Decay score (WSLD) of the vertices. The names of the vector correspond to the indices x
signature(RW = "graph")
NN.score
computes the NN score for multiple vertices using a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
KNN.score
computes the KNN score for multiple vertices using a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
eav.score
computes the Empirical Average score for multiple verticesusing a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
WSLD.score
computes the Weighted Sum with Linear Decay score for multiple vertices using a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
signature(RW = "matrix")
NN.score
computes the NN score for multiple vertices using a kernel matrix or a symmetric matrix expressing the similarity between nodes
KNN.score
computes the KNN score for multiple vertices using a kernel matrix or a symmetric matrix expressing the similarity between nodes
eav.score
computes the Empirical Average score multiple for vertices using a kernel matrix or a symmetric matrix expressing the similarity between nodes
WSLD.score
computes the Weighted Sum with Linear Decay score for multiple vertices using a kernel matrix or a symmetric matrix expressing the similarity between nodes
Re M, Mesiti M, Valentini G: A fast ranking algorithm for predicting gene functions in biomolecular networks. IEEE ACM Trans Comput Biol Bioinform 2012, 9(6):1812-1818.
Insuk Lee, Bindu Ambaru, Pranjali Thakkar, Edward M. Marcotte, and Seung Y. Rhee. Nature Biotechnology 28, 149-156, 2010
Methods for scoring a single vertex
# Computation of scores using STRING data with respect to # the FunCat category 11.02.01 rRNA synthesis library(bionetdata); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); labels <- Yeast.STRING.FunCat[,"11.02.01"]; n <- length(labels); ind.pos <- which(labels==1); # NN-scores computed directly on the STRING matrix s <- NN.score(Yeast.STRING.data, 1:n, ind.pos); # NN-scores computed on the 1 step and 2-step random walk kernel matrix K <- rw.kernel(Yeast.STRING.data); sK <- NN.score(K, 1:n, ind.pos); K2 <- p.step.rw.kernel(K, p=2); sK2 <- NN.score(K2, 1:n, ind.pos); # WSLD-scores computed directly on the STRING matrix s <- WSLD.score(Yeast.STRING.data, 1:n, ind.pos); # WSLD-scores computed on the 1 step and 2-step random walk kernel matrix sK <- WSLD.score(K, 1:n, ind.pos); sK2 <- WSLD.score(K2, 1:n, ind.pos);
# Computation of scores using STRING data with respect to # the FunCat category 11.02.01 rRNA synthesis library(bionetdata); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); labels <- Yeast.STRING.FunCat[,"11.02.01"]; n <- length(labels); ind.pos <- which(labels==1); # NN-scores computed directly on the STRING matrix s <- NN.score(Yeast.STRING.data, 1:n, ind.pos); # NN-scores computed on the 1 step and 2-step random walk kernel matrix K <- rw.kernel(Yeast.STRING.data); sK <- NN.score(K, 1:n, ind.pos); K2 <- p.step.rw.kernel(K, p=2); sK2 <- NN.score(K2, 1:n, ind.pos); # WSLD-scores computed directly on the STRING matrix s <- WSLD.score(Yeast.STRING.data, 1:n, ind.pos); # WSLD-scores computed on the 1 step and 2-step random walk kernel matrix sK <- WSLD.score(K, 1:n, ind.pos); sK2 <- WSLD.score(K2, 1:n, ind.pos);
Methods to compute score functions applied to a single vertex of the graph
## S4 method for signature 'graph' single.NN.score(RW, x, x.pos, auto = FALSE) ## S4 method for signature 'matrix' single.NN.score(RW, x, x.pos, auto = FALSE) ## S4 method for signature 'graph' single.KNN.score(RW, x, x.pos, k = 3, auto = FALSE) ## S4 method for signature 'matrix' single.KNN.score(RW, x, x.pos, k = 3, auto = FALSE) ## S4 method for signature 'graph' single.eav.score(RW, x, x.pos, auto = FALSE) ## S4 method for signature 'matrix' single.eav.score(RW, x, x.pos, auto = FALSE) ## S4 method for signature 'graph' single.WSLD.score(RW, x, x.pos, d = 2, auto = FALSE) ## S4 method for signature 'matrix' single.WSLD.score(RW, x, x.pos, d = 2, auto = FALSE)
## S4 method for signature 'graph' single.NN.score(RW, x, x.pos, auto = FALSE) ## S4 method for signature 'matrix' single.NN.score(RW, x, x.pos, auto = FALSE) ## S4 method for signature 'graph' single.KNN.score(RW, x, x.pos, k = 3, auto = FALSE) ## S4 method for signature 'matrix' single.KNN.score(RW, x, x.pos, k = 3, auto = FALSE) ## S4 method for signature 'graph' single.eav.score(RW, x, x.pos, auto = FALSE) ## S4 method for signature 'matrix' single.eav.score(RW, x, x.pos, auto = FALSE) ## S4 method for signature 'graph' single.WSLD.score(RW, x, x.pos, d = 2, auto = FALSE) ## S4 method for signature 'matrix' single.WSLD.score(RW, x, x.pos, d = 2, auto = FALSE)
RW |
matrix. It must be a kernel matrix or a symmetric matrix expressing the similarity between nodes |
x |
integer. Index corresponding to the element of the RW matrix for which the score must be computed |
x.pos |
vector of integer. Indices of the positive elements of the RW matrix |
k |
integer. Number of the k nearest neighbours to be considered |
d |
integer. Coefficient of linear decay (def. 2) |
auto |
boolean. If TRUE the components |
single.NN.score
computes the NN score for a single vertex:
where is the set of positive vertices.
single.KNN.score
compute KNN score for a single vertex:
single.eav.score
computes the Empirical Average score for a single vertex:
single.WSLD.score
computes the WSLD score for a single vertex:
Let be the kth rank order index w.r.t.
, and
, then:
single.NN.score
: the NN score of the vertex
single.KNN.score
: the KNN score of the vertex
single.eav.score
: the Empirical Average score of the vertex
single.WSLD.score
: the Weighted Sum with Linear Decay score (WSLD) of the vertex
signature(RW = "graph")
single.NN.score
computes the NN score for a single vertex using a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
single.KNN.score
computes the KNN score for a single vertex using a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
single.eav.score
computes the Empirical Average score for a single vertex using a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
single.WSLD.score
computes the Weighted Sum with Linear Decay score for a single vertex using a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
signature(RW = "matrix")
single.NN.score
computes the NN score for a single vertex using a kernel matrix or a symmetric matrix expressing the similarity between nodes
single.KNN.score
computes the KNN score for a single vertex using a kernel matrix or a symmetric matrix expressing the similarity between nodes
single.eav.score
computes the Empirical Average score using a kernel matrix or a symmetric matrix expressing the similarity between nodes
single.WSLD.score
computes the Weighted Sum with Linear Decay score for a single vertex using a kernel matrix or a symmetric matrix expressing the similarity between nodes
Methods for scoring multiple vertices
# Computation of scores using STRING data with respect to # the FunCat category 11.02.01 rRNA synthesis library(bionetdata); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); labels <- Yeast.STRING.FunCat[,"11.02.01"]; n <- length(labels); ind.pos <- which(labels==1); # NN-score computed directly on the STRING matrix on the first yeast gene YJR121W s <- single.NN.score(Yeast.STRING.data, 1, ind.pos); # NN-score computed on the 1 step and 2-step random walk kernel matrix K <- rw.kernel(Yeast.STRING.data); sK <- single.NN.score(K, 1, ind.pos); K2 <- p.step.rw.kernel(K, p=2); sK2 <- single.NN.score(K2, 1, ind.pos); # WSLD-score computed directly on the STRING matrix on the first yeast gene YJR121W s <- single.WSLD.score(Yeast.STRING.data, 1, ind.pos); # WSLD-scores computed on the 1 step and 2-step random walk kernel matrix sK <- single.WSLD.score(K, 1, ind.pos); sK2 <- single.WSLD.score(K2, 1, ind.pos);
# Computation of scores using STRING data with respect to # the FunCat category 11.02.01 rRNA synthesis library(bionetdata); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); labels <- Yeast.STRING.FunCat[,"11.02.01"]; n <- length(labels); ind.pos <- which(labels==1); # NN-score computed directly on the STRING matrix on the first yeast gene YJR121W s <- single.NN.score(Yeast.STRING.data, 1, ind.pos); # NN-score computed on the 1 step and 2-step random walk kernel matrix K <- rw.kernel(Yeast.STRING.data); sK <- single.NN.score(K, 1, ind.pos); K2 <- p.step.rw.kernel(K, p=2); sK2 <- single.NN.score(K2, 1, ind.pos); # WSLD-score computed directly on the STRING matrix on the first yeast gene YJR121W s <- single.WSLD.score(Yeast.STRING.data, 1, ind.pos); # WSLD-scores computed on the 1 step and 2-step random walk kernel matrix sK <- single.WSLD.score(K, 1, ind.pos); sK2 <- single.WSLD.score(K2, 1, ind.pos);
Mixed utility functions to compute accuracy, norms, labels from scores and to perform stratified cross-validation.
compute.acc(pred, labels) compute.F(pred, labels) norm1(x) Unit.sphere.norm(K) do.stratified.cv.data(examples, positives, k = 5, seed = NULL) do.cv.data(examples, positives, k = 5, seed = NULL) labelsfromscores(scores, thresh) Multiple.labels.from.scores(S, thresh.vect) selection.test(pos.scores, av.scores, ind.positives, alpha = 0.05, thresh.pos = 0)
compute.acc(pred, labels) compute.F(pred, labels) norm1(x) Unit.sphere.norm(K) do.stratified.cv.data(examples, positives, k = 5, seed = NULL) do.cv.data(examples, positives, k = 5, seed = NULL) labelsfromscores(scores, thresh) Multiple.labels.from.scores(S, thresh.vect) selection.test(pos.scores, av.scores, ind.positives, alpha = 0.05, thresh.pos = 0)
pred |
vector of the predicted labels |
labels |
vector of the true labels. Note that 0 stands for negative and 1 for positive. In general the first level is negative and the second positive |
x |
numeric vector |
K |
a kernel matrix |
examples |
indices of the examples (a vector of integer) |
positives |
vector of integer. Indices of the positive examples. The indices refer to the indices of examples |
k |
number of folds (def = 5) |
seed |
seed of the random generator (def=NULL). If is set to NULL no initiazitation is performed |
scores |
numeric. Vector of scores: each element correspond to the score of an example |
thresh |
real value. Threshold for the classification |
S |
numeric matrix. Matrix of scores: rows represent examples, columns classes |
thresh.vect |
numeric vector. Vector of the thresholds for multiple classes (one threshold for each class) |
pos.scores |
vector with scores of positive examples. It is returned from multiple.ker.score.cv. |
av.scores |
a vector with the average scores computed by multiple.ker.score.cv. It may be a named vector. If not, the names attributes corresponding to the indices of the vector are added. |
ind.positives |
indices of the positive examples. They are the indices of av.scores corresponding to positive examples. |
alpha |
quantile level (def. 0.05) |
thresh.pos |
only values larger than thresh.pos are retained in pos.scores (def.: 0) |
compute.acc
computes the accuracy for a single class
compute.F
computes the F-score for a single class
norm1
computes the L1-norm of a numeric vector
Unit.sphere.norm
normalize a kernel according to the unit sphere
do.stratified.cv.data
generates data for the stratified cross-validation. In particular subdivides the indices that refer to the rows of the data matrix in different folds (separated for positive and negative examples)
do.cv.data
generates data for the vanilla not stratified cross-validation.
labelsfromscores
computes the labels of a single class from the corresponding scores
Multiple.labels.from.scores
computes the labels of multiple classes from the corresponding scores
selection.test
is a non parametric test to select the most significant unlabeled examples
compute.acc
returns the accuracy
compute.F
returns the F-score
norm1
returns the L1-norm value
Unit.sphere.norm
returns the kernel normalized according to the unit sphere
do.stratified.cv.data
returns a list with 2 two components:
fold.non.positives |
a list with k components. Each component is a vector with the indices of the non positive elements of the fold |
fold.positives |
a list with k components. Each component is a vector with the indices of the positive elements of the fold |
Indices refer to row numbers of the data matrix
do.cv.data
returns a list with 2 two components:
fold.non.positives |
a list with k components. Each component is a vector with the indices of the non positive elements of the fold |
fold.positives |
a list with k components. Each component is a vector with the indices of the positive elements of the fold |
Indices refer to row numbers of the data matrix
labelsfromscores
returns a numeric vector res with 0 or 1 values. The label res[i]=1 if scores[i]>thresh, otherwise res[i]=0
Multiple.labels.from.scores
returns a binary matrix with the labels of the predictions. Rows represent examples, columns classes. Element L[i,j] is the label of example i w.r.t. class j. L[i,j]=1 if i belongs to j, 0 otherwise.
selection.test
returns a list with 5 components:
selected |
a named vector with the components of av.scores selected by the test |
selected.labeled |
a named vector with the labeled components of av.scores selected by the test |
selected.unlabeled |
a named vector with the unlabeled components of av.scores selected by the test |
thresh |
the score threshold selected by the test |
alpha |
significance level (the same value of the input) |
# L1-norm of a vector norm1(rnorm(10)); # generation of 5 stratified folds; do.stratified.cv.data(1:100, 1:10, k = 5, seed = NULL); # generation of labels form scores. labelsfromscores(runif(20), thresh=0.3);
# L1-norm of a vector norm1(rnorm(10)); # generation of 5 stratified folds; do.stratified.cv.data(1:100, 1:10, k = 5, seed = NULL); # generation of labels form scores. labelsfromscores(runif(20), thresh=0.3);
Methods to compute weighted score functions for multiple vertices of the graph
## S4 method for signature 'graph' NN.w.score(RW, x, x.pos, w, norm = TRUE) ## S4 method for signature 'matrix' NN.w.score(RW, x, x.pos, w, norm = TRUE) ## S4 method for signature 'graph' KNN.w.score(RW, x, x.pos, w, k = 3, norm = TRUE) ## S4 method for signature 'matrix' KNN.w.score(RW, x, x.pos, w, k = 3, norm = TRUE) ## S4 method for signature 'graph' eav.w.score(RW, x, x.pos, w, auto = FALSE, norm = TRUE) ## S4 method for signature 'matrix' eav.w.score(RW, x, x.pos, w, auto = FALSE, norm = TRUE)
## S4 method for signature 'graph' NN.w.score(RW, x, x.pos, w, norm = TRUE) ## S4 method for signature 'matrix' NN.w.score(RW, x, x.pos, w, norm = TRUE) ## S4 method for signature 'graph' KNN.w.score(RW, x, x.pos, w, k = 3, norm = TRUE) ## S4 method for signature 'matrix' KNN.w.score(RW, x, x.pos, w, k = 3, norm = TRUE) ## S4 method for signature 'graph' eav.w.score(RW, x, x.pos, w, auto = FALSE, norm = TRUE) ## S4 method for signature 'matrix' eav.w.score(RW, x, x.pos, w, auto = FALSE, norm = TRUE)
RW |
matrix. It must be a kernel matrix or a symmetric matrix expressing the similarity between nodes |
x |
vector of integer. Indices corresponding to the elements of the RW matrix for which the score must be computed |
x.pos |
vector of integer. Indices of the positive elements of the RW matrix |
k |
integer. Number of the k nearest neighbours to be considered |
w |
vector of numeric. Its elements represent the initial likelihood that the nodes of the graph belong to the class under study. The elements of w correspond to the columns of RW and the length of w and the number of columns of RW must be equal. |
auto |
boolean. If TRUE the components |
norm |
boolean. If TRUE (def.) the scores are normalized between 0 and 1. |
The methods compute the weighted scores for multiple vertices according to the weighted version of NN, KNN, and Empirical Average score. Note that the argument x indicates the set of nodes for which the score must be computed. The vector x represents the indices of the rows of the matrix RW corresponding to the vertices for which the scores must be computed. If x = 1:nrow(RW) the scores for all the vertices of the graph are computed.
NN.w.score
: a numeric vector with the weighted NN scores of the vertices. The names of the vector correspond to the indices x
KNN.score
: a numeric vector with the weighted KNN scores of the vertices. The names of the vector correspond to the indices x
eav.score
: a numeric vector with the weighted Empirical Average score of the vertices. The names of the vector correspond to the indices x
signature(RW = "graph")
NN.w.score
computes the weighted NN score for multiple vertices using a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
KNN.w.score
computes the weighted KNN score for multiple vertices using a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
eav.w.score
computes the weighted Empirical Average score for multiple verticesusing a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
signature(RW = "matrix")
NN.w.score
computes the weighted NN score for multiple vertices using a kernel matrix or a symmetric matrix expressing the similarity between nodes
KNN.w.score
computes the weighted KNN score for multiple vertices using a kernel matrix or a symmetric matrix expressing the similarity between nodes
eav.w.score
computes the weighted Empirical Average score multiple for vertices using a kernel matrix or a symmetric matrix expressing the similarity between nodes
Giorgio Valentini, Giuliano Armano, Marco Frasca, Jianyi Lin, Marco Mesiti, and Matteo Re RANKS: a flexible tool for node label ranking and classification in biological networks Bioinformatics first published online June 2, 2016 doi:10.1093/bioinformatics/btw235
Re M, Mesiti M, Valentini G: A fast ranking algorithm for predicting gene functions in biomolecular networks. IEEE ACM Trans Comput Biol Bioinform 2012, 9(6):1812-1818.
Methods for scoring a single vertex - weighted version
Methods for scoring multiple vertices
# Computation of scores using STRING data with respect to # the FunCat category 11.02.01 rRNA synthesis library(bionetdata); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); labels <- Yeast.STRING.FunCat[,"11.02.01"]; n <- length(labels); ind.pos <- which(labels==1); # NN-scores computed directly on the STRING matrix s <- NN.w.score(Yeast.STRING.data, 1:n, ind.pos, w=labels); # Weighted NN-scores computed directly on the STRING matrix # using this time random weights for the value of positive nodes w <- runif(n); s <- NN.w.score(Yeast.STRING.data, 1:n, ind.pos, w=w); # Weighted NN-scores computed on the 1 step and 2-step random walk kernel matrix K <- rw.kernel(Yeast.STRING.data); sK <- NN.w.score(K, 1:n, ind.pos, w); K2 <- p.step.rw.kernel(K, p=2); sK2 <- NN.w.score(K2, 1:n, ind.pos, w);
# Computation of scores using STRING data with respect to # the FunCat category 11.02.01 rRNA synthesis library(bionetdata); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); labels <- Yeast.STRING.FunCat[,"11.02.01"]; n <- length(labels); ind.pos <- which(labels==1); # NN-scores computed directly on the STRING matrix s <- NN.w.score(Yeast.STRING.data, 1:n, ind.pos, w=labels); # Weighted NN-scores computed directly on the STRING matrix # using this time random weights for the value of positive nodes w <- runif(n); s <- NN.w.score(Yeast.STRING.data, 1:n, ind.pos, w=w); # Weighted NN-scores computed on the 1 step and 2-step random walk kernel matrix K <- rw.kernel(Yeast.STRING.data); sK <- NN.w.score(K, 1:n, ind.pos, w); K2 <- p.step.rw.kernel(K, p=2); sK2 <- NN.w.score(K2, 1:n, ind.pos, w);
Methods to compute weighted score functions applied to a single vertex of the graph
## S4 method for signature 'graph' single.NN.w.score(RW, x, x.pos, w) ## S4 method for signature 'matrix' single.NN.w.score(RW, x, x.pos, w) ## S4 method for signature 'graph' single.KNN.w.score(RW, x, x.pos, w, k = 3) ## S4 method for signature 'matrix' single.KNN.w.score(RW, x, x.pos, w, k = 3) ## S4 method for signature 'graph' single.eav.w.score(RW, x, x.pos, w, auto = FALSE) ## S4 method for signature 'matrix' single.eav.w.score(RW, x, x.pos, w, auto = FALSE)
## S4 method for signature 'graph' single.NN.w.score(RW, x, x.pos, w) ## S4 method for signature 'matrix' single.NN.w.score(RW, x, x.pos, w) ## S4 method for signature 'graph' single.KNN.w.score(RW, x, x.pos, w, k = 3) ## S4 method for signature 'matrix' single.KNN.w.score(RW, x, x.pos, w, k = 3) ## S4 method for signature 'graph' single.eav.w.score(RW, x, x.pos, w, auto = FALSE) ## S4 method for signature 'matrix' single.eav.w.score(RW, x, x.pos, w, auto = FALSE)
RW |
matrix. It must be a kernel matrix or a symmetric matrix expressing the similarity between nodes |
x |
integer. Index corresponding to the element of the RW matrix for which the score must be computed |
x.pos |
vector of integer. Indices of the positive elements of the RW matrix |
w |
vector of numeric. Its elements represent the initial likelihood that the nodes of the graph belong to the class under study. The elements of w correspond to the columns of RW and the length of w and the number of columns of RW must be equal. |
k |
integer. Number of the k nearest neighbours to be considered |
auto |
boolean. If TRUE the components |
single.NN.w.score
computes the weighted NN score for a single vertex:
where is the set of positive vertices, and
is the weight associated to the node
single.KNN.w.score
compute the weighted KNN score for a single vertex:
single.eav.score
computes the weighted Empirical Average score for a single vertex:
single.NN.w.score
: the weighted NN score of the vertex
single.KNN.w.score
: the weighted KNN score of the vertex
single.eav.w.score
: the weighted Empirical Average score of the vertex
signature(RW = "graph")
single.NN.w.score
computes the weighted NN score for a single vertex using a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
single.KNN.w.score
computes the weighted KNN score for a single vertex using a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
single.eav.w.score
computes the weighted Empirical Average score for a single vertex using a graph of class graph
(hence including objects of class graphAM and graphNEL from the package graph)
signature(RW = "matrix")
single.NN.w.score
computes the weighted NN score for a single vertex using a kernel matrix or a symmetric matrix expressing the similarity between nodes
single.KNN.w.score
computes the weighted KNN score for a single vertex using a kernel matrix or a symmetric matrix expressing the similarity between nodes
single.eav.score
computes the weighted Empirical Average score using a kernel matrix or a symmetric matrix expressing the similarity between nodes
Methods for scoring a single vertex
Methods for scoring multiple vertices - weighted version
# Computation of scores using STRING data with respect to # the FunCat category 11.02.01 rRNA synthesis library(bionetdata); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); labels <- Yeast.STRING.FunCat[,"11.02.01"]; n <- length(labels); ind.pos <- which(labels==1); # NN-score computed directly on the STRING matrix on the first yeast gene YJR121W s <- single.NN.w.score(Yeast.STRING.data, 1, ind.pos, w=labels); # NN-score weighted computed directly on the STRING matrix on the first yeast gene YJR121W, # using this time random weights for the value of positive nodes w <- runif(n); s <- single.NN.w.score(Yeast.STRING.data, 1, ind.pos, w=w); # NN-score weighted computed on the 1 step and 2-step random walk kernel matrix K <- rw.kernel(Yeast.STRING.data); sK <- single.NN.w.score(K, 1, ind.pos, w); K2 <- p.step.rw.kernel(K, p=2); sK2 <- single.NN.w.score(K2, 1, ind.pos, w);
# Computation of scores using STRING data with respect to # the FunCat category 11.02.01 rRNA synthesis library(bionetdata); data(Yeast.STRING.data); data(Yeast.STRING.FunCat); labels <- Yeast.STRING.FunCat[,"11.02.01"]; n <- length(labels); ind.pos <- which(labels==1); # NN-score computed directly on the STRING matrix on the first yeast gene YJR121W s <- single.NN.w.score(Yeast.STRING.data, 1, ind.pos, w=labels); # NN-score weighted computed directly on the STRING matrix on the first yeast gene YJR121W, # using this time random weights for the value of positive nodes w <- runif(n); s <- single.NN.w.score(Yeast.STRING.data, 1, ind.pos, w=w); # NN-score weighted computed on the 1 step and 2-step random walk kernel matrix K <- rw.kernel(Yeast.STRING.data); sK <- single.NN.w.score(K, 1, ind.pos, w); K2 <- p.step.rw.kernel(K, p=2); sK2 <- single.NN.w.score(K2, 1, ind.pos, w);