Title: | Collaborative Graphical Lasso - Multi-Omics Network Reconstruction |
---|---|
Description: | Reconstruct networks from multi-omics data sets with the collaborative graphical lasso (coglasso) algorithm described in Albanese, A., Kohlen, W., and Behrouzi, P. (2024) <arXiv:2403.18602>. Build multiple networks using the coglasso() function, select the best one with stars_coglasso(). |
Authors: | Alessio Albanese [aut, cre, cph] , Pariya Behrouzi [aut] |
Maintainer: | Alessio Albanese <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.2.9000 |
Built: | 2024-11-17 05:56:36 UTC |
Source: | https://github.com/drquestion/coglasso |
bs()
wraps the two main functions of the package in a single one:
coglasso()
, to build multiple multi-omics networks, and select_coglasso()
to select the best one according to the chosen criterion.
bs( data, p = NULL, pX = lifecycle::deprecated(), lambda_w = NULL, lambda_b = NULL, c = NULL, nlambda_w = NULL, nlambda_b = NULL, nc = NULL, lambda_w_max = NULL, lambda_b_max = NULL, c_max = NULL, lambda_w_min_ratio = NULL, lambda_b_min_ratio = NULL, c_min_ratio = NULL, icov_guess = NULL, cov_output = FALSE, lock_lambdas = FALSE, method = "xestars", stars_thresh = 0.1, stars_subsample_ratio = NULL, rep_num = 20, max_iter = 10, old_sampling = FALSE, light = TRUE, ebic_gamma = 0.5, verbose = TRUE )
bs( data, p = NULL, pX = lifecycle::deprecated(), lambda_w = NULL, lambda_b = NULL, c = NULL, nlambda_w = NULL, nlambda_b = NULL, nc = NULL, lambda_w_max = NULL, lambda_b_max = NULL, c_max = NULL, lambda_w_min_ratio = NULL, lambda_b_min_ratio = NULL, c_min_ratio = NULL, icov_guess = NULL, cov_output = FALSE, lock_lambdas = FALSE, method = "xestars", stars_thresh = 0.1, stars_subsample_ratio = NULL, rep_num = 20, max_iter = 10, old_sampling = FALSE, light = TRUE, ebic_gamma = 0.5, verbose = TRUE )
data |
The input multi-omics data set. Rows should be samples, columns
should be variables. Variables should be grouped by their assay (e.g.
transcripts first, then metabolites). |
p |
A vector with with the number of variables for each omic layer of the
data set (e.g. the number of transcripts, metabolites etc.), in the same
order the layers have in the data set. If given a single number,
|
pX |
|
lambda_w |
A vector of values for the parameter |
lambda_b |
A vector of values for the parameter |
c |
A vector of values for the parameter |
nlambda_w |
The number of requested |
nlambda_b |
The number of requested |
nc |
The number of requested |
lambda_w_max |
The greatest generated |
lambda_b_max |
The greatest generated |
c_max |
The greatest generated |
lambda_w_min_ratio |
The ratio of the smallest generated |
lambda_b_min_ratio |
The ratio of the smallest generated |
c_min_ratio |
The ratio of the smallest generated |
icov_guess |
Use a predetermined inverse covariance matrix as an initial guess for the network estimation. |
cov_output |
Add the estimated variance-covariance matrix to the output. |
lock_lambdas |
Set |
method |
The model selection method to select the best combination of hyperparameters. The available options are "xstars", "xestars" and "eBIC". Defaults to "xestars". |
stars_thresh |
The threshold set for variability of the explored
networks at each iteration of the algorithm. The |
stars_subsample_ratio |
The proportion of samples in the multi-omics
data set to be randomly subsampled to estimate the variability of the
network under the given hyperparameters setting. Defaults to 80% when the
number of samples is smaller than 144, otherwise it defaults to
|
rep_num |
The amount of subsamples of the multi-omics data set used to estimate the variability of the network under the given hyperparameters setting. Defaults to 20. |
max_iter |
The greatest number of times the algorithm is allowed to
choose a new best |
old_sampling |
Perform the same subsampling |
light |
Do not store the "merged" matrixes recording average variability of each edge, making the algorithm more memory efficient, if set to TRUE. Defaults to TRUE. |
ebic_gamma |
The |
verbose |
Print information regarding the network building and the network selection processes. |
When using bs()
, first, coglasso()
estimates multiple multi-omics networks
with the algorithm collaborative graphical lasso, one for each combination
of input values for the hyperparameters ,
and
. Then,
select_coglasso()
selects the best combination of
hyperparameters given to coglasso()
according to the selected model
selection method. The three availble options that can be set for the argument
method
are "xstars", "xestars" and "ebic". For more information on these
selection methods, visit the help page of select_coglasso()
.
bs()
returns an object of S3
class select_coglasso
containing
several elements. The most
important is probably sel_adj
, the adjacency matrix of the
selected network. Some output elements depend on the chosen model selection
method.
These elements are always returned, and they are the result of network
estimation with coglasso()
:
loglik
is a numerical vector containing the likelihoods of all
the estimated networks.
density
is a numerical vector containing a measure of the density of all
the estimated networks.
df
is an integer vector containing the degrees of freedom of all the
estimated networks.
convergence
is a binary vector containing whether a network was
successfully estimated for the given combination of hyperparameters or not.
path
is a list containing the adjacency matrices of all the estimated
networks.
icov
is a list containing the inverse covariance matrices of all the
estimated networks.
nexploded
is the number of combinations of hyperparameters for which
coglasso()
failed to converge.
data
is the input multi-omics data set.
hpars
is the ordered table of all the combinations of hyperparameters
given as input to bs()
, with
being the key to sort rows.
lambda_w
, lambda_b
, and c
are numerical vectors with,
respectively, all the ,
, and
values
bs()
used.
p
is the vector with the number of variables for each omic layer of the
data set.
D
is the number of omics layers in the data set.
cov
optional, returned when cov_output
is TRUE, is a list containing
the variance-covariance matrices of all the estimated networks.
These elements are returned by all selection methods available:
sel_index_c
, sel_index_lw
and sel_index_lb
are the indexes of the
final selected parameters ,
and
leading to the most stable sparse network.
sel_c
, sel_lambda_w
and sel_lambda_b
are the final selected
parameters ,
and
leading to the most
stable sparse network.
sel_adj
is the adjacency matrix of the final selected network.
sel_density
is the density of the final selected network.
sel_icov
is the inverse covariance matrix of the final selected network.
call
is the matched call.
method
is the chosen model selection method.
These are the additional elements returned when choosing "xestars":
opt_adj
is a list of the adjacency matrices finally selected for each
parameter explored.
opt_variability
is a numerical vector containing the variabilities
associated to the adjacency matrices in opt_adj
.
opt_index_lw
and opt_index_lb
are integer vectors containing the
index of the selected s (or
s) for each
parameters explored.
opt_lambda_w
and opt_lambda_b
are vectors containing the selected
s (or
s) for each
parameters
explored.
merge_lw
and merge_lb
are returned only if light
is set to FALSE.
They are lists with as many elements as the number of
parameters explored. Every element is a "merged" adjacency matrix,
the average of all the adjacency matrices estimated for those specific
and the selected
(or
) values
across all the subsampling in the last path explored before convergence,
the one when the final combination of
and
is selected for the given
value.
These are the additional elements returned when choosing "xstars":
merge_lw
and merge_lb
are lists with as many elements as the number of
parameters explored. Every element is in turn a list of as many
matrices as the number of
(or
) values
explored. Each matrix is the "merged" adjacency matrix, the average of all
the adjacency matrices estimated for those specific
and
(or
) values across all the subsampling in
the last path explored before convergence, the one when the final
combination of
and
is selected for the
given
value.
variability_lw
and variability_lb
are lists with as many elements as
the number of parameters explored. Every element is a numeric
vector of as many items as the number of
(or
) values explored. Each item is the variability of the
network estimated for those specific
and
(or
) values in the last path explored before convergence, the
one when the final combination of
and
is
selected for the given
value.
opt_adj
is a list of the adjacency matrices finally selected for each
parameter explored.
opt_variability
is a numerical vector containing the variabilities
associated to the adjacency matrices in opt_adj
.
opt_index_lw
and opt_index_lb
are integer vectors containing the
index of the selected s (or
s) for each
parameters explored.
opt_lambda_w
and opt_lambda_b
are vectors containing the selected
s (or
s) for each
parameters
explored.
These are the additional elements returned when choosing "ebic":
ebic_scores
is a numerical vector containing the eBIC scores for all the
hyperparameter combination.
# Suggested usage: give the input data set, set the values for `p` and the # number of hyperparameters to explore (to choose how extensively to explore # the possible hyperparameters). Then, let the default behavior do the rest: sel_mo_net <- bs(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3, nlambda_b = 3, nc = 3, verbose = FALSE)
# Suggested usage: give the input data set, set the values for `p` and the # number of hyperparameters to explore (to choose how extensively to explore # the possible hyperparameters). Then, let the default behavior do the rest: sel_mo_net <- bs(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3, nlambda_b = 3, nc = 3, verbose = FALSE)
coglasso()
estimates multiple multi-omics networks with the algorithm
collaborative graphical lasso, one for each combination of input values for
the hyperparameters ,
and
.
coglasso( data, p = NULL, pX = lifecycle::deprecated(), lambda_w = NULL, lambda_b = NULL, c = NULL, nlambda_w = NULL, nlambda_b = NULL, nc = NULL, lambda_w_max = NULL, lambda_b_max = NULL, c_max = NULL, lambda_w_min_ratio = NULL, lambda_b_min_ratio = NULL, c_min_ratio = NULL, icov_guess = NULL, cov_output = FALSE, lock_lambdas = FALSE, verbose = TRUE )
coglasso( data, p = NULL, pX = lifecycle::deprecated(), lambda_w = NULL, lambda_b = NULL, c = NULL, nlambda_w = NULL, nlambda_b = NULL, nc = NULL, lambda_w_max = NULL, lambda_b_max = NULL, c_max = NULL, lambda_w_min_ratio = NULL, lambda_b_min_ratio = NULL, c_min_ratio = NULL, icov_guess = NULL, cov_output = FALSE, lock_lambdas = FALSE, verbose = TRUE )
data |
The input multi-omics data set. Rows should be samples, columns
should be variables. Variables should be grouped by their assay (e.g.
transcripts first, then metabolites). |
p |
A vector with with the number of variables for each omic layer of the
data set (e.g. the number of transcripts, metabolites etc.), in the same
order the layers have in the data set. If given a single number,
|
pX |
|
lambda_w |
A vector of values for the parameter |
lambda_b |
A vector of values for the parameter |
c |
A vector of values for the parameter |
nlambda_w |
The number of requested |
nlambda_b |
The number of requested |
nc |
The number of requested |
lambda_w_max |
The greatest generated |
lambda_b_max |
The greatest generated |
c_max |
The greatest generated |
lambda_w_min_ratio |
The ratio of the smallest generated |
lambda_b_min_ratio |
The ratio of the smallest generated |
c_min_ratio |
The ratio of the smallest generated |
icov_guess |
Use a predetermined inverse covariance matrix as an initial guess for the network estimation. |
cov_output |
Add the estimated variance-covariance matrix to the output. |
lock_lambdas |
Set |
verbose |
Print information regarding current |
coglasso()
returns an object of S3
class coglasso
, that has the
following elements:
loglik
is a numerical vector containing the likelihoods of all
the estimated networks.
density
is a numerical vector containing a measure of the density of all
the estimated networks.
df
is an integer vector containing the degrees of freedom of all the
estimated networks.
convergence
is a binary vector containing whether a network was
successfully estimated for the given combination of hyperparameters or not.
path
is a list containing the adjacency matrices of all the estimated
networks.
icov
is a list containing the inverse covariance matrices of all the
estimated networks.
nexploded
is the number of combinations of hyperparameters for which
coglasso()
failed to converge.
data
is the input multi-omics data set.
hpars
is the ordered table of all the combinations of hyperparameters
given as input to coglasso()
, with
being the key to sort rows.
lambda_w
is a numerical vector with all the values
coglasso()
used.
lambda_b
is a numerical vector with all the values
coglasso()
used.
c
is a numerical vector with all the values
coglasso()
used.
p
is the vector with the number of variables for each omic layer of the
data set.
D
is the number of omics layers in the data set.
icov_guess
optional, returned when icov_guess
is given. It is the
predetermined inverse covariance matrix given by the user as an initial
guess for the network estimation.
cov
optional, returned when cov_output
is TRUE, is a list containing
the variance-covariance matrices of all the estimated networks.
call
is the matched call.
# Typical usage: set the number of hyperparameters to explore cg <- coglasso(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3, nlambda_b = 3, nc = 3, verbose = FALSE) # Model selection using eXtended Efficient StARS, takes less than five seconds sel_cg_xestars <- select_coglasso(cg, method = "xestars", verbose = FALSE)
# Typical usage: set the number of hyperparameters to explore cg <- coglasso(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3, nlambda_b = 3, nc = 3, verbose = FALSE) # Model selection using eXtended Efficient StARS, takes less than five seconds sel_cg_xestars <- select_coglasso(cg, method = "xestars", verbose = FALSE)
coglasso
networkget_network()
extracts the selected network from a select_coglasso
object,
or a different specific one from either a select_coglasso
or a coglasso
object when specifying the optional parameters.
get_network(sel_cg_obj, index_c = NULL, index_lw = NULL, index_lb = NULL)
get_network(sel_cg_obj, index_c = NULL, index_lw = NULL, index_lb = NULL)
sel_cg_obj |
The object of |
index_c |
The index of the |
index_lw |
The index of the |
index_lb |
The index of the |
If the input is a coglasso
object, it is necessary to specify all the
indexes to extract a selected network.
If the input is a select_coglasso
object, it extracts by default the
selected network. If the selection method was "ebic", and you want to extract
a different network than the selected one, specify all indexes.
Otherwise, if the objective is to extract the optimal network for a specific
value different than the selected one, set
index_c
to your chosen
one. Also here it is possible to extract a specific non-optimal network by
setting all the indexes to the chosen ones.
get_network()
returns the selected network, in the form of an
object of class igraph
.
sel_cg <- bs(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3, nlambda_b = 3, nc = 3, verbose = FALSE) sel_net <- get_network(sel_cg) # Could even plot the selected network with plot(sel_net), but then it would # plot an unnotated network, better to directly plot(sel_cg). print(sel_net)
sel_cg <- bs(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3, nlambda_b = 3, nc = 3, verbose = FALSE) sel_net <- get_network(sel_cg) # Could even plot the selected network with plot(sel_net), but then it would # plot an unnotated network, better to directly plot(sel_cg). print(sel_net)
A dataset containing transcript and metabolite values analysed in Albanese et al. 2023, subset of the multi-omics data set published in Jan, M., Gobet, N., Diessler, S. et al. A multi-omics digital research object for the genetics of sleep regulation. Sci Data 6, 258 (2019).
multi_omics_sd_small
is a smaller version, limited to the transcript Cirbp
and the transcripts and metabolites belonging to its neighborhood as
described in Albanese et al. 2023
multi_omics_sd_micro
is a minimal version with Cirbp and a selection of its
neighborhood.
multi_omics_sd multi_omics_sd_small multi_omics_sd_micro
multi_omics_sd multi_omics_sd_small multi_omics_sd_micro
multi_omics_sd
A data frame with 30 rows and 238 variables (162 transcripts and 76 metabolites):
log2 CPM values of 162 transcripts in mouse cortex under sleep deprivation (-4.52–10.46)
abundance values of 76 metabolites (0.02–1112.67)
multi_omics_sd_small
A data frame with 30 rows and 19 variables (14 transcripts and 5 metabolites)
log2 CPM values of 14 transcripts in mouse cortex under sleep deprivation (4.24–9.31)
Abundance values of 5 metabolites (0.17–145.33)
multi_omics_sd_micro
A data frame with 30 rows and 6 variables (4 transcripts and 2 metabolites)
log2 CPM values of 4 transcripts in mouse cortex under sleep deprivation (4.78–9.31)
Abundance values of 2 metabolites (58.80–145.33)
Jan, M., Gobet, N., Diessler, S. et al. A multi-omics digital research object for the genetics of sleep regulation. Sci Data 6, 258 (2019) doi:10.1038/s41597-019-0171-x
Figshare folder of the original manuscript: https://figshare.com/articles/dataset/Input_data_for_systems_genetics_of_sleep_regulation/7797434
coglasso
networksplot.select_coglasso()
creates an annotated plot of a coglasso
selected
network from an object of S3
class select_coglasso
. Variables from
different data sets will have different color coding. To plot the network,
it's enough to use plot()
call on the select_coglasso
object.
plot.coglasso()
has the same functioning as select_coglasso.plot()
, but
from an object of S3
class coglasso
. In this case, it is compulsory to
specify index_c
, index_lw
, and index_lb.
## S3 method for class 'select_coglasso' plot( x, index_c = NULL, index_lw = NULL, index_lb = NULL, node_labels = TRUE, hide_isolated = TRUE, ... ) ## S3 method for class 'coglasso' plot( x, index_c, index_lw, index_lb, node_labels = TRUE, hide_isolated = TRUE, ... )
## S3 method for class 'select_coglasso' plot( x, index_c = NULL, index_lw = NULL, index_lb = NULL, node_labels = TRUE, hide_isolated = TRUE, ... ) ## S3 method for class 'coglasso' plot( x, index_c, index_lw, index_lb, node_labels = TRUE, hide_isolated = TRUE, ... )
x |
The object of |
index_c |
The index of the |
index_lw |
The index of the |
index_lb |
The index of the |
node_labels |
Show node names in the network. Defaults to TRUE. |
hide_isolated |
Hide nodes that are not connected to any other node. Defaults to TRUE. |
... |
System required, not used here. |
If the input is a coglasso
object, it is necessary to specify all the
indexes to extract a selected network.
If the input is a select_coglasso
object, it extracts by default the
selected network. If the selection method was "ebic", and you want to extract
a different network than the selected one, specify all indexes.
Otherwise, if the objective is to extract the optimal network for a specific
value different than the selected one, set
index_c
to your chosen
one. Also here it is possible to extract a specific non-optimal network by
setting all the indexes to the chosen ones.
Returns NULL, invisibly.
get_network()
to understand what it means to select a specific
network with index_c
, index_lw
, and index_lb.
sel_cg <- bs(multi_omics_sd_small, p = c(14, 5), nlambda_w = 15, nlambda_b = 15, nc = 3, lambda_w_min_ratio = 0.6, verbose = FALSE) plot(sel_cg)
sel_cg <- bs(multi_omics_sd_small, p = c(14, 5), nlambda_w = 15, nlambda_b = 15, nc = 3, lambda_w_min_ratio = 0.6, verbose = FALSE) plot(sel_cg)
coglasso
networkselect_coglasso()
selects the best combination of hyperparameters given to
coglasso()
according to the selected model selection method. The three
availble options that can be set for the argument method
are "xstars",
"xestars" and "ebic".
select_coglasso( coglasso_obj, method = "xestars", stars_thresh = 0.1, stars_subsample_ratio = NULL, rep_num = 20, max_iter = 10, old_sampling = FALSE, light = TRUE, ebic_gamma = 0.5, verbose = TRUE )
select_coglasso( coglasso_obj, method = "xestars", stars_thresh = 0.1, stars_subsample_ratio = NULL, rep_num = 20, max_iter = 10, old_sampling = FALSE, light = TRUE, ebic_gamma = 0.5, verbose = TRUE )
coglasso_obj |
The object of |
method |
The model selection method to select the best combination of hyperparameters. The available options are "xstars", "xestars" and "eBIC". Defaults to "xestars". |
stars_thresh |
The threshold set for variability of the explored
networks at each iteration of the algorithm. The |
stars_subsample_ratio |
The proportion of samples in the multi-omics
data set to be randomly subsampled to estimate the variability of the
network under the given hyperparameters setting. Defaults to 80% when the
number of samples is smaller than 144, otherwise it defaults to
|
rep_num |
The amount of subsamples of the multi-omics data set used to estimate the variability of the network under the given hyperparameters setting. Defaults to 20. |
max_iter |
The greatest number of times the algorithm is allowed to
choose a new best |
old_sampling |
Perform the same subsampling |
light |
Do not store the "merged" matrixes recording average variability of each edge, making the algorithm more memory efficient, if set to TRUE. Defaults to TRUE. |
ebic_gamma |
The |
verbose |
Print information regarding the progress of the selection procedure on the console. |
select_coglasso()
provides three model selection strategies:
"xstars" uses eXtended StARS (XStARS) selecting the most stable, yet sparse
network. Stability is computed upon network estimation from multiple subsamples of the
multi-omics data set, allowing repetition. Subsamples are collected for a
fixed amount of times (rep_num
), and with a fixed proportion of the total
number of samples (stars_subsample_ratio
). See xstars()
for more
information on the methodology.
"xestars" uses eXtended Efficient StARS (XEStARS), a significantly
faster and memory-effcient version of XStARS. It could produce marginally
different results to "xstars" due to a different sampling strategy. See
xestars()
for more information on the methodology.
"ebic" uses the extended Bayesian Information
Criterion (eBIC) selecting the network that minimizes it. gamma
sets the
wait given to the extended component, turning the model selection method to
the standard BIC if set to 0.
select_coglasso()
returns an object of S3
class select_coglasso
containing the results of the
selection procedure, built upon an object of S3
class coglasso
. Some
output elements depend on the chosen model selection method.
These elements are returned by all methods:
... are the same elements returned by coglasso()
.
sel_index_c
, sel_index_lw
and sel_index_lb
are the indexes of the
final selected parameters ,
and
leading to the most stable sparse network.
sel_c
, sel_lambda_w
and sel_lambda_b
are the final selected
parameters ,
and
leading to the most
stable sparse network.
sel_adj
is the adjacency matrix of the final selected network.
sel_density
is the density of the final selected network.
sel_icov
is the inverse covariance matrix of the final selected network.
call
is the matched call.
method
is the chosen model selection method.
These are the additional elements returned when choosing "xestars":
opt_adj
is a list of the adjacency matrices finally selected for each
parameter explored.
opt_variability
is a numerical vector containing the variabilities
associated to the adjacency matrices in opt_adj
.
opt_index_lw
and opt_index_lb
are integer vectors containing the
index of the selected s (or
s) for each
parameters explored.
opt_lambda_w
and opt_lambda_b
are vectors containing the selected
s (or
s) for each
parameters
explored.
merge_lw
and merge_lb
are returned only if light
is set to FALSE.
They are lists with as many elements as the number of
parameters explored. Every element is a "merged" adjacency matrix,
the average of all the adjacency matrices estimated for those specific
and the selected
(or
) values
across all the subsampling in the last path explored before convergence,
the one when the final combination of
and
is selected for the given
value.
These are the additional elements returned when choosing "xstars":
merge_lw
and merge_lb
are lists with as many elements as the number of
parameters explored. Every element is in turn a list of as many
matrices as the number of
(or
) values
explored. Each matrix is the "merged" adjacency matrix, the average of all
the adjacency matrices estimated for those specific
and
(or
) values across all the subsampling in
the last path explored before convergence, the one when the final
combination of
and
is selected for the
given
value.
variability_lw
and variability_lb
are lists with as many elements as
the number of parameters explored. Every element is a numeric
vector of as many items as the number of
(or
) values explored. Each item is the variability of the
network estimated for those specific
and
(or
) values in the last path explored before convergence, the
one when the final combination of
and
is
selected for the given
value.
opt_adj
is a list of the adjacency matrices finally selected for each
parameter explored.
opt_variability
is a numerical vector containing the variabilities
associated to the adjacency matrices in opt_adj
.
opt_index_lw
and opt_index_lb
are integer vectors containing the
index of the selected s (or
s) for each
parameters explored.
opt_lambda_w
and opt_lambda_b
are vectors containing the selected
s (or
s) for each
parameters
explored.
These are the additional elements returned when choosing "ebic":
ebic_scores
is a numerical vector containing the eBIC scores for all the
hyperparameter combination.
cg <- coglasso(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3, nlambda_b = 3, nc = 3, verbose = FALSE) # Using eXtended Efficient StARS, takes less than five seconds sel_cg_xestars <- select_coglasso(cg, method = "xestars", verbose = FALSE) # Using eXtended StARS, takes around a minute sel_cg_xstars <- select_coglasso(cg, method = "xstars", verbose = FALSE) # Using eBIC sel_cg_ebic <- select_coglasso(cg, method = "ebic", verbose = FALSE)
cg <- coglasso(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3, nlambda_b = 3, nc = 3, verbose = FALSE) # Using eXtended Efficient StARS, takes less than five seconds sel_cg_xestars <- select_coglasso(cg, method = "xestars", verbose = FALSE) # Using eXtended StARS, takes around a minute sel_cg_xstars <- select_coglasso(cg, method = "xstars", verbose = FALSE) # Using eBIC sel_cg_ebic <- select_coglasso(cg, method = "ebic", verbose = FALSE)
coglasso
networkxestars()
provides a more efficient and lighter implementation
than xstars()
to select the combination of hyperparameters given to
coglasso()
yielding the most stable, yet sparse network. Stability is
computed upon network estimation from multiple subsamples of the multi-omics
data set, allowing repetition. Subsamples are collected for a fixed amount of
times (rep_num
), and with a fixed proportion of the total number of samples
(stars_subsample_ratio
).
xestars( coglasso_obj, stars_thresh = 0.1, stars_subsample_ratio = NULL, rep_num = 20, max_iter = 10, old_sampling = FALSE, light = TRUE, verbose = TRUE )
xestars( coglasso_obj, stars_thresh = 0.1, stars_subsample_ratio = NULL, rep_num = 20, max_iter = 10, old_sampling = FALSE, light = TRUE, verbose = TRUE )
coglasso_obj |
The object of |
stars_thresh |
The threshold set for variability of the explored
networks at each iteration of the algorithm. The |
stars_subsample_ratio |
The proportion of samples in the multi-omics
data set to be randomly subsampled to estimate the variability of the
network under the given hyperparameters setting. Defaults to 80% when the
number of samples is smaller than 144, otherwise it defaults to
|
rep_num |
The amount of subsamples of the multi-omics data set used to estimate the variability of the network under the given hyperparameters setting. Defaults to 20. |
max_iter |
The greatest number of times the algorithm is allowed to
choose a new best |
old_sampling |
Perform the same subsampling |
light |
Do not store the "merged" matrixes recording average variability of each edge, making the algorithm more memory efficient, if set to TRUE. Defaults to TRUE. |
verbose |
Print information regarding the progress of the selection procedure on the console. |
eXtended Efficient StARS (XEStARS) is a more efficient and memory-light version of
XStARS, the adaptation for collaborative graphical regression of the method
published by Liu, H. et al. (2010): Stability Approach to Regularization
Selection (StARS). StARS was developed for network estimation regulated by
a single penalty parameter, while collaborative graphical lasso needs to
explore three different hyperparameters. In particular, two of these are
penalty parameters with a direct influence on network sparsity, hence on
stability. For every parameter,
xestars()
explores one of the two
penalty parameters ( or
), keeping the other
one fixed at its previous best estimate, using the normal, one-dimentional
StARS approach, until finding the best couple. What makes it more efficient
than
xstars()
is that the stability check that in the original algorithm
(even in the original StARS) is performed for every or
value, is implemented here as a stopping criterion. This
reduces sensibly the number of iterations before convergence. It then selects
the
parameter for which the best (
,
)
couple yielded the most stable, yet sparse network.
The original XStARS computes a new subsampling for every time the algorithm
switches from optimizing the two and
, and for
every
. This does not allow to compare the hyperparameters on an equal
ground, and can slow the selection down with bigger data set or a larger
hyperparameter space. To allow a fairer (and faster) comparison among
different optimizations, the
old_sampling
parameter has been implemented.
If set to TRUE, the subsampling is the same one xstars()
would perform.
Otherwise the subsampling is performed at the beginning of the algorithm once
and for all its iterations.
To allow xestars()
to be more memory light, the light
parameter has been
implemented. If set to TRUE and the "merged" matrixes traditionally returned
by both StARS and XStARS are not returned.
xestars()
returns an object of S3
class select_coglasso
containing the results of the selection
procedure, built upon the object of S3
class coglasso
returned by coglasso()
.
... are the same elements returned by coglasso()
.
opt_adj
is a list of the adjacency matrices finally selected for each
parameter explored.
opt_variability
is a numerical vector containing the variabilities
associated to the adjacency matrices in opt_adj
.
opt_index_lw
and opt_index_lb
are integer vectors containing the
index of the selected s (or
s) for each
parameters explored.
opt_lambda_w
and opt_lambda_b
are vectors containing the selected
s (or
s) for each
parameters
explored.
sel_index_c
, sel_index_lw
and sel_index_lb
are the indexes of the
final selected parameters ,
and
leading to the most stable sparse network.
sel_c
, sel_lambda_w
and sel_lambda_b
are the final selected
parameters ,
and
leading to the most
stable sparse network.
sel_adj
is the adjacency matrix of the final selected network.
sel_density
is the density of the final selected network.
sel_icov
is the inverse covariance matrix of the final selected network.
call
is the matched call.
method
is the chosen model selection method. Here, it is "xestars".
merge_lw
and merge_lb
are returned only if light
is set to FALSE.
They are lists with as many elements as the number of
parameters explored. Every element is a "merged" adjacency matrix,
the average of all the adjacency matrices estimated for those specific
and the selected
(or
) values
across all the subsampling in the last path explored before convergence,
the one when the final combination of
and
is selected for the given
value.
cg <- coglasso(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3, nlambda_b = 3, nc = 3, verbose = FALSE) # Takes less than five seconds sel_cg <- xestars(cg, verbose = FALSE)
cg <- coglasso(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3, nlambda_b = 3, nc = 3, verbose = FALSE) # Takes less than five seconds sel_cg <- xestars(cg, verbose = FALSE)
coglasso
networkxstars()
selects the combination of hyperparameters given to
coglasso()
yielding the most stable, yet sparse network. Stability is
computed upon network estimation from multiple subsamples of the multi-omics data set,
allowing repetition. Subsamples are collected for a fixed amount of times
(rep_num
), and with a fixed proportion of the total number of samples
(stars_subsample_ratio
).
xstars( coglasso_obj, stars_thresh = 0.1, stars_subsample_ratio = NULL, rep_num = 20, max_iter = 10, verbose = TRUE )
xstars( coglasso_obj, stars_thresh = 0.1, stars_subsample_ratio = NULL, rep_num = 20, max_iter = 10, verbose = TRUE )
coglasso_obj |
The object of |
stars_thresh |
The threshold set for variability of the explored
networks at each iteration of the algorithm. The |
stars_subsample_ratio |
The proportion of samples in the multi-omics
data set to be randomly subsampled to estimate the variability of the
network under the given hyperparameters setting. Defaults to 80% when the
number of samples is smaller than 144, otherwise it defaults to
|
rep_num |
The amount of subsamples of the multi-omics data set used to estimate the variability of the network under the given hyperparameters setting. Defaults to 20. |
max_iter |
The greatest number of times the algorithm is allowed to
choose a new best |
verbose |
Print information regarding the progress of the selection procedure on the console. |
eXtended StARS (XStARS) is an adaptation for collaborative graphical regression of the method
published by Liu, H. et al. (2010): Stability Approach to Regularization
Selection (StARS). StARS was developed for network estimation regulated by
a single penalty parameter, while collaborative graphical lasso needs to
explore three different hyperparameters. In particular, two of these are
penalty parameters with a direct influence on network sparsity, hence on
stability. For every parameter,
xstars()
explores one of
the two penalty parameters ( or
), keeping the other one
fixed at its previous best estimate, using the normal, one-dimentional
StARS approach, until finding the best couple. It then selects the
parameter for which the best (
,
) couple yielded the most
stable, yet sparse network.
xstars()
returns an object of S3
class select_coglasso
containing the results of the
selection procedure, built upon the object of S3
class coglasso
returned by coglasso()
.
... are the same elements returned by coglasso()
.
merge_lw
and merge_lb
are lists with as many elements as the number of
parameters explored. Every element is in turn a list of as many
matrices as the number of
(or
) values explored. Each
matrix is the "merged" adjacency matrix, the average of all the adjacency
matrices estimated for those specific
and
(or
)
values across all the subsampling in the last path explored before
convergence, the one when the final combination of
and
is selected for the given
value.
variability_lw
and variability_lb
are lists with as many elements as
the number of parameters explored. Every element is a numeric vector
of as many items as the number of
(or
) values explored.
Each item is the variability of the network estimated for those specific
and
(or
) values in the last path explored before
convergence, the one when the final combination of
and
is selected for the given
value.
opt_adj
is a list of the adjacency matrices finally selected for each
parameter explored.
opt_variability
is a numerical vector containing the variabilities
associated to the adjacency matrices in opt_adj
.
opt_index_lw
and opt_index_lb
are integer vectors containing the
index of the selected s (or
s) for each
parameters
explored.
opt_lambda_w
and opt_lambda_b
are vectors containing the selected
s (or
s) for each
parameters explored.
sel_index_c
, sel_index_lw
and sel_index_lb
are the indexes of the
final selected parameters ,
and
leading to the
most stable sparse network.
sel_c
, sel_lambda_w
and sel_lambda_b
are the final selected
parameters ,
and
leading to the most stable
sparse network.
sel_adj
is the adjacency matrix of the final selected network.
sel_density
is the density of the final selected network.
sel_icov
is the inverse covariance matrix of the final selected network.
call
is the matched call.
method
is the chosen model selection method. Here, it is "xstars".
cg <- coglasso(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3, nlambda_b = 3, nc = 3, verbose = FALSE) # Takes around one minute sel_cg <- xstars(cg, verbose = FALSE)
cg <- coglasso(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3, nlambda_b = 3, nc = 3, verbose = FALSE) # Takes around one minute sel_cg <- xstars(cg, verbose = FALSE)