This function fits separate classification/regression models, specified in
the tidymodels framework, for each response variable in a data set. This is
the core function of mrIML
.
Usage
mrIMLpredicts(
X,
X1 = NULL,
Y,
Model,
balance_data = "no",
dummy = FALSE,
prop = 0.7,
tune_grid_size = 10,
k = 10,
racing = TRUE
)
Arguments
- Y, X, X1
Data frames containing the response, predictor, and the joint response variables (i.e. the responses that are also to be used as predictors if fitting GN model) respectively. If
X1
is not provided then a standard multi-response model will be fit to the data (e.g. the response models are independant of one another conditional on the predictors supplied in X). See Details section below.- Model
Any model from the tidymodels package. See Examples.
- balance_data
A character string:
"up": up-samples the data to equal class sizes.
"down": down-samples the data to equal class sizes.
"no": leaves the data as is. "no" is the default value.
- dummy
A logical value indicating if
recipes::step_dummy()
should be included in the data recipe.- prop
A numeric value between 0 and 1. Defines the training-testing data proportion to be used, which defaults to
prop = 0.7
.- tune_grid_size
A numeric value that sets the grid size for hyperparameter tuning. Larger grid sizes increase computational time. Ignored if
racing = TRUE
.- k
A numeric value. Sets the number of folds in the cross-validation. 10-fold CV is the default.
- racing
A logical value. If
TRUE
,mrIML
performs the grid search using thefinetune::tune_race_anova()
method; otherwise,tune::tune_grid()
is used.racing = TRUE
is now the default method of tuning.
Value
A list object with three slots:
$Model
: The tidymodels object that was fit.$Data
: A list of the raw data.$Fits
: A list of the fitted models for each response variable.
Details
mrIMLpredicts
fits the supplied tidy model to each response variable in the
data frame Y
. If only X
(a data frame of predictors) is supplied, then
independent models are fit, i.e., the other response variables are not used as
predictors. If X1
(a data frame of all or select response variables) is
supplied, then those response variables are also used as predictors in the
response models. For example, supplying X1
means that a co-occurrence model is fit.
If balance_data = "up"
, then themis::step_rose()
is used to upsample the
dataset; however, we generally recommend using balance_data = "no"
in most
cases.
Examples
library(tidymodels)
data <- MRFcov::Bird.parasites
# Define the response variables of interest
Y <- data %>%
select(-scale.prop.zos) %>%
select(order(everything()))
# Define the predictors
X <- data %>%
select(scale.prop.zos)
# Specify a random forest tidy model
model_rf <- rand_forest(
trees = 10, # 50 trees are set for brevity. Aim to start with 1000
mode = "classification",
mtry = tune(),
min_n = tune()
) %>%
set_engine("randomForest")
# Fitting independent multi-response model -----------------------------------
MR_model_rf <- mrIMLpredicts(
X = X,
Y = Y,
Model = model_rf,
prop = 0.7,
k = 2,
racing = FALSE
)
#>
|
| | 0%
|
|================== | 25%
|
|=================================== | 50%
|
|==================================================== | 75%
|
|======================================================================| 100%
#> i Creating pre-processing data to finalize unknown parameter: mtry
#> i Creating pre-processing data to finalize unknown parameter: mtry
#> i Creating pre-processing data to finalize unknown parameter: mtry
#> i Creating pre-processing data to finalize unknown parameter: mtry
# Fitting a graphical network model -----------------------------------------
# Define the dependent response variables (all in this case)
# \donttest{
X1 <- Y
GN_model <- mrIMLpredicts(
X = X,
Y = Y,
X1 = X1,
Model = model_rf,
prop = 0.7,
k = 2,
racing = FALSE
)
#>
|
| | 0%
|
|================== | 25%
|
|=================================== | 50%
|
|==================================================== | 75%
|
|======================================================================| 100%
#> i Creating pre-processing data to finalize unknown parameter: mtry
#> i Creating pre-processing data to finalize unknown parameter: mtry
#> i Creating pre-processing data to finalize unknown parameter: mtry
#> i Creating pre-processing data to finalize unknown parameter: mtry
# }