document addition of "aorsf" engine to bonsai (#1120)

bcjaeger · simonpcouch · web-flow · commit 8880aff86b69 · 2024-05-14T14:22:16.000-05:00
* document classification and regression with aorsf
* explain classification `type = "class"` divergence with aorsf
* `update_model_info_table()`

---------

Co-authored-by: Simon P. Couch &lt;simonpatrickcouch@gmail.com&gt;
diff --git a/R/rand_forest_aorsf.R b/R/rand_forest_aorsf.R
@@ -1,6 +1,6 @@
 #' Oblique random survival forests via aorsf
 #'
-#' [aorsf::orsf()] fits a model that creates a large number of decision
+#' [aorsf::orsf()] fits a model that creates a large number of oblique decision
 #' trees, each de-correlated from the others. The final prediction uses all
 #' predictions from the individual trees and combines them.
 #'
diff --git a/inst/models.tsv b/inst/models.tsv
@@ -107,11 +107,13 @@
 "proportional_hazards"	"censored regression"	"survival"	"censored"
 "rand_forest"	"censored regression"	"aorsf"	"censored"
 "rand_forest"	"censored regression"	"partykit"	"censored"
+"rand_forest"	"classification"	"aorsf"	"bonsai"
 "rand_forest"	"classification"	"h2o"	"agua"
 "rand_forest"	"classification"	"partykit"	"bonsai"
 "rand_forest"	"classification"	"randomForest"	NA
 "rand_forest"	"classification"	"ranger"	NA
 "rand_forest"	"classification"	"spark"	NA
+"rand_forest"	"regression"	"aorsf"	"bonsai"
 "rand_forest"	"regression"	"h2o"	"agua"
 "rand_forest"	"regression"	"partykit"	"bonsai"
 "rand_forest"	"regression"	"randomForest"	NA
diff --git a/man/details_rand_forest_aorsf.Rd b/man/details_rand_forest_aorsf.Rd
diff --git a/man/rmd/rand_forest_aorsf.Rmd b/man/rmd/rand_forest_aorsf.Rmd
@@ -26,10 +26,9 @@ param$item
 
 Additionally, this model has one engine-specific tuning parameter:
 
- * `split_min_stat`: Minimum test statistic required to split a node. Default is `3.841459` for the log-rank test, which is roughly a p-value of 0.05.
+ * `split_min_stat`: Minimum test statistic required to split a node. Defaults are `3.841459` for censored regression (which is roughly a p-value of 0.05) and `0` for classification and regression. For classification, this tuning parameter should be between 0 and 1, and for regression it should be greater than or equal to 0. Higher values of this parameter cause trees grown by `aorsf` to have less depth.
 
-
-# Translation from parsnip to the original package (censored regression)
+## Translation from parsnip to the original package (censored regression)
 
 `r uses_extension("rand_forest", "aorsf", "censored regression")`
 
@@ -42,6 +41,32 @@ rand_forest() %>%
   translate()
 ```
 
+## Translation from parsnip to the original package (regression)
+
+`r uses_extension("rand_forest", "aorsf", "regression")`
+
+```{r aorsf-reg}
+library(bonsai)
+
+rand_forest() %>%
+  set_engine("aorsf") %>%
+  set_mode("regression") %>%
+  translate()
+```
+
+## Translation from parsnip to the original package (classification)
+
+`r uses_extension("rand_forest", "aorsf", "classification")`
+
+```{r aorsf-class}
+library(bonsai)
+
+rand_forest() %>%
+  set_engine("aorsf") %>%
+  set_mode("classification") %>%
+  translate()
+```
+
 ## Preprocessing requirements
 
 ```{r child = "template-tree-split-factors.Rmd"}
@@ -56,6 +81,8 @@ rand_forest() %>%
 
 Predictions of survival probability at a time exceeding the maximum observed event time are the predicted survival probability at the maximum observed time in the training data.
 
+The class predict method in `aorsf` uses the standard 'each tree gets one vote' approach, which is usually but not always consistent with the picking the class that has highest predicted probability. It is okay for this inconsistency to occur in `aorsf` because it is intentionally applying the traditional class prediction method for random forests, but in `tidymodels` it is preferable to embrace consistency. Thus, we opted to make predicted probability consistent with predicted class all the time by making the predicted class a function of predicted probability (see [tidymodels/bonsai#78](https://github.com/tidymodels/bonsai/pull/78)).
+
 ## References
 
 - Jaeger BC, Long DL, Long DM, Sims M, Szychowski JM, Min YI, Mcclure LA, Howard G, Simon N. Oblique random survival forests. Annals of applied statistics 2019 Sep; 13(3):1847-83. DOI: 10.1214/19-AOAS1261
diff --git a/man/rmd/rand_forest_aorsf.md b/man/rmd/rand_forest_aorsf.md
@@ -1,7 +1,7 @@
 
 
 
-For this engine, there is a single mode: censored regression
+For this engine, there are multiple modes: censored regression, classification, and regression
 
 ## Tuning Parameters
 
@@ -17,10 +17,9 @@ This model has 3 tuning parameters:
 
 Additionally, this model has one engine-specific tuning parameter:
 
- * `split_min_stat`: Minimum test statistic required to split a node. Default is `3.841459` for the log-rank test, which is roughly a p-value of 0.05.
+ * `split_min_stat`: Minimum test statistic required to split a node. Defaults are `3.841459` for censored regression (which is roughly a p-value of 0.05) and `0` for classification and regression. For classification, this tuning parameter should be between 0 and 1, and for regression it should be greater than or equal to 0. Higher values of this parameter cause trees grown by `aorsf` to have less depth.
 
-
-# Translation from parsnip to the original package (censored regression)
+## Translation from parsnip to the original package (censored regression)
 
 The **censored** extension package is required to fit this model.
 
@@ -43,6 +42,54 @@ rand_forest() %>%
 ## aorsf::orsf(formula = missing_arg(), data = missing_arg(), weights = missing_arg())
 ```
 
+## Translation from parsnip to the original package (regression)
+
+The **bonsai** extension package is required to fit this model.
+
+
+```r
+library(bonsai)
+
+rand_forest() %>%
+  set_engine("aorsf") %>%
+  set_mode("regression") %>%
+  translate()
+```
+
+```
+## Random Forest Model Specification (regression)
+## 
+## Computational engine: aorsf 
+## 
+## Model fit template:
+## aorsf::orsf(formula = missing_arg(), data = missing_arg(), weights = missing_arg(), 
+##     n_thread = 1, verbose_progress = FALSE)
+```
+
+## Translation from parsnip to the original package (classification)
+
+The **bonsai** extension package is required to fit this model.
+
+
+```r
+library(bonsai)
+
+rand_forest() %>%
+  set_engine("aorsf") %>%
+  set_mode("classification") %>%
+  translate()
+```
+
+```
+## Random Forest Model Specification (classification)
+## 
+## Computational engine: aorsf 
+## 
+## Model fit template:
+## aorsf::orsf(formula = missing_arg(), data = missing_arg(), weights = missing_arg(), 
+##     n_thread = 1, verbose_progress = FALSE)
+```
+
 ## Preprocessing requirements
 
 
@@ -59,6 +106,8 @@ The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that e
 
 Predictions of survival probability at a time exceeding the maximum observed event time are the predicted survival probability at the maximum observed time in the training data.
 
+The class predict method in `aorsf` uses the standard 'each tree gets one vote' approach, which is usually but not always consistent with the picking the class that has highest predicted probability. It is okay for this inconsistency to occur in `aorsf` because it is intentionally applying the traditional class prediction method for random forests, but in `tidymodels` it is preferable to embrace consistency. Thus, we opted to make predicted probability consistent with predicted class all the time by making the predicted class a function of predicted probability (see [tidymodels/bonsai#78](https://github.com/tidymodels/bonsai/pull/78)).
+
 ## References
 
 - Jaeger BC, Long DL, Long DM, Sims M, Szychowski JM, Min YI, Mcclure LA, Howard G, Simon N. Oblique random survival forests. Annals of applied statistics 2019 Sep; 13(3):1847-83. DOI: 10.1214/19-AOAS1261