Skip to content

Commit 8880aff

Browse files
document addition of "aorsf" engine to bonsai (#1120)
* document classification and regression with aorsf * explain classification `type = "class"` divergence with aorsf * `update_model_info_table()` --------- Co-authored-by: Simon P. Couch <[email protected]>
1 parent 320affd commit 8880aff

File tree

5 files changed

+154
-15
lines changed

5 files changed

+154
-15
lines changed

R/rand_forest_aorsf.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#' Oblique random survival forests via aorsf
22
#'
3-
#' [aorsf::orsf()] fits a model that creates a large number of decision
3+
#' [aorsf::orsf()] fits a model that creates a large number of oblique decision
44
#' trees, each de-correlated from the others. The final prediction uses all
55
#' predictions from the individual trees and combines them.
66
#'

inst/models.tsv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,11 +107,13 @@
107107
"proportional_hazards" "censored regression" "survival" "censored"
108108
"rand_forest" "censored regression" "aorsf" "censored"
109109
"rand_forest" "censored regression" "partykit" "censored"
110+
"rand_forest" "classification" "aorsf" "bonsai"
110111
"rand_forest" "classification" "h2o" "agua"
111112
"rand_forest" "classification" "partykit" "bonsai"
112113
"rand_forest" "classification" "randomForest" NA
113114
"rand_forest" "classification" "ranger" NA
114115
"rand_forest" "classification" "spark" NA
116+
"rand_forest" "regression" "aorsf" "bonsai"
115117
"rand_forest" "regression" "h2o" "agua"
116118
"rand_forest" "regression" "partykit" "bonsai"
117119
"rand_forest" "regression" "randomForest" NA

man/details_rand_forest_aorsf.Rd

Lines changed: 68 additions & 7 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

man/rmd/rand_forest_aorsf.Rmd

Lines changed: 30 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,9 @@ param$item
2626

2727
Additionally, this model has one engine-specific tuning parameter:
2828

29-
* `split_min_stat`: Minimum test statistic required to split a node. Default is `3.841459` for the log-rank test, which is roughly a p-value of 0.05.
29+
* `split_min_stat`: Minimum test statistic required to split a node. Defaults are `3.841459` for censored regression (which is roughly a p-value of 0.05) and `0` for classification and regression. For classification, this tuning parameter should be between 0 and 1, and for regression it should be greater than or equal to 0. Higher values of this parameter cause trees grown by `aorsf` to have less depth.
3030

31-
32-
# Translation from parsnip to the original package (censored regression)
31+
## Translation from parsnip to the original package (censored regression)
3332

3433
`r uses_extension("rand_forest", "aorsf", "censored regression")`
3534

@@ -42,6 +41,32 @@ rand_forest() %>%
4241
translate()
4342
```
4443

44+
## Translation from parsnip to the original package (regression)
45+
46+
`r uses_extension("rand_forest", "aorsf", "regression")`
47+
48+
```{r aorsf-reg}
49+
library(bonsai)
50+
51+
rand_forest() %>%
52+
set_engine("aorsf") %>%
53+
set_mode("regression") %>%
54+
translate()
55+
```
56+
57+
## Translation from parsnip to the original package (classification)
58+
59+
`r uses_extension("rand_forest", "aorsf", "classification")`
60+
61+
```{r aorsf-class}
62+
library(bonsai)
63+
64+
rand_forest() %>%
65+
set_engine("aorsf") %>%
66+
set_mode("classification") %>%
67+
translate()
68+
```
69+
4570
## Preprocessing requirements
4671

4772
```{r child = "template-tree-split-factors.Rmd"}
@@ -56,6 +81,8 @@ rand_forest() %>%
5681

5782
Predictions of survival probability at a time exceeding the maximum observed event time are the predicted survival probability at the maximum observed time in the training data.
5883

84+
The class predict method in `aorsf` uses the standard 'each tree gets one vote' approach, which is usually but not always consistent with the picking the class that has highest predicted probability. It is okay for this inconsistency to occur in `aorsf` because it is intentionally applying the traditional class prediction method for random forests, but in `tidymodels` it is preferable to embrace consistency. Thus, we opted to make predicted probability consistent with predicted class all the time by making the predicted class a function of predicted probability (see [tidymodels/bonsai#78](https://github.com/tidymodels/bonsai/pull/78)).
85+
5986
## References
6087

6188
- Jaeger BC, Long DL, Long DM, Sims M, Szychowski JM, Min YI, Mcclure LA, Howard G, Simon N. Oblique random survival forests. Annals of applied statistics 2019 Sep; 13(3):1847-83. DOI: 10.1214/19-AOAS1261

man/rmd/rand_forest_aorsf.md

Lines changed: 53 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11

22

33

4-
For this engine, there is a single mode: censored regression
4+
For this engine, there are multiple modes: censored regression, classification, and regression
55

66
## Tuning Parameters
77

@@ -17,10 +17,9 @@ This model has 3 tuning parameters:
1717

1818
Additionally, this model has one engine-specific tuning parameter:
1919

20-
* `split_min_stat`: Minimum test statistic required to split a node. Default is `3.841459` for the log-rank test, which is roughly a p-value of 0.05.
20+
* `split_min_stat`: Minimum test statistic required to split a node. Defaults are `3.841459` for censored regression (which is roughly a p-value of 0.05) and `0` for classification and regression. For classification, this tuning parameter should be between 0 and 1, and for regression it should be greater than or equal to 0. Higher values of this parameter cause trees grown by `aorsf` to have less depth.
2121

22-
23-
# Translation from parsnip to the original package (censored regression)
22+
## Translation from parsnip to the original package (censored regression)
2423

2524
The **censored** extension package is required to fit this model.
2625

@@ -43,6 +42,54 @@ rand_forest() %>%
4342
## aorsf::orsf(formula = missing_arg(), data = missing_arg(), weights = missing_arg())
4443
```
4544

45+
## Translation from parsnip to the original package (regression)
46+
47+
The **bonsai** extension package is required to fit this model.
48+
49+
50+
```r
51+
library(bonsai)
52+
53+
rand_forest() %>%
54+
set_engine("aorsf") %>%
55+
set_mode("regression") %>%
56+
translate()
57+
```
58+
59+
```
60+
## Random Forest Model Specification (regression)
61+
##
62+
## Computational engine: aorsf
63+
##
64+
## Model fit template:
65+
## aorsf::orsf(formula = missing_arg(), data = missing_arg(), weights = missing_arg(),
66+
## n_thread = 1, verbose_progress = FALSE)
67+
```
68+
69+
## Translation from parsnip to the original package (classification)
70+
71+
The **bonsai** extension package is required to fit this model.
72+
73+
74+
```r
75+
library(bonsai)
76+
77+
rand_forest() %>%
78+
set_engine("aorsf") %>%
79+
set_mode("classification") %>%
80+
translate()
81+
```
82+
83+
```
84+
## Random Forest Model Specification (classification)
85+
##
86+
## Computational engine: aorsf
87+
##
88+
## Model fit template:
89+
## aorsf::orsf(formula = missing_arg(), data = missing_arg(), weights = missing_arg(),
90+
## n_thread = 1, verbose_progress = FALSE)
91+
```
92+
4693
## Preprocessing requirements
4794

4895

@@ -59,6 +106,8 @@ The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that e
59106

60107
Predictions of survival probability at a time exceeding the maximum observed event time are the predicted survival probability at the maximum observed time in the training data.
61108

109+
The class predict method in `aorsf` uses the standard 'each tree gets one vote' approach, which is usually but not always consistent with the picking the class that has highest predicted probability. It is okay for this inconsistency to occur in `aorsf` because it is intentionally applying the traditional class prediction method for random forests, but in `tidymodels` it is preferable to embrace consistency. Thus, we opted to make predicted probability consistent with predicted class all the time by making the predicted class a function of predicted probability (see [tidymodels/bonsai#78](https://github.com/tidymodels/bonsai/pull/78)).
110+
62111
## References
63112

64113
- Jaeger BC, Long DL, Long DM, Sims M, Szychowski JM, Min YI, Mcclure LA, Howard G, Simon N. Oblique random survival forests. Annals of applied statistics 2019 Sep; 13(3):1847-83. DOI: 10.1214/19-AOAS1261

0 commit comments

Comments
 (0)