Skip to content

extra docs for new engines #1120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion R/rand_forest_aorsf.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Oblique random survival forests via aorsf
#'
#' [aorsf::orsf()] fits a model that creates a large number of decision
#' [aorsf::orsf()] fits a model that creates a large number of oblique decision
#' trees, each de-correlated from the others. The final prediction uses all
#' predictions from the individual trees and combines them.
#'
Expand Down
2 changes: 2 additions & 0 deletions inst/models.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -107,11 +107,13 @@
"proportional_hazards" "censored regression" "survival" "censored"
"rand_forest" "censored regression" "aorsf" "censored"
"rand_forest" "censored regression" "partykit" "censored"
"rand_forest" "classification" "aorsf" "bonsai"
"rand_forest" "classification" "h2o" "agua"
"rand_forest" "classification" "partykit" "bonsai"
"rand_forest" "classification" "randomForest" NA
"rand_forest" "classification" "ranger" NA
"rand_forest" "classification" "spark" NA
"rand_forest" "regression" "aorsf" "bonsai"
"rand_forest" "regression" "h2o" "agua"
"rand_forest" "regression" "partykit" "bonsai"
"rand_forest" "regression" "randomForest" NA
Expand Down
75 changes: 68 additions & 7 deletions man/details_rand_forest_aorsf.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

33 changes: 30 additions & 3 deletions man/rmd/rand_forest_aorsf.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,9 @@ param$item

Additionally, this model has one engine-specific tuning parameter:

* `split_min_stat`: Minimum test statistic required to split a node. Default is `3.841459` for the log-rank test, which is roughly a p-value of 0.05.
* `split_min_stat`: Minimum test statistic required to split a node. Defaults are `3.841459` for censored regression (which is roughly a p-value of 0.05) and `0` for classification and regression. For classification, this tuning parameter should be between 0 and 1, and for regression it should be greater than or equal to 0. Higher values of this parameter cause trees grown by `aorsf` to have less depth.


# Translation from parsnip to the original package (censored regression)
## Translation from parsnip to the original package (censored regression)

`r uses_extension("rand_forest", "aorsf", "censored regression")`

Expand All @@ -42,6 +41,32 @@ rand_forest() %>%
translate()
```

## Translation from parsnip to the original package (regression)

`r uses_extension("rand_forest", "aorsf", "regression")`

```{r aorsf-reg}
library(bonsai)

rand_forest() %>%
set_engine("aorsf") %>%
set_mode("regression") %>%
translate()
```

## Translation from parsnip to the original package (classification)

`r uses_extension("rand_forest", "aorsf", "classification")`

```{r aorsf-class}
library(bonsai)

rand_forest() %>%
set_engine("aorsf") %>%
set_mode("classification") %>%
translate()
```

## Preprocessing requirements

```{r child = "template-tree-split-factors.Rmd"}
Expand All @@ -56,6 +81,8 @@ rand_forest() %>%

Predictions of survival probability at a time exceeding the maximum observed event time are the predicted survival probability at the maximum observed time in the training data.

The class predict method in `aorsf` uses the standard 'each tree gets one vote' approach, which is usually but not always consistent with the picking the class that has highest predicted probability. It is okay for this inconsistency to occur in `aorsf` because it is intentionally applying the traditional class prediction method for random forests, but in `tidymodels` it is preferable to embrace consistency. Thus, we opted to make predicted probability consistent with predicted class all the time by making the predicted class a function of predicted probability (see [tidymodels/bonsai#78](https://github.com/tidymodels/bonsai/pull/78)).

## References

- Jaeger BC, Long DL, Long DM, Sims M, Szychowski JM, Min YI, Mcclure LA, Howard G, Simon N. Oblique random survival forests. Annals of applied statistics 2019 Sep; 13(3):1847-83. DOI: 10.1214/19-AOAS1261
Expand Down
57 changes: 53 additions & 4 deletions man/rmd/rand_forest_aorsf.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@



For this engine, there is a single mode: censored regression
For this engine, there are multiple modes: censored regression, classification, and regression

## Tuning Parameters

Expand All @@ -17,10 +17,9 @@ This model has 3 tuning parameters:

Additionally, this model has one engine-specific tuning parameter:

* `split_min_stat`: Minimum test statistic required to split a node. Default is `3.841459` for the log-rank test, which is roughly a p-value of 0.05.
* `split_min_stat`: Minimum test statistic required to split a node. Defaults are `3.841459` for censored regression (which is roughly a p-value of 0.05) and `0` for classification and regression. For classification, this tuning parameter should be between 0 and 1, and for regression it should be greater than or equal to 0. Higher values of this parameter cause trees grown by `aorsf` to have less depth.


# Translation from parsnip to the original package (censored regression)
## Translation from parsnip to the original package (censored regression)

The **censored** extension package is required to fit this model.

Expand All @@ -43,6 +42,54 @@ rand_forest() %>%
## aorsf::orsf(formula = missing_arg(), data = missing_arg(), weights = missing_arg())
```

## Translation from parsnip to the original package (regression)

The **bonsai** extension package is required to fit this model.


```r
library(bonsai)

rand_forest() %>%
set_engine("aorsf") %>%
set_mode("regression") %>%
translate()
```

```
## Random Forest Model Specification (regression)
##
## Computational engine: aorsf
##
## Model fit template:
## aorsf::orsf(formula = missing_arg(), data = missing_arg(), weights = missing_arg(),
## n_thread = 1, verbose_progress = FALSE)
```

## Translation from parsnip to the original package (classification)

The **bonsai** extension package is required to fit this model.


```r
library(bonsai)

rand_forest() %>%
set_engine("aorsf") %>%
set_mode("classification") %>%
translate()
```

```
## Random Forest Model Specification (classification)
##
## Computational engine: aorsf
##
## Model fit template:
## aorsf::orsf(formula = missing_arg(), data = missing_arg(), weights = missing_arg(),
## n_thread = 1, verbose_progress = FALSE)
```

## Preprocessing requirements


Expand All @@ -59,6 +106,8 @@ The `fit()` and `fit_xy()` arguments have arguments called `case_weights` that e

Predictions of survival probability at a time exceeding the maximum observed event time are the predicted survival probability at the maximum observed time in the training data.

The class predict method in `aorsf` uses the standard 'each tree gets one vote' approach, which is usually but not always consistent with the picking the class that has highest predicted probability. It is okay for this inconsistency to occur in `aorsf` because it is intentionally applying the traditional class prediction method for random forests, but in `tidymodels` it is preferable to embrace consistency. Thus, we opted to make predicted probability consistent with predicted class all the time by making the predicted class a function of predicted probability (see [tidymodels/bonsai#78](https://github.com/tidymodels/bonsai/pull/78)).

## References

- Jaeger BC, Long DL, Long DM, Sims M, Szychowski JM, Min YI, Mcclure LA, Howard G, Simon N. Oblique random survival forests. Annals of applied statistics 2019 Sep; 13(3):1847-83. DOI: 10.1214/19-AOAS1261
Expand Down
Loading