Skip to content

Commit 68a4a49

Browse files
authored
add stop_iter as a main argument to rule_fit() (#749)
1 parent 4473786 commit 68a4a49

File tree

8 files changed

+51
-16
lines changed

8 files changed

+51
-16
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
Package: parsnip
22
Title: A Common API to Modeling and Analysis Functions
3-
Version: 0.2.1.9002
3+
Version: 0.2.1.9003
44
Authors@R: c(
55
person("Max", "Kuhn", , "[email protected]", role = c("aut", "cre")),
66
person("Davis", "Vaughan", , "[email protected]", role = "aut"),

R/rule_fit.R

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ rule_fit <-
4040
tree_depth = NULL, learn_rate = NULL,
4141
loss_reduction = NULL,
4242
sample_size = NULL,
43+
stop_iter = NULL,
4344
penalty = NULL,
4445
engine = "xrf") {
4546

@@ -51,6 +52,7 @@ rule_fit <-
5152
learn_rate = enquo(learn_rate),
5253
loss_reduction = enquo(loss_reduction),
5354
sample_size = enquo(sample_size),
55+
stop_iter = enquo(stop_iter),
5456
penalty = enquo(penalty)
5557
)
5658

man/rmd/boost_tree_xgboost.Rmd

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -75,11 +75,8 @@ By default, the model is trained without parallel processing. This can be change
7575

7676
### Early stopping
7777

78-
The `stop_iter()` argument allows the model to prematurely stop training if the objective function does not improve within `early_stop` iterations.
79-
80-
The best way to use this feature is in conjunction with an _internal validation set_. To do this, pass the `validation` parameter of [xgb_train()] via the parsnip [set_engine()] function. This is the proportion of the training set that should be reserved for measuring performance (and stop early).
81-
82-
If the model specification has `early_stop >= trees`, `early_stop` is converted to `trees - 1` and a warning is issued.
78+
```{r child = "template-early-stopping.Rmd"}
79+
```
8380

8481
### Objective function
8582

man/rmd/boost_tree_xgboost.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,9 +130,10 @@ parsnip and its extensions accommodate this parameterization using the `counts`
130130

131131
### Early stopping
132132

133+
133134
The `stop_iter()` argument allows the model to prematurely stop training if the objective function does not improve within `early_stop` iterations.
134135

135-
The best way to use this feature is in conjunction with an _internal validation set_. To do this, pass the `validation` parameter of [xgb_train()] via the parsnip [set_engine()] function. This is the proportion of the training set that should be reserved for measuring performance (and stop early).
136+
The best way to use this feature is in conjunction with an _internal validation set_. To do this, pass the `validation` parameter of \\code{\\link[=xgb_train]{xgb_train()}} via the parsnip \\code{\\link[=set_engine]{set_engine()}} function. This is the proportion of the training set that should be reserved for measuring performance (and stopping early).
136137

137138
If the model specification has `early_stop >= trees`, `early_stop` is converted to `trees - 1` and a warning is issued.
138139

man/rmd/rule_fit_xrf.Rmd

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@
77

88
```{r xrf-param-info, echo = FALSE}
99
defaults <-
10-
tibble::tibble(parsnip = c("tree_depth", "trees", "learn_rate", "mtry", "min_n", "loss_reduction", "sample_size", "penalty"),
11-
default = c("6L", "15L", "0.3", "1.0", "1L", "0.0", "1.0", "0.1"))
10+
tibble::tibble(parsnip = c("tree_depth", "trees", "learn_rate", "mtry", "min_n", "loss_reduction", "sample_size", "stop_iter", "penalty"),
11+
default = c("6L", "15L", "0.3", "see below", "1L", "0.0", "1.0", "Inf", "0.1"))
1212
1313
param <-
1414
rule_fit() %>%
@@ -83,18 +83,23 @@ Also, there are several configuration differences in how `xrf()` is fit between
8383

8484
These differences will create a disparity in the values of the `penalty` argument that **glmnet** uses. Also, **rules** can also set `penalty` whereas **xrf** uses an internal 5-fold cross-validation to determine it (by default).
8585

86-
## Other details
87-
88-
### Preprocessing requirements
86+
## Preprocessing requirements
8987

9088
```{r child = "template-makes-dummies.Rmd"}
9189
```
9290

91+
## Other details
92+
9393
### Interpreting `mtry`
9494

9595
```{r child = "template-mtry-prop.Rmd"}
9696
```
9797

98+
### Early stopping
99+
100+
```{r child = "template-early-stopping.Rmd"}
101+
```
102+
98103
## Case weights
99104

100105
```{r child = "template-no-case-weights.Rmd"}

man/rmd/rule_fit_xrf.md

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ For this engine, there are multiple modes: classification and regression
99

1010
This model has 8 tuning parameters:
1111

12-
- `mtry`: Proportion Randomly Selected Predictors (type: double, default: 1.0)
12+
- `mtry`: Proportion Randomly Selected Predictors (type: double, default: see below)
1313

1414
- `trees`: # Trees (type: integer, default: 15L)
1515

@@ -65,7 +65,7 @@ rule_fit(
6565
## Computational engine: xrf
6666
##
6767
## Model fit template:
68-
## rules::xrf_fit(object = missing_arg(), data = missing_arg(),
68+
## rules::xrf_fit(formula = missing_arg(), data = missing_arg(),
6969
## colsample_bytree = numeric(1), nrounds = integer(1), min_child_weight = integer(1),
7070
## max_depth = integer(1), eta = numeric(1), gamma = numeric(1),
7171
## subsample = numeric(1), lambda = numeric(1))
@@ -111,7 +111,7 @@ rule_fit(
111111
## Computational engine: xrf
112112
##
113113
## Model fit template:
114-
## rules::xrf_fit(object = missing_arg(), data = missing_arg(),
114+
## rules::xrf_fit(formula = missing_arg(), data = missing_arg(),
115115
## colsample_bytree = numeric(1), nrounds = integer(1), min_child_weight = integer(1),
116116
## max_depth = integer(1), eta = numeric(1), gamma = numeric(1),
117117
## subsample = numeric(1), lambda = numeric(1))
@@ -134,9 +134,30 @@ These differences will create a disparity in the values of the `penalty` argumen
134134

135135
## Preprocessing requirements
136136

137-
138137
Factor/categorical predictors need to be converted to numeric values (e.g., dummy or indicator variables) for this engine. When using the formula method via \\code{\\link[=fit.model_spec]{fit()}}, parsnip will convert factor columns to indicators.
139138

139+
## Other details
140+
141+
### Interpreting `mtry`
142+
143+
144+
The `mtry` argument denotes the number of predictors that will be randomly sampled at each split when creating tree models.
145+
146+
Some engines, such as `"xgboost"`, `"xrf"`, and `"lightgbm"`, interpret their analogue to the `mtry` argument as the _proportion_ of predictors that will be randomly sampled at each split rather than the _count_. In some settings, such as when tuning over preprocessors that influence the number of predictors, this parameterization is quite helpful---interpreting `mtry` as a proportion means that $[0, 1]$ is always a valid range for that parameter, regardless of input data.
147+
148+
parsnip and its extensions accommodate this parameterization using the `counts` argument: a logical indicating whether `mtry` should be interpreted as the number of predictors that will be randomly sampled at each split. `TRUE` indicates that `mtry` will be interpreted in its sense as a count, `FALSE` indicates that the argument will be interpreted in its sense as a proportion.
149+
150+
`mtry` is a main model argument for \\code{\\link[=boost_tree]{boost_tree()}} and \\code{\\link[=rand_forest]{rand_forest()}}, and thus should not have an engine-specific interface. So, regardless of engine, `counts` defaults to `TRUE`. For engines that support the proportion interpretation---currently `"xgboost"`, `"xrf"` (via the rules package), and `"lightgbm"` (via the bonsai package)---the user can pass the `counts = FALSE` argument to `set_engine()` to supply `mtry` values within $[0, 1]$.
151+
152+
### Early stopping
153+
154+
155+
The `stop_iter()` argument allows the model to prematurely stop training if the objective function does not improve within `early_stop` iterations.
156+
157+
The best way to use this feature is in conjunction with an _internal validation set_. To do this, pass the `validation` parameter of \\code{\\link[=xgb_train]{xgb_train()}} via the parsnip \\code{\\link[=set_engine]{set_engine()}} function. This is the proportion of the training set that should be reserved for measuring performance (and stopping early).
158+
159+
If the model specification has `early_stop >= trees`, `early_stop` is converted to `trees - 1` and a warning is issued.
160+
140161
## Case weights
141162

142163

man/rmd/template-early-stopping.Rmd

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
The `stop_iter()` argument allows the model to prematurely stop training if the objective function does not improve within `early_stop` iterations.
2+
3+
The best way to use this feature is in conjunction with an _internal validation set_. To do this, pass the `validation` parameter of \\code{\\link[=xgb_train]{xgb_train()}} via the parsnip \\code{\\link[=set_engine]{set_engine()}} function. This is the proportion of the training set that should be reserved for measuring performance (and stopping early).
4+
5+
If the model specification has `early_stop >= trees`, `early_stop` is converted to `trees - 1` and a warning is issued.

man/rule_fit.Rd

Lines changed: 4 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)