You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: man/rmd/boost_tree_xgboost.Rmd
+2-5Lines changed: 2 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -75,11 +75,8 @@ By default, the model is trained without parallel processing. This can be change
75
75
76
76
### Early stopping
77
77
78
-
The `stop_iter()` argument allows the model to prematurely stop training if the objective function does not improve within `early_stop` iterations.
79
-
80
-
The best way to use this feature is in conjunction with an _internal validation set_. To do this, pass the `validation` parameter of [xgb_train()] via the parsnip [set_engine()] function. This is the proportion of the training set that should be reserved for measuring performance (and stop early).
81
-
82
-
If the model specification has `early_stop >= trees`, `early_stop` is converted to `trees - 1` and a warning is issued.
Copy file name to clipboardExpand all lines: man/rmd/boost_tree_xgboost.md
+2-1Lines changed: 2 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -130,9 +130,10 @@ parsnip and its extensions accommodate this parameterization using the `counts`
130
130
131
131
### Early stopping
132
132
133
+
133
134
The `stop_iter()` argument allows the model to prematurely stop training if the objective function does not improve within `early_stop` iterations.
134
135
135
-
The best way to use this feature is in conjunction with an _internal validation set_. To do this, pass the `validation` parameter of [xgb_train()] via the parsnip [set_engine()] function. This is the proportion of the training set that should be reserved for measuring performance (and stop early).
136
+
The best way to use this feature is in conjunction with an _internal validation set_. To do this, pass the `validation` parameter of \\code{\\link[=xgb_train]{xgb_train()}} via the parsnip \\code{\\link[=set_engine]{set_engine()}} function. This is the proportion of the training set that should be reserved for measuring performance (and stopping early).
136
137
137
138
If the model specification has `early_stop >= trees`, `early_stop` is converted to `trees - 1` and a warning is issued.
@@ -83,18 +83,23 @@ Also, there are several configuration differences in how `xrf()` is fit between
83
83
84
84
These differences will create a disparity in the values of the `penalty` argument that **glmnet** uses. Also, **rules** can also set `penalty` whereas **xrf** uses an internal 5-fold cross-validation to determine it (by default).
## max_depth = integer(1), eta = numeric(1), gamma = numeric(1),
117
117
## subsample = numeric(1), lambda = numeric(1))
@@ -134,9 +134,30 @@ These differences will create a disparity in the values of the `penalty` argumen
134
134
135
135
## Preprocessing requirements
136
136
137
-
138
137
Factor/categorical predictors need to be converted to numeric values (e.g., dummy or indicator variables) for this engine. When using the formula method via \\code{\\link[=fit.model_spec]{fit()}}, parsnip will convert factor columns to indicators.
139
138
139
+
## Other details
140
+
141
+
### Interpreting `mtry`
142
+
143
+
144
+
The `mtry` argument denotes the number of predictors that will be randomly sampled at each split when creating tree models.
145
+
146
+
Some engines, such as `"xgboost"`, `"xrf"`, and `"lightgbm"`, interpret their analogue to the `mtry` argument as the _proportion_ of predictors that will be randomly sampled at each split rather than the _count_. In some settings, such as when tuning over preprocessors that influence the number of predictors, this parameterization is quite helpful---interpreting `mtry` as a proportion means that $[0, 1]$ is always a valid range for that parameter, regardless of input data.
147
+
148
+
parsnip and its extensions accommodate this parameterization using the `counts` argument: a logical indicating whether `mtry` should be interpreted as the number of predictors that will be randomly sampled at each split. `TRUE` indicates that `mtry` will be interpreted in its sense as a count, `FALSE` indicates that the argument will be interpreted in its sense as a proportion.
149
+
150
+
`mtry` is a main model argument for \\code{\\link[=boost_tree]{boost_tree()}} and \\code{\\link[=rand_forest]{rand_forest()}}, and thus should not have an engine-specific interface. So, regardless of engine, `counts` defaults to `TRUE`. For engines that support the proportion interpretation---currently `"xgboost"`, `"xrf"` (via the rules package), and `"lightgbm"` (via the bonsai package)---the user can pass the `counts = FALSE` argument to `set_engine()` to supply `mtry` values within $[0, 1]$.
151
+
152
+
### Early stopping
153
+
154
+
155
+
The `stop_iter()` argument allows the model to prematurely stop training if the objective function does not improve within `early_stop` iterations.
156
+
157
+
The best way to use this feature is in conjunction with an _internal validation set_. To do this, pass the `validation` parameter of \\code{\\link[=xgb_train]{xgb_train()}} via the parsnip \\code{\\link[=set_engine]{set_engine()}} function. This is the proportion of the training set that should be reserved for measuring performance (and stopping early).
158
+
159
+
If the model specification has `early_stop >= trees`, `early_stop` is converted to `trees - 1` and a warning is issued.
The `stop_iter()` argument allows the model to prematurely stop training if the objective function does not improve within `early_stop` iterations.
2
+
3
+
The best way to use this feature is in conjunction with an _internal validation set_. To do this, pass the `validation` parameter of \\code{\\link[=xgb_train]{xgb_train()}} via the parsnip \\code{\\link[=set_engine]{set_engine()}} function. This is the proportion of the training set that should be reserved for measuring performance (and stopping early).
4
+
5
+
If the model specification has `early_stop >= trees`, `early_stop` is converted to `trees - 1` and a warning is issued.
0 commit comments