Skip to content

Commit 06d6f75

Browse files
committed
draft boost_tree(engine = "lightgbm") docs
1 parent c6e97e6 commit 06d6f75

File tree

3 files changed

+206
-1
lines changed

3 files changed

+206
-1
lines changed

R/engine_docs.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ knit_engine_docs <- function(pattern = NULL) {
4949

5050
extensions <- function() {
5151
c("baguette", "censored", "discrim", "multilevelmod", "plsmod",
52-
"poissonreg", "rules")
52+
"poissonreg", "rules", "bonsai")
5353
}
5454

5555
# ------------------------------------------------------------------------------

man/rmd/boost_tree_lightgbm.Rmd

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
```{r, child = "aaa.Rmd", include = FALSE}
2+
```
3+
4+
`r descr_models("boost_tree", "lightgbm")`
5+
6+
## Tuning Parameters
7+
8+
```{r lightgbm-param-info, echo = FALSE}
9+
defaults <-
10+
tibble::tibble(parsnip = c("mtry", "trees", "tree_depth", "learn_rate", "min_n", "loss_reduction"),
11+
default = c("see below", 100L, -1, 0.1, 20, 0))
12+
13+
# For this model, this is the same for all modes
14+
param <-
15+
boost_tree() %>%
16+
set_engine("lightgbm") %>%
17+
set_mode("regression") %>%
18+
make_parameter_list(defaults)
19+
```
20+
21+
This model has `r nrow(param)` tuning parameters:
22+
23+
```{r lightgbm-param-list, echo = FALSE, results = "asis"}
24+
param$item
25+
```
26+
27+
The `mtry` parameter gives the _number_ of predictors that will be randomly sampled at each split. The default is to use all predictors.
28+
29+
Rather than as a number, [lightgbm::lgb.train()]'s `feature_fraction` argument encodes `mtry` as the _proportion_ of predictors that will be randomly sampled at each split. parsnip translates `mtry`, supplied as the _number_ of predictors, to a proportion under the hood. That is, the user should still supply the argument as `mtry` to `boost_tree()`, and do so in its sense as a number rather than a proportion; before passing `mtry` to [lightgbm::lgb.train()], parsnip will convert the `mtry` value to a proportion.
30+
31+
Note that parsnip's translation can be overridden via the `counts` argument, supplied to `set_engine()`. By default, `counts` is set to `TRUE`, but supplying the argument `counts = FALSE` allows the user to supply `mtry` as a proportion rather than a number.
32+
33+
## Translation from parsnip to the original package (regression)
34+
35+
`r uses_extension("boost_tree", "lightgbm", "regression")`
36+
37+
```{r lightgbm-reg}
38+
boost_tree(
39+
mtry = integer(), trees = integer(), tree_depth = integer(),
40+
learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
41+
) %>%
42+
set_engine("lightgbm") %>%
43+
set_mode("regression") %>%
44+
translate()
45+
```
46+
47+
## Translation from parsnip to the original package (classification)
48+
49+
`r uses_extension("boost_tree", "lightgbm", "classification")`
50+
51+
```{r lightgbm-cls}
52+
boost_tree(
53+
mtry = integer(), trees = integer(), tree_depth = integer(),
54+
learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
55+
) %>%
56+
set_engine("lightgbm") %>%
57+
set_mode("classification") %>%
58+
translate()
59+
```
60+
61+
[train_lightgbm()] is a wrapper around [lightgbm::lgb.train()] (and other functions) that make it easier to run this model.
62+
63+
## Other details
64+
65+
### Preprocessing
66+
67+
```{r child = "template-tree-split-factors.Rmd"}
68+
```
69+
70+
Non-numeric predictors (i.e., factors) are internally converted to numeric. In the classification context, non-numeric outcomes (i.e., factors) are also internally converted to numeric.
71+
72+
### Verbosity
73+
74+
bonsai quiets much of the logging output from [lightgbm::lgb.train()] by default. With default settings, logged warnings and errors will still be passed on to the user. To print out all logs during training, set `quiet = TRUE`.
75+
76+
## Examples
77+
78+
<!-- TODO: update url to bonsai pkgdown site -->
79+
The "Introduction to bonsai" article contains [examples](https://github.com/tidymodels/bonsai) of `boost_tree()` with the `"lightgbm"` engine.
80+
81+
## References
82+
83+
- [LightGBM: A Highly Efficient Gradient Boosting Decision Tree](https://papers.nips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html)

man/rmd/boost_tree_lightgbm.md

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
2+
3+
4+
For this engine, there are multiple modes: regression and classification
5+
6+
## Tuning Parameters
7+
8+
9+
10+
This model has 6 tuning parameters:
11+
12+
- `tree_depth`: Tree Depth (type: integer, default: -1)
13+
14+
- `trees`: # Trees (type: integer, default: 100)
15+
16+
- `learn_rate`: Learning Rate (type: double, default: 0.1)
17+
18+
- `mtry`: # Randomly Selected Predictors (type: integer, default: see below)
19+
20+
- `min_n`: Minimal Node Size (type: integer, default: 20)
21+
22+
- `loss_reduction`: Minimum Loss Reduction (type: double, default: 0)
23+
24+
The `mtry` parameter gives the _number_ of predictors that will be randomly sampled at each split. The default is to use all predictors.
25+
26+
Rather than as a number, [lightgbm::lgb.train()]'s `feature_fraction` argument encodes `mtry` as the _proportion_ of predictors that will be randomly sampled at each split. parsnip translates `mtry`, supplied as the _number_ of predictors, to a proportion under the hood. That is, the user should still supply the argument as `mtry` to `boost_tree()`, and do so in its sense as a number rather than a proportion; before passing `mtry` to [lightgbm::lgb.train()], parsnip will convert the `mtry` value to a proportion.
27+
28+
Note that parsnip's translation can be overridden via the `counts` argument, supplied to `set_engine()`. By default, `counts` is set to `TRUE`, but supplying the argument `counts = FALSE` allows the user to supply `mtry` as a proportion rather than a number.
29+
30+
## Translation from parsnip to the original package (regression)
31+
32+
33+
34+
35+
```r
36+
boost_tree(
37+
mtry = integer(), trees = integer(), tree_depth = integer(),
38+
learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
39+
) %>%
40+
set_engine("lightgbm") %>%
41+
set_mode("regression") %>%
42+
translate()
43+
```
44+
45+
```
46+
## Boosted Tree Model Specification (regression)
47+
##
48+
## Main Arguments:
49+
## mtry = integer()
50+
## trees = integer()
51+
## min_n = integer()
52+
## tree_depth = integer()
53+
## learn_rate = numeric()
54+
## loss_reduction = numeric()
55+
##
56+
## Computational engine: lightgbm
57+
##
58+
## Model fit template:
59+
## bonsai::train_lightgbm(x = missing_arg(), y = missing_arg(),
60+
## feature_fraction = integer(), num_iterations = integer(),
61+
## min_data_in_leaf = integer(), max_depth = integer(), learning_rate = numeric(),
62+
## min_gain_to_split = numeric(), verbose = -1)
63+
```
64+
65+
## Translation from parsnip to the original package (classification)
66+
67+
68+
69+
70+
```r
71+
boost_tree(
72+
mtry = integer(), trees = integer(), tree_depth = integer(),
73+
learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
74+
) %>%
75+
set_engine("lightgbm") %>%
76+
set_mode("classification") %>%
77+
translate()
78+
```
79+
80+
```
81+
## Boosted Tree Model Specification (classification)
82+
##
83+
## Main Arguments:
84+
## mtry = integer()
85+
## trees = integer()
86+
## min_n = integer()
87+
## tree_depth = integer()
88+
## learn_rate = numeric()
89+
## loss_reduction = numeric()
90+
##
91+
## Computational engine: lightgbm
92+
##
93+
## Model fit template:
94+
## bonsai::train_lightgbm(x = missing_arg(), y = missing_arg(),
95+
## feature_fraction = integer(), num_iterations = integer(),
96+
## min_data_in_leaf = integer(), max_depth = integer(), learning_rate = numeric(),
97+
## min_gain_to_split = numeric(), verbose = -1)
98+
```
99+
100+
[train_lightgbm()] is a wrapper around [lightgbm::lgb.train()] (and other functions) that make it easier to run this model.
101+
102+
## Other details
103+
104+
### Preprocessing
105+
106+
107+
This engine does not require any special encoding of the predictors. Categorical predictors can be partitioned into groups of factor levels (e.g. `{a, c}` vs `{b, d}`) when splitting at a node. Dummy variables are not required for this model.
108+
109+
Non-numeric predictors (i.e., factors) are internally converted to numeric. In the classification context, non-numeric outcomes (i.e., factors) are also internally converted to numeric.
110+
111+
### Verbosity
112+
113+
bonsai quiets much of the logging output from [lightgbm::lgb.train()] by default. With default settings, logged warnings and errors will still be passed on to the user. To print out all logs during training, set `quiet = TRUE`.
114+
115+
## Examples
116+
117+
<!-- TODO: update url to bonsai pkgdown site -->
118+
The "Introduction to bonsai" article contains [examples](https://github.com/tidymodels/bonsai) of `boost_tree()` with the `"lightgbm"` engine.
119+
120+
## References
121+
122+
- [LightGBM: A Highly Efficient Gradient Boosting Decision Tree](https://papers.nips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html)

0 commit comments

Comments
 (0)