draft boost_tree(engine = "lightgbm") docs

simonpcouch · simonpcouch · commit 06d6f75ac714 · 2022-05-10T15:04:53.000-04:00
diff --git a/R/engine_docs.R b/R/engine_docs.R
@@ -49,7 +49,7 @@ knit_engine_docs <- function(pattern = NULL) {
 
 extensions <- function() {
   c("baguette", "censored", "discrim", "multilevelmod", "plsmod",
-    "poissonreg", "rules")
+    "poissonreg", "rules", "bonsai")
 }
 
 # ------------------------------------------------------------------------------
diff --git a/man/rmd/boost_tree_lightgbm.Rmd b/man/rmd/boost_tree_lightgbm.Rmd
@@ -0,0 +1,83 @@
+```{r, child = "aaa.Rmd", include = FALSE}
+```
+
+`r descr_models("boost_tree", "lightgbm")`
+
+## Tuning Parameters
+
+```{r lightgbm-param-info, echo = FALSE}
+defaults <- 
+  tibble::tibble(parsnip = c("mtry", "trees", "tree_depth", "learn_rate", "min_n",  "loss_reduction"),
+                 default = c("see below", 100L, -1, 0.1, 20, 0))
+
+# For this model, this is the same for all modes
+param <-
+ boost_tree() %>% 
+  set_engine("lightgbm") %>% 
+  set_mode("regression") %>% 
+  make_parameter_list(defaults)
+```
+
+This model has `r nrow(param)` tuning parameters:
+
+```{r lightgbm-param-list, echo = FALSE, results = "asis"}
+param$item
+```
+
+The `mtry` parameter gives the _number_ of predictors that will be randomly sampled at each split. The default is to use all predictors. 
+
+Rather than as a number, [lightgbm::lgb.train()]'s `feature_fraction` argument encodes `mtry` as the _proportion_ of predictors that will be randomly sampled at each split. parsnip translates `mtry`, supplied as the _number_ of predictors, to a proportion under the hood. That is, the user should still supply the argument as `mtry` to `boost_tree()`, and do so in its sense as a number rather than a proportion; before passing `mtry` to [lightgbm::lgb.train()], parsnip will convert the `mtry` value to a proportion. 
+
+Note that parsnip's translation can be overridden via the `counts` argument, supplied to `set_engine()`. By default, `counts` is set to `TRUE`, but supplying the argument `counts = FALSE` allows the user to supply `mtry` as a proportion rather than a number.
+
+## Translation from parsnip to the original package (regression)
+
+`r uses_extension("boost_tree", "lightgbm", "regression")`
+
+```{r lightgbm-reg}
+boost_tree(
+  mtry = integer(), trees = integer(), tree_depth = integer(), 
+  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
+) %>%
+  set_engine("lightgbm") %>%
+  set_mode("regression") %>%
+  translate()
+```
+
+## Translation from parsnip to the original package (classification)
+
+`r uses_extension("boost_tree", "lightgbm", "classification")`
+
+```{r lightgbm-cls}
+boost_tree(
+  mtry = integer(), trees = integer(), tree_depth = integer(), 
+  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
+) %>% 
+  set_engine("lightgbm") %>% 
+  set_mode("classification") %>% 
+  translate()
+```
+
+[train_lightgbm()] is a wrapper around [lightgbm::lgb.train()] (and other functions) that make it easier to run this model. 
+
+## Other details
+
+### Preprocessing
+
+```{r child = "template-tree-split-factors.Rmd"}
+```
+
+Non-numeric predictors (i.e., factors) are internally converted to numeric. In the classification context, non-numeric outcomes (i.e., factors) are also internally converted to numeric. 
+
+### Verbosity
+
+bonsai quiets much of the logging output from [lightgbm::lgb.train()] by default. With default settings, logged warnings and errors will still be passed on to the user. To print out all logs during training, set `quiet = TRUE`.
+
+## Examples 
+
+<!-- TODO: update url to bonsai pkgdown site -->
+The "Introduction to bonsai" article contains [examples](https://github.com/tidymodels/bonsai) of `boost_tree()` with the `"lightgbm"` engine.
+
+## References
+
+ - [LightGBM: A Highly Efficient Gradient Boosting Decision Tree](https://papers.nips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html)
diff --git a/man/rmd/boost_tree_lightgbm.md b/man/rmd/boost_tree_lightgbm.md
@@ -0,0 +1,122 @@
+
+
+
+For this engine, there are multiple modes: regression and classification
+
+## Tuning Parameters
+
+
+
+This model has 6 tuning parameters:
+
+- `tree_depth`: Tree Depth (type: integer, default: -1)
+
+- `trees`: # Trees (type: integer, default: 100)
+
+- `learn_rate`: Learning Rate (type: double, default: 0.1)
+
+- `mtry`: # Randomly Selected Predictors (type: integer, default: see below)
+
+- `min_n`: Minimal Node Size (type: integer, default: 20)
+
+- `loss_reduction`: Minimum Loss Reduction (type: double, default: 0)
+
+The `mtry` parameter gives the _number_ of predictors that will be randomly sampled at each split. The default is to use all predictors. 
+
+Rather than as a number, [lightgbm::lgb.train()]'s `feature_fraction` argument encodes `mtry` as the _proportion_ of predictors that will be randomly sampled at each split. parsnip translates `mtry`, supplied as the _number_ of predictors, to a proportion under the hood. That is, the user should still supply the argument as `mtry` to `boost_tree()`, and do so in its sense as a number rather than a proportion; before passing `mtry` to [lightgbm::lgb.train()], parsnip will convert the `mtry` value to a proportion. 
+
+Note that parsnip's translation can be overridden via the `counts` argument, supplied to `set_engine()`. By default, `counts` is set to `TRUE`, but supplying the argument `counts = FALSE` allows the user to supply `mtry` as a proportion rather than a number.
+
+## Translation from parsnip to the original package (regression)
+
+
+
+
+```r
+boost_tree(
+  mtry = integer(), trees = integer(), tree_depth = integer(), 
+  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
+) %>%
+  set_engine("lightgbm") %>%
+  set_mode("regression") %>%
+  translate()
+```
+
+```
+## Boosted Tree Model Specification (regression)
+## 
+## Main Arguments:
+##   mtry = integer()
+##   trees = integer()
+##   min_n = integer()
+##   tree_depth = integer()
+##   learn_rate = numeric()
+##   loss_reduction = numeric()
+## 
+## Computational engine: lightgbm 
+## 
+## Model fit template:
+## bonsai::train_lightgbm(x = missing_arg(), y = missing_arg(), 
+##     feature_fraction = integer(), num_iterations = integer(), 
+##     min_data_in_leaf = integer(), max_depth = integer(), learning_rate = numeric(), 
+##     min_gain_to_split = numeric(), verbose = -1)
+```
+
+## Translation from parsnip to the original package (classification)
+
+
+
+
+```r
+boost_tree(
+  mtry = integer(), trees = integer(), tree_depth = integer(), 
+  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
+) %>% 
+  set_engine("lightgbm") %>% 
+  set_mode("classification") %>% 
+  translate()
+```
+
+```
+## Boosted Tree Model Specification (classification)
+## 
+## Main Arguments:
+##   mtry = integer()
+##   trees = integer()
+##   min_n = integer()
+##   tree_depth = integer()
+##   learn_rate = numeric()
+##   loss_reduction = numeric()
+## 
+## Computational engine: lightgbm 
+## 
+## Model fit template:
+## bonsai::train_lightgbm(x = missing_arg(), y = missing_arg(), 
+##     feature_fraction = integer(), num_iterations = integer(), 
+##     min_data_in_leaf = integer(), max_depth = integer(), learning_rate = numeric(), 
+##     min_gain_to_split = numeric(), verbose = -1)
+```
+
+[train_lightgbm()] is a wrapper around [lightgbm::lgb.train()] (and other functions) that make it easier to run this model. 
+
+## Other details
+
+### Preprocessing
+
+
+This engine does not require any special encoding of the predictors. Categorical predictors can be partitioned into groups of factor levels (e.g. `{a, c}` vs `{b, d}`) when splitting at a node. Dummy variables are not required for this model. 
+
+Non-numeric predictors (i.e., factors) are internally converted to numeric. In the classification context, non-numeric outcomes (i.e., factors) are also internally converted to numeric. 
+
+### Verbosity
+
+bonsai quiets much of the logging output from [lightgbm::lgb.train()] by default. With default settings, logged warnings and errors will still be passed on to the user. To print out all logs during training, set `quiet = TRUE`.
+
+## Examples 
+
+<!-- TODO: update url to bonsai pkgdown site -->
+The "Introduction to bonsai" article contains [examples](https://github.com/tidymodels/bonsai) of `boost_tree()` with the `"lightgbm"` engine.
+
+## References
+
+ - [LightGBM: A Highly Efficient Gradient Boosting Decision Tree](https://papers.nips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html)

Original file line number	Diff line number	Diff line change
`@@ -49,7 +49,7 @@ knit_engine_docs <- function(pattern = NULL) {`
`49`	`49`
`50`	`50`	`extensions <- function() {`
`51`	`51`	`c("baguette", "censored", "discrim", "multilevelmod", "plsmod",`
`52`		`- "poissonreg", "rules")`
	`52`	`+ "poissonreg", "rules", "bonsai")`
`53`	`53`	`}`
`54`	`54`
`55`	`55`	`# ------------------------------------------------------------------------------`