tidymodels · topepo · May 16, 2022 · May 10, 2022 · May 10, 2022 · May 13, 2022
diff --git a/R/engine_docs.R b/R/engine_docs.R
@@ -49,7 +49,7 @@ knit_engine_docs <- function(pattern = NULL) {
 
 extensions <- function() {
   c("baguette", "censored", "discrim", "multilevelmod", "plsmod",
-    "poissonreg", "rules")
+    "poissonreg", "rules", "bonsai")
 }
 
 # ------------------------------------------------------------------------------

diff --git a/R/translate.R b/R/translate.R
@@ -166,7 +166,8 @@ deharmonize <- function(args, key) {
   merged <-
     dplyr::left_join(parsn, key, by = "parsnip") %>%
     dplyr::arrange(order)
-  # TODO correct for bad merge?
+
+  merged <- merged[!duplicated(merged$order),]
 
   names(args) <- merged$original
   args[!is.na(merged$original)]

diff --git a/man/rmd/boost_tree_lightgbm.Rmd b/man/rmd/boost_tree_lightgbm.Rmd
@@ -0,0 +1,85 @@
+```{r, child = "aaa.Rmd", include = FALSE}
+```
+
+`r descr_models("boost_tree", "lightgbm")`
+
+## Tuning Parameters
+
+```{r lightgbm-param-info, echo = FALSE}
+defaults <- 
+  tibble::tibble(parsnip = c("mtry", "trees", "tree_depth", "learn_rate", "min_n",  "loss_reduction"),
+                 default = c("see below", 100L, -1, 0.1, 20, 0))
+
+# For this model, this is the same for all modes
+param <-
+ boost_tree() %>% 
+  set_engine("lightgbm") %>% 
+  set_mode("regression") %>% 
+  make_parameter_list(defaults)
+```
+
+This model has `r nrow(param)` tuning parameters:
+
+```{r lightgbm-param-list, echo = FALSE, results = "asis"}
+param$item
+```
+
+The `mtry` parameter gives the _number_ of predictors that will be randomly sampled at each split. The default is to use all predictors. 
+
+Rather than as a number, [lightgbm::lgb.train()]'s `feature_fraction` argument encodes `mtry` as the _proportion_ of predictors that will be randomly sampled at each split. parsnip translates `mtry`, supplied as the _number_ of predictors, to a proportion under the hood. That is, the user should still supply the argument as `mtry` to `boost_tree()`, and do so in its sense as a number rather than a proportion; before passing `mtry` to [lightgbm::lgb.train()], parsnip will convert the `mtry` value to a proportion. 
+
+Note that parsnip's translation can be overridden via the `counts` argument, supplied to `set_engine()`. By default, `counts` is set to `TRUE`, but supplying the argument `counts = FALSE` allows the user to supply `mtry` as a proportion rather than a number.
+
+## Translation from parsnip to the original package (regression)
+
+`r uses_extension("boost_tree", "lightgbm", "regression")`
+
+```{r lightgbm-reg}
+boost_tree(
+  mtry = integer(), trees = integer(), tree_depth = integer(), 
+  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
+) %>%
+  set_engine("lightgbm") %>%
+  set_mode("regression") %>%
+  translate()
+```
+
+## Translation from parsnip to the original package (classification)
+
+`r uses_extension("boost_tree", "lightgbm", "classification")`
+
+```{r lightgbm-cls}
+boost_tree(
+  mtry = integer(), trees = integer(), tree_depth = integer(), 
+  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
+) %>% 
+  set_engine("lightgbm") %>% 
+  set_mode("classification") %>% 
+  translate()
+```
+
+[train_lightgbm()] is a wrapper around [lightgbm::lgb.train()] (and other functions) that make it easier to run this model. 
+
+## Other details
+
+### Preprocessing
+
+```{r child = "template-tree-split-factors.Rmd"}
+```
+
+Non-numeric predictors (i.e., factors) are internally converted to numeric. In the classification context, non-numeric outcomes (i.e., factors) are also internally converted to numeric. 
+
+### Verbosity
+
+bonsai quiets much of the logging output from [lightgbm::lgb.train()] by default. With default settings, logged warnings and errors will still be passed on to the user. To print out all logs during training, set `quiet = TRUE`.
+
+## Examples 
+
+<!-- TODO: update url to bonsai pkgdown site -->
+The "Introduction to bonsai" article contains [examples](https://github.com/tidymodels/bonsai) of `boost_tree()` with the `"lightgbm"` engine.
+
+## References
+
+ - [LightGBM: A Highly Efficient Gradient Boosting Decision Tree](https://papers.nips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html)
+
+- Kuhn, M, and K Johnson. 2013. _Applied Predictive Modeling_. Springer.
diff --git a/man/rmd/boost_tree_lightgbm.md b/man/rmd/boost_tree_lightgbm.md
@@ -0,0 +1,124 @@
+
+
+
+For this engine, there are multiple modes: regression and classification
+
+## Tuning Parameters
+
+
+
+This model has 6 tuning parameters:
+
+- `tree_depth`: Tree Depth (type: integer, default: -1)
+
+- `trees`: # Trees (type: integer, default: 100)
+
+- `learn_rate`: Learning Rate (type: double, default: 0.1)
+
+- `mtry`: # Randomly Selected Predictors (type: integer, default: see below)
+
+- `min_n`: Minimal Node Size (type: integer, default: 20)
+
+- `loss_reduction`: Minimum Loss Reduction (type: double, default: 0)
+
+The `mtry` parameter gives the _number_ of predictors that will be randomly sampled at each split. The default is to use all predictors. 
+
+Rather than as a number, [lightgbm::lgb.train()]'s `feature_fraction` argument encodes `mtry` as the _proportion_ of predictors that will be randomly sampled at each split. parsnip translates `mtry`, supplied as the _number_ of predictors, to a proportion under the hood. That is, the user should still supply the argument as `mtry` to `boost_tree()`, and do so in its sense as a number rather than a proportion; before passing `mtry` to [lightgbm::lgb.train()], parsnip will convert the `mtry` value to a proportion. 
+
+Note that parsnip's translation can be overridden via the `counts` argument, supplied to `set_engine()`. By default, `counts` is set to `TRUE`, but supplying the argument `counts = FALSE` allows the user to supply `mtry` as a proportion rather than a number.
+
+## Translation from parsnip to the original package (regression)
+
+
+
+
+```r
+boost_tree(
+  mtry = integer(), trees = integer(), tree_depth = integer(), 
+  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
+) %>%
+  set_engine("lightgbm") %>%
+  set_mode("regression") %>%
+  translate()
+```
+
+```
+## Boosted Tree Model Specification (regression)
+## 
+## Main Arguments:
+##   mtry = integer()
+##   trees = integer()
+##   min_n = integer()
+##   tree_depth = integer()
+##   learn_rate = numeric()
+##   loss_reduction = numeric()
+## 
+## Computational engine: lightgbm 
+## 
+## Model fit template:
+## bonsai::train_lightgbm(x = missing_arg(), y = missing_arg(), 
+##     feature_fraction = integer(), num_iterations = integer(), 
+##     min_data_in_leaf = integer(), max_depth = integer(), learning_rate = numeric(), 
+##     min_gain_to_split = numeric(), verbose = -1)
+```
+
+## Translation from parsnip to the original package (classification)
+
+
+
+
+```r
+boost_tree(
+  mtry = integer(), trees = integer(), tree_depth = integer(), 
+  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
+) %>% 
+  set_engine("lightgbm") %>% 
+  set_mode("classification") %>% 
+  translate()
+```
+
+```
+## Boosted Tree Model Specification (classification)
+## 
+## Main Arguments:
+##   mtry = integer()
+##   trees = integer()
+##   min_n = integer()
+##   tree_depth = integer()
+##   learn_rate = numeric()
+##   loss_reduction = numeric()
+## 
+## Computational engine: lightgbm 
+## 
+## Model fit template:
+## bonsai::train_lightgbm(x = missing_arg(), y = missing_arg(), 
+##     feature_fraction = integer(), num_iterations = integer(), 
+##     min_data_in_leaf = integer(), max_depth = integer(), learning_rate = numeric(), 
+##     min_gain_to_split = numeric(), verbose = -1)
+```
+
+[train_lightgbm()] is a wrapper around [lightgbm::lgb.train()] (and other functions) that make it easier to run this model. 
+
+## Other details
+
+### Preprocessing
+
+
+This engine does not require any special encoding of the predictors. Categorical predictors can be partitioned into groups of factor levels (e.g. `{a, c}` vs `{b, d}`) when splitting at a node. Dummy variables are not required for this model. 
+
+Non-numeric predictors (i.e., factors) are internally converted to numeric. In the classification context, non-numeric outcomes (i.e., factors) are also internally converted to numeric. 
+
+### Verbosity
+
+bonsai quiets much of the logging output from [lightgbm::lgb.train()] by default. With default settings, logged warnings and errors will still be passed on to the user. To print out all logs during training, set `quiet = TRUE`.
+
+## Examples 
+
+<!-- TODO: update url to bonsai pkgdown site -->
+The "Introduction to bonsai" article contains [examples](https://github.com/tidymodels/bonsai) of `boost_tree()` with the `"lightgbm"` engine.
+
+## References
+
+ - [LightGBM: A Highly Efficient Gradient Boosting Decision Tree](https://papers.nips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html)
+
+- Kuhn, M, and K Johnson. 2013. _Applied Predictive Modeling_. Springer.
diff --git a/man/rmd/decision_tree_partykit.Rmd b/man/rmd/decision_tree_partykit.Rmd
@@ -0,0 +1,65 @@
+```{r, child = "aaa.Rmd", include = FALSE}
+```
+
+`r descr_models("decision_tree", "partykit")`
+
+## Tuning Parameters
+
+```{r partykit-param-info, echo = FALSE}
+defaults <- 
+  tibble::tibble(parsnip = c("tree_depth", "min_n"),
+                 default = c("see below", "20L"))
+
+param <-
+ decision_tree() %>% 
+  set_engine("partykit") %>% 
+  set_mode("regression") %>% 
+  make_parameter_list(defaults)
+```
+
+This model has `r nrow(param)` tuning parameters:
+
+```{r partykit-param-list, echo = FALSE, results = "asis"}
+param$item
+```
+
+The `tree_depth` parameter defaults to `0` which means no restrictions are applied to tree depth.
+
+An engine-specific parameter for this model is: 
+
+ * `mtry`: the number of predictors, selected at random, that are evaluated for splitting. The default is to use all predictors.
+
+## Translation from parsnip to the original package (regression)
+
+`r uses_extension("decision_tree", "partykit", "regression")`
+
+```{r partykit-creg}
+decision_tree(tree_depth = integer(1), min_n = integer(1)) %>% 
+  set_engine("partykit") %>% 
+  set_mode("regression") %>% 
+  translate()
+```
+
+## Translation from parsnip to the original package (classification)
+
+`r uses_extension("decision_tree", "partykit", "classification")`
+
+```{r partykit-class}
+decision_tree(tree_depth = integer(1), min_n = integer(1)) %>% 
+  set_engine("partykit") %>% 
+  set_mode("classification") %>% 
+  translate()
+```
+
+`parsnip::ctree_train()` is a wrapper around [partykit::ctree()] (and other functions) that makes it easier to run this model. 
+
+## Preprocessing requirements
+
+```{r child = "template-tree-split-factors.Rmd"}
+```
+
+## References
+
+ - [partykit: A Modular Toolkit for Recursive Partytioning in R](https://jmlr.org/papers/v16/hothorn15a.html)
+
+ - Kuhn, M, and K Johnson. 2013. _Applied Predictive Modeling_. Springer.
diff --git a/man/rmd/decision_tree_partykit.md b/man/rmd/decision_tree_partykit.md
@@ -0,0 +1,89 @@
+
+
+
+For this engine, there are multiple modes: censored regression, regression, and classification
+
+## Tuning Parameters
+
+
+
+This model has 3 tuning parameters:
+
+- `tree_depth`: Tree Depth (type: integer, default: see below)
+
+- `min_n`: Minimal Node Size (type: integer, default: 20L)
+
+- `min_n`: Minimal Node Size (type: integer, default: 20L)
+
+The `tree_depth` parameter defaults to `0` which means no restrictions are applied to tree depth.
+
+An engine-specific parameter for this model is: 
+
+ * `mtry`: the number of predictors, selected at random, that are evaluated for splitting. The default is to use all predictors.
+
+## Translation from parsnip to the original package (regression)
+
+
+
+
+```r
+decision_tree(tree_depth = integer(1), min_n = integer(1)) %>% 
+  set_engine("partykit") %>% 
+  set_mode("regression") %>% 
+  translate()
+```
+
+```
+## Decision Tree Model Specification (regression)
+## 
+## Main Arguments:
+##   tree_depth = integer(1)
+##   min_n = integer(1)
+## 
+## Computational engine: partykit 
+## 
+## Model fit template:
+## parsnip::ctree_train(formula = missing_arg(), data = missing_arg(), 
+##     weights = missing_arg(), maxdepth = integer(1), minsplit = min_rows(0L, 
+##         data))
+```
+
+## Translation from parsnip to the original package (classification)
+
+
+
+
+```r
+decision_tree(tree_depth = integer(1), min_n = integer(1)) %>% 
+  set_engine("partykit") %>% 
+  set_mode("classification") %>% 
+  translate()
+```
+
+```
+## Decision Tree Model Specification (classification)
+## 
+## Main Arguments:
+##   tree_depth = integer(1)
+##   min_n = integer(1)
+## 
+## Computational engine: partykit 
+## 
+## Model fit template:
+## parsnip::ctree_train(formula = missing_arg(), data = missing_arg(), 
+##     weights = missing_arg(), maxdepth = integer(1), minsplit = min_rows(0L, 
+##         data))
+```
+
+`parsnip::ctree_train()` is a wrapper around [partykit::ctree()] (and other functions) that makes it easier to run this model. 
+
+## Preprocessing requirements
+
+
+This engine does not require any special encoding of the predictors. Categorical predictors can be partitioned into groups of factor levels (e.g. `{a, c}` vs `{b, d}`) when splitting at a node. Dummy variables are not required for this model. 
+
+## References
+
+ - [partykit: A Modular Toolkit for Recursive Partytioning in R](https://jmlr.org/papers/v16/hothorn15a.html)
+
+ - Kuhn, M, and K Johnson. 2013. _Applied Predictive Modeling_. Springer.