|
| 1 | + |
| 2 | + |
| 3 | + |
| 4 | +For this engine, there are multiple modes: regression and classification |
| 5 | + |
| 6 | +## Tuning Parameters |
| 7 | + |
| 8 | + |
| 9 | + |
| 10 | +This model has 6 tuning parameters: |
| 11 | + |
| 12 | +- `tree_depth`: Tree Depth (type: integer, default: -1) |
| 13 | + |
| 14 | +- `trees`: # Trees (type: integer, default: 100) |
| 15 | + |
| 16 | +- `learn_rate`: Learning Rate (type: double, default: 0.1) |
| 17 | + |
| 18 | +- `mtry`: # Randomly Selected Predictors (type: integer, default: see below) |
| 19 | + |
| 20 | +- `min_n`: Minimal Node Size (type: integer, default: 20) |
| 21 | + |
| 22 | +- `loss_reduction`: Minimum Loss Reduction (type: double, default: 0) |
| 23 | + |
| 24 | +The `mtry` parameter gives the _number_ of predictors that will be randomly sampled at each split. The default is to use all predictors. |
| 25 | + |
| 26 | +Rather than as a number, [lightgbm::lgb.train()]'s `feature_fraction` argument encodes `mtry` as the _proportion_ of predictors that will be randomly sampled at each split. parsnip translates `mtry`, supplied as the _number_ of predictors, to a proportion under the hood. That is, the user should still supply the argument as `mtry` to `boost_tree()`, and do so in its sense as a number rather than a proportion; before passing `mtry` to [lightgbm::lgb.train()], parsnip will convert the `mtry` value to a proportion. |
| 27 | + |
| 28 | +Note that parsnip's translation can be overridden via the `counts` argument, supplied to `set_engine()`. By default, `counts` is set to `TRUE`, but supplying the argument `counts = FALSE` allows the user to supply `mtry` as a proportion rather than a number. |
| 29 | + |
| 30 | +## Translation from parsnip to the original package (regression) |
| 31 | + |
| 32 | + |
| 33 | + |
| 34 | + |
| 35 | +```r |
| 36 | +boost_tree( |
| 37 | + mtry = integer(), trees = integer(), tree_depth = integer(), |
| 38 | + learn_rate = numeric(), min_n = integer(), loss_reduction = numeric() |
| 39 | +) %>% |
| 40 | + set_engine("lightgbm") %>% |
| 41 | + set_mode("regression") %>% |
| 42 | + translate() |
| 43 | +``` |
| 44 | + |
| 45 | +``` |
| 46 | +## Boosted Tree Model Specification (regression) |
| 47 | +## |
| 48 | +## Main Arguments: |
| 49 | +## mtry = integer() |
| 50 | +## trees = integer() |
| 51 | +## min_n = integer() |
| 52 | +## tree_depth = integer() |
| 53 | +## learn_rate = numeric() |
| 54 | +## loss_reduction = numeric() |
| 55 | +## |
| 56 | +## Computational engine: lightgbm |
| 57 | +## |
| 58 | +## Model fit template: |
| 59 | +## bonsai::train_lightgbm(x = missing_arg(), y = missing_arg(), |
| 60 | +## feature_fraction = integer(), num_iterations = integer(), |
| 61 | +## min_data_in_leaf = integer(), max_depth = integer(), learning_rate = numeric(), |
| 62 | +## min_gain_to_split = numeric(), verbose = -1) |
| 63 | +``` |
| 64 | + |
| 65 | +## Translation from parsnip to the original package (classification) |
| 66 | + |
| 67 | + |
| 68 | + |
| 69 | + |
| 70 | +```r |
| 71 | +boost_tree( |
| 72 | + mtry = integer(), trees = integer(), tree_depth = integer(), |
| 73 | + learn_rate = numeric(), min_n = integer(), loss_reduction = numeric() |
| 74 | +) %>% |
| 75 | + set_engine("lightgbm") %>% |
| 76 | + set_mode("classification") %>% |
| 77 | + translate() |
| 78 | +``` |
| 79 | + |
| 80 | +``` |
| 81 | +## Boosted Tree Model Specification (classification) |
| 82 | +## |
| 83 | +## Main Arguments: |
| 84 | +## mtry = integer() |
| 85 | +## trees = integer() |
| 86 | +## min_n = integer() |
| 87 | +## tree_depth = integer() |
| 88 | +## learn_rate = numeric() |
| 89 | +## loss_reduction = numeric() |
| 90 | +## |
| 91 | +## Computational engine: lightgbm |
| 92 | +## |
| 93 | +## Model fit template: |
| 94 | +## bonsai::train_lightgbm(x = missing_arg(), y = missing_arg(), |
| 95 | +## feature_fraction = integer(), num_iterations = integer(), |
| 96 | +## min_data_in_leaf = integer(), max_depth = integer(), learning_rate = numeric(), |
| 97 | +## min_gain_to_split = numeric(), verbose = -1) |
| 98 | +``` |
| 99 | + |
| 100 | +[train_lightgbm()] is a wrapper around [lightgbm::lgb.train()] (and other functions) that make it easier to run this model. |
| 101 | + |
| 102 | +## Other details |
| 103 | + |
| 104 | +### Preprocessing |
| 105 | + |
| 106 | + |
| 107 | +This engine does not require any special encoding of the predictors. Categorical predictors can be partitioned into groups of factor levels (e.g. `{a, c}` vs `{b, d}`) when splitting at a node. Dummy variables are not required for this model. |
| 108 | + |
| 109 | +Non-numeric predictors (i.e., factors) are internally converted to numeric. In the classification context, non-numeric outcomes (i.e., factors) are also internally converted to numeric. |
| 110 | + |
| 111 | +### Verbosity |
| 112 | + |
| 113 | +bonsai quiets much of the logging output from [lightgbm::lgb.train()] by default. With default settings, logged warnings and errors will still be passed on to the user. To print out all logs during training, set `quiet = TRUE`. |
| 114 | + |
| 115 | +## Examples |
| 116 | + |
| 117 | +<!-- TODO: update url to bonsai pkgdown site --> |
| 118 | +The "Introduction to bonsai" article contains [examples](https://github.com/tidymodels/bonsai) of `boost_tree()` with the `"lightgbm"` engine. |
| 119 | + |
| 120 | +## References |
| 121 | + |
| 122 | + - [LightGBM: A Highly Efficient Gradient Boosting Decision Tree](https://papers.nips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html) |
0 commit comments