Skip to content

Tables for engine specific params in model docs #272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 26, 2020
Merged

Tables for engine specific params in model docs #272

merged 2 commits into from
Mar 26, 2020

Conversation

juliasilge
Copy link
Member

Addresses #211. This PR implements a table in each individual model .Rd file to show the mapping from the parsnip parameters to the engine parameters.

Still TODO is the defaults for these parameters but that is a WHOLE THING (maybe not something that can be automated) so I plan to do that in another PR in the hopefully near future. If this piece is good as is, let's merge it.

@topepo
Copy link
Member

topepo commented Mar 24, 2020

Looks good. I agree that the default arguments should be a second stage.

@topepo
Copy link
Member

topepo commented Mar 25, 2020

For future PRs... this is a little kludgy but we could add some code to the _data files that define the methods to catalog their default values:

library(tidymodels)
#> ── Attaching packages ───────────────────────────── tidymodels 0.1.0 ──
#> ✓ broom     0.5.4          ✓ recipes   0.1.10    
#> ✓ dials     0.0.4.9000     ✓ rsample   0.0.5.9000
#> ✓ dplyr     0.8.5          ✓ tibble    2.1.3     
#> ✓ ggplot2   3.3.0          ✓ tune      0.0.1.9000
#> ✓ infer     0.5.1          ✓ workflows 0.1.0     
#> ✓ parsnip   0.0.5          ✓ yardstick 0.0.5     
#> ✓ purrr     0.3.3
#> ── Conflicts ──────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard()  masks scales::discard()
#> x dplyr::filter()   masks stats::filter()
#> x dplyr::lag()      masks stats::lag()
#> x ggplot2::margin() masks dials::margin()
#> x recipes::step()   masks stats::step()
library(rlang)
#> 
#> Attaching package: 'rlang'
#> The following objects are masked from 'package:purrr':
#> 
#>     %@%, as_function, flatten, flatten_chr, flatten_dbl, flatten_int,
#>     flatten_lgl, flatten_raw, invoke, list_along, modify, prepend,
#>     splice


get_arg <- function(ns, f, arg) {
  args <- formals(getFromNamespace(f, ns))
  args <- args %>% as.list() 
  as.character(args[[arg]])
}

# Make the defaults character because there are cases where we will write something
# eg glmnet::glmnet would have "lambda (all)" or something similar
dt_defaults <- 
  tibble::tribble(
    ~model,         ~engine,                 ~original,  ~default,
    "decision_tree", "rpart",               "maxdepth", get_arg("rpart", "rpart.control", "maxdepth"),
    "decision_tree", "rpart",               "minsplit", get_arg("rpart", "rpart.control", "minsplit"),
    "decision_tree", "rpart",                     "cp", get_arg("rpart", "rpart.control", "cp"),
    "decision_tree",  "C5.0",               "minCases", get_arg("C50", "C5.0Control", "minCases"),
    "decision_tree", "spark",              "max_depth", get_arg("sparklyr", "ml_decision_tree", "max_depth"),
    "decision_tree", "spark", "min_instances_per_node", get_arg("sparklyr", "ml_decision_tree", "min_instances_per_node"),
  )

# emulating convert_args("decision_tree")
model_name <- "decision_tree"

envir <- get_model_env()

args <-
  ls(envir) %>%
  tibble::tibble(name = .) %>%
  dplyr::filter(grepl("args", name)) %>%
  dplyr::mutate(model = sub("_args", "", name),
                args  = purrr::map(name, ~envir[[.x]])) %>%
  dplyr::filter(grepl(model_name, model)) %>%
  tidyr::unnest(args) %>%
  dplyr::select(model:original) %>% 
  full_join(dt_defaults) %>% 
  mutate(original = paste0(original, " (", default, ")")) %>% 
  select(-default)
#> Joining, by = c("model", "engine", "original")

convert_df <- args %>%
  dplyr::select(-model) %>%
  tidyr::pivot_wider(names_from = engine, values_from = original)

convert_df %>%
  knitr::kable(col.names = paste0("**", colnames(convert_df), "**"))
parsnip rpart C5.0 spark
tree_depth maxdepth (30) NA max_depth (5)
min_n minsplit (20) minCases (2) min_instances_per_node (1)
cost_complexity cp (0.01) NA NA

Created on 2020-03-24 by the reprex package (v0.3.0)

@topepo topepo merged commit 8b73b3b into tidymodels:master Mar 26, 2020
@juliasilge juliasilge deleted the engine-specific-parameters branch June 30, 2020 20:14
@github-actions
Copy link

github-actions bot commented Mar 7, 2021

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants