Skip to content

Reorder columns at prediction time for glmnet #382

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 23, 2020
Merged

Conversation

juliasilge
Copy link
Member

Closes #273.

I played around in the various predict functions and methods today, and I think the best place to do this reordering is in the prediction module for the two glmnet models. This is mainly because:

  • the fitted object is available
  • all the preprocessing has already been done
library(parsnip)
library(tidyverse)
set.seed(989)
# Gaussian
x = matrix(rnorm(100 * length(letters)), 100, length(letters)) %>%
  magrittr::set_colnames(letters)
y = rnorm(100)

glmnet_spec <- linear_reg(penalty = .1) %>%
  set_engine('glmnet')

glmnet_fit <- fit_xy(glmnet_spec, x, y)

glmnet_fit %>%
  predict(x)
#> # A tibble: 100 x 1
#>      .pred
#>      <dbl>
#>  1 -0.135 
#>  2  0.285 
#>  3 -0.126 
#>  4  0.222 
#>  5 -0.138 
#>  6 -0.157 
#>  7  0.0271
#>  8 -0.0215
#>  9 -0.0702
#> 10 -0.0283
#> # … with 90 more rows

glmnet_fit %>%
  predict(x[, sample(colnames(x))])
#> # A tibble: 100 x 1
#>      .pred
#>      <dbl>
#>  1 -0.135 
#>  2  0.285 
#>  3 -0.126 
#>  4  0.222 
#>  5 -0.138 
#>  6 -0.157 
#>  7  0.0271
#>  8 -0.0215
#>  9 -0.0702
#> 10 -0.0283
#> # … with 90 more rows



## still works with dummy variables
data(ames, package = "modeldata")

parsnip_form_fit <- glmnet_spec %>%
  fit(Sale_Price ~ Year_Built + Alley, data = ames)

predict(parsnip_form_fit, ames %>% select(Year_Built, Alley))
#> # A tibble: 2,930 x 1
#>      .pred
#>      <dbl>
#>  1 163033.
#>  2 164552.
#>  3 159996.
#>  4 175180.
#>  5 219215.
#>  6 220733.
#>  7 225289.
#>  8 211623.
#>  9 216178.
#> 10 222252.
#> # … with 2,920 more rows

Created on 2020-10-20 by the reprex package (v0.3.0.9001)

I ran extratests with this change and there were no failures related to this.

@juliasilge juliasilge requested a review from topepo October 20, 2020 23:31
@topepo
Copy link
Member

topepo commented Oct 22, 2020

This needs to happen in the poissonreg package too.

topepo added a commit to tidymodels/poissonreg that referenced this pull request Oct 22, 2020
topepo added a commit to tidymodels/poissonreg that referenced this pull request Oct 23, 2020
@topepo topepo merged commit 53722db into master Oct 23, 2020
@juliasilge juliasilge deleted the glmnet-column-fix branch October 23, 2020 13:44
@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check at prediction time it column names are the same and in same order
2 participants