Handling alternative data argument names and fixing calls #316
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As mentioned in #315:
In some cases, a model function that
parsnip
is calling has non-standard argument names for the data-oriented components.For example, a function
foo()
that only has the x/y interface might have a signature likefoo(X, Y)
instead of the standardfoo(x, y)
.Some real examples:
sparklyr
models usex
as thedata
slot (and put it in the first position)kernlab::ksvm(x, data)
wherex
is a formula.MASS::lda(x, grouping)
This PR allows for these types of arguments.
Model definitions have the option to add a
data
element to the definition that maps the argument name to the objects that are created byfit()
orfit_xy()
.For example, for formula-based
ksvm()
models, the new definition usesThis is optional so most models are unaffected. The model definitions that were changed in the PR:
svm_rbf()
andsvm_poly()
withkernlab
engines (for the above reason)mars()
withearth
engines (for other reasons)Bonus Feature 1 🎉
Previously, when the arguments were translated to a call, the actual formula was not used. It contained a generic symbol called
formula
. In this PR we substitute the user-provided formula to the call.This should help with #274. @StephenMilborrow, @apreshill, and others have mentioned this issue of borked calls in the past. This and the extra special bonus feature below should make this more tolerable.
Bonus Feature 2 🎉 🎉
The above change doesn't really solve #274 since 1) the generic
data
symbol does not link to the user's data object and 2) extra arguments (likeanova
above) are quosures and not the real object.This version contains a function called
repair_call()
that fixes both of these issues if the user provides the original data to the function (just to get its name).For the example above:
You might wonder why we need to manually call this extra function. Can't this function be invoked at the end of
fit()
to happen automatically?The problem with automating this is that, once
tune
andworkflows
are used, the data that are passed internally tofit()
might not available to the user (i.e. the might not match the user's data object). The easiest example is when a recipe is used in a workflow; the processed data are not the same as the original data.In this case, the user could run the recipe on the data that was used to build the model and then repair the call with that object.