-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Computed variable and delayed evaluation docs #5117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
R/aes-evaluation.r
Outdated
#' fun = ~ round(mean(.x), 2), | ||
#' fun.max = ~ round(sd(.x), 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nitpicking only because of my love/hate relationship with the fun.*
arguments in stat_summary()
:
May I suggest just passing a function to fun.data
returning a dataframe with the desired variables-as-columns, as opposed to supplying functions to fun
and fun.max
that each return vectors?
So those 2 lines can be replaced with:
# possibly `\(val) ...` so the argument is not confused with the x-positional aesthetic
fun.data = \(x) round(data.frame(mean = mean(x), sd = sd(x)), 2)
Then, the columns from fun.data
output can be referenced in the after_stat()
for label
like this, which IMO is more transparent:
after_stat(paste(mean, sd, sep = " ± "))
And it also produces a more straightforward layer data:
p_new <- ggplot(mpg, aes(class, displ)) +
geom_violin() +
stat_summary(
aes(y = stage(displ, after_stat = 8),
label = after_stat(paste(mean, sd, sep = " ± "))),
geom = "text",
fun.data = \(x) round(data.frame(mean = mean(x), sd = sd(x)), 2)
)
layer_data(last_plot(), 2)[,1:7]
#> y label x group mean sd PANEL
#> 1 8 6.16 ± 0.53 1 1 6.16 0.53 1
#> 2 8 2.33 ± 0.45 2 2 2.33 0.45 1
#> 3 8 2.92 ± 0.72 3 3 2.92 0.72 1
#> 4 8 3.39 ± 0.45 4 4 3.39 0.45 1
#> 5 8 4.42 ± 0.83 5 5 4.42 0.83 1
#> 6 8 2.66 ± 1.1 6 6 2.66 1.10 1
#> 7 8 4.46 ± 1.07 7 7 4.46 1.07 1
Without overriding fun.data
(or doing an ugly fun.min = ~ NULL
), you get the vestigial ymin
column that's there to support the default pointrange geom. Plus, the mean that's calculated from fun
gets mapped to y
and then is immediately overriden by stage(after_stat = 8)
which may be surprising (in case one wanted to use it to map size
in the after-scale of something):
p_old <- ggplot(mpg, aes(class, displ)) +
geom_violin() +
stat_summary(
aes(y = stage(displ, after_stat = 8),
label = after_stat(paste(y, ymax, sep = " ± "))),
geom = "text",
fun = ~ round(mean(.x), 2),
fun.max = ~ round(sd(.x), 2)
)
layer_data(last_plot(), 2)[,1:7]
#> y label x group ymin ymax PANEL
#> 1 8 6.16 ± 0.53 1 1 NA 0.53 1
#> 2 8 2.33 ± 0.45 2 2 NA 0.45 1
#> 3 8 2.92 ± 0.72 3 3 NA 0.72 1
#> 4 8 3.39 ± 0.45 4 4 NA 0.45 1
#> 5 8 4.42 ± 0.83 5 5 NA 0.83 1
#> 6 8 2.66 ± 1.1 6 6 NA 1.10 1
#> 7 8 4.46 ± 1.07 7 7 NA 1.07 1
Lastly and relatedly, because sd is held in the ymax
column in the old example, it gets used later in the retraining of the y-scales, so if you plot p_old
you see it actually gets an expansion down to around 0 in the y-axis because ymax
is around that value. All of this mess because stat_summary()
appears as if it's geom-agnostic when it kinda isn't 😅
I may be missing obvious problems with my approach and this is possibly off-label usage of fun.data
, but as long as fun.data
dataframe doesn't touch standard aesthetics (like overriding/duplicating the x
column or hard-coding colour
before the scale sees it), this should be ok?
(And of course thanks for this great addition to the docs!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks June, I love the suggestion. It makes it much clearer about what is going on! Especially the y-axis expansion I hadn't spotted, so thanks :) The only way I'll deviate from your suggestion is that the example should work in older R versions than R4.1.0, so I'll stick to the rlang lambda syntax instead of base R lambda syntax for now.
…into comp_var_docs # Conflicts: # R/aes-evaluation.r
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I added a few comments specifically about the style.
Thanks for your comments Hadley! I've adapted at most places, and I'll go write an |
Alright if we would disagree about the list method (describe vs itemize), it should now be relatively easy to adjust the |
R/aes-evaluation.r
Outdated
#' aesthetics from, and three functions to control at which stage aesthetics | ||
#' should be evaluated. | ||
#' | ||
#' @usage # These functions can be used inside the `aes()` function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#' @usage # These functions can be used inside the `aes()` function | |
#' @usage | |
#' # These functions can be used inside the `aes()` function |
I think this makes it easier to read and shouldn't affect the output
R/aes-evaluation.r
Outdated
#' If you want to map the same aesthetic multiple times, e.g. map `x` to a | ||
#' data column for the stat, but remap it for the geom, you can use the | ||
#' `stage()` function to collect multiple mappings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent
R/aes-evaluation.r
Outdated
#' ) | ||
#' ``` | ||
#' @note | ||
#' `after_stat()` replaces the old approaches of using either `stat()`, e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd put this in the @description
. I assume it already has a stat
alias?
R/utilities-help.r
Outdated
header <- "@section Computed variables: " | ||
intro <- paste0( | ||
"These are calculated by the 'stat' part of layers and can be accessed ", | ||
"with \\link[=aes_eval]{delayed evaluation}. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use markdown here?
|
||
# Compose item-list | ||
fmt_descr <- paste0(gsub("\n", "", descr), "}") | ||
fmt_list <- paste(fmt_items, fmt_descr, sep = "\\cr ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fmt_list <- paste(fmt_items, fmt_descr, sep = "\\cr ") | |
fmt_list <- paste(fmt_items, fmt_descr, sep = ": ") |
?
Thanks for your review Hadley! I've processed the latest comments. I'm still going to make one last attempt at advocating for This is what the documentation of Whereas if we separate item name and description with Nonetheless, I have taken your suggestion to use |
Ok, that looks good to me! |
This PR aims to fix #4951
Briefly, in 'computed variables' sections there is now a pointer to the
?after_stat
page, and all variable names are wrapped inafter_stat()
. In addition, the?after_stat
page has been restructured to more clearly articulate what to expect for each stage, as well as sprinkling around some extra, slightly more complicated examples.