Skip to content

Computed variable and delayed evaluation docs #5117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jan 3, 2023

Conversation

teunbrand
Copy link
Collaborator

@teunbrand teunbrand commented Dec 23, 2022

This PR aims to fix #4951

Briefly, in 'computed variables' sections there is now a pointer to the ?after_stat page, and all variable names are wrapped in after_stat(). In addition, the ?after_stat page has been restructured to more clearly articulate what to expect for each stage, as well as sprinkling around some extra, slightly more complicated examples.

Comment on lines 101 to 102
#' fun = ~ round(mean(.x), 2),
#' fun.max = ~ round(sd(.x), 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nitpicking only because of my love/hate relationship with the fun.* arguments in stat_summary():

May I suggest just passing a function to fun.data returning a dataframe with the desired variables-as-columns, as opposed to supplying functions to fun and fun.max that each return vectors?

So those 2 lines can be replaced with:

# possibly `\(val) ...` so the argument is not confused with the x-positional aesthetic
fun.data = \(x) round(data.frame(mean = mean(x), sd = sd(x)), 2)

Then, the columns from fun.data output can be referenced in the after_stat() for label like this, which IMO is more transparent:

after_stat(paste(mean, sd, sep = " ± "))

And it also produces a more straightforward layer data:

p_new <- ggplot(mpg, aes(class, displ)) +
   geom_violin() +
   stat_summary(
     aes(y = stage(displ, after_stat = 8),
         label = after_stat(paste(mean, sd, sep = " ± "))),
     geom = "text",
     fun.data = \(x) round(data.frame(mean = mean(x), sd = sd(x)), 2)
   )
layer_data(last_plot(), 2)[,1:7]
#>   y       label x group mean   sd PANEL
#> 1 8 6.16 ± 0.53 1     1 6.16 0.53     1
#> 2 8 2.33 ± 0.45 2     2 2.33 0.45     1
#> 3 8 2.92 ± 0.72 3     3 2.92 0.72     1
#> 4 8 3.39 ± 0.45 4     4 3.39 0.45     1
#> 5 8 4.42 ± 0.83 5     5 4.42 0.83     1
#> 6 8  2.66 ± 1.1 6     6 2.66 1.10     1
#> 7 8 4.46 ± 1.07 7     7 4.46 1.07     1

Without overriding fun.data (or doing an ugly fun.min = ~ NULL), you get the vestigial ymin column that's there to support the default pointrange geom. Plus, the mean that's calculated from fun gets mapped to y and then is immediately overriden by stage(after_stat = 8) which may be surprising (in case one wanted to use it to map size in the after-scale of something):

p_old <- ggplot(mpg, aes(class, displ)) +
   geom_violin() +
   stat_summary(
     aes(y = stage(displ, after_stat = 8),
         label = after_stat(paste(y, ymax, sep = " ± "))),
     geom = "text",
     fun = ~ round(mean(.x), 2),
     fun.max = ~ round(sd(.x), 2)
   )
layer_data(last_plot(), 2)[,1:7]
#>   y       label x group ymin ymax PANEL
#> 1 8 6.16 ± 0.53 1     1   NA 0.53     1
#> 2 8 2.33 ± 0.45 2     2   NA 0.45     1
#> 3 8 2.92 ± 0.72 3     3   NA 0.72     1
#> 4 8 3.39 ± 0.45 4     4   NA 0.45     1
#> 5 8 4.42 ± 0.83 5     5   NA 0.83     1
#> 6 8  2.66 ± 1.1 6     6   NA 1.10     1
#> 7 8 4.46 ± 1.07 7     7   NA 1.07     1

Lastly and relatedly, because sd is held in the ymax column in the old example, it gets used later in the retraining of the y-scales, so if you plot p_old you see it actually gets an expansion down to around 0 in the y-axis because ymax is around that value. All of this mess because stat_summary() appears as if it's geom-agnostic when it kinda isn't 😅

I may be missing obvious problems with my approach and this is possibly off-label usage of fun.data, but as long as fun.data dataframe doesn't touch standard aesthetics (like overriding/duplicating the x column or hard-coding colour before the scale sees it), this should be ok?

(And of course thanks for this great addition to the docs!)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks June, I love the suggestion. It makes it much clearer about what is going on! Especially the y-axis expansion I hadn't spotted, so thanks :) The only way I'll deviate from your suggestion is that the example should work in older R versions than R4.1.0, so I'll stick to the rlang lambda syntax instead of base R lambda syntax for now.

@yutannihilation
Copy link
Member

Looks awesome!

@hadley
Could you take a look when you have time? Since #4951 is what you proposed, probably you are the best one to review this.

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I added a few comments specifically about the style.

@teunbrand
Copy link
Collaborator Author

Thanks for your comments Hadley! I've adapted at most places, and I'll go write an rd_computed_vars() helper function to format the section better.

@teunbrand
Copy link
Collaborator Author

Alright if we would disagree about the list method (describe vs itemize), it should now be relatively easy to adjust the rd_compute_vars() wrapper instead of manually going through all of the stat functions :)

@teunbrand teunbrand requested a review from hadley December 27, 2022 15:51
#' aesthetics from, and three functions to control at which stage aesthetics
#' should be evaluated.
#'
#' @usage # These functions can be used inside the `aes()` function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' @usage # These functions can be used inside the `aes()` function
#' @usage
#' # These functions can be used inside the `aes()` function

I think this makes it easier to read and shouldn't affect the output

Comment on lines 83 to 85
#' If you want to map the same aesthetic multiple times, e.g. map `x` to a
#' data column for the stat, but remap it for the geom, you can use the
#' `stage()` function to collect multiple mappings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent

#' )
#' ```
#' @note
#' `after_stat()` replaces the old approaches of using either `stat()`, e.g.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd put this in the @description. I assume it already has a stat alias?

header <- "@section Computed variables: "
intro <- paste0(
"These are calculated by the 'stat' part of layers and can be accessed ",
"with \\link[=aes_eval]{delayed evaluation}. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use markdown here?


# Compose item-list
fmt_descr <- paste0(gsub("\n", "", descr), "}")
fmt_list <- paste(fmt_items, fmt_descr, sep = "\\cr ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fmt_list <- paste(fmt_items, fmt_descr, sep = "\\cr ")
fmt_list <- paste(fmt_items, fmt_descr, sep = ": ")

?

@teunbrand
Copy link
Collaborator Author

Thanks for your review Hadley! I've processed the latest comments.

I'm still going to make one last attempt at advocating for \cr separation of item name and description. Ideally, what I think would be best is to have the typesetting as in @param fields, where all descriptions are indented the same. I don't think this can be replicated outside @param fields though.

This is what the documentation of ?geom_boxplot looks like if we separate with :. It is visually hard to discriminate where the item name ends and the description begins. The only visual clue is that the typesetting is a monospaced font for the item name (if we ignore the 'or's) and the colon itself. The good thing about this, is that it is more compact.

image

Whereas if we separate item name and description with \cr, the documentation looks like the one below. I think it is visually easier to discriminate the item name from the description, and it looks a bit tidier than separating with :.

image

Nonetheless, I have taken your suggestion to use : in the PR (but I couldn't let this go without an honest attempt at advocating for \cr one last time).

@hadley
Copy link
Member

hadley commented Jan 2, 2023

Ok, that looks good to me!

@teunbrand teunbrand merged commit 3daeb33 into tidyverse:main Jan 3, 2023
@teunbrand teunbrand deleted the comp_var_docs branch January 3, 2023 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Computed variable docs should suggest after_stat()
4 participants