Skip to content

Computed variable and delayed evaluation docs #5117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Jan 3, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
185 changes: 152 additions & 33 deletions R/aes-evaluation.r
Original file line number Diff line number Diff line change
@@ -1,41 +1,114 @@
#' Control aesthetic evaluation
#'
#' Most aesthetics are mapped from variables found in the data. Sometimes,
#' however, you want to delay the mapping until later in the rendering process.
#' ggplot2 has three stages of the data that you can map aesthetics from. The
#' default is to map at the beginning, using the layer data provided by the
#' user. The second stage is after the data has been transformed by the layer
#' stat. The third and last stage is after the data has been transformed and
#' mapped by the plot scales. The most common example of mapping from stat
#' transformed data is the height of bars in [geom_histogram()]:
#' the height does not come from a variable in the underlying data, but
#' is instead mapped to the `count` computed by [stat_bin()]. An example of
#' mapping from scaled data could be to use a desaturated version of the stroke
#' colour for fill. If you want to map directly from the layer data you should
#' not do anything special. In order to map from stat transformed data you
#' @description
#' Most [aesthetics][aes()] are mapped from variables found in the data.
#' Sometimes, however, you want to delay the mapping until later in the
#' rendering process. ggplot2 has three stages of the data that you can map
#' aesthetics from, and three functions to control at which stage aesthetics
#' should be evaluated.
#'
#' @description
#' `after_stat()` replaces the old approaches of using either `stat()`, e.g.
#' `stat(density)`, or surrounding the variable names with `..`, e.g.
#' `..density..`.
#'
#' @usage
#' # These functions can be used inside the `aes()` function
#' # used as the `mapping` argument in layers, for example:
#' # geom_density(mapping = aes(y = after_stat(scaled)))
#'
#' @param x <[`data-masking`][rlang::topic-data-mask]> An aesthetic expression
#' using variables calculated by the stat (`after_stat()`) or layer aesthetics
#' (`after_scale()`).
#' @param start <[`data-masking`][rlang::topic-data-mask]> An aesthetic
#' expression using variables from the layer data.
#' @param after_stat <[`data-masking`][rlang::topic-data-mask]> An aesthetic
#' expression using variables calculated by the stat.
#' @param after_scale <[`data-masking`][rlang::topic-data-mask]> An aesthetic
#' expression using layer aesthetics.
#'
#' @details
#' # Staging
#' Below follows an overview of the three stages of evaluation and how aesthetic
#' evaluation can be controlled.
#'
#' ## Stage 1: direct input
#' The default is to map at the beginning, using the layer data provided by
#' the user. If you want to map directly from the layer data you should not do
#' anything special. This is the only stage where the original layer data can
#' be accessed.
#'
#' ```r
#' # 'x' and 'y' are mapped directly
#' ggplot(mtcars) + geom_point(aes(x = mpg, y = disp))
#' ```
#'
#' ## Stage 2: after stat transformation
#' The second stage is after the data has been transformed by the layer
#' stat. The most common example of mapping from stat transformed data is the
#' height of bars in [geom_histogram()]: the height does not come from a
#' variable in the underlying data, but is instead mapped to the `count`
#' computed by [stat_bin()]. In order to map from stat transformed data you
#' should use the `after_stat()` function to flag that evaluation of the
#' aesthetic mapping should be postponed until after stat transformation.
#' Similarly, you should use `after_scale()` to flag evaluation of mapping for
#' after data has been scaled. If you want to map the same aesthetic multiple
#' times, e.g. map `x` to a data column for the stat, but remap it for the geom,
#' you can use the `stage()` function to collect multiple mappings.
#'
#' `after_stat()` replaces the old approaches of using either `stat()` or
#' surrounding the variable names with `..`.
#'
#' @note Evaluation after stat transformation will have access to the
#' variables calculated by the stat, not the original mapped values. Evaluation
#' after scaling will only have access to the final aesthetics of the layer
#' (including non-mapped, default aesthetics). The original layer data can only
#' be accessed at the first stage.
#'
#' @param x An aesthetic expression using variables calculated by the stat
#' (`after_stat()`) or layer aesthetics (`after_scale()`).
#' @param start An aesthetic expression using variables from the layer data.
#' @param after_stat An aesthetic expression using variables calculated by the
#' stat.
#' @param after_scale An aesthetic expression using layer aesthetics.
#' Evaluation after stat transformation will have access to the variables
#' calculated by the stat, not the original mapped values. The 'computed
#' variables' section in each stat lists which variables are available to
#' access.
#'
#' ```r
#' # The 'y' values for the histogram are computed by the stat
#' ggplot(faithful, aes(x = waiting)) +
#' geom_histogram()
#'
#' # Choosing a different computed variable to display, matching up the
#' # histogram with the density plot
#' ggplot(faithful, aes(x = waiting)) +
#' geom_histogram(aes(y = after_stat(density))) +
#' geom_density()
#' ```
#'
#' ## Stage 3: after scale transformation
#' The third and last stage is after the data has been transformed and
#' mapped by the plot scales. An example of mapping from scaled data could
#' be to use a desaturated version of the stroke colour for fill. You should
#' use `after_scale()` to flag evaluation of mapping for after data has been
#' scaled. Evaluation after scaling will only have access to the final
#' aesthetics of the layer (including non-mapped, default aesthetics).
#'
#' ```r
#' # The exact colour is known after scale transformation
#' ggplot(mpg, aes(cty, colour = factor(cyl))) +
#' geom_density()
#'
#' # We re-use colour properties for the fill without a separate fill scale
#' ggplot(mpg, aes(cty, colour = factor(cyl))) +
#' geom_density(aes(fill = after_scale(alpha(colour, 0.3))))
#' ```
#'
#' ## Complex staging
#' If you want to map the same aesthetic multiple times, e.g. map `x` to a
#' data column for the stat, but remap it for the geom, you can use the
#' `stage()` function to collect multiple mappings.
#'
#' ```r
#' # Use stage to modify the scaled fill
#' ggplot(mpg, aes(class, hwy)) +
#' geom_boxplot(aes(fill = stage(class, after_scale = alpha(fill, 0.4))))
#'
#' # Using data for computing summary, but placing label elsewhere.
#' # Also, we're making our own computed variable to use for the label.
#' ggplot(mpg, aes(class, displ)) +
#' geom_violin() +
#' stat_summary(
#' aes(
#' y = stage(displ, after_stat = 8),
#' label = after_stat(paste(mean, "±", sd))
#' ),
#' geom = "text",
#' fun.data = ~ round(data.frame(mean = mean(.x), sd = sd(.x)), 2)
#' )
#' ```
#' @rdname aes_eval
#' @name aes_eval
#'
Expand All @@ -55,6 +128,52 @@
#' # Use stage to modify the scaled fill
#' ggplot(mpg, aes(class, hwy)) +
#' geom_boxplot(aes(fill = stage(class, after_scale = alpha(fill, 0.4))))
#'
#' # Making a proportional stacked density plot
#' ggplot(mpg, aes(cty)) +
#' geom_density(
#' aes(
#' colour = factor(cyl),
#' fill = after_scale(alpha(colour, 0.3)),
#' y = after_stat(count / sum(n[!duplicated(group)]))
#' ),
#' position = "stack", bw = 1
#' ) +
#' geom_density(bw = 1)
#'
#' # Imitating a ridgeline plot
#' ggplot(mpg, aes(cty, colour = factor(cyl))) +
#' geom_ribbon(
#' stat = "density", outline.type = "upper",
#' aes(
#' fill = after_scale(alpha(colour, 0.3)),
#' ymin = after_stat(group),
#' ymax = after_stat(group + ndensity)
#' )
#' )
#'
#' # Labelling a bar plot
#' ggplot(mpg, aes(class)) +
#' geom_bar() +
#' geom_text(
#' aes(
#' y = after_stat(count + 2),
#' label = after_stat(count)
#' ),
#' stat = "count"
#' )
#'
#' # Labelling the upper hinge of a boxplot,
#' # inspired by June Choe
#' ggplot(mpg, aes(displ, class)) +
#' geom_boxplot(outlier.shape = NA) +
#' geom_text(
#' aes(
#' label = after_stat(xmax),
#' x = stage(displ, after_stat = xmax)
#' ),
#' stat = "boxplot", hjust = -0.5
#' )
NULL

#' @rdname aes_eval
Expand Down
2 changes: 2 additions & 0 deletions R/aes.r
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ NULL
#' are typically omitted because they are so common; all other aesthetics must be named.
#' @seealso [vars()] for another quoting function designed for
#' faceting specifications.
#'
#' [Delayed evaluation][aes_eval] for working with computed variables.
#' @return A list with class `uneval`. Components of the list are either
#' quosures or constants.
#' @export
Expand Down
23 changes: 11 additions & 12 deletions R/geom-dotplot.r
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,17 @@
#' to match the number of dots.
#'
#' @eval rd_aesthetics("geom", "dotplot")
#' @section Computed variables:
#' \describe{
#' \item{x}{center of each bin, if binaxis is "x"}
#' \item{y}{center of each bin, if binaxis is "x"}
#' \item{binwidth}{max width of each bin if method is "dotdensity";
#' width of each bin if method is "histodot"}
#' \item{count}{number of points in bin}
#' \item{ncount}{count, scaled to maximum of 1}
#' \item{density}{density of points in bin, scaled to integrate to 1,
#' if method is "histodot"}
#' \item{ndensity}{density, scaled to maximum of 1, if method is "histodot"}
#' }
#' @eval rd_computed_vars(
#' x = 'center of each bin, if `binaxis` is `"x"`.',
#' y = 'center of each bin, if `binaxis` is `"x"`.',
#' binwidth = 'maximum width of each bin if method is `"dotdensity"`;
#' width of each bin if method is `"histodot"`.',
#' count = "number of points in bin.",
#' ncount = "count, scaled to a maximum of 1.",
#' density = 'density of points in bin, scaled to integrate to 1, if method
#' is `"histodot"`.',
#' ndensity = 'density, scaled to maximum of 1, if method is `"histodot"`.'
#' )
#'
#' @inheritParams layer
#' @inheritParams geom_point
Expand Down
15 changes: 7 additions & 8 deletions R/stat-bin.r
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,13 @@
#' or left edges of bins are included in the bin.
#' @param pad If `TRUE`, adds empty bins at either end of x. This ensures
#' frequency polygons touch 0. Defaults to `FALSE`.
#' @section Computed variables:
#' \describe{
#' \item{`count`}{number of points in bin}
#' \item{`density`}{density of points in bin, scaled to integrate to 1}
#' \item{`ncount`}{count, scaled to maximum of 1}
#' \item{`ndensity`}{density, scaled to maximum of 1}
#' \item{`width`}{widths of bins}
#' }
#' @eval rd_computed_vars(
#' count = "number of points in bin.",
#' density = "density of points in bin, scaled to integrate to 1.",
#' ncount = "count, scaled to a maximum of 1.",
#' ndensity = "density, scaled to a maximum of 1.",
#' width = "widths of bins."
#' )
#'
#' @section Dropped variables:
#' \describe{
Expand Down
13 changes: 6 additions & 7 deletions R/stat-bin2d.r
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,12 @@
#' @param drop if `TRUE` removes all cells with 0 counts.
#' @export
#' @rdname geom_bin_2d
#' @section Computed variables:
#' \describe{
#' \item{count}{number of points in bin}
#' \item{density}{density of points in bin, scaled to integrate to 1}
#' \item{ncount}{count, scaled to maximum of 1}
#' \item{ndensity}{density, scaled to maximum of 1}
#' }
#' @eval rd_computed_vars(
#' count = "number of points in bin.",
#' density = "density of points in bin, scaled to integrate to 1.",
#' ncount = "count, scaled to maximum of 1.",
#' ndensity = "density, scaled to a maximum of 1."
#' )
stat_bin_2d <- function(mapping = NULL, data = NULL,
geom = "tile", position = "identity",
...,
Expand Down
13 changes: 6 additions & 7 deletions R/stat-binhex.r
Original file line number Diff line number Diff line change
@@ -1,13 +1,12 @@
#' @export
#' @rdname geom_hex
#' @inheritParams stat_bin_2d
#' @section Computed variables:
#' \describe{
#' \item{count}{number of points in bin}
#' \item{density}{density of points in bin, scaled to integrate to 1}
#' \item{ncount}{count, scaled to maximum of 1}
#' \item{ndensity}{density, scaled to maximum of 1}
#' }
#' @eval rd_computed_vars(
#' count = "number of points in bin.",
#' density = "density of points in bin, scaled to integrate to 1.",
#' ncount = "count, scaled to maximum of 1.",
#' ndensity = "density, scaled to maximum of 1."
#' )
stat_bin_hex <- function(mapping = NULL, data = NULL,
geom = "hex", position = "identity",
...,
Expand Down
26 changes: 14 additions & 12 deletions R/stat-boxplot.r
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
#' @rdname geom_boxplot
#' @param coef Length of the whiskers as multiple of IQR. Defaults to 1.5.
#' @inheritParams stat_identity
#' @section Computed variables:
#' `stat_boxplot()` provides the following variables, some of which depend on the orientation:
#' \describe{
#' \item{width}{width of boxplot}
#' \item{ymin *or* xmin}{lower whisker = smallest observation greater than or equal to lower hinge - 1.5 * IQR}
#' \item{lower *or* xlower}{lower hinge, 25% quantile}
#' \item{notchlower}{lower edge of notch = median - 1.58 * IQR / sqrt(n)}
#' \item{middle *or* xmiddle}{median, 50% quantile}
#' \item{notchupper}{upper edge of notch = median + 1.58 * IQR / sqrt(n)}
#' \item{upper *or* xupper}{upper hinge, 75% quantile}
#' \item{ymax *or* xmax}{upper whisker = largest observation less than or equal to upper hinge + 1.5 * IQR}
#' }
#' @export
#' @eval rd_computed_vars(
#' .details = "`stat_boxplot()` provides the following variables, some of
#' which depend on the orientation:",
#' width = "width of boxplot.",
#' "ymin|xmin" = "lower whisker = smallest observation greater than or equal
#' to lower hinger - 1.5 * IQR.",
#' "lower|xlower" = "lower hinge, 25% quantile.",
#' notchlower = "lower edge of notch = median - 1.58 * IQR / sqrt(n).",
#' "middle|xmiddle" = "median, 50% quantile.",
#' notchupper = "upper edge of notch = median + 1.58 * IQR / sqrt(n).",
#' "upper|xupper" = "upper hinge, 75% quantile.",
#' "ymax|xmax" = "upper whisker = largest observation less than or equal to
#' upper hinger + 1.5 * IQR."
#' )
stat_boxplot <- function(mapping = NULL, data = NULL,
geom = "boxplot", position = "dodge2",
...,
Expand Down
30 changes: 15 additions & 15 deletions R/stat-contour.r
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,21 @@
#' @export
#' @eval rd_aesthetics("stat", "contour")
#' @eval rd_aesthetics("stat", "contour_filled")
#' @section Computed variables:
#' The computed variables differ somewhat for contour lines (computed by
#' `stat_contour()`) and contour bands (filled contours, computed by `stat_contour_filled()`).
#' The variables `nlevel` and `piece` are available for both, whereas `level_low`, `level_high`,
#' and `level_mid` are only available for bands. The variable `level` is a numeric or a factor
#' depending on whether lines or bands are calculated.
#' \describe{
#' \item{`level`}{Height of contour. For contour lines, this is numeric vector that
#' represents bin boundaries. For contour bands, this is an ordered factor that
#' represents bin ranges.}
#' \item{`level_low`, `level_high`, `level_mid`}{(contour bands only) Lower and upper
#' bin boundaries for each band, as well the mid point between the boundaries.}
#' \item{`nlevel`}{Height of contour, scaled to maximum of 1.}
#' \item{`piece`}{Contour piece (an integer).}
#' }
#' @eval rd_computed_vars(
#' .details = "The computed variables differ somewhat for contour lines
#' (compbuted by `stat_contour()`) and contour bands (filled contours,
#' computed by `stat_contour_filled()`). The variables `nlevel` and `piece`
#' are available for both, whereas `level_low`, `level_high`, and `level_mid`
#' are only available for bands. The variable `level` is a numeric or a factor
#' depending on whether lines or bands are calculated.",
#' level = "Height of contour. For contour lines, this is a numeric vector
#' that represents bin boundaries. For contour bands, this is an ordered
#' factor that represents bin ranges.",
#' "level_low,level_high,level_mid" = "(contour bands only) Lower and upper
#' bin boundaries for each band, as well as the mid point between boundaries.",
#' nlevel = "Height of contour, scaled to a maximum of 1.",
#' piece = "Contour piece (an integer)."
#' )
#'
#' @section Dropped variables:
#' \describe{
Expand Down
9 changes: 4 additions & 5 deletions R/stat-count.r
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
#' @section Computed variables:
#' \describe{
#' \item{count}{number of points in bin}
#' \item{prop}{groupwise proportion}
#' }
#' @eval rd_computed_vars(
#' count = "number of points in bin.",
#' prop = "groupwise proportion"
#' )
#' @seealso [stat_bin()], which bins data in ranges and counts the
#' cases in each range. It differs from `stat_count()`, which counts the
#' number of cases at each `x` position (without binning into ranges).
Expand Down
Loading