Skip to content

Commit 3daeb33

Browse files
authored
Merge pull request #5117 from teunbrand/comp_var_docs
Computed variable and delayed evaluation docs
2 parents ef00be7 + 7b703f4 commit 3daeb33

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+606
-321
lines changed

R/aes-evaluation.r

Lines changed: 152 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,114 @@
11
#' Control aesthetic evaluation
22
#'
3-
#' Most aesthetics are mapped from variables found in the data. Sometimes,
4-
#' however, you want to delay the mapping until later in the rendering process.
5-
#' ggplot2 has three stages of the data that you can map aesthetics from. The
6-
#' default is to map at the beginning, using the layer data provided by the
7-
#' user. The second stage is after the data has been transformed by the layer
8-
#' stat. The third and last stage is after the data has been transformed and
9-
#' mapped by the plot scales. The most common example of mapping from stat
10-
#' transformed data is the height of bars in [geom_histogram()]:
11-
#' the height does not come from a variable in the underlying data, but
12-
#' is instead mapped to the `count` computed by [stat_bin()]. An example of
13-
#' mapping from scaled data could be to use a desaturated version of the stroke
14-
#' colour for fill. If you want to map directly from the layer data you should
15-
#' not do anything special. In order to map from stat transformed data you
3+
#' @description
4+
#' Most [aesthetics][aes()] are mapped from variables found in the data.
5+
#' Sometimes, however, you want to delay the mapping until later in the
6+
#' rendering process. ggplot2 has three stages of the data that you can map
7+
#' aesthetics from, and three functions to control at which stage aesthetics
8+
#' should be evaluated.
9+
#'
10+
#' @description
11+
#' `after_stat()` replaces the old approaches of using either `stat()`, e.g.
12+
#' `stat(density)`, or surrounding the variable names with `..`, e.g.
13+
#' `..density..`.
14+
#'
15+
#' @usage
16+
#' # These functions can be used inside the `aes()` function
17+
#' # used as the `mapping` argument in layers, for example:
18+
#' # geom_density(mapping = aes(y = after_stat(scaled)))
19+
#'
20+
#' @param x <[`data-masking`][rlang::topic-data-mask]> An aesthetic expression
21+
#' using variables calculated by the stat (`after_stat()`) or layer aesthetics
22+
#' (`after_scale()`).
23+
#' @param start <[`data-masking`][rlang::topic-data-mask]> An aesthetic
24+
#' expression using variables from the layer data.
25+
#' @param after_stat <[`data-masking`][rlang::topic-data-mask]> An aesthetic
26+
#' expression using variables calculated by the stat.
27+
#' @param after_scale <[`data-masking`][rlang::topic-data-mask]> An aesthetic
28+
#' expression using layer aesthetics.
29+
#'
30+
#' @details
31+
#' # Staging
32+
#' Below follows an overview of the three stages of evaluation and how aesthetic
33+
#' evaluation can be controlled.
34+
#'
35+
#' ## Stage 1: direct input
36+
#' The default is to map at the beginning, using the layer data provided by
37+
#' the user. If you want to map directly from the layer data you should not do
38+
#' anything special. This is the only stage where the original layer data can
39+
#' be accessed.
40+
#'
41+
#' ```r
42+
#' # 'x' and 'y' are mapped directly
43+
#' ggplot(mtcars) + geom_point(aes(x = mpg, y = disp))
44+
#' ```
45+
#'
46+
#' ## Stage 2: after stat transformation
47+
#' The second stage is after the data has been transformed by the layer
48+
#' stat. The most common example of mapping from stat transformed data is the
49+
#' height of bars in [geom_histogram()]: the height does not come from a
50+
#' variable in the underlying data, but is instead mapped to the `count`
51+
#' computed by [stat_bin()]. In order to map from stat transformed data you
1652
#' should use the `after_stat()` function to flag that evaluation of the
1753
#' aesthetic mapping should be postponed until after stat transformation.
18-
#' Similarly, you should use `after_scale()` to flag evaluation of mapping for
19-
#' after data has been scaled. If you want to map the same aesthetic multiple
20-
#' times, e.g. map `x` to a data column for the stat, but remap it for the geom,
21-
#' you can use the `stage()` function to collect multiple mappings.
22-
#'
23-
#' `after_stat()` replaces the old approaches of using either `stat()` or
24-
#' surrounding the variable names with `..`.
25-
#'
26-
#' @note Evaluation after stat transformation will have access to the
27-
#' variables calculated by the stat, not the original mapped values. Evaluation
28-
#' after scaling will only have access to the final aesthetics of the layer
29-
#' (including non-mapped, default aesthetics). The original layer data can only
30-
#' be accessed at the first stage.
31-
#'
32-
#' @param x An aesthetic expression using variables calculated by the stat
33-
#' (`after_stat()`) or layer aesthetics (`after_scale()`).
34-
#' @param start An aesthetic expression using variables from the layer data.
35-
#' @param after_stat An aesthetic expression using variables calculated by the
36-
#' stat.
37-
#' @param after_scale An aesthetic expression using layer aesthetics.
54+
#' Evaluation after stat transformation will have access to the variables
55+
#' calculated by the stat, not the original mapped values. The 'computed
56+
#' variables' section in each stat lists which variables are available to
57+
#' access.
58+
#'
59+
#' ```r
60+
#' # The 'y' values for the histogram are computed by the stat
61+
#' ggplot(faithful, aes(x = waiting)) +
62+
#' geom_histogram()
63+
#'
64+
#' # Choosing a different computed variable to display, matching up the
65+
#' # histogram with the density plot
66+
#' ggplot(faithful, aes(x = waiting)) +
67+
#' geom_histogram(aes(y = after_stat(density))) +
68+
#' geom_density()
69+
#' ```
70+
#'
71+
#' ## Stage 3: after scale transformation
72+
#' The third and last stage is after the data has been transformed and
73+
#' mapped by the plot scales. An example of mapping from scaled data could
74+
#' be to use a desaturated version of the stroke colour for fill. You should
75+
#' use `after_scale()` to flag evaluation of mapping for after data has been
76+
#' scaled. Evaluation after scaling will only have access to the final
77+
#' aesthetics of the layer (including non-mapped, default aesthetics).
78+
#'
79+
#' ```r
80+
#' # The exact colour is known after scale transformation
81+
#' ggplot(mpg, aes(cty, colour = factor(cyl))) +
82+
#' geom_density()
3883
#'
84+
#' # We re-use colour properties for the fill without a separate fill scale
85+
#' ggplot(mpg, aes(cty, colour = factor(cyl))) +
86+
#' geom_density(aes(fill = after_scale(alpha(colour, 0.3))))
87+
#' ```
88+
#'
89+
#' ## Complex staging
90+
#' If you want to map the same aesthetic multiple times, e.g. map `x` to a
91+
#' data column for the stat, but remap it for the geom, you can use the
92+
#' `stage()` function to collect multiple mappings.
93+
#'
94+
#' ```r
95+
#' # Use stage to modify the scaled fill
96+
#' ggplot(mpg, aes(class, hwy)) +
97+
#' geom_boxplot(aes(fill = stage(class, after_scale = alpha(fill, 0.4))))
98+
#'
99+
#' # Using data for computing summary, but placing label elsewhere.
100+
#' # Also, we're making our own computed variable to use for the label.
101+
#' ggplot(mpg, aes(class, displ)) +
102+
#' geom_violin() +
103+
#' stat_summary(
104+
#' aes(
105+
#' y = stage(displ, after_stat = 8),
106+
#' label = after_stat(paste(mean, "±", sd))
107+
#' ),
108+
#' geom = "text",
109+
#' fun.data = ~ round(data.frame(mean = mean(.x), sd = sd(.x)), 2)
110+
#' )
111+
#' ```
39112
#' @rdname aes_eval
40113
#' @name aes_eval
41114
#'
@@ -55,6 +128,52 @@
55128
#' # Use stage to modify the scaled fill
56129
#' ggplot(mpg, aes(class, hwy)) +
57130
#' geom_boxplot(aes(fill = stage(class, after_scale = alpha(fill, 0.4))))
131+
#'
132+
#' # Making a proportional stacked density plot
133+
#' ggplot(mpg, aes(cty)) +
134+
#' geom_density(
135+
#' aes(
136+
#' colour = factor(cyl),
137+
#' fill = after_scale(alpha(colour, 0.3)),
138+
#' y = after_stat(count / sum(n[!duplicated(group)]))
139+
#' ),
140+
#' position = "stack", bw = 1
141+
#' ) +
142+
#' geom_density(bw = 1)
143+
#'
144+
#' # Imitating a ridgeline plot
145+
#' ggplot(mpg, aes(cty, colour = factor(cyl))) +
146+
#' geom_ribbon(
147+
#' stat = "density", outline.type = "upper",
148+
#' aes(
149+
#' fill = after_scale(alpha(colour, 0.3)),
150+
#' ymin = after_stat(group),
151+
#' ymax = after_stat(group + ndensity)
152+
#' )
153+
#' )
154+
#'
155+
#' # Labelling a bar plot
156+
#' ggplot(mpg, aes(class)) +
157+
#' geom_bar() +
158+
#' geom_text(
159+
#' aes(
160+
#' y = after_stat(count + 2),
161+
#' label = after_stat(count)
162+
#' ),
163+
#' stat = "count"
164+
#' )
165+
#'
166+
#' # Labelling the upper hinge of a boxplot,
167+
#' # inspired by June Choe
168+
#' ggplot(mpg, aes(displ, class)) +
169+
#' geom_boxplot(outlier.shape = NA) +
170+
#' geom_text(
171+
#' aes(
172+
#' label = after_stat(xmax),
173+
#' x = stage(displ, after_stat = xmax)
174+
#' ),
175+
#' stat = "boxplot", hjust = -0.5
176+
#' )
58177
NULL
59178

60179
#' @rdname aes_eval

R/aes.r

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,8 @@ NULL
3131
#' are typically omitted because they are so common; all other aesthetics must be named.
3232
#' @seealso [vars()] for another quoting function designed for
3333
#' faceting specifications.
34+
#'
35+
#' [Delayed evaluation][aes_eval] for working with computed variables.
3436
#' @return A list with class `uneval`. Components of the list are either
3537
#' quosures or constants.
3638
#' @export

R/geom-dotplot.r

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -17,18 +17,17 @@
1717
#' to match the number of dots.
1818
#'
1919
#' @eval rd_aesthetics("geom", "dotplot")
20-
#' @section Computed variables:
21-
#' \describe{
22-
#' \item{x}{center of each bin, if binaxis is "x"}
23-
#' \item{y}{center of each bin, if binaxis is "x"}
24-
#' \item{binwidth}{max width of each bin if method is "dotdensity";
25-
#' width of each bin if method is "histodot"}
26-
#' \item{count}{number of points in bin}
27-
#' \item{ncount}{count, scaled to maximum of 1}
28-
#' \item{density}{density of points in bin, scaled to integrate to 1,
29-
#' if method is "histodot"}
30-
#' \item{ndensity}{density, scaled to maximum of 1, if method is "histodot"}
31-
#' }
20+
#' @eval rd_computed_vars(
21+
#' x = 'center of each bin, if `binaxis` is `"x"`.',
22+
#' y = 'center of each bin, if `binaxis` is `"x"`.',
23+
#' binwidth = 'maximum width of each bin if method is `"dotdensity"`;
24+
#' width of each bin if method is `"histodot"`.',
25+
#' count = "number of points in bin.",
26+
#' ncount = "count, scaled to a maximum of 1.",
27+
#' density = 'density of points in bin, scaled to integrate to 1, if method
28+
#' is `"histodot"`.',
29+
#' ndensity = 'density, scaled to maximum of 1, if method is `"histodot"`.'
30+
#' )
3231
#'
3332
#' @inheritParams layer
3433
#' @inheritParams geom_point

R/stat-bin.r

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -27,14 +27,13 @@
2727
#' or left edges of bins are included in the bin.
2828
#' @param pad If `TRUE`, adds empty bins at either end of x. This ensures
2929
#' frequency polygons touch 0. Defaults to `FALSE`.
30-
#' @section Computed variables:
31-
#' \describe{
32-
#' \item{`count`}{number of points in bin}
33-
#' \item{`density`}{density of points in bin, scaled to integrate to 1}
34-
#' \item{`ncount`}{count, scaled to maximum of 1}
35-
#' \item{`ndensity`}{density, scaled to maximum of 1}
36-
#' \item{`width`}{widths of bins}
37-
#' }
30+
#' @eval rd_computed_vars(
31+
#' count = "number of points in bin.",
32+
#' density = "density of points in bin, scaled to integrate to 1.",
33+
#' ncount = "count, scaled to a maximum of 1.",
34+
#' ndensity = "density, scaled to a maximum of 1.",
35+
#' width = "widths of bins."
36+
#' )
3837
#'
3938
#' @section Dropped variables:
4039
#' \describe{

R/stat-bin2d.r

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,12 @@
55
#' @param drop if `TRUE` removes all cells with 0 counts.
66
#' @export
77
#' @rdname geom_bin_2d
8-
#' @section Computed variables:
9-
#' \describe{
10-
#' \item{count}{number of points in bin}
11-
#' \item{density}{density of points in bin, scaled to integrate to 1}
12-
#' \item{ncount}{count, scaled to maximum of 1}
13-
#' \item{ndensity}{density, scaled to maximum of 1}
14-
#' }
8+
#' @eval rd_computed_vars(
9+
#' count = "number of points in bin.",
10+
#' density = "density of points in bin, scaled to integrate to 1.",
11+
#' ncount = "count, scaled to maximum of 1.",
12+
#' ndensity = "density, scaled to a maximum of 1."
13+
#' )
1514
stat_bin_2d <- function(mapping = NULL, data = NULL,
1615
geom = "tile", position = "identity",
1716
...,

R/stat-binhex.r

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,12 @@
11
#' @export
22
#' @rdname geom_hex
33
#' @inheritParams stat_bin_2d
4-
#' @section Computed variables:
5-
#' \describe{
6-
#' \item{count}{number of points in bin}
7-
#' \item{density}{density of points in bin, scaled to integrate to 1}
8-
#' \item{ncount}{count, scaled to maximum of 1}
9-
#' \item{ndensity}{density, scaled to maximum of 1}
10-
#' }
4+
#' @eval rd_computed_vars(
5+
#' count = "number of points in bin.",
6+
#' density = "density of points in bin, scaled to integrate to 1.",
7+
#' ncount = "count, scaled to maximum of 1.",
8+
#' ndensity = "density, scaled to maximum of 1."
9+
#' )
1110
stat_bin_hex <- function(mapping = NULL, data = NULL,
1211
geom = "hex", position = "identity",
1312
...,

R/stat-boxplot.r

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,21 @@
11
#' @rdname geom_boxplot
22
#' @param coef Length of the whiskers as multiple of IQR. Defaults to 1.5.
33
#' @inheritParams stat_identity
4-
#' @section Computed variables:
5-
#' `stat_boxplot()` provides the following variables, some of which depend on the orientation:
6-
#' \describe{
7-
#' \item{width}{width of boxplot}
8-
#' \item{ymin *or* xmin}{lower whisker = smallest observation greater than or equal to lower hinge - 1.5 * IQR}
9-
#' \item{lower *or* xlower}{lower hinge, 25% quantile}
10-
#' \item{notchlower}{lower edge of notch = median - 1.58 * IQR / sqrt(n)}
11-
#' \item{middle *or* xmiddle}{median, 50% quantile}
12-
#' \item{notchupper}{upper edge of notch = median + 1.58 * IQR / sqrt(n)}
13-
#' \item{upper *or* xupper}{upper hinge, 75% quantile}
14-
#' \item{ymax *or* xmax}{upper whisker = largest observation less than or equal to upper hinge + 1.5 * IQR}
15-
#' }
164
#' @export
5+
#' @eval rd_computed_vars(
6+
#' .details = "`stat_boxplot()` provides the following variables, some of
7+
#' which depend on the orientation:",
8+
#' width = "width of boxplot.",
9+
#' "ymin|xmin" = "lower whisker = smallest observation greater than or equal
10+
#' to lower hinger - 1.5 * IQR.",
11+
#' "lower|xlower" = "lower hinge, 25% quantile.",
12+
#' notchlower = "lower edge of notch = median - 1.58 * IQR / sqrt(n).",
13+
#' "middle|xmiddle" = "median, 50% quantile.",
14+
#' notchupper = "upper edge of notch = median + 1.58 * IQR / sqrt(n).",
15+
#' "upper|xupper" = "upper hinge, 75% quantile.",
16+
#' "ymax|xmax" = "upper whisker = largest observation less than or equal to
17+
#' upper hinger + 1.5 * IQR."
18+
#' )
1719
stat_boxplot <- function(mapping = NULL, data = NULL,
1820
geom = "boxplot", position = "dodge2",
1921
...,

R/stat-contour.r

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,21 @@
33
#' @export
44
#' @eval rd_aesthetics("stat", "contour")
55
#' @eval rd_aesthetics("stat", "contour_filled")
6-
#' @section Computed variables:
7-
#' The computed variables differ somewhat for contour lines (computed by
8-
#' `stat_contour()`) and contour bands (filled contours, computed by `stat_contour_filled()`).
9-
#' The variables `nlevel` and `piece` are available for both, whereas `level_low`, `level_high`,
10-
#' and `level_mid` are only available for bands. The variable `level` is a numeric or a factor
11-
#' depending on whether lines or bands are calculated.
12-
#' \describe{
13-
#' \item{`level`}{Height of contour. For contour lines, this is numeric vector that
14-
#' represents bin boundaries. For contour bands, this is an ordered factor that
15-
#' represents bin ranges.}
16-
#' \item{`level_low`, `level_high`, `level_mid`}{(contour bands only) Lower and upper
17-
#' bin boundaries for each band, as well the mid point between the boundaries.}
18-
#' \item{`nlevel`}{Height of contour, scaled to maximum of 1.}
19-
#' \item{`piece`}{Contour piece (an integer).}
20-
#' }
6+
#' @eval rd_computed_vars(
7+
#' .details = "The computed variables differ somewhat for contour lines
8+
#' (compbuted by `stat_contour()`) and contour bands (filled contours,
9+
#' computed by `stat_contour_filled()`). The variables `nlevel` and `piece`
10+
#' are available for both, whereas `level_low`, `level_high`, and `level_mid`
11+
#' are only available for bands. The variable `level` is a numeric or a factor
12+
#' depending on whether lines or bands are calculated.",
13+
#' level = "Height of contour. For contour lines, this is a numeric vector
14+
#' that represents bin boundaries. For contour bands, this is an ordered
15+
#' factor that represents bin ranges.",
16+
#' "level_low,level_high,level_mid" = "(contour bands only) Lower and upper
17+
#' bin boundaries for each band, as well as the mid point between boundaries.",
18+
#' nlevel = "Height of contour, scaled to a maximum of 1.",
19+
#' piece = "Contour piece (an integer)."
20+
#' )
2121
#'
2222
#' @section Dropped variables:
2323
#' \describe{

R/stat-count.r

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
1-
#' @section Computed variables:
2-
#' \describe{
3-
#' \item{count}{number of points in bin}
4-
#' \item{prop}{groupwise proportion}
5-
#' }
1+
#' @eval rd_computed_vars(
2+
#' count = "number of points in bin.",
3+
#' prop = "groupwise proportion"
4+
#' )
65
#' @seealso [stat_bin()], which bins data in ranges and counts the
76
#' cases in each range. It differs from `stat_count()`, which counts the
87
#' number of cases at each `x` position (without binning into ranges).

0 commit comments

Comments
 (0)