-
Notifications
You must be signed in to change notification settings - Fork 2.1k
No retransform in mapping statistics #4835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I need to sit down with this some more, but wouldn't that mean values mapped from the stat exist in a different untransformed scale than those mapped from the data? |
I'm not sure I can come up with a code example that would demonstrate this. The way I see it, is that this PR prevents the double-transform problem. It doesn't touch the mismatch between user-space and data-space that Claus mentions #4194 (comment). Towards that mismatch, it might worthwhile to simply document in |
At least, it's surprising that this change won't break any tests...! |
Indeed! I tried to come up with examples where it would show more unexpected behaviour than the current situation, but I'm having trouble imagining such examples. |
Can we collect a few more examples where the transform makes a difference (regardless of which version is correct, with the line in place or absent)? Maybe this will help us better understand what is happening. If the current code has a genuine bug then you'd expect it to only affect things that are currently broken, and nobody may be writing such code because it doesn't currently work. |
Here are some examples of things that go differently with PR and without PR: With PRlibrary(ggplot2)
p1 <- ggplot(head(economics, 24)) +
aes(x = date, xend = after_stat(x + 30),
y = unemploy, yend = unemploy) +
geom_segment()
p2 <- ggplot(data.frame(value = 16),
aes(value, 1)) +
geom_point() +
geom_point(
aes(stage(value, after_stat = x), 2),
) +
scale_x_sqrt(limits = c(0, 16))
p3 <- ggplot(data.frame(x = c(1, 10), y = 1:2)) +
geom_rect(
aes(xmin = x, xmax = after_stat(xmin + 1), ymin = y, ymax = y + 1)
) +
scale_x_log10()
patchwork::wrap_plots(p1, p2, p3) Created on 2022-05-14 by the reprex package (v2.0.1) With CRAN ggplot2# Repeat what was done in chunk above (omitted for brevity)
patchwork::wrap_plots(p2, p3) Differences in words
|
Looked a bit further into the history of these particular lines, and it seems that idea that the stat-output should be transformed came from this commit: 4cb2af3. That made me realise that this PR ruins the following (note that untransformed maximum count is at 25): library(ggplot2)
ggplot(faithful, aes(waiting)) +
geom_histogram() +
scale_y_sqrt()
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`. Created on 2022-05-15 by the reprex package (v2.0.1) So, it would probably be better not to merge this PR after all. I apologise for wasting everyone's time with this. |
No worries, this isn't a waste of time. It's really great that we at last figured out the case when no-retransform matters, which no one here were aware of (I tried to dive into the history, but couldn't do it well like you). Thanks for tackling the issue! Here's a bit simpler (or more complex?) example. library(ggplot2)
p_bar <- ggplot(mpg) +
geom_bar(aes(class)) +
scale_y_sqrt() +
ggtitle("geom_bar()")
d <- dplyr::count(mpg, class)
p_col <- ggplot(d) +
geom_col(aes(class, n)) +
scale_y_sqrt() +
ggtitle("geom_col()")
patchwork::wrap_plots(p_bar, p_col) ########## DANGER ZONE ##############
p_bar$layers[[1]]$stat$retransform <- FALSE
p_col$layers[[1]]$stat$retransform <- FALSE
patchwork::wrap_plots(p_bar, p_col) Created on 2022-05-15 by the reprex package (v2.0.1) |
I think it would be really helpful to add a test that fails when the retransform is disabled. @teunbrand, would you be willing to do that? |
This is the PR following-up on the comment left under #4194 intended to fix #4155.
As mentioned before: I'm not familiar with all the design decisions at stake, nor do I know the history of
map_statistic
very well, so feel free to ignore or correct me if I suggest something too brazenly.The idea of this PR is pretty simple. If #4194 has thought us that we may need to back-transform data before applying the
stage()
family of functions, why not just delete the forward transform?It should (1) resolve the examples posted by Hiroaki and myself in #4155 and (2) have a tiny performance boost. I can't imagine what case(s) this might break on top of my head, so it would be good for someone with a keener eye and/or foresight to have a second look.