Skip to content

Fix overlap identification in position_dodge2() #5939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 11, 2024

Conversation

teunbrand
Copy link
Collaborator

@teunbrand teunbrand commented Jun 7, 2024

This PR aims to fix #4327 and fix #5938.

Briefly, find_x_overlap() is completely rewritten.
The trick was to search wether xmin[i + 1] is larger than cummax(xmax)[i], then start a new range.

The reprex from #4327, note no misplacement of points:

devtools::load_all("~/packages/ggplot2")
#> ℹ Loading ggplot2

df <- data.frame(
  dimension = paste0("dim ", c(1, 1, 2, 2)),
  group = c("A", "B", "A", "B"),
  value = c(0.6, 0.7, 0.4, 0.6)
)

ggplot(df, aes(dimension, value, colour = group)) +
  geom_point(position = position_dodge2(width = 0))

Reprex from #5938, note range 3 is correctly identified as overlapping with 1 & 2.

f <- function(x, xend) find_x_overlaps(data.frame(xmin = x, xmax = xend))

df <- data.frame(
  xmin = c(1, 2, 3, 5),
  xmax = c(4, 3, 4, 6),
  group = factor(1:4)
)

ggplot(df, aes(x = xmin, xend = xmax, y = group)) +
  geom_segment(aes(colour = factor(f(xmin, xmax))))

Better dodging behaviour:

ggplot(df, aes(xmin = xmin, xmax = xmax)) +
  geom_rect(aes(ymin = 0, ymax = 1, fill = group),
            alpha = 0.75,
            position = position_dodge2())

Time considerations in details (tl;dr: it's faster)

new <- function(df) {
  
  start   <- df$xmin
  nonzero <- df$xmax != df$xmin
  missing <- is.na(df$xmin) | is.na(df$xmax)
  end <- cummax(c(df$xmax[1], df$xmax[-nrow(df)]))
  overlaps <- cumsum(start > end | (start == end & nonzero))
  overlaps[missing] <- seq_len(sum(missing)) + max(overlaps, na.rm = TRUE)
  match(overlaps, unique0(overlaps))
}

old <- function(df) {
  overlaps <- numeric(nrow(df))
  overlaps[1] <- counter <- 1
  for (i in seq_asc(2, nrow(df))) {
    if (is.na(df$xmin[i]) || is.na(df$xmax[i - 1]) || df$xmin[i] >= df$xmax[i - 1]) {
      counter <- counter + 1
    }
    overlaps[i] <- counter
  }
  overlaps
}

df <- data.frame(xmin = runif(1000, max = 100))
df$xmax <- df$xmin + runif(1000, max = 2)
df <- df[order(df$xmin, df$xmax), ]

bench::mark(
  new(df),
  old(df),
  check = FALSE
)
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 new(df)      29.4µs   30.8µs    27374.    99.9KB   106.  
#> 2 old(df)      1.53ms   1.61ms      579.    95.5KB     8.42

Also tested for smaller nrow(df) and new is faster than old from nrow(df) == 5 onwards.

Created on 2024-06-07 with reprex v2.1.0

@teunbrand teunbrand changed the title Fix overlap identifier Fix overlap identification in position_dodge2() Jun 7, 2024
Copy link
Member

@thomasp85 thomasp85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - nice work

@teunbrand teunbrand merged commit 6298aec into tidyverse:main Jul 11, 2024
11 checks passed
@teunbrand teunbrand deleted the fix_overlap branch July 11, 2024 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Internal function find_x_overlaps() has bug point is misplaced if width = 0 is used in position_dodge2 with geom_point
2 participants