Skip to content

[ADT] Make Zippy more iterator-like for lifetime safety #112441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

dwblaikie
Copy link
Collaborator

@geoffromer identifier/encountered a lifetime issue when using concat+zip, zip would return by value, concat would take references to that value and use them in its result after they had expired.

This is a common problem with range adapters and the lifetime of values.

But it's also non-conforming with the C++ iterator requirements, I think

  • partly because op-> should be supported (which I haven't done here) and that basically has to return by pointer.

So the best thing is to stash a value in the iterator and return a pointer/reference to that.

(some context that may or may not be relevant to this part of the code may be in 981ce8f )

@geoffromer identifier/encountered a lifetime issue when using
concat+zip, zip would return by value, concat would take references to
that value and use them in its result after they had expired.

This is a common problem with range adapters and the lifetime of values.

But it's also non-conforming with the C++ iterator requirements, I think
- partly because op-> should be supported (which I haven't done here)
and that basically has to return by pointer.

So the best thing is to stash a value in the iterator and return a
pointer/reference to that.
@llvmbot
Copy link
Member

llvmbot commented Oct 15, 2024

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-adt

Author: David Blaikie (dwblaikie)

Changes

@geoffromer identifier/encountered a lifetime issue when using concat+zip, zip would return by value, concat would take references to that value and use them in its result after they had expired.

This is a common problem with range adapters and the lifetime of values.

But it's also non-conforming with the C++ iterator requirements, I think

  • partly because op-> should be supported (which I haven't done here) and that basically has to return by pointer.

So the best thing is to stash a value in the iterator and return a pointer/reference to that.

(some context that may or may not be relevant to this part of the code may be in 981ce8f )


Full diff: https://github.com/llvm/llvm-project/pull/112441.diff

3 Files Affected:

  • (modified) llvm/include/llvm/ADT/STLExtras.h (+5-3)
  • (modified) llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp (+1-1)
  • (modified) llvm/unittests/ADT/IteratorTest.cpp (+20-16)
diff --git a/llvm/include/llvm/ADT/STLExtras.h b/llvm/include/llvm/ADT/STLExtras.h
index eb441bb31c9bc8..3692b199c9ec1c 100644
--- a/llvm/include/llvm/ADT/STLExtras.h
+++ b/llvm/include/llvm/ADT/STLExtras.h
@@ -704,10 +704,12 @@ struct zip_common : public zip_traits<ZipType, ReferenceTupleType, Iters...> {
   using value_type = typename Base::value_type;
 
   std::tuple<Iters...> iterators;
+  mutable std::optional<value_type> value;
 
 protected:
-  template <size_t... Ns> value_type deref(std::index_sequence<Ns...>) const {
-    return value_type(*std::get<Ns>(iterators)...);
+  template <size_t... Ns> const value_type &deref(std::index_sequence<Ns...>) const {
+    value.emplace(*std::get<Ns>(iterators)...);
+    return *value;
   }
 
   template <size_t... Ns> void tup_inc(std::index_sequence<Ns...>) {
@@ -728,7 +730,7 @@ struct zip_common : public zip_traits<ZipType, ReferenceTupleType, Iters...> {
 public:
   zip_common(Iters &&... ts) : iterators(std::forward<Iters>(ts)...) {}
 
-  value_type operator*() const { return deref(IndexSequence{}); }
+  const value_type &operator*() const { return deref(IndexSequence{}); }
 
   ZipType &operator++() {
     tup_inc(IndexSequence{});
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 336126cc1fbc21..61aeb03634645c 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -9991,7 +9991,7 @@ class BoUpSLP::ShuffleCostEstimator : public BaseShuffleAnalysis {
       if (!E->ReorderIndices.empty() && CommonVF == E->ReorderIndices.size() &&
           CommonVF == CommonMask.size() &&
           any_of(enumerate(CommonMask),
-                 [](const auto &&P) {
+                 [](const auto &P) {
                    return P.value() != PoisonMaskElem &&
                           static_cast<unsigned>(P.value()) != P.index();
                  }) &&
diff --git a/llvm/unittests/ADT/IteratorTest.cpp b/llvm/unittests/ADT/IteratorTest.cpp
index a0d3c9b564d857..6c9aba8fcad2b8 100644
--- a/llvm/unittests/ADT/IteratorTest.cpp
+++ b/llvm/unittests/ADT/IteratorTest.cpp
@@ -482,39 +482,26 @@ TEST(ZipIteratorTest, ZipEqualConstCorrectness) {
   EXPECT_THAT(first, ElementsAre(0, 0, 0));
   EXPECT_THAT(second, ElementsAre(true, true, true));
 
-  std::vector<bool> nemesis = {true, false, true};
-  const std::vector<bool> c_nemesis = nemesis;
-
-  for (auto &&[a, b, c, d] : zip_equal(first, c_first, nemesis, c_nemesis)) {
+  for (auto &&[a, b] : zip_equal(first, c_first)) {
     a = 2;
-    c = true;
     static_assert(!IsConstRef<decltype(a)>);
     static_assert(IsConstRef<decltype(b)>);
-    static_assert(!IsBoolConstRef<decltype(c)>);
-    static_assert(IsBoolConstRef<decltype(d)>);
   }
 
   EXPECT_THAT(first, ElementsAre(2, 2, 2));
-  EXPECT_THAT(nemesis, ElementsAre(true, true, true));
 
   unsigned iters = 0;
-  for (const auto &[a, b, c, d] :
-       zip_equal(first, c_first, nemesis, c_nemesis)) {
+  for (const auto &[a, b] : zip_equal(first, c_first)) {
     static_assert(!IsConstRef<decltype(a)>);
     static_assert(IsConstRef<decltype(b)>);
-    static_assert(!IsBoolConstRef<decltype(c)>);
-    static_assert(IsBoolConstRef<decltype(d)>);
     ++iters;
   }
   EXPECT_EQ(iters, 3u);
   iters = 0;
 
-  for (const auto &[a, b, c, d] :
-       MakeConst(zip_equal(first, c_first, nemesis, c_nemesis))) {
+  for (const auto &[a, b] : MakeConst(zip_equal(first, c_first))) {
     static_assert(!IsConstRef<decltype(a)>);
     static_assert(IsConstRef<decltype(b)>);
-    static_assert(!IsBoolConstRef<decltype(c)>);
-    static_assert(IsBoolConstRef<decltype(d)>);
     ++iters;
   }
   EXPECT_EQ(iters, 3u);
@@ -643,6 +630,23 @@ TEST(ZipIteratorTest, Mutability) {
   }
 }
 
+TEST(ZipIteratorTest, Lifetime) {
+  SmallVector<unsigned> v1 = {1, 2, 3, 4};
+  SmallVector<unsigned> v2 = {5, 6, 7, 8};
+
+  auto zipper = zip(v1, v2);
+
+  auto zip_iter = zipper.begin();
+  EXPECT_NE(zip_iter, zipper.end());
+
+  auto *elem = &*zip_iter;
+  EXPECT_EQ(std::get<0>(*elem), 1u);
+  EXPECT_EQ(std::get<1>(*elem), 5u);
+
+  std::get<0>(*elem) = 42;
+  EXPECT_EQ(v1[0], 42u);
+}
+
 TEST(ZipIteratorTest, ZipFirstMutability) {
   using namespace std;
   vector<unsigned> pi{3, 1, 4, 1, 5, 9};

Copy link

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 23da16933b8ad48a967905369f576e5ec45b985f 75eb8c747a41c5e9a0275def57e534083d973921 --extensions h,cpp -- llvm/include/llvm/ADT/STLExtras.h llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/unittests/ADT/IteratorTest.cpp
View the diff from clang-format here.
diff --git a/llvm/include/llvm/ADT/STLExtras.h b/llvm/include/llvm/ADT/STLExtras.h
index 3692b199c9..672b068f9f 100644
--- a/llvm/include/llvm/ADT/STLExtras.h
+++ b/llvm/include/llvm/ADT/STLExtras.h
@@ -707,7 +707,8 @@ struct zip_common : public zip_traits<ZipType, ReferenceTupleType, Iters...> {
   mutable std::optional<value_type> value;
 
 protected:
-  template <size_t... Ns> const value_type &deref(std::index_sequence<Ns...>) const {
+  template <size_t... Ns>
+  const value_type &deref(std::index_sequence<Ns...>) const {
     value.emplace(*std::get<Ns>(iterators)...);
     return *value;
   }

Copy link
Member

@kuhar kuhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the impact on compilation times? If I'm reading this right, this will increase the size of Zippy to the size of the value type? For example, if I zip two vectors of std::string, the value_type will be something like tuple<string&, string&> ==> 16B?

I think concat works around it without an extra member?

template <size_t Index> ValueT *getHelper() const {
auto &Begin = std::get<Index>(Begins);
auto &End = std::get<Index>(Ends);
if (Begin == End)
return nullptr;
return &*Begin;
}
/// Finds the first non-end iterator, dereferences, and returns the resulting
/// reference.
///
/// It is an error to call this with all iterators at the end.
template <size_t... Ns> ValueT &get(std::index_sequence<Ns...>) const {
// Build a sequence of functions to get from iterator if possible.
ValueT *(concat_iterator::*GetHelperFns[])() const = {
&concat_iterator::getHelper<Ns>...};
// Loop over them, and return the first result we find.
for (auto &GetHelperFn : GetHelperFns)
if (ValueT *P = (this->*GetHelperFn)())
return *P;

identifier/encountered a lifetime issue when using concat+zip, zip would return by value, concat would take references to that value and use them in its result after they had expired.

Could you share a repro? Last time I checked, concat also forwarded ranges to its storage, so I'm not exactly sure what the issue is.

Comment on lines +711 to +712
value.emplace(*std::get<Ns>(iterators)...);
return *value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the difference between emplace and assignment here?

Comment on lines -485 to -494
std::vector<bool> nemesis = {true, false, true};
const std::vector<bool> c_nemesis = nemesis;

for (auto &&[a, b, c, d] : zip_equal(first, c_first, nemesis, c_nemesis)) {
for (auto &&[a, b] : zip_equal(first, c_first)) {
a = 2;
c = true;
static_assert(!IsConstRef<decltype(a)>);
static_assert(IsConstRef<decltype(b)>);
static_assert(!IsBoolConstRef<decltype(c)>);
static_assert(IsBoolConstRef<decltype(d)>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removed?

@dwblaikie
Copy link
Collaborator Author

(sorry, missed some details in the original description that could've avoided the extra roundtrip/questions - thanks for asking!)

What's the impact on compilation times? If I'm reading this right, this will increase the size of Zippy to the size of the value type? For example, if I zip two vectors of std::string, the value_type will be something like tuple<string&, string&> ==> 16B?

That's correct.

I think concat works around it without an extra member?

concat doesn't synthesize a new value - its value are the values of the underlying iterators/ranges - so it doesn't have this problem/need a workaround.

I've tried to address/ensure others address this issue on iterators/range adapters in ADT, but I've certainly not been the most consistent.

mapped_iterator would be an example of another range/iterator with this problem, if the mapping function returns by value (if it returns by reference, it's fine - the lifetime of the object is managed elsewhere) - doesn't look like we do anything to ensure the mapping function returns by reference... - though at least when it does return by reference you get the good behavior, etc. And if it returns by value - it won't compose well/you'll have lifetime issues on composition.

make_second_range & similar seem to workaround the possibility that the underlying range might return a pair by value - returning by value if the underlying range returns by value...

template <size_t Index> ValueT *getHelper() const {
auto &Begin = std::get<Index>(Begins);
auto &End = std::get<Index>(Ends);
if (Begin == End)
return nullptr;
return &*Begin;
}
/// Finds the first non-end iterator, dereferences, and returns the resulting
/// reference.
///
/// It is an error to call this with all iterators at the end.
template <size_t... Ns> ValueT &get(std::index_sequence<Ns...>) const {
// Build a sequence of functions to get from iterator if possible.
ValueT *(concat_iterator::*GetHelperFns[])() const = {
&concat_iterator::getHelper<Ns>...};
// Loop over them, and return the first result we find.
for (auto &GetHelperFn : GetHelperFns)
if (ValueT *P = (this->*GetHelperFn)())
return *P;

identifier/encountered a lifetime issue when using concat+zip, zip would return by value, concat would take references to that value and use them in its result after they had expired.

Could you share a repro? Last time I checked, concat also forwarded ranges to its storage, so I'm not exactly sure what the issue is.

Not so much about the storage of the ranges itself, but of the values produced by the iterators.

I'll see about an example. I /think/ this is sufficient: https://godbolt.org/z/aK8drvGvY - (godbolt can't link to the LLVM libraries, it only has headers, so I can't create a runnable example) - I expect the place where this was encountered as an asan failure was where LLVM was installed as a system library, in which case the permissive error would've been silenced & the bug would've compiled without any compiler diagnostics.

@kuhar
Copy link
Member

kuhar commented Oct 16, 2024

OK, so the issue is that the lifetime of objects created by dereferencing zip iterators can be shorter than the iterators themselves? Did I get it right?

https://godbolt.org/z/aK8drvGvY

Thanks for the repro. Based on the error message, would it be something we could work around with some if constexpr (is_rvalue_ref_v<...>) in concat range instead?

I'm asking because IIRC returning values from the deref operator will become valid in C++20, so I wonder if we will have to fix it this way anyway later on.

@dwblaikie
Copy link
Collaborator Author

OK, so the issue is that the lifetime of objects created by dereferencing zip iterators can be shorter than the iterators themselves? Did I get it right?

Yep, pretty much.

Thanks for the repro. Based on the error message, would it be something we could work around with some if constexpr (is_rvalue_ref_v<...>) in concat range instead?

Possibly

I'm asking because IIRC returning values from the deref operator will become valid in C++20, so I wonder if we will have to fix it this way anyway later on.

I had some vague sense of that too, but didn't find (admittedly didn't look terribly hard) details on this - do you happen to have pointers to the specifics here?

@kuhar
Copy link
Member

kuhar commented Oct 16, 2024

I had some vague sense of that too, but didn't find (admittedly didn't look terribly hard) details on this - do you happen to have pointers to the specifics here?

IANALL but I found these:

@geoffromer
Copy link

I'm asking because IIRC returning values from the deref operator will become valid in C++20, so I wonder if we will have to fix it this way anyway later on.

I had some vague sense of that too, but didn't find (admittedly didn't look terribly hard) details on this - do you happen to have pointers to the specifics here?

It's more of an addition than a change: the new iterator concepts like std::forward_iterator don't require operator* to return an lvalue, but the pre-C++20 "named requirements" like ForwardIterator haven't changed, except that they're now called e.g. LegacyForwardIterator. See P0022 for background. Note that it was written in 2016, so the details may have changed, but I believe it reflects the overall direction that was eventually adopted.

Ultimately, this isn't a question of whether the iterator is "valid", it's a question of what usages the iterator supports. Right now, the zip iterator doesn't support usages that require operator* to return an lvalue, which means it doesn't support being nested inside concat. Nothing in C++20 will cause those usages to start working; if you want them to work, you'll need to change the iterator to make them work.

@kuhar
Copy link
Member

kuhar commented Oct 16, 2024

Ultimately, this isn't a question of whether the iterator is "valid", it's a question of what usages the iterator supports. Right now, the zip iterator doesn't support usages that require operator* to return an lvalue, which means it doesn't support being nested inside concat. Nothing in C++20 will cause those usages to start working; if you want them to work, you'll need to change the iterator to make them work.

My point is that there are two things we could change: zippy or concat. We could teach concat to support iterators that don't satisfy this requirement and cover more range types in the same sweep as fixing issues with concat of zip.

@dwblaikie
Copy link
Collaborator Author

Yeah, fair - I'll look into what it'd take to make concat compatible with the std::forward_iterator concept in this regard.

dwblaikie added a commit to dwblaikie/llvm-project that referenced this pull request Oct 17, 2024
… value (such as zip)

If any iterator in the concatenation returns by value, the result must
return by value otherwise it'll produce dangling references.

An alternative to llvm#112441
@dwblaikie
Copy link
Collaborator Author

Concat change posted here: #112783

dwblaikie added a commit that referenced this pull request Oct 18, 2024
… value (such as zip) (#112783)

If any iterator in the concatenation returns by value, the result must
return by value otherwise it'll produce dangling references.

(some context that may or may not be relevant to this part of the code
may be in
981ce8f
)

An alternative to #112441
@dwblaikie dwblaikie closed this Oct 18, 2024
@dwblaikie dwblaikie deleted the zip_return_by_reference_for_lifetime branch October 18, 2024 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants