-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[ADT] Make Zippy more iterator-like for lifetime safety #112441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADT] Make Zippy more iterator-like for lifetime safety #112441
Conversation
@geoffromer identifier/encountered a lifetime issue when using concat+zip, zip would return by value, concat would take references to that value and use them in its result after they had expired. This is a common problem with range adapters and the lifetime of values. But it's also non-conforming with the C++ iterator requirements, I think - partly because op-> should be supported (which I haven't done here) and that basically has to return by pointer. So the best thing is to stash a value in the iterator and return a pointer/reference to that.
@llvm/pr-subscribers-llvm-transforms @llvm/pr-subscribers-llvm-adt Author: David Blaikie (dwblaikie) Changes@geoffromer identifier/encountered a lifetime issue when using concat+zip, zip would return by value, concat would take references to that value and use them in its result after they had expired. This is a common problem with range adapters and the lifetime of values. But it's also non-conforming with the C++ iterator requirements, I think
So the best thing is to stash a value in the iterator and return a pointer/reference to that. (some context that may or may not be relevant to this part of the code may be in 981ce8f ) Full diff: https://github.com/llvm/llvm-project/pull/112441.diff 3 Files Affected:
diff --git a/llvm/include/llvm/ADT/STLExtras.h b/llvm/include/llvm/ADT/STLExtras.h
index eb441bb31c9bc8..3692b199c9ec1c 100644
--- a/llvm/include/llvm/ADT/STLExtras.h
+++ b/llvm/include/llvm/ADT/STLExtras.h
@@ -704,10 +704,12 @@ struct zip_common : public zip_traits<ZipType, ReferenceTupleType, Iters...> {
using value_type = typename Base::value_type;
std::tuple<Iters...> iterators;
+ mutable std::optional<value_type> value;
protected:
- template <size_t... Ns> value_type deref(std::index_sequence<Ns...>) const {
- return value_type(*std::get<Ns>(iterators)...);
+ template <size_t... Ns> const value_type &deref(std::index_sequence<Ns...>) const {
+ value.emplace(*std::get<Ns>(iterators)...);
+ return *value;
}
template <size_t... Ns> void tup_inc(std::index_sequence<Ns...>) {
@@ -728,7 +730,7 @@ struct zip_common : public zip_traits<ZipType, ReferenceTupleType, Iters...> {
public:
zip_common(Iters &&... ts) : iterators(std::forward<Iters>(ts)...) {}
- value_type operator*() const { return deref(IndexSequence{}); }
+ const value_type &operator*() const { return deref(IndexSequence{}); }
ZipType &operator++() {
tup_inc(IndexSequence{});
diff --git a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
index 336126cc1fbc21..61aeb03634645c 100644
--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -9991,7 +9991,7 @@ class BoUpSLP::ShuffleCostEstimator : public BaseShuffleAnalysis {
if (!E->ReorderIndices.empty() && CommonVF == E->ReorderIndices.size() &&
CommonVF == CommonMask.size() &&
any_of(enumerate(CommonMask),
- [](const auto &&P) {
+ [](const auto &P) {
return P.value() != PoisonMaskElem &&
static_cast<unsigned>(P.value()) != P.index();
}) &&
diff --git a/llvm/unittests/ADT/IteratorTest.cpp b/llvm/unittests/ADT/IteratorTest.cpp
index a0d3c9b564d857..6c9aba8fcad2b8 100644
--- a/llvm/unittests/ADT/IteratorTest.cpp
+++ b/llvm/unittests/ADT/IteratorTest.cpp
@@ -482,39 +482,26 @@ TEST(ZipIteratorTest, ZipEqualConstCorrectness) {
EXPECT_THAT(first, ElementsAre(0, 0, 0));
EXPECT_THAT(second, ElementsAre(true, true, true));
- std::vector<bool> nemesis = {true, false, true};
- const std::vector<bool> c_nemesis = nemesis;
-
- for (auto &&[a, b, c, d] : zip_equal(first, c_first, nemesis, c_nemesis)) {
+ for (auto &&[a, b] : zip_equal(first, c_first)) {
a = 2;
- c = true;
static_assert(!IsConstRef<decltype(a)>);
static_assert(IsConstRef<decltype(b)>);
- static_assert(!IsBoolConstRef<decltype(c)>);
- static_assert(IsBoolConstRef<decltype(d)>);
}
EXPECT_THAT(first, ElementsAre(2, 2, 2));
- EXPECT_THAT(nemesis, ElementsAre(true, true, true));
unsigned iters = 0;
- for (const auto &[a, b, c, d] :
- zip_equal(first, c_first, nemesis, c_nemesis)) {
+ for (const auto &[a, b] : zip_equal(first, c_first)) {
static_assert(!IsConstRef<decltype(a)>);
static_assert(IsConstRef<decltype(b)>);
- static_assert(!IsBoolConstRef<decltype(c)>);
- static_assert(IsBoolConstRef<decltype(d)>);
++iters;
}
EXPECT_EQ(iters, 3u);
iters = 0;
- for (const auto &[a, b, c, d] :
- MakeConst(zip_equal(first, c_first, nemesis, c_nemesis))) {
+ for (const auto &[a, b] : MakeConst(zip_equal(first, c_first))) {
static_assert(!IsConstRef<decltype(a)>);
static_assert(IsConstRef<decltype(b)>);
- static_assert(!IsBoolConstRef<decltype(c)>);
- static_assert(IsBoolConstRef<decltype(d)>);
++iters;
}
EXPECT_EQ(iters, 3u);
@@ -643,6 +630,23 @@ TEST(ZipIteratorTest, Mutability) {
}
}
+TEST(ZipIteratorTest, Lifetime) {
+ SmallVector<unsigned> v1 = {1, 2, 3, 4};
+ SmallVector<unsigned> v2 = {5, 6, 7, 8};
+
+ auto zipper = zip(v1, v2);
+
+ auto zip_iter = zipper.begin();
+ EXPECT_NE(zip_iter, zipper.end());
+
+ auto *elem = &*zip_iter;
+ EXPECT_EQ(std::get<0>(*elem), 1u);
+ EXPECT_EQ(std::get<1>(*elem), 5u);
+
+ std::get<0>(*elem) = 42;
+ EXPECT_EQ(v1[0], 42u);
+}
+
TEST(ZipIteratorTest, ZipFirstMutability) {
using namespace std;
vector<unsigned> pi{3, 1, 4, 1, 5, 9};
|
You can test this locally with the following command:git-clang-format --diff 23da16933b8ad48a967905369f576e5ec45b985f 75eb8c747a41c5e9a0275def57e534083d973921 --extensions h,cpp -- llvm/include/llvm/ADT/STLExtras.h llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp llvm/unittests/ADT/IteratorTest.cpp View the diff from clang-format here.diff --git a/llvm/include/llvm/ADT/STLExtras.h b/llvm/include/llvm/ADT/STLExtras.h
index 3692b199c9..672b068f9f 100644
--- a/llvm/include/llvm/ADT/STLExtras.h
+++ b/llvm/include/llvm/ADT/STLExtras.h
@@ -707,7 +707,8 @@ struct zip_common : public zip_traits<ZipType, ReferenceTupleType, Iters...> {
mutable std::optional<value_type> value;
protected:
- template <size_t... Ns> const value_type &deref(std::index_sequence<Ns...>) const {
+ template <size_t... Ns>
+ const value_type &deref(std::index_sequence<Ns...>) const {
value.emplace(*std::get<Ns>(iterators)...);
return *value;
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the impact on compilation times? If I'm reading this right, this will increase the size of Zippy to the size of the value type? For example, if I zip two vectors of std::string
, the value_type
will be something like tuple<string&, string&>
==> 16B?
I think concat works around it without an extra member?
llvm-project/llvm/include/llvm/ADT/STLExtras.h
Lines 1068 to 1089 in df05512
template <size_t Index> ValueT *getHelper() const { | |
auto &Begin = std::get<Index>(Begins); | |
auto &End = std::get<Index>(Ends); | |
if (Begin == End) | |
return nullptr; | |
return &*Begin; | |
} | |
/// Finds the first non-end iterator, dereferences, and returns the resulting | |
/// reference. | |
/// | |
/// It is an error to call this with all iterators at the end. | |
template <size_t... Ns> ValueT &get(std::index_sequence<Ns...>) const { | |
// Build a sequence of functions to get from iterator if possible. | |
ValueT *(concat_iterator::*GetHelperFns[])() const = { | |
&concat_iterator::getHelper<Ns>...}; | |
// Loop over them, and return the first result we find. | |
for (auto &GetHelperFn : GetHelperFns) | |
if (ValueT *P = (this->*GetHelperFn)()) | |
return *P; |
identifier/encountered a lifetime issue when using concat+zip, zip would return by value, concat would take references to that value and use them in its result after they had expired.
Could you share a repro? Last time I checked, concat
also forwarded ranges to its storage, so I'm not exactly sure what the issue is.
value.emplace(*std::get<Ns>(iterators)...); | ||
return *value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between emplace and assignment here?
std::vector<bool> nemesis = {true, false, true}; | ||
const std::vector<bool> c_nemesis = nemesis; | ||
|
||
for (auto &&[a, b, c, d] : zip_equal(first, c_first, nemesis, c_nemesis)) { | ||
for (auto &&[a, b] : zip_equal(first, c_first)) { | ||
a = 2; | ||
c = true; | ||
static_assert(!IsConstRef<decltype(a)>); | ||
static_assert(IsConstRef<decltype(b)>); | ||
static_assert(!IsBoolConstRef<decltype(c)>); | ||
static_assert(IsBoolConstRef<decltype(d)>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this removed?
(sorry, missed some details in the original description that could've avoided the extra roundtrip/questions - thanks for asking!)
That's correct.
concat doesn't synthesize a new value - its value are the values of the underlying iterators/ranges - so it doesn't have this problem/need a workaround. I've tried to address/ensure others address this issue on iterators/range adapters in ADT, but I've certainly not been the most consistent. mapped_iterator would be an example of another range/iterator with this problem, if the mapping function returns by value (if it returns by reference, it's fine - the lifetime of the object is managed elsewhere) - doesn't look like we do anything to ensure the mapping function returns by reference... - though at least when it does return by reference you get the good behavior, etc. And if it returns by value - it won't compose well/you'll have lifetime issues on composition.
Not so much about the storage of the ranges itself, but of the values produced by the iterators. I'll see about an example. I /think/ this is sufficient: https://godbolt.org/z/aK8drvGvY - (godbolt can't link to the LLVM libraries, it only has headers, so I can't create a runnable example) - I expect the place where this was encountered as an asan failure was where LLVM was installed as a system library, in which case the permissive error would've been silenced & the bug would've compiled without any compiler diagnostics. |
OK, so the issue is that the lifetime of objects created by dereferencing zip iterators can be shorter than the iterators themselves? Did I get it right? Thanks for the repro. Based on the error message, would it be something we could work around with some I'm asking because IIRC returning values from the deref operator will become valid in C++20, so I wonder if we will have to fix it this way anyway later on. |
Yep, pretty much.
Possibly
I had some vague sense of that too, but didn't find (admittedly didn't look terribly hard) details on this - do you happen to have pointers to the specifics here? |
IANALL but I found these:
|
It's more of an addition than a change: the new iterator concepts like Ultimately, this isn't a question of whether the iterator is "valid", it's a question of what usages the iterator supports. Right now, the |
My point is that there are two things we could change: zippy or concat. We could teach concat to support iterators that don't satisfy this requirement and cover more range types in the same sweep as fixing issues with concat of zip. |
Yeah, fair - I'll look into what it'd take to make concat compatible with the |
… value (such as zip) If any iterator in the concatenation returns by value, the result must return by value otherwise it'll produce dangling references. An alternative to llvm#112441
Concat change posted here: #112783 |
@geoffromer identifier/encountered a lifetime issue when using concat+zip, zip would return by value, concat would take references to that value and use them in its result after they had expired.
This is a common problem with range adapters and the lifetime of values.
But it's also non-conforming with the C++ iterator requirements, I think
So the best thing is to stash a value in the iterator and return a pointer/reference to that.
(some context that may or may not be relevant to this part of the code may be in 981ce8f )