Skip to content

Make the lazy assignment of an assoociated type/wtable atomic #31768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 14, 2020

Conversation

rjmccall
Copy link
Contributor

Should fix SR-12760.

@rjmccall rjmccall requested review from DougGregor and kubamracek May 13, 2020 22:02
@rjmccall
Copy link
Contributor Author

@swift-ci Please test.

@dabrahams
Copy link
Contributor

FWIW, we are still seeing TSan failures related to the metadata cache with this patch applied, e.g.

Previous read of size 8 at 0x7ffed2a13b80 by thread T69:
    #0 initializeWithCopy third_party/unsupported_toolchains/swift/toolchains/stable/runtime_srcs/swift/stdlib/public/runtime/MetadataImpl.h:103:25 (d4aeea9bceaeffeccd52bdb1b4a68d441e8cdec71d47e435d100dfcffb2263ef_020044d584a8+0x199b455d)
    #1 swift::metadataimpl::ValueWitnesses<swift::metadataimpl::NativeBox<unsigned long, 8ul, 8ul, 8ul> >::initializeWithCopy(swift::OpaqueValue*, swift::OpaqueValue*, swift::TargetMetadata<swift::InProcess> const*) third_party/unsupported_toolchains/swift/toolchains/stable/runtime_srcs/swift/stdlib/public/runtime/MetadataImpl.h:786:27 (d4aeea9bceaeffeccd52bdb1b4a68d441e8cdec71d47e435d100dfcffb2263ef_020044d584a8+0x199b455d)

@pschuh
Copy link
Contributor

pschuh commented May 13, 2020

I think you might want to still do some sort of lock on the slow path. Like so:

static std::recursive_mutex slow_path_lock;
template <typename T>
struct AtomicCachedPointer {
  template <typename SlowPathFunctor>
  T* get(SlowPathFunctor functor) {
     T* data_tmp = data.load(std::memory_order_acquire);
     if (isIninitialized(data_tmp)) { return data_tmp; }
     std::lock_guard<std::recursive_mutex> guard(slow_path_lock);
     // Inside the lock double-check that other threads haven't already initialized.
     T* data_tmp_try_two = data.load(std::memory_order_acquire);
     if (!isIninitialized(data_tmp_try_two)) {
        // Actually compute the slow path:
        data_tmp_try_two = functor();
        data.store(data_tmp_try_two);
     }
     return data_tmp_try_two;
  }
  std::atomic<T*> data;
};

Otherwise, the slow path will get called twice.

@rjmccall
Copy link
Contributor Author

rjmccall commented May 14, 2020

FWIW, we are still seeing TSan failures related to the metadata cache with this patch applied, e.g.

Previous read of size 8 at 0x7ffed2a13b80 by thread T69:
    #0 initializeWithCopy third_party/unsupported_toolchains/swift/toolchains/stable/runtime_srcs/swift/stdlib/public/runtime/MetadataImpl.h:103:25 (d4aeea9bceaeffeccd52bdb1b4a68d441e8cdec71d47e435d100dfcffb2263ef_020044d584a8+0x199b455d)
    #1 swift::metadataimpl::ValueWitnesses<swift::metadataimpl::NativeBox<unsigned long, 8ul, 8ul, 8ul> >::initializeWithCopy(swift::OpaqueValue*, swift::OpaqueValue*, swift::TargetMetadata<swift::InProcess> const*) third_party/unsupported_toolchains/swift/toolchains/stable/runtime_srcs/swift/stdlib/public/runtime/MetadataImpl.h:786:27 (d4aeea9bceaeffeccd52bdb1b4a68d441e8cdec71d47e435d100dfcffb2263ef_020044d584a8+0x199b455d)

Do you still have the original report around? TSan reports generally talk about multiple accesses, and it's important to include all of them.

It's likely there are multiple bugs. For example, the original bug you reported did not superficially involve any associated types or conformances. Is this report related to that bug (which you reported as SR-12761) or to the one I'm directly trying to fix (SR-12760)?

@rjmccall
Copy link
Contributor Author

rjmccall commented May 14, 2020

I think you might want to still do some sort of lock on the slow path. Like so:

...

Otherwise, the slow path will get called twice.

I don't mind the slow path getting called twice. Note that the slow path for associated type metadata doesn't even always cache its result. We rely on the slow path being confluent.

Anyway, adding locks would be problematic. We have a very specific concurrency-control scheme here, and as with most concurrency-control schemes, it cannot always be safely hybridized with another scheme without introducing deadlocks.

@pschuh
Copy link
Contributor

pschuh commented May 14, 2020

Here is a dump of the tsan results for the reproducer from SR-12760 with this change patched in:
basic-reproducer2.txt

@rjmccall
Copy link
Contributor Author

rjmccall commented May 14, 2020

If I understand those results correctly, it's saying that _dispatch_logv_init has racing reads and writes on a logging file descriptor; it's not related to the metadata system. @kubamracek can clarify.

@rjmccall rjmccall force-pushed the associated-witness-atomics branch from 46a8c1e to 6200df1 Compare May 14, 2020 00:37
@rjmccall
Copy link
Contributor Author

@swift-ci Please test.

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 46a8c1e1974ce3c6ea0c614ceefc64fa48841616

@swift-ci
Copy link
Contributor

Build failed
Swift Test Linux Platform
Git Sha - 46a8c1e1974ce3c6ea0c614ceefc64fa48841616

@rjmccall
Copy link
Contributor Author

FWIW, we are still seeing TSan failures related to the metadata cache with this patch applied

Okay, since I've tentatively concluded that SR-12761 is simply a stack overflow, I would be very interested in knowing more about this remaining TSan failure. Is that on the SR-12760 test?

@rjmccall
Copy link
Contributor Author

@swift-ci Please test Windows

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants