Skip to content

Fix RandomSample crash when 0 was out from random. #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Feb 12, 2021

Conversation

sidepelican
Copy link
Contributor

Fix #69 .

I changed the random ranges from [0, 1) to (0, 1] .
w must not be 0 so fixed nextW too. (in log(1 - w), log(1) is 0 so zero division happens.)

Checklist

  • I've added at least one test that validates that my change is working, if appropriate
  • I've followed the code style of the rest of the project
  • I've read the Contribution Guidelines
  • I've updated the documentation if necessary

@timvermeulen
Copy link
Contributor

Thanks for your contribution, @sidepelican! This looks great.

1 - .random(in: 0..<1, using: &rng) is a good way of excluding 0 from the result. It does now include the upper bound, 1, which the original algorithm does not, but as far as I can see this does not affect the algorithm in any meaningful way.

Your seeded number generator correctly produces 0 as its first random number, and this tests the change you made in nextW (which is where rng is first used). However, note that it does not test the change in nextOffset because at that point rng no longer  generates 0. Can you think of a way to test both changes, perhaps using a custom RandomNumberGenerator type?

@sidepelican
Copy link
Contributor Author

sidepelican commented Feb 3, 2021

Can you think of a way to test both changes, perhaps using a custom RandomNumberGenerator type?

I can make the RandomNumberGenerator type its anytime returns 0, but it doesn't work correctly becase Int.random(in: 0..<N, using: &g) (N > 2) falls into infinit loop.

let j = Int.random(in: 0..<result.count, using: &rng)

The test stops at this line when using such a generator.

The fundamental loop is here.
https://github.com/apple/swift/blob/a73a8087968f9111149073107c5242d83635107a/stdlib/public/core/Random.swift#L104


Based on the above, I changed the test to use internal function directly. 5acda56
How about this? @timvermeulen

Or, should make a generator that return 0 first 2 times?

Copy link
Contributor

@timvermeulen timvermeulen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make the RandomNumberGenerator type its anytime returns 0, but it doesn't work correctly becase Int.random(in: 0..<N, using: &g) (N > 2) falls into infinit loop.

Indeed, what I had in mind was either a generator that starts by generating at least two 0s like you mentioned, or one that generates a 0 with a certain (high) probability.

Based on the above, I changed the test to use internal function directly.

This I like even more! Now we're not making any assumptions about the exact order in which these functions are called.

@stephentyrone
Copy link
Member

stephentyrone commented Feb 3, 2021

I think this means that we can't test in release any longer, right? @natecook1000, how do you feel about that? Do we already use a testable import elsewhere?

(n.b. this comment is not a "we shouldn't do this," it's a "we should be aware that we are making a choice here.")

@stephentyrone
Copy link
Member

We chatted, and we're fine with this. Approved.

@stephentyrone
Copy link
Member

stephentyrone commented Feb 7, 2021

Upon more detailed review, this fixes the most obvious problem, but is not actually a complete solution. log(1-w) can underflow to zero without w being exactly zero; we simply need to have a sequence of w values whose product is less than .ulpOfOne/2. This should be replaced with log(onePlus: -w) to resolve the problem (numerically this only matters for very large collections--more than 2**53 elements, which can't happen on most systems supported by Swift--but it's trivially the right change to make, so we should do it).

The zero division itself is actually fine. Division by zero in floating-point is not an error; it produces infinity or NaN, which is fine. It's the conversion to Int that causes the problem. So we should be able to keep using 0..<1, but instead use:

internal func nextOffset<G: RandomNumberGenerator>(
  w: Double, using rng: inout G
) -> Int {
  let offset = Double.log(.random(in: 0..<1, using: &rng)) / .log(onePlus: -w)
  return offset < Double(Int.max) ? Int(offset) : Int.max
}

We should also have a clamping initializer for Int from BinaryFloatingPoint, so that people don't have to do this themselves. That belongs in the standard library, but I'll land it in Numerics first and we can switch over to using it in algorithms.

@timvermeulen
Copy link
Contributor

Nice catch, and TIL about log(onePlus:)!

numerically this only matters for very large collections--more than 2**53 elements

We can trivially construct a range larger than that, so we have real reasons to fix this.

@sidepelican
Copy link
Contributor Author

TIL about log(onePlus:) too!

I tried #70 (comment) , its looks fine but, I found crash when returning Int.max in nextOffset because arithmetic overflow happens at here.

var offset = nextOffset(w: w, using: &rng) + 1

I fixed this by just adding & but I'm not smart about the algorithm there so is this right?

@stephentyrone
Copy link
Member

stephentyrone commented Feb 9, 2021

Can't we eliminate the +1 there and use while offset > 0 in the loop following?

(&+ is wrong, because it would wrap to Int.min causing us to try to select an earlier element, which is broken.)

@kylemacomber kylemacomber merged commit 1a3c706 into apple:main Feb 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

randomSample crashes randomly
4 participants