Skip to content

safegaurd Box-Muller normal random number generation against u=0.0 #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 29, 2023
Merged

safegaurd Box-Muller normal random number generation against u=0.0 #158

merged 3 commits into from
Aug 29, 2023

Conversation

dacarnazzola
Copy link
Contributor

@dacarnazzola dacarnazzola commented Aug 14, 2023

This fix also allows a single randn_vec subroutine to be used for any array rank, as well as improves performance ~3-10x. See example program below for that:

module nf_random
implicit none
private

    public :: randn, randn_vec

    real, parameter :: pi = 4 * atan(1.d0)

    interface randn
        module procedure randn_1d
        module procedure randn_2d
        module procedure randn_4d
    end interface randn

    contains

    function randn_1d(i) result(x)
        integer, intent(in) :: i
        real :: x(i)
        real :: u(i), v(i)
        call random_number(u)
        call random_number(v)
        x = sqrt(-2 * log(u)) * cos(2 * pi * v)
    end function randn_1d

    function randn_2d(i, j) result(x)
        integer, intent(in) :: i, j
        real :: x(i,j)
        real :: u(i,j), v(i,j)
        call random_number(u)
        call random_number(v)
        x = sqrt(-2 * log(u)) * cos(2 * pi * v)
    end function randn_2d

    function randn_4d(i, j, k, l) result(x)
        integer, intent(in) :: i, j, k, l
        real :: x(i,j,k,l)
        real :: u(i,j,k,l), v(i,j,k,l)
        call random_number(u)
        call random_number(v)
        x = sqrt(-2 * log(u)) * cos(2 * pi * v)
    end function randn_4d 

    impure subroutine randn_vec(x, n)
        real, intent(out) :: x(*)
        integer, intent(in), value :: n
        real :: u((n+1)/2), v((n+1)/2), r((n+1)/2)
        integer :: i, nu
        call random_number(u)
        nu = size(u)
        do i=1,nu
            do while (u(i) == 0.0)
                call random_number(u(i))
            end do
        end do
        call random_number(v)
        r = sqrt(-2.0*log(u))
        x(1:nu) = r*sin(2.0*pi*v)
        if (n > (nu+1)) x(nu+1:n) = r(1:(n-nu))*cos(2.0*pi*v(1:(n-nu)))
    end subroutine randn_vec

end module nf_random


program main
use, intrinsic :: iso_fortran_env, only: i64 => int64
use, non_intrinsic :: nf_random, only: randn, randn_vec

    integer, parameter :: r_max = 10

    real, allocatable :: x1(:), x2(:,:), x4(:,:,:,:)
    integer :: i, n, r
    integer(i64) :: c1, cr, c2
    real :: elapsed

    do i=20,30
        n = 2**i
        write(*,'(a,i0,a)') 'testing n=',n,' elements'
        if (allocated(x1)) deallocate(x1)
        if (allocated(x2)) deallocate(x2)
        if (allocated(x4)) deallocate(x4)
        allocate(x1(n), x2(1,n), x4(1,1,1,n))
        write(*,'(a)') 'nf_random::randn'
        call system_clock(c1, cr)
        do r=1,r_max
            x1 = randn(size(x1))
            x2 = randn(size(x2,1), size(x2,2))
            x4 = randn(size(x4,1), size(x4,2), size(x4,3), size(x4,4))
        end do
        call system_clock(c2)
        elapsed = real(max(c2 - c1, 1_i64))/real(cr)
        write(*,'(a,e13.6,a)') '  elapsed: ',elapsed,' seconds'
        write(*,'(a,3e13.6)') '  avg x1/x2/x4: ',sum(x1)/size(x1),sum(x2)/size(x2),sum(x4)/size(x4)
        write(*,'(a,3e13.6)') '  std x1/x2/x4: ',sqrt(sum((x1-sum(x1)/size(x1))**2)/real(size(x1)-1)), &
                                      sqrt(sum((x2-sum(x2)/size(x2))**2)/real(size(x2)-1)), &
                                      sqrt(sum((x4-sum(x4)/size(x4))**2)/real(size(x4)-1))
        write(*,'(a)') 'randn_vec'
        call system_clock(c1, cr)
        do r=1,r_max
            call randn_vec(x1, size(x1))
            call randn_vec(x2, size(x2))
            call randn_vec(x4, size(x4))
        end do
        call system_clock(c2)
        elapsed = real(max(c2 - c1, 1_i64))/real(cr)
        write(*,'(a,e13.6,a)') '  elapsed: ',elapsed,' seconds'
        write(*,'(a,3e13.6)') '  avg x1/x2/x4: ',sum(x1)/size(x1),sum(x2)/size(x2),sum(x4)/size(x4)
        write(*,'(a,3e13.6)') '  std x1/x2/x4: ',sqrt(sum((x1-sum(x1)/size(x1))**2)/real(size(x1)-1)), &
                                      sqrt(sum((x2-sum(x2)/size(x2))**2)/real(size(x2)-1)), &
                                      sqrt(sum((x4-sum(x4)/size(x4))**2)/real(size(x4)-1))
        write(*,'(a)') ''
    end do

end program main

Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR.

I'm not in favor of using assumed-size arrays. Alternatively, if we want to not duplicate the code, we could discuss replacing existing rank-specific functions with one impure elemental subroutine. In general for performance non-critical code I prefer using functions, but here a subroutine would be a good compromise to reduce code duplication.

As for guarding against u == 0, why not just do something like this:

u = max(tiny(u), u)

which is much simpler than the doubly-nested do loop.

@jvdp1
Copy link
Collaborator

jvdp1 commented Aug 19, 2023

As for guarding against u == 0, why not just do something like this:

For guarding against u==0, I usually do

call random_number(u)
u = 1 - u

because random_number will return a value in the range 0<=u <1

@dacarnazzola
Copy link
Contributor Author

As for guarding against u == 0, why not just do something like this:

u = max(tiny(u), u)

which is much simpler than the doubly-nested do loop.

That is indeed much simpler; performance should be tested. It may be significantly faster too!

As far as not using assumed size arrays, that is a matter of preference and if this project does not want to use them, then I would agree impure elemental subroutine at least meets the need of reducing code duplication.

@dacarnazzola
Copy link
Contributor Author

dacarnazzola commented Aug 24, 2023

I have went ahead and implemented u(1) = 1.0 - u(1) instead of the while loop to keep u > 0.0. On my machine, this measured ever so slightly faster than u = max(tiny(1.0), u) for a large array. I suspect for a size of 2 there will be no measurable difference.

The procedure is also now impure elemental subroutine as requested. I wasn't sure what you guys like to do with dead code, so I just commented out the unused module functions from nf_random, but left the actual implementations in the submodule alone.

EDIT: Turns out this approach runs rather slower than the previous method on my machine when compiled with gfortran. Using ifort brings it in line with the other methods, but this is a common occurrence I would say. Assumed size arrays seem to generate the fastest code in a lot of cases based on my experience.

@milancurcic
Copy link
Member

Great, thank you! You can remove the old functions from the code since the new subroutine will be used instead and we have git to keep history.

@milancurcic milancurcic added enhancement New feature or request bug Something isn't working labels Aug 25, 2023
@milancurcic
Copy link
Member

Great @dacarnazzola let's just rename randn_ele to random_normal (I don't know why I called it randn it was a long time ago, maybe after np.randn), and this is good to go.

@dacarnazzola
Copy link
Contributor Author

dacarnazzola commented Aug 25, 2023

Alright, all fair points. I have updated the fix to only provide one value at a time, keeping no internal state. In response to a couple of questions/comments

  • Only u(1) needs to be /= 0.0 because log(0.0) is invalid, but cos(0.0) is not.
  • I believe impure elemental should indicate that this routine is not safe to parallelize.
  • There is no need for the variable r when only calculating a single value.

If my comment about impure elemental is incorrect, then I wonder why there was ever a requirement on elemental to be pure, which was later lifted to include procedures marked impure. I will note that in general, I have never seen a procedure that compiled to anything better, or executed faster, after being marked pure or elemental, but I would absolutely love to be shown a counter example.

@dacarnazzola
Copy link
Contributor Author

Great @dacarnazzola let's just rename randn_ele to random_normal (I don't know why I called it randn it was a long time ago, maybe after np.randn), and this is good to go.

Do you want the interface renamed to random_normal, just randn_ele, or both?

@milancurcic
Copy link
Member

I will note that in general, I have never seen a procedure that compiled to anything better, or executed faster, after being marked pure or elemental, but I would absolutely love to be shown a counter example.

Same here, but that may change as compilers and hardware improve.

@milancurcic
Copy link
Member

Do you want the interface renamed to random_normal, just randn_ele, or both?

Since we now have just one specific procedure, I'd remove the interface altogether.

@dacarnazzola
Copy link
Contributor Author

Do you want the interface renamed to random_normal, just randn_ele, or both?

Since we now have just one specific procedure, I'd remove the interface altogether.

This has been done. I don't have a lot of experience with submodules, but the compiler wasn't liking the main nf_random module not having an interface block. Since other modules seem ok without a corresponding submodule, I have removed nf_random_submodule.

Copy link
Member

@milancurcic milancurcic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@milancurcic milancurcic merged commit b1b2cac into modern-fortran:main Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants