Skip to content

Feat libfasttransforms #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Sep 7, 2019
Merged

Feat libfasttransforms #75

merged 17 commits into from
Sep 7, 2019

Conversation

MikaelSlevinsky
Copy link
Member

@MikaelSlevinsky MikaelSlevinsky commented Aug 30, 2019

This PR adds a Julia wrapper around the C library of the same name. It also removes the resulting duplication. The 26 exported transforms are

julia> FastTransforms.kind2string.(0:25)
26-element Array{String,1}:
 "Legendre--Chebyshev"                                
 "Chebyshev--Legendre"                                
 "ultraspherical--ultraspherical"                     
 "Jacobi--Jacobi"                                     
 "Laguerre--Laguerre"                                 
 "Jacobi--ultraspherical"                             
 "ultraspherical--Jacobi"                             
 "Jacobi--Chebyshev"                                  
 "Chebyshev--Jacobi"                                  
 "ultraspherical--Chebyshev"                          
 "Chebyshev--ultraspherical"                          
 "Spherical harmonic--Fourier"                        
 "Spherical vector field--Fourier"                    
 "Zernike--Chebyshev×Fourier"                         
 "Proriol--Chebyshev²"                                
 "Proriol--Chebyshev³"                                
 "FFTW Fourier synthesis on the sphere"               
 "FFTW Fourier analysis on the sphere"                
 "FFTW Fourier synthesis on the sphere (vector field)"
 "FFTW Fourier analysis on the sphere (vector field)" 
 "FFTW Chebyshev×Fourier synthesis on the disk"       
 "FFTW Chebyshev×Fourier analysis on the disk"        
 "FFTW Chebyshev synthesis on the triangle"           
 "FFTW Chebyshev analysis on the triangle"            
 "FFTW Chebyshev synthesis on the tetrahedron"        
 "FFTW Chebyshev analysis on the tetrahedron"         

and they each create a parameterized FTPlan. The first 11 support standard and ortho-normalizations via Bools.

Surprisingly, the Linux and macOS builds succeed. This works by using BinaryProvider to query one's version of homebrew/apt gcc (via detect_compiler_abi()), then downloading the right pre-compiled binary from here (https://github.com/MikaelSlevinsky/FastTransforms/releases/tag/v0.2.6).

Windows support will have to be dropped to Tier 3 and all other platforms and non-x86_64 chips to Tier 4.

TODO:

  • Full use of BinaryBuilder and BinaryProvider. It would be helpful if BinaryBuilder worked on macOS. Help with this would be appreciated.
  • Update documentation. Perhaps the best would be to refer to the C documentation (which also needs work).
  • export tetrahedral transform.

@MikaelSlevinsky
Copy link
Member Author

One niche, but really cool, improvement is to multi-precision transforms. These were only available through the Toeplitz--Hankel transforms but they were slow.

julia> begin
    @time p = FastTransforms.th_leg2chebplan(BigFloat, 1000)
    @time x = rand(BigFloat, 1000)
    @time p*x
end;
3.462209 seconds (50.44 M allocations: 2.628 GiB, 26.64% gc time)
0.000181 seconds (2.01 k allocations: 117.625 KiB)
68.177021 seconds (902.38 M allocations: 46.999 GiB, 35.17% gc time)

compared with the direct mpfr_t routines from C:

julia> begin
    @time p = plan_leg2cheb(BigFloat, 1000)
    @time x = rand(BigFloat, 1000)
    @time p*x
end;
0.321481 seconds (5 allocations: 192 bytes)
0.000196 seconds (2.01 k allocations: 117.625 KiB)
0.019480 seconds (5.02 k allocations: 409.250 KiB)

@MikaelSlevinsky
Copy link
Member Author

MikaelSlevinsky commented Aug 30, 2019

The main reason for this PR is because transforms are essentially imperative. Writing them in Julia was good for experimental purposes, but Julia's development is more active and volatile than C's.

While a 1024x2047 spherical harmonic transform used to take 4 seconds to plan and 0.6 seconds to execute when I first wrote it, careless syntax changes and compounding performance regressions led to an approximately 100-fold increase in execution time (see #69).

On this branch, we again have something reasonable:

julia> F = sphrandn(Float64, 1024, 2047); # note the change to `sphrandn`. Second integer denotes exact number of columns.

julia> @time G = sph2fourier(F);
  0.118161 seconds (9 allocations: 15.993 MiB)

julia> @time H = fourier2sph(F);
  0.122915 seconds (9 allocations: 15.993 MiB)

So, needless to say, closing #69 will be exciting! CC @AshtonSBradley

EDIT: the advanced interface yet cuts this in half:

julia> F = sphrandn(Float64, 1024, 2047);

julia> P = plan_sph2fourier(Float64, 1024)
FastTransforms Spherical harmonic--Fourier plan for 1024×2047-element array of Float64

julia> @time lmul!(P, F);
  0.062377 seconds (4 allocations: 160 bytes)

julia> @time ldiv!(P, F);
  0.056915 seconds (4 allocations: 160 bytes)

@MikaelSlevinsky
Copy link
Member Author

The one performance regression is to plan_leg2cheb and plan_cheb2leg, which take about 5 times longer. The execution is almost the same. This difference can be closed in time.

But what's lost in pre-computation is gained in generality. The new method, no longer the Alper--Rokhlin scheme, solves triangular banded generalized eigenvalue problems, and thus is applicable to all of the Jacobi--Jacobi transforms and Laguerre--Laguerre transforms as well. Lookout for associated OP transforms in a future release!

@MikaelSlevinsky
Copy link
Member Author

I have come to learn a bit about building and providing binaries. And yet, I've concluded that the user should always have the right to build from source. One compelling reason is that Travis-hosted (cross-)compilation may not turn on all the best optimization flags for one's personal computer. My Mac Pro with AVX-512 would never be able to use it!

As well, notwithstanding the issue I've filed, BinaryBuilder.jl comes with compromises. It currently requires all of a binary library's dependencies to be installed as well. This is sub-optimal because a user does not need so many copies of, e.g. OpenBLAS (and especially so on macOS since it can use system BLAS).

Therefore, there will be three build strategies ultimately available to the user that will be determined by an environment variable: build from BinaryBuilder.jl (most reliable, default, when this works), build from FastTransforms releases (fastest, which assumes the user has dependencies from the same package managers), build from source (best optimization).

Since one of these already works with Travis CI, it's time to merge.

@MikaelSlevinsky MikaelSlevinsky merged commit 8dd3943 into master Sep 7, 2019
@MikaelSlevinsky MikaelSlevinsky deleted the feat-libfasttransforms branch October 2, 2020 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant