-
Notifications
You must be signed in to change notification settings - Fork 27
Feat libfasttransforms #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
no binarybuilder yet
change tests
expert interface unchanged Basically: x = 1.0./(1:10) norm(leg2cheb(cheb2leg(x)) - x) is small
why gcc-8? I think that's what's current in julia
One niche, but really cool, improvement is to multi-precision transforms. These were only available through the Toeplitz--Hankel transforms but they were slow. julia> begin
@time p = FastTransforms.th_leg2chebplan(BigFloat, 1000)
@time x = rand(BigFloat, 1000)
@time p*x
end;
3.462209 seconds (50.44 M allocations: 2.628 GiB, 26.64% gc time)
0.000181 seconds (2.01 k allocations: 117.625 KiB)
68.177021 seconds (902.38 M allocations: 46.999 GiB, 35.17% gc time) compared with the direct mpfr_t routines from C: julia> begin
@time p = plan_leg2cheb(BigFloat, 1000)
@time x = rand(BigFloat, 1000)
@time p*x
end;
0.321481 seconds (5 allocations: 192 bytes)
0.000196 seconds (2.01 k allocations: 117.625 KiB)
0.019480 seconds (5.02 k allocations: 409.250 KiB) |
The main reason for this PR is because transforms are essentially imperative. Writing them in Julia was good for experimental purposes, but Julia's development is more active and volatile than C's. While a 1024x2047 spherical harmonic transform used to take 4 seconds to plan and 0.6 seconds to execute when I first wrote it, careless syntax changes and compounding performance regressions led to an approximately 100-fold increase in execution time (see #69). On this branch, we again have something reasonable: julia> F = sphrandn(Float64, 1024, 2047); # note the change to `sphrandn`. Second integer denotes exact number of columns.
julia> @time G = sph2fourier(F);
0.118161 seconds (9 allocations: 15.993 MiB)
julia> @time H = fourier2sph(F);
0.122915 seconds (9 allocations: 15.993 MiB) So, needless to say, closing #69 will be exciting! CC @AshtonSBradley EDIT: the advanced interface yet cuts this in half: julia> F = sphrandn(Float64, 1024, 2047);
julia> P = plan_sph2fourier(Float64, 1024)
FastTransforms Spherical harmonic--Fourier plan for 1024×2047-element array of Float64
julia> @time lmul!(P, F);
0.062377 seconds (4 allocations: 160 bytes)
julia> @time ldiv!(P, F);
0.056915 seconds (4 allocations: 160 bytes)
|
The one performance regression is to But what's lost in pre-computation is gained in generality. The new method, no longer the Alper--Rokhlin scheme, solves triangular banded generalized eigenvalue problems, and thus is applicable to all of the Jacobi--Jacobi transforms and Laguerre--Laguerre transforms as well. Lookout for associated OP transforms in a future release! |
I have come to learn a bit about building and providing binaries. And yet, I've concluded that the user should always have the right to build from source. One compelling reason is that Travis-hosted (cross-)compilation may not turn on all the best optimization flags for one's personal computer. My Mac Pro with AVX-512 would never be able to use it! As well, notwithstanding the issue I've filed, BinaryBuilder.jl comes with compromises. It currently requires all of a binary library's dependencies to be installed as well. This is sub-optimal because a user does not need so many copies of, e.g. OpenBLAS (and especially so on macOS since it can use system BLAS). Therefore, there will be three build strategies ultimately available to the user that will be determined by an environment variable: build from BinaryBuilder.jl (most reliable, default, when this works), build from FastTransforms releases (fastest, which assumes the user has dependencies from the same package managers), build from source (best optimization). Since one of these already works with Travis CI, it's time to merge. |
This PR adds a Julia wrapper around the C library of the same name. It also removes the resulting duplication. The 26 exported transforms are
and they each create a parameterized
FTPlan
. The first 11 support standard and ortho-normalizations viaBool
s.Surprisingly, the Linux and macOS builds succeed. This works by using BinaryProvider to query one's version of homebrew/apt gcc (via
detect_compiler_abi()
), then downloading the right pre-compiled binary from here (https://github.com/MikaelSlevinsky/FastTransforms/releases/tag/v0.2.6).Windows support will have to be dropped to Tier 3 and all other platforms and non-x86_64 chips to Tier 4.
TODO: