-
Notifications
You must be signed in to change notification settings - Fork 27
Support clenshaw! with any DenseColumnMajor blas vector #113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Trying to reactivate coverage |
The coefficients |
So change the |
I think so. This is actually really useful because the Julia API will let me fill in the C code incrementally |
We should probably try multithreading |
Is threading naively turned on in Julia? I thought you needed to use |
src/libfasttransforms.jl
Outdated
ccall((:ft_clenshawf, libfasttransforms), Cvoid, (Cint, Ptr{Float32}, Cint, Cint, Ptr{Float32}, Ptr{Float32}), length(c), c, 1, length(x), x, f) | ||
function _clenshaw!(::AbstractStridedLayout, ::AbstractColumnMajor, ::AbstractColumnMajor, c::AbstractVector{Float32}, x::AbstractVector{Float32}, f::AbstractVector{Float32}) | ||
@boundscheck check_clenshaw_points(x, f) | ||
ccall((:ft_clenshawf, libfasttransforms), Cvoid, (Cint, Ptr{Float32}, Cint, Cint, Ptr{Float32}, Ptr{Float32}), length(c), stride(c,1), 1, length(x), x, f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this changed c
instead of 1
Codecov Report
@@ Coverage Diff @@
## master #113 +/- ##
==========================================
+ Coverage 78.32% 79.25% +0.92%
==========================================
Files 12 13 +1
Lines 1352 1417 +65
==========================================
+ Hits 1059 1123 +64
- Misses 293 294 +1
Continue to review full report at Codecov.
|
Yes that's what I mean, add |
The C clenshaw isn't threaded. I thought an O(mn) method might best be threadsafe until further consideration. If the Julia code is faster for small point sets it's probably because of higher throughput for lower vectorization levels. Probably need about 16 - 32 points to start to see the difference? I've only been concerned with m = n > 1024, say. |
OK, feel free to merge when you're happy |
Got OrthogonalPolynomialsQuasi.jl to call julia> T = Chebyshev();
julia> c = randn(10_000_000);
julia> u = T * [c; zeros(∞)];
julia> x = rand(1000);
julia> @time u[x]
1.587686 seconds (3 allocations: 8.000 KiB)
1000-element Array{Float64,1}:
-2312.029629930109
-2390.9279613489657
-1354.6587225463163
626.9532040037507
-3543.7729254722767
-2190.2464546247734
-2572.0111713445567
-746.4599775088337
514.8473134322153
1364.2562428541896
1155.2279350358774
-562.0256249035326
-5933.812790545215
-244.4875261855841
690.8312153482333
⋮
-2944.8625897604925
273.0743333139484
-2318.133247373495
808.2158822165528
-1636.3031403509444
-2500.700269418508
-881.2745832074958
-4010.298262876177
1400.478600291998
-724.9251230992776
103.64841191004098
-990.9982621498198
-1738.304836473942
-808.4222444061252 |
No description provided.