-
Notifications
You must be signed in to change notification settings - Fork 42
Square Kronecker products and sums #183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pinging @JeffFessler, @MKAbdElrahman who may have interesting use cases, IIRC. |
Codecov Report
@@ Coverage Diff @@
## master #183 +/- ##
==========================================
+ Coverage 98.60% 98.63% +0.03%
==========================================
Files 17 17
Lines 1360 1396 +36
==========================================
+ Hits 1341 1377 +36
Misses 19 19
Continue to review full report at Codecov.
|
Good to know! |
I was hoping you would use using LinearMaps, LinearAlgebra, BenchmarkTools, StaticArrays
As = [randn(2,2) for i in 1:10]
x = randn(2^10)
K = kron(LinearMap.(As)...)
K2 = squarekron(As...)
KS = kronsum(As...)
KS2 = sumkronsum(As...)
y = K*x
@btime mul!($y, $K, $x);
@btime mul!($y, $K2, $x);
@btime mul!($y, $KS, $x);
@btime mul!($y, $KS2, $x); gives 359.028 μs (1387 allocations: 223.53 KiB)
68.455 μs (90 allocations: 27.50 KiB)
332.506 μs (3210 allocations: 5.50 MiB)
64.878 μs (57 allocations: 10.62 KiB) |
The most famous kron in signal processing is probably the fact that the 2D FFT of an image is the kron of two 1D FFTs. I started to work on it but hit #185. |
Here's my current attempt to provide a test here. using FFTW: fft, bfft
using LinearMaps
M,N = 64, 32 # image size
T = ComplexF64
x = rand(T,M,N)
A1 = LinearMap{T}(x -> vec(fft(reshape(x, M,N))), M*N, M*N)
y1 = reshape(A1 * vec(x), M, N)
@assert y1 ≈ fft(x)
Ax = LinearMap{T}(fft, bfft, M, M)
Ay = LinearMap{T}(fft, bfft, N, N)
y0 = Ax * x * transpose(Ay) # 2D FFT via two 1D FFTs
y0 = Matrix(y0)
@assert y0 ≈ fft(x)
Ak = kron(Ay, Ax)
tmp = Ak * vec(x) # fails with method error |
9d641fb
to
3850953
Compare
Yes, the |
For later reference, here is the algorithm from that paper referred to in the docs. Since it copies the current column to a strided vector, it works for the FFT example. The "problem" with that example is that the 2d fft is well taken care of the FFTW.jl itself, so a performance comparison is, of course, horrible. 🥲 function anothermul!(y::AbstractVecOrMat, L::LinearMaps.KroneckerMap, x::AbstractVector)
all(LinearMaps._issquare, L.maps)
copyto!(y, x)
N = length(L.maps)
nleft = prod(A -> size(A, 1), LinearMaps._front(L.maps))
nright = 1
@inbounds for i in N:-1:1
Ai = L.maps[i]
ni = size(Ai, 1)
zi = similar(x, ni)
base = 0
jump = ni * nright
indices = range(0; step=nright, length=ni)
for _ in 1:nleft
yindices = base .+ indices
for j in 1:nright
yindices = yindices .+ 1
yi = view(y, yindices)
copyto!(zi, yi)
LinearMaps._unsafe_mul!(yi, Ai, zi)
end
base += jump
end
nleft = i > 1 ? nleft ÷ size(L.maps[i-1],1) : nleft
nright = jump
end
return y
end It sometimes happens to be faster than our current algorithm, but it targets the square case for which we now have the |
In the documentation of the (Python) fastmat package I found a reference to a paper that considers an alternative representation of Kronecker products and sums (of square factors and summands) and proposes an algorithm. The algorithm seems to be better than our current algorithms only in few cases, but the alternative representation seems to be very very good, especially for more than two factors/summands. In the square case, it turns out that one can represent Kronecker products/sums as products/sums of "normal factors", that are of the kind
I ⊗ Ai ⊗ I
. However, we do have pretty good infrastructure to representCompositeMap
s andLinearCombination
s, so let's try that. And there you go, it reduces allocations and runtime! For the time being, the current and the new representations are separated (kron
vs.squarekron
,kronsum
vs.sumkronsum
) and can therefore be easily benchmarked against each other by checking out this PR branch. So, I'd like to solicite benchmark results from our power users. Once we have an idea of what this does to real code, we can think about changing defaults, like havekronsum
of two arguments return aKroneckerSumMap
, butkronsum
of three or more arguments return aLinearCombination{KroneckerMaps}
. My local benchmarks suggest that this could be a sane default, but they may be stupidly simple, and may not reflect practical use cases.