Skip to content

Square Kronecker products and sums #183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 27, 2022
Merged

Square Kronecker products and sums #183

merged 5 commits into from
Jun 27, 2022

Conversation

dkarrasch
Copy link
Member

In the documentation of the (Python) fastmat package I found a reference to a paper that considers an alternative representation of Kronecker products and sums (of square factors and summands) and proposes an algorithm. The algorithm seems to be better than our current algorithms only in few cases, but the alternative representation seems to be very very good, especially for more than two factors/summands. In the square case, it turns out that one can represent Kronecker products/sums as products/sums of "normal factors", that are of the kind I ⊗ Ai ⊗ I. However, we do have pretty good infrastructure to represent CompositeMaps and LinearCombinations, so let's try that. And there you go, it reduces allocations and runtime! For the time being, the current and the new representations are separated (kron vs. squarekron, kronsum vs. sumkronsum) and can therefore be easily benchmarked against each other by checking out this PR branch. So, I'd like to solicite benchmark results from our power users. Once we have an idea of what this does to real code, we can think about changing defaults, like have kronsum of two arguments return a KroneckerSumMap, but kronsumof three or more arguments return a LinearCombination{KroneckerMaps}. My local benchmarks suggest that this could be a sane default, but they may be stupidly simple, and may not reflect practical use cases.

@dkarrasch
Copy link
Member Author

Pinging @JeffFessler, @MKAbdElrahman who may have interesting use cases, IIRC.

@codecov
Copy link

codecov bot commented Jun 21, 2022

Codecov Report

Merging #183 (12a4939) into master (10b6901) will increase coverage by 0.03%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #183      +/-   ##
==========================================
+ Coverage   98.60%   98.63%   +0.03%     
==========================================
  Files          17       17              
  Lines        1360     1396      +36     
==========================================
+ Hits         1341     1377      +36     
  Misses         19       19              
Impacted Files Coverage Δ
src/LinearMaps.jl 100.00% <100.00%> (ø)
src/kronecker.jl 98.79% <100.00%> (+0.32%) ⬆️
src/left.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 10b6901...12a4939. Read the comment docs.

@JeffFessler
Copy link
Member

Good to know!
But I don't really have interesting use cases for Kronecker sum. The case I teach in my course uses Kron. sum for finding common roots of two polynomials, but the degrees are small so dense matrices are fine there.

@dkarrasch
Copy link
Member Author

I was hoping you would use kron with square operators. If that was the case, you could try to construct the operator with squarekron and see what happens. I just have no idea what kind of problems people solve out there. I found the following benchmark problem in Kronecker.jl and adapted it to LinearMaps.jl:

using LinearMaps, LinearAlgebra, BenchmarkTools, StaticArrays
As = [randn(2,2) for i in 1:10]
x = randn(2^10)
K = kron(LinearMap.(As)...)
K2 = squarekron(As...)
KS = kronsum(As...)
KS2 = sumkronsum(As...)
y = K*x
@btime mul!($y, $K, $x);
@btime mul!($y, $K2, $x);
@btime mul!($y, $KS, $x);
@btime mul!($y, $KS2, $x);

gives

 359.028 μs (1387 allocations: 223.53 KiB)
  68.455 μs (90 allocations: 27.50 KiB)
  332.506 μs (3210 allocations: 5.50 MiB)
  64.878 μs (57 allocations: 10.62 KiB)

@JeffFessler JeffFessler mentioned this pull request Jun 22, 2022
@JeffFessler
Copy link
Member

The most famous kron in signal processing is probably the fact that the 2D FFT of an image is the kron of two 1D FFTs. I started to work on it but hit #185.

@JeffFessler
Copy link
Member

Here's my current attempt to provide a test here.
Unfortunately the last line throws a MethodError inside plan_fft that perplexes me.
Something inside kron might be leading to a SubArray type that FFTW does not seem to like. (I am testing with the released v3.7.0 of LM btw.)
This is my only idea for a Kronecker product of square, non-identity matrices.

using FFTW: fft, bfft
using LinearMaps

M,N = 64, 32 # image size
T = ComplexF64
x = rand(T,M,N)
A1 = LinearMap{T}(x -> vec(fft(reshape(x, M,N))), M*N, M*N)
y1 = reshape(A1 * vec(x), M, N)
@assert y1  fft(x)
Ax = LinearMap{T}(fft, bfft, M, M)
Ay = LinearMap{T}(fft, bfft, N, N)
y0 = Ax * x * transpose(Ay) # 2D FFT via two 1D FFTs
y0 = Matrix(y0)
@assert y0  fft(x)
Ak = kron(Ay, Ax)
tmp = Ak * vec(x) # fails with method error

@dkarrasch dkarrasch force-pushed the dk/kron branch 2 times, most recently from 9d641fb to 3850953 Compare June 24, 2022 10:57
@dkarrasch
Copy link
Member Author

Yes, the SubArray issue arises because we use the vec-trick, which requires us to compute the 1d fft on the columns of the reshaped vector. I have another algorithm (locally) which avoids that by allocating one vector any copyto!ing the columns to it. Moreover, I started to work on a little caching mechanism specifically for this situation, but need more benchmarking to see if it's worth it. I think I'll merge this and release. For the time being, these are just alternative constructors and they come along with recommendations to benchmark. If at some point we realize that it's always advantageous to represent a Kronecker sum with two summands as a KroneckerSumMap, but with 3 or more as a LinearCombination, we can simply shuffle the internal construction code. As for Kronecker products, the two approaches will need to stand side-by-side because the new approach only works for square maps/matrices.

@dkarrasch
Copy link
Member Author

For later reference, here is the algorithm from that paper referred to in the docs. Since it copies the current column to a strided vector, it works for the FFT example. The "problem" with that example is that the 2d fft is well taken care of the FFTW.jl itself, so a performance comparison is, of course, horrible. 🥲

function anothermul!(y::AbstractVecOrMat, L::LinearMaps.KroneckerMap, x::AbstractVector)
    all(LinearMaps._issquare, L.maps)
    copyto!(y, x)
    N = length(L.maps)
    nleft = prod(A -> size(A, 1), LinearMaps._front(L.maps))
    nright = 1
    @inbounds for i in N:-1:1
        Ai = L.maps[i]
        ni = size(Ai, 1)
        zi = similar(x, ni)
        base = 0
        jump = ni * nright
        indices = range(0; step=nright, length=ni)
        for _ in 1:nleft
            yindices = base .+ indices
            for j in 1:nright
                yindices = yindices .+ 1
                yi = view(y, yindices)
                copyto!(zi, yi)
                LinearMaps._unsafe_mul!(yi, Ai, zi)
            end
            base += jump
        end
        nleft = i > 1 ? nleft ÷ size(L.maps[i-1],1) : nleft
        nright = jump
    end
    return y
end

It sometimes happens to be faster than our current algorithm, but it targets the square case for which we now have the squarekron constructor.

@dkarrasch dkarrasch changed the title NFC: Square Kronecker products and sums Square Kronecker products and sums Jun 27, 2022
@dkarrasch dkarrasch merged commit e39cbc1 into master Jun 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants