module
Benchmark of dense decompositionsThis page presents a speed comparison of the dense matrix decompositions offered by Eigen for a wide range of square matrices and overconstrained problems.
For a more general overview on the features and numerical robustness of linear solvers and decompositions, check this table.
This benchmark has been run on a laptop equipped with an Intel core i7 @ 2,6 GHz, and compiled with clang with AVX and FMA instruction sets enabled but without multi-threading. It uses single precision float numbers. For double, you can get a good estimate by multiplying the timings by a factor 2.
The square matrices are symmetric, and for the overconstrained matrices, the reported timmings include the cost to compute the symmetric covariance matrix for the first four solvers based on Cholesky and LU, as denoted by the * symbol (top-right corner part of the table). Timings are in milliseconds, and factors are relative to the LLT decomposition which is the fastest but also the least general and robust.
solver/size | 8x8 | 100x100 | 1000x1000 | 4000x4000 | 10000x8 | 10000x100 | 10000x1000 | 10000x4000 |
---|---|---|---|---|---|---|---|---|
LLT | 0.05 | 0.42 | 5.83 | 374.55 | 6.79 | 30.15 | 236.34 | 3847.17 |
LDLT | 0.07 (x1.3) | 0.65 (x1.5) | 26.86 (x4.6) | 2361.18 (x6.3) | 6.81 (x1) | 31.91 (x1.1) | 252.61 (x1.1) | 5807.66 (x1.5) |
PartialPivLU | 0.08 (x1.5) | 0.69 (x1.6) | 15.63 (x2.7) | 709.32 (x1.9) | 6.81 (x1) | 31.32 (x1) | 241.68 (x1) | 4270.48 (x1.1) |
FullPivLU | 0.1 (x1.9) | 4.48 (x10.6) | 281.33 (x48.2) | - | 6.83 (x1) | 32.67 (x1.1) | 498.25 (x2.1) | - |
HouseholderQR | 0.19 (x3.5) | 2.18 (x5.2) | 23.42 (x4) | 1337.52 (x3.6) | 34.26 (x5) | 129.01 (x4.3) | 377.37 (x1.6) | 4839.1 (x1.3) |
ColPivHouseholderQR | 0.23 (x4.3) | 2.23 (x5.3) | 103.34 (x17.7) | 9987.16 (x26.7) | 36.05 (x5.3) | 163.18 (x5.4) | 2354.08 (x10) | 37860.5 (x9.8) |
CompleteOrthogonalDecomposition | 0.23 (x4.3) | 2.22 (x5.2) | 99.44 (x17.1) | 10555.3 (x28.2) | 35.75 (x5.3) | 169.39 (x5.6) | 2150.56 (x9.1) | 36981.8 (x9.6) |
FullPivHouseholderQR | 0.23 (x4.3) | 4.64 (x11) | 289.1 (x49.6) | - | 69.38 (x10.2) | 446.73 (x14.8) | 4852.12 (x20.5) | - |
JacobiSVD | 1.01 (x18.6) | 71.43 (x168.4) | - | - | 113.81 (x16.7) | 1179.66 (x39.1) | - | - |
BDCSVD | 1.07 (x19.7) | 21.83 (x51.5) | 331.77 (x56.9) | 18587.9 (x49.6) | 110.53 (x16.3) | 397.67 (x13.2) | 2975 (x12.6) | 48593.2 (x12.6) |
*: This decomposition do not support direct least-square solving for over-constrained problems, and the reported timing include the cost to form the symmetric covariance matrix .
Observations:
- LLT is always the fastest solvers.
- For largely over-constrained problems, the cost of Cholesky/LU decompositions is dominated by the computation of the symmetric covariance matrix.
- For large problem sizes, only the decomposition implementing a cache-friendly blocking strategy scale well. Those include LLT, PartialPivLU, HouseholderQR, and BDCSVD. This explain why for a 4k x 4k matrix, HouseholderQR is faster than LDLT. In the future, LDLT and ColPivHouseholderQR will also implement blocking strategies.
- CompleteOrthogonalDecomposition is based on ColPivHouseholderQR and they thus achieve the same level of performance.
The above table has been generated by the bench/dense_