random Singular Value Decomposition

Let us consider a parameterized problem $\mathcal{P}$ we want to solve for different parameter values in a set $\mathcal{G}$.

In general, Reduced Basis Methods (RBMs) are decomposed in two stages:

one offline that generates a smaller subspace $X_h^N$ of dimension $N$,
and one online that find an approximation for a new parameter $\mu \in \mathcal{G}$ onto that subspace.

The offline stage is usually very costly but only executed once. We review below the random Singular Value Decomposition (rSVD). As seen in https://reducedbasis.github.io/post/offline/, its implementation comes with lower computational costs compared to other methods. Here, we address additional details about this method.

Let us denote by $\mathbf{A} \in \mathbb{R}^{N_h \times Ntrain}$ the snapshots matrix (i.e. the training solutions of $\mathcal{P}$ for different parameter values in $\mathcal{G}$), where $N_h$ is the number of degrees of freedom and $Ntrain$ is the number of snapshots.

Usually, we rely on a Truncated SVD for the offline part of RBMs. Let us first quickly go over how it works:

The truncated SVD

SVD and POD are very close algorithms:

The singular value decomposition (SVD) is widely used in low-rank approximation. It consists in factorizing (in our context) the matrix $\mathbf{A} \in \mathbb{R}^{N_h \times Ntrain}$ under the form $\mathbf{A} = \mathbf{U} \Sigma \mathbf{V}^T$ where $\mathbf{U} \in \mathbb{R}^{N_h \times N_h}$ and $\mathbf{V} \in \mathbb{R}^{Ntrain \times Ntrain}$ are unitary matrices with $\Sigma = diag(\sigma_1,...,\sigma_p) \in \mathbb{R}^{N_h \times Ntrain}$, and $ \sigma_1 \geq \sigma_2 \geq ··· \geq \sigma_p \geq 0$, with $p = $min$(N_h,Ntrain)$.

The $(\sigma_i)_{i=1,...,p}$ are the singular values of $\mathbf{A}$, $\mathbf{U}=[\Phi_1,\dots,\Phi_{N_h}]$ is made up of the left singular vectors of $\mathbf{A}$, whereas the elements of $\mathbf{V}=[\Psi_1,\dots,\Psi_{Ntrain}]$ are its right singular vectors, such that: $\mathbf{A} \Phi_i=\sigma \Psi_i$.

The rank of a diagonal matrix is equal to the number of non zero diagonal terms, and rank($\mathbf{A}$)=rank($\Sigma$), such that if $\mathbf{A}$ has $r$ positive singular values, then rank($\mathbf{A}$) = $r$. We deduce that $\mathbf{A}$ can be written as the following sum:

$\mathbf{A} = \sum_{i=1}^r \sigma_i \Phi_i \Psi_i^T$.

In the POD, $\mathbf{A}^T \mathbf{A} \in \mathbb{R}^{Ntrain \times Ntrain}$ is the correlation matrix and we use the fact that $\mathbf{A}^T \mathbf{A}= \mathbf{V} \Sigma^T \Sigma \mathbf{V}^T$. Solving this eigenvalue problem does not require much time if $Ntrain$ is small, although assembling the correlation matrix can be time consuming for a large number of degrees of freedom.

The dominant cost in the SVD on the snapshot matrix $\mathbf{A}$ is in $O (N_h \times Ntrain^2)$ flops.

Random SVD (rSVD) and fixed-rank approximation problem

The dominant cost in the random SVD on the snapshot matrix $\mathbf{A}$ is in $O((2q+2)\cdot N\dot N_h\cdot Ntrain)$ if $N_h>Ntrain$.

The main idea is to construct a low-dimensional subspace that captures the action of the matrix $\mathbf{A}$.

To do so, we need to restrict $\mathbf{A}$ onto that subspace, i.e. to project $\mathbf{A}$ onto that space and to compute a standard factorization (SVD factorization). Indeed, the reduced basis for the range of $\mathbf{A}$ is approximated by operating a truncated SVD onto that smaller space, which is the key ingredient to lower the costs compared to other offline RB algorithms.

$$\begin{equation*} \mathbf{A} \simeq \mathbf{Q} \mathbf{Q}^* \mathbf{A}, \end{equation*} $$

and such that $\mathbf{Q}^* \mathbf{Q}= \mathbf{I}$.

Indeed, we have

$$ \begin{equation*} \mathbf{Q}^*\mathbf{A}-\mathbf{B}=0, \end{equation*} $$$$ \begin{equation*} \mathbf{Q}^*(\mathbf{A}-\mathbf{Q}\mathbf{B})=0, \end{equation*} $$$$ \begin{equation*} Q^* (\mathbf{A}-\mathbf{Q} \mathbf{Q}^* \mathbf{A})=0. \end{equation*} $$

Therefore $\mathbf{Q} \mathbf{Q}^*$ acts like a orthogonal projector and $\mathbf{Q} \mathbf{Q}^* \mathbf{A}$ is the best approximation of $\mathbf{A}$ in the space spanned by the columns of $\mathbf{Q}$. In other words, $\mathbf{Q} \mathbf{Q}^* \mathbf{A}$ is the orthogonal projection of $\mathbf{A}$ onto the range of $\mathbf{Q}$ and thus,

$$\begin{equation*} col(\mathbf{A}-\mathbf{Q} \mathbf{Q}^*\mathbf{A}) \perp Im(\mathbf{Q}). \end{equation*}$$

Of course, we want the columns of $\mathbf{Q}$ to be as small as possible. The dimension $N$ is sometimes called the numerical rank of the matrix.

$$ \begin{equation*} \underset{rank(\mathbf{X}) \leq N}{\mathrm{min}} \lVert \mathbf{A}-\mathbf{X} \rVert = \sigma_{N+1}, \end{equation*} $$

where $\lVert \cdot \rVert$ refers to the spectral norm and $\sigma_{j=N+1}$ the $j$th largest singular value of $\mathcal{A}$.

Finally, a truncated SVD is processed on $\mathbf{B}=\mathbf{Q}^* \mathbf{A}$ which keeps the $N$ first singular values only.

The randomness of the method lies in the projection of $\mathbf{A}$: a random matrix $\Omega \in \mathbb{R}^{Ntrain \times}$ is constructed whose columns form a linearly independent family of random vectors. These vectors are then orthonormalized to obtain an orthonormal basis of the range of $\mathbf{A}$.

Thus we refer to the following algorithm:

we first generate a random matrix (usually Gaussian) $\Omega \in \mathbb{R}^{Nrain\times N}$.
Then $\mathbf{A}$ is projected onto the smaller space of size $Ntrain \times N$ thanks to $\Omega$, where $N$ is the number of modes:

$$ \begin{equation*} \mathbf{Y}=\mathbf{A} \ \Omega. \end{equation*}$$

If the singular spectrum of the input matrix decays slowly, we proceed with a orthonormalized power iteration processus (orthonormalization avoids round-off errors at each step). If the number of iterations $q$ is small (1 or 2), this orthonormalization is not required. At each step, we thus do

$$\begin{equation*} \mathbf{Y}=\mathbf{A} \ \mathbf{A}^T \mathbf{Y}, \end{equation*} $$

and we orthonormalize $\mathbf{Y}$ if $q>2$.

With a QR orthonormalization of $\mathbf{Y}$, we construct the reduced matrix $\mathbf{B} \in \mathbb{R}^{N \times Ntrain} $ such that

$$ \mathbf{B}= \mathbf{Q}^T \mathbf{A}. $$

Then we exploit a truncated SVD on $\mathbf{B}$ to generate the reduced basis of $\mathbf{A}$, denoted by $\mathbf{U}$:

$$ \begin{equation*} \mathbf{U}=\mathbf{Q} \ \tilde{\mathbf{U}}, \end{equation*}$$

where $\tilde{\mathbf{U}}$ are the left singular vectors of $\mathbf{B}$.

The oversampling effect is detailed in Paragraph 1.3.2 of the paper “Finding structure with randomness: probabilistic algorithms for construncting approximate matrix decompositoins”.

References

Halko, N., Martinsson, P. G., & Tropp, J. A. (2011). Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions.