version 1.6, 2011/07/22 07:38:04
|
version 1.7, 2011/11/21 20:42:50
|
Line 1
|
Line 1
|
|
*> \brief \b DGEJSV |
|
* |
|
* =========== DOCUMENTATION =========== |
|
* |
|
* Online html documentation available at |
|
* http://www.netlib.org/lapack/explore-html/ |
|
* |
|
*> \htmlonly |
|
*> Download DGEJSV + dependencies |
|
*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/dgejsv.f"> |
|
*> [TGZ]</a> |
|
*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/dgejsv.f"> |
|
*> [ZIP]</a> |
|
*> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/dgejsv.f"> |
|
*> [TXT]</a> |
|
*> \endhtmlonly |
|
* |
|
* Definition: |
|
* =========== |
|
* |
|
* SUBROUTINE DGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP, |
|
* M, N, A, LDA, SVA, U, LDU, V, LDV, |
|
* WORK, LWORK, IWORK, INFO ) |
|
* |
|
* .. Scalar Arguments .. |
|
* IMPLICIT NONE |
|
* INTEGER INFO, LDA, LDU, LDV, LWORK, M, N |
|
* .. |
|
* .. Array Arguments .. |
|
* DOUBLE PRECISION A( LDA, * ), SVA( N ), U( LDU, * ), V( LDV, * ), |
|
* $ WORK( LWORK ) |
|
* INTEGER IWORK( * ) |
|
* CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV |
|
* .. |
|
* |
|
* |
|
*> \par Purpose: |
|
* ============= |
|
*> |
|
*> \verbatim |
|
*> |
|
*> DGEJSV computes the singular value decomposition (SVD) of a real M-by-N |
|
*> matrix [A], where M >= N. The SVD of [A] is written as |
|
*> |
|
*> [A] = [U] * [SIGMA] * [V]^t, |
|
*> |
|
*> where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N |
|
*> diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and |
|
*> [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are |
|
*> the singular values of [A]. The columns of [U] and [V] are the left and |
|
*> the right singular vectors of [A], respectively. The matrices [U] and [V] |
|
*> are computed and stored in the arrays U and V, respectively. The diagonal |
|
*> of [SIGMA] is computed and stored in the array SVA. |
|
*> \endverbatim |
|
* |
|
* Arguments: |
|
* ========== |
|
* |
|
*> \param[in] JOBA |
|
*> \verbatim |
|
*> JOBA is CHARACTER*1 |
|
*> Specifies the level of accuracy: |
|
*> = 'C': This option works well (high relative accuracy) if A = B * D, |
|
*> with well-conditioned B and arbitrary diagonal matrix D. |
|
*> The accuracy cannot be spoiled by COLUMN scaling. The |
|
*> accuracy of the computed output depends on the condition of |
|
*> B, and the procedure aims at the best theoretical accuracy. |
|
*> The relative error max_{i=1:N}|d sigma_i| / sigma_i is |
|
*> bounded by f(M,N)*epsilon* cond(B), independent of D. |
|
*> The input matrix is preprocessed with the QRF with column |
|
*> pivoting. This initial preprocessing and preconditioning by |
|
*> a rank revealing QR factorization is common for all values of |
|
*> JOBA. Additional actions are specified as follows: |
|
*> = 'E': Computation as with 'C' with an additional estimate of the |
|
*> condition number of B. It provides a realistic error bound. |
|
*> = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings |
|
*> D1, D2, and well-conditioned matrix C, this option gives |
|
*> higher accuracy than the 'C' option. If the structure of the |
|
*> input matrix is not known, and relative accuracy is |
|
*> desirable, then this option is advisable. The input matrix A |
|
*> is preprocessed with QR factorization with FULL (row and |
|
*> column) pivoting. |
|
*> = 'G' Computation as with 'F' with an additional estimate of the |
|
*> condition number of B, where A=D*B. If A has heavily weighted |
|
*> rows, then using this condition number gives too pessimistic |
|
*> error bound. |
|
*> = 'A': Small singular values are the noise and the matrix is treated |
|
*> as numerically rank defficient. The error in the computed |
|
*> singular values is bounded by f(m,n)*epsilon*||A||. |
|
*> The computed SVD A = U * S * V^t restores A up to |
|
*> f(m,n)*epsilon*||A||. |
|
*> This gives the procedure the licence to discard (set to zero) |
|
*> all singular values below N*epsilon*||A||. |
|
*> = 'R': Similar as in 'A'. Rank revealing property of the initial |
|
*> QR factorization is used do reveal (using triangular factor) |
|
*> a gap sigma_{r+1} < epsilon * sigma_r in which case the |
|
*> numerical RANK is declared to be r. The SVD is computed with |
|
*> absolute error bounds, but more accurately than with 'A'. |
|
*> \endverbatim |
|
*> |
|
*> \param[in] JOBU |
|
*> \verbatim |
|
*> JOBU is CHARACTER*1 |
|
*> Specifies whether to compute the columns of U: |
|
*> = 'U': N columns of U are returned in the array U. |
|
*> = 'F': full set of M left sing. vectors is returned in the array U. |
|
*> = 'W': U may be used as workspace of length M*N. See the description |
|
*> of U. |
|
*> = 'N': U is not computed. |
|
*> \endverbatim |
|
*> |
|
*> \param[in] JOBV |
|
*> \verbatim |
|
*> JOBV is CHARACTER*1 |
|
*> Specifies whether to compute the matrix V: |
|
*> = 'V': N columns of V are returned in the array V; Jacobi rotations |
|
*> are not explicitly accumulated. |
|
*> = 'J': N columns of V are returned in the array V, but they are |
|
*> computed as the product of Jacobi rotations. This option is |
|
*> allowed only if JOBU .NE. 'N', i.e. in computing the full SVD. |
|
*> = 'W': V may be used as workspace of length N*N. See the description |
|
*> of V. |
|
*> = 'N': V is not computed. |
|
*> \endverbatim |
|
*> |
|
*> \param[in] JOBR |
|
*> \verbatim |
|
*> JOBR is CHARACTER*1 |
|
*> Specifies the RANGE for the singular values. Issues the licence to |
|
*> set to zero small positive singular values if they are outside |
|
*> specified range. If A .NE. 0 is scaled so that the largest singular |
|
*> value of c*A is around DSQRT(BIG), BIG=SLAMCH('O'), then JOBR issues |
|
*> the licence to kill columns of A whose norm in c*A is less than |
|
*> DSQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN, |
|
*> where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E'). |
|
*> = 'N': Do not kill small columns of c*A. This option assumes that |
|
*> BLAS and QR factorizations and triangular solvers are |
|
*> implemented to work in that range. If the condition of A |
|
*> is greater than BIG, use DGESVJ. |
|
*> = 'R': RESTRICTED range for sigma(c*A) is [DSQRT(SFMIN), DSQRT(BIG)] |
|
*> (roughly, as described above). This option is recommended. |
|
*> ~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
*> For computing the singular values in the FULL range [SFMIN,BIG] |
|
*> use DGESVJ. |
|
*> \endverbatim |
|
*> |
|
*> \param[in] JOBT |
|
*> \verbatim |
|
*> JOBT is CHARACTER*1 |
|
*> If the matrix is square then the procedure may determine to use |
|
*> transposed A if A^t seems to be better with respect to convergence. |
|
*> If the matrix is not square, JOBT is ignored. This is subject to |
|
*> changes in the future. |
|
*> The decision is based on two values of entropy over the adjoint |
|
*> orbit of A^t * A. See the descriptions of WORK(6) and WORK(7). |
|
*> = 'T': transpose if entropy test indicates possibly faster |
|
*> convergence of Jacobi process if A^t is taken as input. If A is |
|
*> replaced with A^t, then the row pivoting is included automatically. |
|
*> = 'N': do not speculate. |
|
*> This option can be used to compute only the singular values, or the |
|
*> full SVD (U, SIGMA and V). For only one set of singular vectors |
|
*> (U or V), the caller should provide both U and V, as one of the |
|
*> matrices is used as workspace if the matrix A is transposed. |
|
*> The implementer can easily remove this constraint and make the |
|
*> code more complicated. See the descriptions of U and V. |
|
*> \endverbatim |
|
*> |
|
*> \param[in] JOBP |
|
*> \verbatim |
|
*> JOBP is CHARACTER*1 |
|
*> Issues the licence to introduce structured perturbations to drown |
|
*> denormalized numbers. This licence should be active if the |
|
*> denormals are poorly implemented, causing slow computation, |
|
*> especially in cases of fast convergence (!). For details see [1,2]. |
|
*> For the sake of simplicity, this perturbations are included only |
|
*> when the full SVD or only the singular values are requested. The |
|
*> implementer/user can easily add the perturbation for the cases of |
|
*> computing one set of singular vectors. |
|
*> = 'P': introduce perturbation |
|
*> = 'N': do not perturb |
|
*> \endverbatim |
|
*> |
|
*> \param[in] M |
|
*> \verbatim |
|
*> M is INTEGER |
|
*> The number of rows of the input matrix A. M >= 0. |
|
*> \endverbatim |
|
*> |
|
*> \param[in] N |
|
*> \verbatim |
|
*> N is INTEGER |
|
*> The number of columns of the input matrix A. M >= N >= 0. |
|
*> \endverbatim |
|
*> |
|
*> \param[in,out] A |
|
*> \verbatim |
|
*> A is DOUBLE PRECISION array, dimension (LDA,N) |
|
*> On entry, the M-by-N matrix A. |
|
*> \endverbatim |
|
*> |
|
*> \param[in] LDA |
|
*> \verbatim |
|
*> LDA is INTEGER |
|
*> The leading dimension of the array A. LDA >= max(1,M). |
|
*> \endverbatim |
|
*> |
|
*> \param[out] SVA |
|
*> \verbatim |
|
*> SVA is DOUBLE PRECISION array, dimension (N) |
|
*> On exit, |
|
*> - For WORK(1)/WORK(2) = ONE: The singular values of A. During the |
|
*> computation SVA contains Euclidean column norms of the |
|
*> iterated matrices in the array A. |
|
*> - For WORK(1) .NE. WORK(2): The singular values of A are |
|
*> (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if |
|
*> sigma_max(A) overflows or if small singular values have been |
|
*> saved from underflow by scaling the input matrix A. |
|
*> - If JOBR='R' then some of the singular values may be returned |
|
*> as exact zeros obtained by "set to zero" because they are |
|
*> below the numerical rank threshold or are denormalized numbers. |
|
*> \endverbatim |
|
*> |
|
*> \param[out] U |
|
*> \verbatim |
|
*> U is DOUBLE PRECISION array, dimension ( LDU, N ) |
|
*> If JOBU = 'U', then U contains on exit the M-by-N matrix of |
|
*> the left singular vectors. |
|
*> If JOBU = 'F', then U contains on exit the M-by-M matrix of |
|
*> the left singular vectors, including an ONB |
|
*> of the orthogonal complement of the Range(A). |
|
*> If JOBU = 'W' .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N), |
|
*> then U is used as workspace if the procedure |
|
*> replaces A with A^t. In that case, [V] is computed |
|
*> in U as left singular vectors of A^t and then |
|
*> copied back to the V array. This 'W' option is just |
|
*> a reminder to the caller that in this case U is |
|
*> reserved as workspace of length N*N. |
|
*> If JOBU = 'N' U is not referenced. |
|
*> \endverbatim |
|
*> |
|
*> \param[in] LDU |
|
*> \verbatim |
|
*> LDU is INTEGER |
|
*> The leading dimension of the array U, LDU >= 1. |
|
*> IF JOBU = 'U' or 'F' or 'W', then LDU >= M. |
|
*> \endverbatim |
|
*> |
|
*> \param[out] V |
|
*> \verbatim |
|
*> V is DOUBLE PRECISION array, dimension ( LDV, N ) |
|
*> If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of |
|
*> the right singular vectors; |
|
*> If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N), |
|
*> then V is used as workspace if the pprocedure |
|
*> replaces A with A^t. In that case, [U] is computed |
|
*> in V as right singular vectors of A^t and then |
|
*> copied back to the U array. This 'W' option is just |
|
*> a reminder to the caller that in this case V is |
|
*> reserved as workspace of length N*N. |
|
*> If JOBV = 'N' V is not referenced. |
|
*> \endverbatim |
|
*> |
|
*> \param[in] LDV |
|
*> \verbatim |
|
*> LDV is INTEGER |
|
*> The leading dimension of the array V, LDV >= 1. |
|
*> If JOBV = 'V' or 'J' or 'W', then LDV >= N. |
|
*> \endverbatim |
|
*> |
|
*> \param[out] WORK |
|
*> \verbatim |
|
*> WORK is DOUBLE PRECISION array, dimension at least LWORK. |
|
*> On exit, if N.GT.0 .AND. M.GT.0 (else not referenced), |
|
*> WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such |
|
*> that SCALE*SVA(1:N) are the computed singular values |
|
*> of A. (See the description of SVA().) |
|
*> WORK(2) = See the description of WORK(1). |
|
*> WORK(3) = SCONDA is an estimate for the condition number of |
|
*> column equilibrated A. (If JOBA .EQ. 'E' or 'G') |
|
*> SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1). |
|
*> It is computed using DPOCON. It holds |
|
*> N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA |
|
*> where R is the triangular factor from the QRF of A. |
|
*> However, if R is truncated and the numerical rank is |
|
*> determined to be strictly smaller than N, SCONDA is |
|
*> returned as -1, thus indicating that the smallest |
|
*> singular values might be lost. |
|
*> |
|
*> If full SVD is needed, the following two condition numbers are |
|
*> useful for the analysis of the algorithm. They are provied for |
|
*> a developer/implementer who is familiar with the details of |
|
*> the method. |
|
*> |
|
*> WORK(4) = an estimate of the scaled condition number of the |
|
*> triangular factor in the first QR factorization. |
|
*> WORK(5) = an estimate of the scaled condition number of the |
|
*> triangular factor in the second QR factorization. |
|
*> The following two parameters are computed if JOBT .EQ. 'T'. |
|
*> They are provided for a developer/implementer who is familiar |
|
*> with the details of the method. |
|
*> |
|
*> WORK(6) = the entropy of A^t*A :: this is the Shannon entropy |
|
*> of diag(A^t*A) / Trace(A^t*A) taken as point in the |
|
*> probability simplex. |
|
*> WORK(7) = the entropy of A*A^t. |
|
*> \endverbatim |
|
*> |
|
*> \param[in] LWORK |
|
*> \verbatim |
|
*> LWORK is INTEGER |
|
*> Length of WORK to confirm proper allocation of work space. |
|
*> LWORK depends on the job: |
|
*> |
|
*> If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and |
|
*> -> .. no scaled condition estimate required (JOBE.EQ.'N'): |
|
*> LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement. |
|
*> ->> For optimal performance (blocked code) the optimal value |
|
*> is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal |
|
*> block size for DGEQP3 and DGEQRF. |
|
*> In general, optimal LWORK is computed as |
|
*> LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 7). |
|
*> -> .. an estimate of the scaled condition number of A is |
|
*> required (JOBA='E', 'G'). In this case, LWORK is the maximum |
|
*> of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4*N,7). |
|
*> ->> For optimal performance (blocked code) the optimal value |
|
*> is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7). |
|
*> In general, the optimal length LWORK is computed as |
|
*> LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), |
|
*> N+N*N+LWORK(DPOCON),7). |
|
*> |
|
*> If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'), |
|
*> -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7). |
|
*> -> For optimal performance, LWORK >= max(2*M+N,3*N+(N+1)*NB,7), |
|
*> where NB is the optimal block size for DGEQP3, DGEQRF, DGELQ, |
|
*> DORMLQ. In general, the optimal length LWORK is computed as |
|
*> LWORK >= max(2*M+N,N+LWORK(DGEQP3), N+LWORK(DPOCON), |
|
*> N+LWORK(DGELQ), 2*N+LWORK(DGEQRF), N+LWORK(DORMLQ)). |
|
*> |
|
*> If SIGMA and the left singular vectors are needed |
|
*> -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7). |
|
*> -> For optimal performance: |
|
*> if JOBU.EQ.'U' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,7), |
|
*> if JOBU.EQ.'F' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,N+M*NB,7), |
|
*> where NB is the optimal block size for DGEQP3, DGEQRF, DORMQR. |
|
*> In general, the optimal length LWORK is computed as |
|
*> LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DPOCON), |
|
*> 2*N+LWORK(DGEQRF), N+LWORK(DORMQR)). |
|
*> Here LWORK(DORMQR) equals N*NB (for JOBU.EQ.'U') or |
|
*> M*NB (for JOBU.EQ.'F'). |
|
*> |
|
*> If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and |
|
*> -> if JOBV.EQ.'V' |
|
*> the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N). |
|
*> -> if JOBV.EQ.'J' the minimal requirement is |
|
*> LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6). |
|
*> -> For optimal performance, LWORK should be additionally |
|
*> larger than N+M*NB, where NB is the optimal block size |
|
*> for DORMQR. |
|
*> \endverbatim |
|
*> |
|
*> \param[out] IWORK |
|
*> \verbatim |
|
*> IWORK is INTEGER array, dimension M+3*N. |
|
*> On exit, |
|
*> IWORK(1) = the numerical rank determined after the initial |
|
*> QR factorization with pivoting. See the descriptions |
|
*> of JOBA and JOBR. |
|
*> IWORK(2) = the number of the computed nonzero singular values |
|
*> IWORK(3) = if nonzero, a warning message: |
|
*> If IWORK(3).EQ.1 then some of the column norms of A |
|
*> were denormalized floats. The requested high accuracy |
|
*> is not warranted by the data. |
|
*> \endverbatim |
|
*> |
|
*> \param[out] INFO |
|
*> \verbatim |
|
*> INFO is INTEGER |
|
*> < 0 : if INFO = -i, then the i-th argument had an illegal value. |
|
*> = 0 : successfull exit; |
|
*> > 0 : DGEJSV did not converge in the maximal allowed number |
|
*> of sweeps. The computed values may be inaccurate. |
|
*> \endverbatim |
|
* |
|
* Authors: |
|
* ======== |
|
* |
|
*> \author Univ. of Tennessee |
|
*> \author Univ. of California Berkeley |
|
*> \author Univ. of Colorado Denver |
|
*> \author NAG Ltd. |
|
* |
|
*> \date November 2011 |
|
* |
|
*> \ingroup doubleGEcomputational |
|
* |
|
*> \par Further Details: |
|
* ===================== |
|
*> |
|
*> \verbatim |
|
*> |
|
*> DGEJSV implements a preconditioned Jacobi SVD algorithm. It uses DGEQP3, |
|
*> DGEQRF, and DGELQF as preprocessors and preconditioners. Optionally, an |
|
*> additional row pivoting can be used as a preprocessor, which in some |
|
*> cases results in much higher accuracy. An example is matrix A with the |
|
*> structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned |
|
*> diagonal matrices and C is well-conditioned matrix. In that case, complete |
|
*> pivoting in the first QR factorizations provides accuracy dependent on the |
|
*> condition number of C, and independent of D1, D2. Such higher accuracy is |
|
*> not completely understood theoretically, but it works well in practice. |
|
*> Further, if A can be written as A = B*D, with well-conditioned B and some |
|
*> diagonal D, then the high accuracy is guaranteed, both theoretically and |
|
*> in software, independent of D. For more details see [1], [2]. |
|
*> The computational range for the singular values can be the full range |
|
*> ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS |
|
*> & LAPACK routines called by DGEJSV are implemented to work in that range. |
|
*> If that is not the case, then the restriction for safe computation with |
|
*> the singular values in the range of normalized IEEE numbers is that the |
|
*> spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not |
|
*> overflow. This code (DGEJSV) is best used in this restricted range, |
|
*> meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are |
|
*> returned as zeros. See JOBR for details on this. |
|
*> Further, this implementation is somewhat slower than the one described |
|
*> in [1,2] due to replacement of some non-LAPACK components, and because |
|
*> the choice of some tuning parameters in the iterative part (DGESVJ) is |
|
*> left to the implementer on a particular machine. |
|
*> The rank revealing QR factorization (in this code: DGEQP3) should be |
|
*> implemented as in [3]. We have a new version of DGEQP3 under development |
|
*> that is more robust than the current one in LAPACK, with a cleaner cut in |
|
*> rank defficient cases. It will be available in the SIGMA library [4]. |
|
*> If M is much larger than N, it is obvious that the inital QRF with |
|
*> column pivoting can be preprocessed by the QRF without pivoting. That |
|
*> well known trick is not used in DGEJSV because in some cases heavy row |
|
*> weighting can be treated with complete pivoting. The overhead in cases |
|
*> M much larger than N is then only due to pivoting, but the benefits in |
|
*> terms of accuracy have prevailed. The implementer/user can incorporate |
|
*> this extra QRF step easily. The implementer can also improve data movement |
|
*> (matrix transpose, matrix copy, matrix transposed copy) - this |
|
*> implementation of DGEJSV uses only the simplest, naive data movement. |
|
*> \endverbatim |
|
* |
|
*> \par Contributors: |
|
* ================== |
|
*> |
|
*> Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany) |
|
* |
|
*> \par References: |
|
* ================ |
|
*> |
|
*> \verbatim |
|
*> |
|
*> [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I. |
|
*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342. |
|
*> LAPACK Working note 169. |
|
*> [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II. |
|
*> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362. |
|
*> LAPACK Working note 170. |
|
*> [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR |
|
*> factorization software - a case study. |
|
*> ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28. |
|
*> LAPACK Working note 176. |
|
*> [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV, |
|
*> QSVD, (H,K)-SVD computations. |
|
*> Department of Mathematics, University of Zagreb, 2008. |
|
*> \endverbatim |
|
* |
|
*> \par Bugs, examples and comments: |
|
* ================================= |
|
*> |
|
*> Please report all bugs and send interesting examples and/or comments to |
|
*> drmac@math.hr. Thank you. |
|
*> |
|
* ===================================================================== |
SUBROUTINE DGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP, |
SUBROUTINE DGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP, |
$ M, N, A, LDA, SVA, U, LDU, V, LDV, |
$ M, N, A, LDA, SVA, U, LDU, V, LDV, |
$ WORK, LWORK, IWORK, INFO ) |
$ WORK, LWORK, IWORK, INFO ) |
* |
* |
* -- LAPACK routine (version 3.3.1) -- |
* -- LAPACK computational routine (version 3.4.0) -- |
* |
|
* -- Contributed by Zlatko Drmac of the University of Zagreb and -- |
|
* -- Kresimir Veselic of the Fernuniversitaet Hagen -- |
|
* -- April 2011 -- |
|
* |
|
* -- LAPACK is a software package provided by Univ. of Tennessee, -- |
* -- LAPACK is a software package provided by Univ. of Tennessee, -- |
* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..-- |
* -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..-- |
* |
* November 2011 |
* This routine is also part of SIGMA (version 1.23, October 23. 2008.) |
|
* SIGMA is a library of algorithms for highly accurate algorithms for |
|
* computation of SVD, PSVD, QSVD, (H,K)-SVD, and for solution of the |
|
* eigenvalue problems Hx = lambda M x, H M x = lambda x with H, M > 0. |
|
* |
* |
* .. Scalar Arguments .. |
* .. Scalar Arguments .. |
IMPLICIT NONE |
IMPLICIT NONE |
Line 27
|
Line 490
|
CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV |
CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV |
* .. |
* .. |
* |
* |
* Purpose |
|
* ======= |
|
* |
|
* DGEJSV computes the singular value decomposition (SVD) of a real M-by-N |
|
* matrix [A], where M >= N. The SVD of [A] is written as |
|
* |
|
* [A] = [U] * [SIGMA] * [V]^t, |
|
* |
|
* where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N |
|
* diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and |
|
* [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are |
|
* the singular values of [A]. The columns of [U] and [V] are the left and |
|
* the right singular vectors of [A], respectively. The matrices [U] and [V] |
|
* are computed and stored in the arrays U and V, respectively. The diagonal |
|
* of [SIGMA] is computed and stored in the array SVA. |
|
* |
|
* Arguments |
|
* ========= |
|
* |
|
* JOBA (input) CHARACTER*1 |
|
* Specifies the level of accuracy: |
|
* = 'C': This option works well (high relative accuracy) if A = B * D, |
|
* with well-conditioned B and arbitrary diagonal matrix D. |
|
* The accuracy cannot be spoiled by COLUMN scaling. The |
|
* accuracy of the computed output depends on the condition of |
|
* B, and the procedure aims at the best theoretical accuracy. |
|
* The relative error max_{i=1:N}|d sigma_i| / sigma_i is |
|
* bounded by f(M,N)*epsilon* cond(B), independent of D. |
|
* The input matrix is preprocessed with the QRF with column |
|
* pivoting. This initial preprocessing and preconditioning by |
|
* a rank revealing QR factorization is common for all values of |
|
* JOBA. Additional actions are specified as follows: |
|
* = 'E': Computation as with 'C' with an additional estimate of the |
|
* condition number of B. It provides a realistic error bound. |
|
* = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings |
|
* D1, D2, and well-conditioned matrix C, this option gives |
|
* higher accuracy than the 'C' option. If the structure of the |
|
* input matrix is not known, and relative accuracy is |
|
* desirable, then this option is advisable. The input matrix A |
|
* is preprocessed with QR factorization with FULL (row and |
|
* column) pivoting. |
|
* = 'G' Computation as with 'F' with an additional estimate of the |
|
* condition number of B, where A=D*B. If A has heavily weighted |
|
* rows, then using this condition number gives too pessimistic |
|
* error bound. |
|
* = 'A': Small singular values are the noise and the matrix is treated |
|
* as numerically rank defficient. The error in the computed |
|
* singular values is bounded by f(m,n)*epsilon*||A||. |
|
* The computed SVD A = U * S * V^t restores A up to |
|
* f(m,n)*epsilon*||A||. |
|
* This gives the procedure the licence to discard (set to zero) |
|
* all singular values below N*epsilon*||A||. |
|
* = 'R': Similar as in 'A'. Rank revealing property of the initial |
|
* QR factorization is used do reveal (using triangular factor) |
|
* a gap sigma_{r+1} < epsilon * sigma_r in which case the |
|
* numerical RANK is declared to be r. The SVD is computed with |
|
* absolute error bounds, but more accurately than with 'A'. |
|
* |
|
* JOBU (input) CHARACTER*1 |
|
* Specifies whether to compute the columns of U: |
|
* = 'U': N columns of U are returned in the array U. |
|
* = 'F': full set of M left sing. vectors is returned in the array U. |
|
* = 'W': U may be used as workspace of length M*N. See the description |
|
* of U. |
|
* = 'N': U is not computed. |
|
* |
|
* JOBV (input) CHARACTER*1 |
|
* Specifies whether to compute the matrix V: |
|
* = 'V': N columns of V are returned in the array V; Jacobi rotations |
|
* are not explicitly accumulated. |
|
* = 'J': N columns of V are returned in the array V, but they are |
|
* computed as the product of Jacobi rotations. This option is |
|
* allowed only if JOBU .NE. 'N', i.e. in computing the full SVD. |
|
* = 'W': V may be used as workspace of length N*N. See the description |
|
* of V. |
|
* = 'N': V is not computed. |
|
* |
|
* JOBR (input) CHARACTER*1 |
|
* Specifies the RANGE for the singular values. Issues the licence to |
|
* set to zero small positive singular values if they are outside |
|
* specified range. If A .NE. 0 is scaled so that the largest singular |
|
* value of c*A is around DSQRT(BIG), BIG=SLAMCH('O'), then JOBR issues |
|
* the licence to kill columns of A whose norm in c*A is less than |
|
* DSQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN, |
|
* where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E'). |
|
* = 'N': Do not kill small columns of c*A. This option assumes that |
|
* BLAS and QR factorizations and triangular solvers are |
|
* implemented to work in that range. If the condition of A |
|
* is greater than BIG, use DGESVJ. |
|
* = 'R': RESTRICTED range for sigma(c*A) is [DSQRT(SFMIN), DSQRT(BIG)] |
|
* (roughly, as described above). This option is recommended. |
|
* ~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
* For computing the singular values in the FULL range [SFMIN,BIG] |
|
* use DGESVJ. |
|
* |
|
* JOBT (input) CHARACTER*1 |
|
* If the matrix is square then the procedure may determine to use |
|
* transposed A if A^t seems to be better with respect to convergence. |
|
* If the matrix is not square, JOBT is ignored. This is subject to |
|
* changes in the future. |
|
* The decision is based on two values of entropy over the adjoint |
|
* orbit of A^t * A. See the descriptions of WORK(6) and WORK(7). |
|
* = 'T': transpose if entropy test indicates possibly faster |
|
* convergence of Jacobi process if A^t is taken as input. If A is |
|
* replaced with A^t, then the row pivoting is included automatically. |
|
* = 'N': do not speculate. |
|
* This option can be used to compute only the singular values, or the |
|
* full SVD (U, SIGMA and V). For only one set of singular vectors |
|
* (U or V), the caller should provide both U and V, as one of the |
|
* matrices is used as workspace if the matrix A is transposed. |
|
* The implementer can easily remove this constraint and make the |
|
* code more complicated. See the descriptions of U and V. |
|
* |
|
* JOBP (input) CHARACTER*1 |
|
* Issues the licence to introduce structured perturbations to drown |
|
* denormalized numbers. This licence should be active if the |
|
* denormals are poorly implemented, causing slow computation, |
|
* especially in cases of fast convergence (!). For details see [1,2]. |
|
* For the sake of simplicity, this perturbations are included only |
|
* when the full SVD or only the singular values are requested. The |
|
* implementer/user can easily add the perturbation for the cases of |
|
* computing one set of singular vectors. |
|
* = 'P': introduce perturbation |
|
* = 'N': do not perturb |
|
* |
|
* M (input) INTEGER |
|
* The number of rows of the input matrix A. M >= 0. |
|
* |
|
* N (input) INTEGER |
|
* The number of columns of the input matrix A. M >= N >= 0. |
|
* |
|
* A (input/workspace) DOUBLE PRECISION array, dimension (LDA,N) |
|
* On entry, the M-by-N matrix A. |
|
* |
|
* LDA (input) INTEGER |
|
* The leading dimension of the array A. LDA >= max(1,M). |
|
* |
|
* SVA (workspace/output) DOUBLE PRECISION array, dimension (N) |
|
* On exit, |
|
* - For WORK(1)/WORK(2) = ONE: The singular values of A. During the |
|
* computation SVA contains Euclidean column norms of the |
|
* iterated matrices in the array A. |
|
* - For WORK(1) .NE. WORK(2): The singular values of A are |
|
* (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if |
|
* sigma_max(A) overflows or if small singular values have been |
|
* saved from underflow by scaling the input matrix A. |
|
* - If JOBR='R' then some of the singular values may be returned |
|
* as exact zeros obtained by "set to zero" because they are |
|
* below the numerical rank threshold or are denormalized numbers. |
|
* |
|
* U (workspace/output) DOUBLE PRECISION array, dimension ( LDU, N ) |
|
* If JOBU = 'U', then U contains on exit the M-by-N matrix of |
|
* the left singular vectors. |
|
* If JOBU = 'F', then U contains on exit the M-by-M matrix of |
|
* the left singular vectors, including an ONB |
|
* of the orthogonal complement of the Range(A). |
|
* If JOBU = 'W' .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N), |
|
* then U is used as workspace if the procedure |
|
* replaces A with A^t. In that case, [V] is computed |
|
* in U as left singular vectors of A^t and then |
|
* copied back to the V array. This 'W' option is just |
|
* a reminder to the caller that in this case U is |
|
* reserved as workspace of length N*N. |
|
* If JOBU = 'N' U is not referenced. |
|
* |
|
* LDU (input) INTEGER |
|
* The leading dimension of the array U, LDU >= 1. |
|
* IF JOBU = 'U' or 'F' or 'W', then LDU >= M. |
|
* |
|
* V (workspace/output) DOUBLE PRECISION array, dimension ( LDV, N ) |
|
* If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of |
|
* the right singular vectors; |
|
* If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N), |
|
* then V is used as workspace if the pprocedure |
|
* replaces A with A^t. In that case, [U] is computed |
|
* in V as right singular vectors of A^t and then |
|
* copied back to the U array. This 'W' option is just |
|
* a reminder to the caller that in this case V is |
|
* reserved as workspace of length N*N. |
|
* If JOBV = 'N' V is not referenced. |
|
* |
|
* LDV (input) INTEGER |
|
* The leading dimension of the array V, LDV >= 1. |
|
* If JOBV = 'V' or 'J' or 'W', then LDV >= N. |
|
* |
|
* WORK (workspace/output) DOUBLE PRECISION array, dimension at least LWORK. |
|
* On exit, if N.GT.0 .AND. M.GT.0 (else not referenced), |
|
* WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such |
|
* that SCALE*SVA(1:N) are the computed singular values |
|
* of A. (See the description of SVA().) |
|
* WORK(2) = See the description of WORK(1). |
|
* WORK(3) = SCONDA is an estimate for the condition number of |
|
* column equilibrated A. (If JOBA .EQ. 'E' or 'G') |
|
* SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1). |
|
* It is computed using DPOCON. It holds |
|
* N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA |
|
* where R is the triangular factor from the QRF of A. |
|
* However, if R is truncated and the numerical rank is |
|
* determined to be strictly smaller than N, SCONDA is |
|
* returned as -1, thus indicating that the smallest |
|
* singular values might be lost. |
|
* |
|
* If full SVD is needed, the following two condition numbers are |
|
* useful for the analysis of the algorithm. They are provied for |
|
* a developer/implementer who is familiar with the details of |
|
* the method. |
|
* |
|
* WORK(4) = an estimate of the scaled condition number of the |
|
* triangular factor in the first QR factorization. |
|
* WORK(5) = an estimate of the scaled condition number of the |
|
* triangular factor in the second QR factorization. |
|
* The following two parameters are computed if JOBT .EQ. 'T'. |
|
* They are provided for a developer/implementer who is familiar |
|
* with the details of the method. |
|
* |
|
* WORK(6) = the entropy of A^t*A :: this is the Shannon entropy |
|
* of diag(A^t*A) / Trace(A^t*A) taken as point in the |
|
* probability simplex. |
|
* WORK(7) = the entropy of A*A^t. |
|
* |
|
* LWORK (input) INTEGER |
|
* Length of WORK to confirm proper allocation of work space. |
|
* LWORK depends on the job: |
|
* |
|
* If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and |
|
* -> .. no scaled condition estimate required (JOBE.EQ.'N'): |
|
* LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement. |
|
* ->> For optimal performance (blocked code) the optimal value |
|
* is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal |
|
* block size for DGEQP3 and DGEQRF. |
|
* In general, optimal LWORK is computed as |
|
* LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 7). |
|
* -> .. an estimate of the scaled condition number of A is |
|
* required (JOBA='E', 'G'). In this case, LWORK is the maximum |
|
* of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4*N,7). |
|
* ->> For optimal performance (blocked code) the optimal value |
|
* is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7). |
|
* In general, the optimal length LWORK is computed as |
|
* LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), |
|
* N+N*N+LWORK(DPOCON),7). |
|
* |
|
* If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'), |
|
* -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7). |
|
* -> For optimal performance, LWORK >= max(2*M+N,3*N+(N+1)*NB,7), |
|
* where NB is the optimal block size for DGEQP3, DGEQRF, DGELQ, |
|
* DORMLQ. In general, the optimal length LWORK is computed as |
|
* LWORK >= max(2*M+N,N+LWORK(DGEQP3), N+LWORK(DPOCON), |
|
* N+LWORK(DGELQ), 2*N+LWORK(DGEQRF), N+LWORK(DORMLQ)). |
|
* |
|
* If SIGMA and the left singular vectors are needed |
|
* -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7). |
|
* -> For optimal performance: |
|
* if JOBU.EQ.'U' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,7), |
|
* if JOBU.EQ.'F' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,N+M*NB,7), |
|
* where NB is the optimal block size for DGEQP3, DGEQRF, DORMQR. |
|
* In general, the optimal length LWORK is computed as |
|
* LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DPOCON), |
|
* 2*N+LWORK(DGEQRF), N+LWORK(DORMQR)). |
|
* Here LWORK(DORMQR) equals N*NB (for JOBU.EQ.'U') or |
|
* M*NB (for JOBU.EQ.'F'). |
|
* |
|
* If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and |
|
* -> if JOBV.EQ.'V' |
|
* the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N). |
|
* -> if JOBV.EQ.'J' the minimal requirement is |
|
* LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6). |
|
* -> For optimal performance, LWORK should be additionally |
|
* larger than N+M*NB, where NB is the optimal block size |
|
* for DORMQR. |
|
* |
|
* IWORK (workspace/output) INTEGER array, dimension M+3*N. |
|
* On exit, |
|
* IWORK(1) = the numerical rank determined after the initial |
|
* QR factorization with pivoting. See the descriptions |
|
* of JOBA and JOBR. |
|
* IWORK(2) = the number of the computed nonzero singular values |
|
* IWORK(3) = if nonzero, a warning message: |
|
* If IWORK(3).EQ.1 then some of the column norms of A |
|
* were denormalized floats. The requested high accuracy |
|
* is not warranted by the data. |
|
* |
|
* INFO (output) INTEGER |
|
* < 0 : if INFO = -i, then the i-th argument had an illegal value. |
|
* = 0 : successfull exit; |
|
* > 0 : DGEJSV did not converge in the maximal allowed number |
|
* of sweeps. The computed values may be inaccurate. |
|
* |
|
* Further Details |
|
* =============== |
|
* |
|
* DGEJSV implements a preconditioned Jacobi SVD algorithm. It uses DGEQP3, |
|
* DGEQRF, and DGELQF as preprocessors and preconditioners. Optionally, an |
|
* additional row pivoting can be used as a preprocessor, which in some |
|
* cases results in much higher accuracy. An example is matrix A with the |
|
* structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned |
|
* diagonal matrices and C is well-conditioned matrix. In that case, complete |
|
* pivoting in the first QR factorizations provides accuracy dependent on the |
|
* condition number of C, and independent of D1, D2. Such higher accuracy is |
|
* not completely understood theoretically, but it works well in practice. |
|
* Further, if A can be written as A = B*D, with well-conditioned B and some |
|
* diagonal D, then the high accuracy is guaranteed, both theoretically and |
|
* in software, independent of D. For more details see [1], [2]. |
|
* The computational range for the singular values can be the full range |
|
* ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS |
|
* & LAPACK routines called by DGEJSV are implemented to work in that range. |
|
* If that is not the case, then the restriction for safe computation with |
|
* the singular values in the range of normalized IEEE numbers is that the |
|
* spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not |
|
* overflow. This code (DGEJSV) is best used in this restricted range, |
|
* meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are |
|
* returned as zeros. See JOBR for details on this. |
|
* Further, this implementation is somewhat slower than the one described |
|
* in [1,2] due to replacement of some non-LAPACK components, and because |
|
* the choice of some tuning parameters in the iterative part (DGESVJ) is |
|
* left to the implementer on a particular machine. |
|
* The rank revealing QR factorization (in this code: DGEQP3) should be |
|
* implemented as in [3]. We have a new version of DGEQP3 under development |
|
* that is more robust than the current one in LAPACK, with a cleaner cut in |
|
* rank defficient cases. It will be available in the SIGMA library [4]. |
|
* If M is much larger than N, it is obvious that the inital QRF with |
|
* column pivoting can be preprocessed by the QRF without pivoting. That |
|
* well known trick is not used in DGEJSV because in some cases heavy row |
|
* weighting can be treated with complete pivoting. The overhead in cases |
|
* M much larger than N is then only due to pivoting, but the benefits in |
|
* terms of accuracy have prevailed. The implementer/user can incorporate |
|
* this extra QRF step easily. The implementer can also improve data movement |
|
* (matrix transpose, matrix copy, matrix transposed copy) - this |
|
* implementation of DGEJSV uses only the simplest, naive data movement. |
|
* |
|
* Contributors |
|
* |
|
* Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany) |
|
* |
|
* References |
|
* |
|
* [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I. |
|
* SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342. |
|
* LAPACK Working note 169. |
|
* [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II. |
|
* SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362. |
|
* LAPACK Working note 170. |
|
* [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR |
|
* factorization software - a case study. |
|
* ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28. |
|
* LAPACK Working note 176. |
|
* [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV, |
|
* QSVD, (H,K)-SVD computations. |
|
* Department of Mathematics, University of Zagreb, 2008. |
|
* |
|
* Bugs, examples and comments |
|
* |
|
* Please report all bugs and send interesting examples and/or comments to |
|
* drmac@math.hr. Thank you. |
|
* |
|
* =========================================================================== |
* =========================================================================== |
* |
* |
* .. Local Parameters .. |
* .. Local Parameters .. |