rpl/lapack/lapack/dgejsv.f - diff

Return to dgejsv.f CVS log

Up to [local] / rpl / lapack / lapack

Diff for /rpl/lapack/lapack/dgejsv.f between versions 1.6 and 1.7

-version 1.6, 2011/07/22 07:38:04
+version 1.7, 2011/11/21 20:42:50
  Line 1
+ *> \brief \b DGEJSV
+ *
+ *  =========== DOCUMENTATION ===========
+ *
+ * Online html documentation available at
+ *            http://www.netlib.org/lapack/explore-html/
+ *
+ *> \htmlonly
+ *> Download DGEJSV + dependencies
+ *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/dgejsv.f">
+ *> [TGZ]</a>
+ *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/dgejsv.f">
+ *> [ZIP]</a>
+ *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/dgejsv.f">
+ *> [TXT]</a>
+ *> \endhtmlonly
+ *
+ *  Definition:
+ *  ===========
+ *
+ *       SUBROUTINE DGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
+ *                          M, N, A, LDA, SVA, U, LDU, V, LDV,
+ *                          WORK, LWORK, IWORK, INFO )
+ *
+ *       .. Scalar Arguments ..
+ *       IMPLICIT    NONE
+ *       INTEGER     INFO, LDA, LDU, LDV, LWORK, M, N
+ *       ..
+ *       .. Array Arguments ..
+ *       DOUBLE PRECISION A( LDA, * ), SVA( N ), U( LDU, * ), V( LDV, * ),
+ *      $            WORK( LWORK )
+ *       INTEGER     IWORK( * )
+ *       CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
+ *       ..
+ *
+ *
+ *> \par Purpose:
+ *  =============
+ *>
+ *> \verbatim
+ *>
+ *> DGEJSV computes the singular value decomposition (SVD) of a real M-by-N
+ *> matrix [A], where M >= N. The SVD of [A] is written as
+ *>
+ *>              [A] = [U] * [SIGMA] * [V]^t,
+ *>
+ *> where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
+ *> diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and
+ *> [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are
+ *> the singular values of [A]. The columns of [U] and [V] are the left and
+ *> the right singular vectors of [A], respectively. The matrices [U] and [V]
+ *> are computed and stored in the arrays U and V, respectively. The diagonal
+ *> of [SIGMA] is computed and stored in the array SVA.
+ *> \endverbatim
+ *
+ *  Arguments:
+ *  ==========
+ *
+ *> \param[in] JOBA
+ *> \verbatim
+ *>          JOBA is CHARACTER*1
+ *>        Specifies the level of accuracy:
+ *>       = 'C': This option works well (high relative accuracy) if A = B * D,
+ *>             with well-conditioned B and arbitrary diagonal matrix D.
+ *>             The accuracy cannot be spoiled by COLUMN scaling. The
+ *>             accuracy of the computed output depends on the condition of
+ *>             B, and the procedure aims at the best theoretical accuracy.
+ *>             The relative error max_{i=1:N}|d sigma_i| / sigma_i is
+ *>             bounded by f(M,N)*epsilon* cond(B), independent of D.
+ *>             The input matrix is preprocessed with the QRF with column
+ *>             pivoting. This initial preprocessing and preconditioning by
+ *>             a rank revealing QR factorization is common for all values of
+ *>             JOBA. Additional actions are specified as follows:
+ *>       = 'E': Computation as with 'C' with an additional estimate of the
+ *>             condition number of B. It provides a realistic error bound.
+ *>       = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
+ *>             D1, D2, and well-conditioned matrix C, this option gives
+ *>             higher accuracy than the 'C' option. If the structure of the
+ *>             input matrix is not known, and relative accuracy is
+ *>             desirable, then this option is advisable. The input matrix A
+ *>             is preprocessed with QR factorization with FULL (row and
+ *>             column) pivoting.
+ *>       = 'G'  Computation as with 'F' with an additional estimate of the
+ *>             condition number of B, where A=D*B. If A has heavily weighted
+ *>             rows, then using this condition number gives too pessimistic
+ *>             error bound.
+ *>       = 'A': Small singular values are the noise and the matrix is treated
+ *>             as numerically rank defficient. The error in the computed
+ *>             singular values is bounded by f(m,n)*epsilon*||A||.
+ *>             The computed SVD A = U * S * V^t restores A up to
+ *>             f(m,n)*epsilon*||A||.
+ *>             This gives the procedure the licence to discard (set to zero)
+ *>             all singular values below N*epsilon*||A||.
+ *>       = 'R': Similar as in 'A'. Rank revealing property of the initial
+ *>             QR factorization is used do reveal (using triangular factor)
+ *>             a gap sigma_{r+1} < epsilon * sigma_r in which case the
+ *>             numerical RANK is declared to be r. The SVD is computed with
+ *>             absolute error bounds, but more accurately than with 'A'.
+ *> \endverbatim
+ *>
+ *> \param[in] JOBU
+ *> \verbatim
+ *>          JOBU is CHARACTER*1
+ *>        Specifies whether to compute the columns of U:
+ *>       = 'U': N columns of U are returned in the array U.
+ *>       = 'F': full set of M left sing. vectors is returned in the array U.
+ *>       = 'W': U may be used as workspace of length M*N. See the description
+ *>             of U.
+ *>       = 'N': U is not computed.
+ *> \endverbatim
+ *>
+ *> \param[in] JOBV
+ *> \verbatim
+ *>          JOBV is CHARACTER*1
+ *>        Specifies whether to compute the matrix V:
+ *>       = 'V': N columns of V are returned in the array V; Jacobi rotations
+ *>             are not explicitly accumulated.
+ *>       = 'J': N columns of V are returned in the array V, but they are
+ *>             computed as the product of Jacobi rotations. This option is
+ *>             allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
+ *>       = 'W': V may be used as workspace of length N*N. See the description
+ *>             of V.
+ *>       = 'N': V is not computed.
+ *> \endverbatim
+ *>
+ *> \param[in] JOBR
+ *> \verbatim
+ *>          JOBR is CHARACTER*1
+ *>        Specifies the RANGE for the singular values. Issues the licence to
+ *>        set to zero small positive singular values if they are outside
+ *>        specified range. If A .NE. 0 is scaled so that the largest singular
+ *>        value of c*A is around DSQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
+ *>        the licence to kill columns of A whose norm in c*A is less than
+ *>        DSQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,
+ *>        where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
+ *>       = 'N': Do not kill small columns of c*A. This option assumes that
+ *>             BLAS and QR factorizations and triangular solvers are
+ *>             implemented to work in that range. If the condition of A
+ *>             is greater than BIG, use DGESVJ.
+ *>       = 'R': RESTRICTED range for sigma(c*A) is [DSQRT(SFMIN), DSQRT(BIG)]
+ *>             (roughly, as described above). This option is recommended.
+ *>                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ *>        For computing the singular values in the FULL range [SFMIN,BIG]
+ *>        use DGESVJ.
+ *> \endverbatim
+ *>
+ *> \param[in] JOBT
+ *> \verbatim
+ *>          JOBT is CHARACTER*1
+ *>        If the matrix is square then the procedure may determine to use
+ *>        transposed A if A^t seems to be better with respect to convergence.
+ *>        If the matrix is not square, JOBT is ignored. This is subject to
+ *>        changes in the future.
+ *>        The decision is based on two values of entropy over the adjoint
+ *>        orbit of A^t * A. See the descriptions of WORK(6) and WORK(7).
+ *>       = 'T': transpose if entropy test indicates possibly faster
+ *>        convergence of Jacobi process if A^t is taken as input. If A is
+ *>        replaced with A^t, then the row pivoting is included automatically.
+ *>       = 'N': do not speculate.
+ *>        This option can be used to compute only the singular values, or the
+ *>        full SVD (U, SIGMA and V). For only one set of singular vectors
+ *>        (U or V), the caller should provide both U and V, as one of the
+ *>        matrices is used as workspace if the matrix A is transposed.
+ *>        The implementer can easily remove this constraint and make the
+ *>        code more complicated. See the descriptions of U and V.
+ *> \endverbatim
+ *>
+ *> \param[in] JOBP
+ *> \verbatim
+ *>          JOBP is CHARACTER*1
+ *>        Issues the licence to introduce structured perturbations to drown
+ *>        denormalized numbers. This licence should be active if the
+ *>        denormals are poorly implemented, causing slow computation,
+ *>        especially in cases of fast convergence (!). For details see [1,2].
+ *>        For the sake of simplicity, this perturbations are included only
+ *>        when the full SVD or only the singular values are requested. The
+ *>        implementer/user can easily add the perturbation for the cases of
+ *>        computing one set of singular vectors.
+ *>       = 'P': introduce perturbation
+ *>       = 'N': do not perturb
+ *> \endverbatim
+ *>
+ *> \param[in] M
+ *> \verbatim
+ *>          M is INTEGER
+ *>         The number of rows of the input matrix A.  M >= 0.
+ *> \endverbatim
+ *>
+ *> \param[in] N
+ *> \verbatim
+ *>          N is INTEGER
+ *>         The number of columns of the input matrix A. M >= N >= 0.
+ *> \endverbatim
+ *>
+ *> \param[in,out] A
+ *> \verbatim
+ *>          A is DOUBLE PRECISION array, dimension (LDA,N)
+ *>          On entry, the M-by-N matrix A.
+ *> \endverbatim
+ *>
+ *> \param[in] LDA
+ *> \verbatim
+ *>          LDA is INTEGER
+ *>          The leading dimension of the array A.  LDA >= max(1,M).
+ *> \endverbatim
+ *>
+ *> \param[out] SVA
+ *> \verbatim
+ *>          SVA is DOUBLE PRECISION array, dimension (N)
+ *>          On exit,
+ *>          - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
+ *>            computation SVA contains Euclidean column norms of the
+ *>            iterated matrices in the array A.
+ *>          - For WORK(1) .NE. WORK(2): The singular values of A are
+ *>            (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
+ *>            sigma_max(A) overflows or if small singular values have been
+ *>            saved from underflow by scaling the input matrix A.
+ *>          - If JOBR='R' then some of the singular values may be returned
+ *>            as exact zeros obtained by "set to zero" because they are
+ *>            below the numerical rank threshold or are denormalized numbers.
+ *> \endverbatim
+ *>
+ *> \param[out] U
+ *> \verbatim
+ *>          U is DOUBLE PRECISION array, dimension ( LDU, N )
+ *>          If JOBU = 'U', then U contains on exit the M-by-N matrix of
+ *>                         the left singular vectors.
+ *>          If JOBU = 'F', then U contains on exit the M-by-M matrix of
+ *>                         the left singular vectors, including an ONB
+ *>                         of the orthogonal complement of the Range(A).
+ *>          If JOBU = 'W'  .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),
+ *>                         then U is used as workspace if the procedure
+ *>                         replaces A with A^t. In that case, [V] is computed
+ *>                         in U as left singular vectors of A^t and then
+ *>                         copied back to the V array. This 'W' option is just
+ *>                         a reminder to the caller that in this case U is
+ *>                         reserved as workspace of length N*N.
+ *>          If JOBU = 'N'  U is not referenced.
+ *> \endverbatim
+ *>
+ *> \param[in] LDU
+ *> \verbatim
+ *>          LDU is INTEGER
+ *>          The leading dimension of the array U,  LDU >= 1.
+ *>          IF  JOBU = 'U' or 'F' or 'W',  then LDU >= M.
+ *> \endverbatim
+ *>
+ *> \param[out] V
+ *> \verbatim
+ *>          V is DOUBLE PRECISION array, dimension ( LDV, N )
+ *>          If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
+ *>                         the right singular vectors;
+ *>          If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),
+ *>                         then V is used as workspace if the pprocedure
+ *>                         replaces A with A^t. In that case, [U] is computed
+ *>                         in V as right singular vectors of A^t and then
+ *>                         copied back to the U array. This 'W' option is just
+ *>                         a reminder to the caller that in this case V is
+ *>                         reserved as workspace of length N*N.
+ *>          If JOBV = 'N'  V is not referenced.
+ *> \endverbatim
+ *>
+ *> \param[in] LDV
+ *> \verbatim
+ *>          LDV is INTEGER
+ *>          The leading dimension of the array V,  LDV >= 1.
+ *>          If JOBV = 'V' or 'J' or 'W', then LDV >= N.
+ *> \endverbatim
+ *>
+ *> \param[out] WORK
+ *> \verbatim
+ *>          WORK is DOUBLE PRECISION array, dimension at least LWORK.
+ *>          On exit, if N.GT.0 .AND. M.GT.0 (else not referenced),
+ *>          WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such
+ *>                    that SCALE*SVA(1:N) are the computed singular values
+ *>                    of A. (See the description of SVA().)
+ *>          WORK(2) = See the description of WORK(1).
+ *>          WORK(3) = SCONDA is an estimate for the condition number of
+ *>                    column equilibrated A. (If JOBA .EQ. 'E' or 'G')
+ *>                    SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
+ *>                    It is computed using DPOCON. It holds
+ *>                    N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
+ *>                    where R is the triangular factor from the QRF of A.
+ *>                    However, if R is truncated and the numerical rank is
+ *>                    determined to be strictly smaller than N, SCONDA is
+ *>                    returned as -1, thus indicating that the smallest
+ *>                    singular values might be lost.
+ *>
+ *>          If full SVD is needed, the following two condition numbers are
+ *>          useful for the analysis of the algorithm. They are provied for
+ *>          a developer/implementer who is familiar with the details of
+ *>          the method.
+ *>
+ *>          WORK(4) = an estimate of the scaled condition number of the
+ *>                    triangular factor in the first QR factorization.
+ *>          WORK(5) = an estimate of the scaled condition number of the
+ *>                    triangular factor in the second QR factorization.
+ *>          The following two parameters are computed if JOBT .EQ. 'T'.
+ *>          They are provided for a developer/implementer who is familiar
+ *>          with the details of the method.
+ *>
+ *>          WORK(6) = the entropy of A^t*A :: this is the Shannon entropy
+ *>                    of diag(A^t*A) / Trace(A^t*A) taken as point in the
+ *>                    probability simplex.
+ *>          WORK(7) = the entropy of A*A^t.
+ *> \endverbatim
+ *>
+ *> \param[in] LWORK
+ *> \verbatim
+ *>          LWORK is INTEGER
+ *>          Length of WORK to confirm proper allocation of work space.
+ *>          LWORK depends on the job:
+ *>
+ *>          If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and
+ *>            -> .. no scaled condition estimate required (JOBE.EQ.'N'):
+ *>               LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement.
+ *>               ->> For optimal performance (blocked code) the optimal value
+ *>               is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal
+ *>               block size for DGEQP3 and DGEQRF.
+ *>               In general, optimal LWORK is computed as
+ *>               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 7).
+ *>            -> .. an estimate of the scaled condition number of A is
+ *>               required (JOBA='E', 'G'). In this case, LWORK is the maximum
+ *>               of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4*N,7).
+ *>               ->> For optimal performance (blocked code) the optimal value
+ *>               is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7).
+ *>               In general, the optimal length LWORK is computed as
+ *>               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF),
+ *>                                                     N+N*N+LWORK(DPOCON),7).
+ *>
+ *>          If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),
+ *>            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
+ *>            -> For optimal performance, LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
+ *>               where NB is the optimal block size for DGEQP3, DGEQRF, DGELQ,
+ *>               DORMLQ. In general, the optimal length LWORK is computed as
+ *>               LWORK >= max(2*M+N,N+LWORK(DGEQP3), N+LWORK(DPOCON),
+ *>                       N+LWORK(DGELQ), 2*N+LWORK(DGEQRF), N+LWORK(DORMLQ)).
+ *>
+ *>          If SIGMA and the left singular vectors are needed
+ *>            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
+ *>            -> For optimal performance:
+ *>               if JOBU.EQ.'U' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
+ *>               if JOBU.EQ.'F' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,N+M*NB,7),
+ *>               where NB is the optimal block size for DGEQP3, DGEQRF, DORMQR.
+ *>               In general, the optimal length LWORK is computed as
+ *>               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DPOCON),
+ *>                        2*N+LWORK(DGEQRF), N+LWORK(DORMQR)).
+ *>               Here LWORK(DORMQR) equals N*NB (for JOBU.EQ.'U') or
+ *>               M*NB (for JOBU.EQ.'F').
+ *>
+ *>          If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and
+ *>            -> if JOBV.EQ.'V'
+ *>               the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N).
+ *>            -> if JOBV.EQ.'J' the minimal requirement is
+ *>               LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6).
+ *>            -> For optimal performance, LWORK should be additionally
+ *>               larger than N+M*NB, where NB is the optimal block size
+ *>               for DORMQR.
+ *> \endverbatim
+ *>
+ *> \param[out] IWORK
+ *> \verbatim
+ *>          IWORK is INTEGER array, dimension M+3*N.
+ *>          On exit,
+ *>          IWORK(1) = the numerical rank determined after the initial
+ *>                     QR factorization with pivoting. See the descriptions
+ *>                     of JOBA and JOBR.
+ *>          IWORK(2) = the number of the computed nonzero singular values
+ *>          IWORK(3) = if nonzero, a warning message:
+ *>                     If IWORK(3).EQ.1 then some of the column norms of A
+ *>                     were denormalized floats. The requested high accuracy
+ *>                     is not warranted by the data.
+ *> \endverbatim
+ *>
+ *> \param[out] INFO
+ *> \verbatim
+ *>          INFO is INTEGER
+ *>           < 0  : if INFO = -i, then the i-th argument had an illegal value.
+ *>           = 0 :  successfull exit;
+ *>           > 0 :  DGEJSV  did not converge in the maximal allowed number
+ *>                  of sweeps. The computed values may be inaccurate.
+ *> \endverbatim
+ *
+ *  Authors:
+ *  ========
+ *
+ *> \author Univ. of Tennessee
+ *> \author Univ. of California Berkeley
+ *> \author Univ. of Colorado Denver
+ *> \author NAG Ltd.
+ *
+ *> \date November 2011
+ *
+ *> \ingroup doubleGEcomputational
+ *
+ *> \par Further Details:
+ *  =====================
+ *>
+ *> \verbatim
+ *>
+ *>  DGEJSV implements a preconditioned Jacobi SVD algorithm. It uses DGEQP3,
+ *>  DGEQRF, and DGELQF as preprocessors and preconditioners. Optionally, an
+ *>  additional row pivoting can be used as a preprocessor, which in some
+ *>  cases results in much higher accuracy. An example is matrix A with the
+ *>  structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
+ *>  diagonal matrices and C is well-conditioned matrix. In that case, complete
+ *>  pivoting in the first QR factorizations provides accuracy dependent on the
+ *>  condition number of C, and independent of D1, D2. Such higher accuracy is
+ *>  not completely understood theoretically, but it works well in practice.
+ *>  Further, if A can be written as A = B*D, with well-conditioned B and some
+ *>  diagonal D, then the high accuracy is guaranteed, both theoretically and
+ *>  in software, independent of D. For more details see [1], [2].
+ *>     The computational range for the singular values can be the full range
+ *>  ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
+ *>  & LAPACK routines called by DGEJSV are implemented to work in that range.
+ *>  If that is not the case, then the restriction for safe computation with
+ *>  the singular values in the range of normalized IEEE numbers is that the
+ *>  spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
+ *>  overflow. This code (DGEJSV) is best used in this restricted range,
+ *>  meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are
+ *>  returned as zeros. See JOBR for details on this.
+ *>     Further, this implementation is somewhat slower than the one described
+ *>  in [1,2] due to replacement of some non-LAPACK components, and because
+ *>  the choice of some tuning parameters in the iterative part (DGESVJ) is
+ *>  left to the implementer on a particular machine.
+ *>     The rank revealing QR factorization (in this code: DGEQP3) should be
+ *>  implemented as in [3]. We have a new version of DGEQP3 under development
+ *>  that is more robust than the current one in LAPACK, with a cleaner cut in
+ *>  rank defficient cases. It will be available in the SIGMA library [4].
+ *>  If M is much larger than N, it is obvious that the inital QRF with
+ *>  column pivoting can be preprocessed by the QRF without pivoting. That
+ *>  well known trick is not used in DGEJSV because in some cases heavy row
+ *>  weighting can be treated with complete pivoting. The overhead in cases
+ *>  M much larger than N is then only due to pivoting, but the benefits in
+ *>  terms of accuracy have prevailed. The implementer/user can incorporate
+ *>  this extra QRF step easily. The implementer can also improve data movement
+ *>  (matrix transpose, matrix copy, matrix transposed copy) - this
+ *>  implementation of DGEJSV uses only the simplest, naive data movement.
+ *> \endverbatim
+ *
+ *> \par Contributors:
+ *  ==================
+ *>
+ *>  Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
+ *
+ *> \par References:
+ *  ================
+ *>
+ *> \verbatim
+ *>
+ *> [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
+ *>     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
+ *>     LAPACK Working note 169.
+ *> [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
+ *>     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
+ *>     LAPACK Working note 170.
+ *> [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
+ *>     factorization software - a case study.
+ *>     ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
+ *>     LAPACK Working note 176.
+ *> [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
+ *>     QSVD, (H,K)-SVD computations.
+ *>     Department of Mathematics, University of Zagreb, 2008.
+ *> \endverbatim
+ *
+ *>  \par Bugs, examples and comments:
+ *   =================================
+ *>
+ *>  Please report all bugs and send interesting examples and/or comments to
+ *>  drmac@math.hr. Thank you.
+ *>
+ *  =====================================================================
        SUBROUTINE DGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
       $                   M, N, A, LDA, SVA, U, LDU, V, LDV,
       $                   WORK, LWORK, IWORK, INFO )
  *
- *  -- LAPACK routine (version 3.3.1)                                    --
+ *  -- LAPACK computational routine (version 3.4.0) --
- *
- *  -- Contributed by Zlatko Drmac of the University of Zagreb and     --
- *  -- Kresimir Veselic of the Fernuniversitaet Hagen                  --
- *  -- April 2011                                                      --
- *
  *  -- LAPACK is a software package provided by Univ. of Tennessee,    --
  *  -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
- *
+ *     November 2011
- * This routine is also part of SIGMA (version 1.23, October 23. 2008.)
- * SIGMA is a library of algorithms for highly accurate algorithms for
- * computation of SVD, PSVD, QSVD, (H,K)-SVD, and for solution of the
- * eigenvalue problems Hx = lambda M x, H M x = lambda x with H, M > 0.
  *
  *     .. Scalar Arguments ..
        IMPLICIT    NONE
- Line 27
+ Line 490
        CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
  *     ..
  *
- *  Purpose
- *  =======
- *
- *  DGEJSV computes the singular value decomposition (SVD) of a real M-by-N
- *  matrix [A], where M >= N. The SVD of [A] is written as
- *
- *               [A] = [U] * [SIGMA] * [V]^t,
- *
- *  where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
- *  diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and
- *  [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are
- *  the singular values of [A]. The columns of [U] and [V] are the left and
- *  the right singular vectors of [A], respectively. The matrices [U] and [V]
- *  are computed and stored in the arrays U and V, respectively. The diagonal
- *  of [SIGMA] is computed and stored in the array SVA.
- *
- *  Arguments
- *  =========
- *
- *  JOBA    (input) CHARACTER*1
- *        Specifies the level of accuracy:
- *       = 'C': This option works well (high relative accuracy) if A = B * D,
- *             with well-conditioned B and arbitrary diagonal matrix D.
- *             The accuracy cannot be spoiled by COLUMN scaling. The
- *             accuracy of the computed output depends on the condition of
- *             B, and the procedure aims at the best theoretical accuracy.
- *             The relative error max_{i=1:N}|d sigma_i| / sigma_i is
- *             bounded by f(M,N)*epsilon* cond(B), independent of D.
- *             The input matrix is preprocessed with the QRF with column
- *             pivoting. This initial preprocessing and preconditioning by
- *             a rank revealing QR factorization is common for all values of
- *             JOBA. Additional actions are specified as follows:
- *       = 'E': Computation as with 'C' with an additional estimate of the
- *             condition number of B. It provides a realistic error bound.
- *       = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
- *             D1, D2, and well-conditioned matrix C, this option gives
- *             higher accuracy than the 'C' option. If the structure of the
- *             input matrix is not known, and relative accuracy is
- *             desirable, then this option is advisable. The input matrix A
- *             is preprocessed with QR factorization with FULL (row and
- *             column) pivoting.
- *       = 'G'  Computation as with 'F' with an additional estimate of the
- *             condition number of B, where A=D*B. If A has heavily weighted
- *             rows, then using this condition number gives too pessimistic
- *             error bound.
- *       = 'A': Small singular values are the noise and the matrix is treated
- *             as numerically rank defficient. The error in the computed
- *             singular values is bounded by f(m,n)*epsilon*||A||.
- *             The computed SVD A = U * S * V^t restores A up to
- *             f(m,n)*epsilon*||A||.
- *             This gives the procedure the licence to discard (set to zero)
- *             all singular values below N*epsilon*||A||.
- *       = 'R': Similar as in 'A'. Rank revealing property of the initial
- *             QR factorization is used do reveal (using triangular factor)
- *             a gap sigma_{r+1} < epsilon * sigma_r in which case the
- *             numerical RANK is declared to be r. The SVD is computed with
- *             absolute error bounds, but more accurately than with 'A'.
- *
- *  JOBU    (input) CHARACTER*1
- *        Specifies whether to compute the columns of U:
- *       = 'U': N columns of U are returned in the array U.
- *       = 'F': full set of M left sing. vectors is returned in the array U.
- *       = 'W': U may be used as workspace of length M*N. See the description
- *             of U.
- *       = 'N': U is not computed.
- *
- *  JOBV    (input) CHARACTER*1
- *        Specifies whether to compute the matrix V:
- *       = 'V': N columns of V are returned in the array V; Jacobi rotations
- *             are not explicitly accumulated.
- *       = 'J': N columns of V are returned in the array V, but they are
- *             computed as the product of Jacobi rotations. This option is
- *             allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
- *       = 'W': V may be used as workspace of length N*N. See the description
- *             of V.
- *       = 'N': V is not computed.
- *
- *  JOBR    (input) CHARACTER*1
- *        Specifies the RANGE for the singular values. Issues the licence to
- *        set to zero small positive singular values if they are outside
- *        specified range. If A .NE. 0 is scaled so that the largest singular
- *        value of c*A is around DSQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
- *        the licence to kill columns of A whose norm in c*A is less than
- *        DSQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,
- *        where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
- *       = 'N': Do not kill small columns of c*A. This option assumes that
- *             BLAS and QR factorizations and triangular solvers are
- *             implemented to work in that range. If the condition of A
- *             is greater than BIG, use DGESVJ.
- *       = 'R': RESTRICTED range for sigma(c*A) is [DSQRT(SFMIN), DSQRT(BIG)]
- *             (roughly, as described above). This option is recommended.
- *                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~
- *        For computing the singular values in the FULL range [SFMIN,BIG]
- *        use DGESVJ.
- *
- *  JOBT    (input) CHARACTER*1
- *        If the matrix is square then the procedure may determine to use
- *        transposed A if A^t seems to be better with respect to convergence.
- *        If the matrix is not square, JOBT is ignored. This is subject to
- *        changes in the future.
- *        The decision is based on two values of entropy over the adjoint
- *        orbit of A^t * A. See the descriptions of WORK(6) and WORK(7).
- *       = 'T': transpose if entropy test indicates possibly faster
- *        convergence of Jacobi process if A^t is taken as input. If A is
- *        replaced with A^t, then the row pivoting is included automatically.
- *       = 'N': do not speculate.
- *        This option can be used to compute only the singular values, or the
- *        full SVD (U, SIGMA and V). For only one set of singular vectors
- *        (U or V), the caller should provide both U and V, as one of the
- *        matrices is used as workspace if the matrix A is transposed.
- *        The implementer can easily remove this constraint and make the
- *        code more complicated. See the descriptions of U and V.
- *
- *  JOBP    (input) CHARACTER*1
- *        Issues the licence to introduce structured perturbations to drown
- *        denormalized numbers. This licence should be active if the
- *        denormals are poorly implemented, causing slow computation,
- *        especially in cases of fast convergence (!). For details see [1,2].
- *        For the sake of simplicity, this perturbations are included only
- *        when the full SVD or only the singular values are requested. The
- *        implementer/user can easily add the perturbation for the cases of
- *        computing one set of singular vectors.
- *       = 'P': introduce perturbation
- *       = 'N': do not perturb
- *
- *  M       (input) INTEGER
- *         The number of rows of the input matrix A.  M >= 0.
- *
- *  N       (input) INTEGER
- *         The number of columns of the input matrix A. M >= N >= 0.
- *
- *  A       (input/workspace) DOUBLE PRECISION array, dimension (LDA,N)
- *          On entry, the M-by-N matrix A.
- *
- *  LDA     (input) INTEGER
- *          The leading dimension of the array A.  LDA >= max(1,M).
- *
- *  SVA     (workspace/output) DOUBLE PRECISION array, dimension (N)
- *          On exit,
- *          - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
- *            computation SVA contains Euclidean column norms of the
- *            iterated matrices in the array A.
- *          - For WORK(1) .NE. WORK(2): The singular values of A are
- *            (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
- *            sigma_max(A) overflows or if small singular values have been
- *            saved from underflow by scaling the input matrix A.
- *          - If JOBR='R' then some of the singular values may be returned
- *            as exact zeros obtained by "set to zero" because they are
- *            below the numerical rank threshold or are denormalized numbers.
- *
- *  U       (workspace/output) DOUBLE PRECISION array, dimension ( LDU, N )
- *          If JOBU = 'U', then U contains on exit the M-by-N matrix of
- *                         the left singular vectors.
- *          If JOBU = 'F', then U contains on exit the M-by-M matrix of
- *                         the left singular vectors, including an ONB
- *                         of the orthogonal complement of the Range(A).
- *          If JOBU = 'W'  .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),
- *                         then U is used as workspace if the procedure
- *                         replaces A with A^t. In that case, [V] is computed
- *                         in U as left singular vectors of A^t and then
- *                         copied back to the V array. This 'W' option is just
- *                         a reminder to the caller that in this case U is
- *                         reserved as workspace of length N*N.
- *          If JOBU = 'N'  U is not referenced.
- *
- * LDU      (input) INTEGER
- *          The leading dimension of the array U,  LDU >= 1.
- *          IF  JOBU = 'U' or 'F' or 'W',  then LDU >= M.
- *
- *  V       (workspace/output) DOUBLE PRECISION array, dimension ( LDV, N )
- *          If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
- *                         the right singular vectors;
- *          If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),
- *                         then V is used as workspace if the pprocedure
- *                         replaces A with A^t. In that case, [U] is computed
- *                         in V as right singular vectors of A^t and then
- *                         copied back to the U array. This 'W' option is just
- *                         a reminder to the caller that in this case V is
- *                         reserved as workspace of length N*N.
- *          If JOBV = 'N'  V is not referenced.
- *
- *  LDV     (input) INTEGER
- *          The leading dimension of the array V,  LDV >= 1.
- *          If JOBV = 'V' or 'J' or 'W', then LDV >= N.
- *
- *  WORK    (workspace/output) DOUBLE PRECISION array, dimension at least LWORK.
- *          On exit, if N.GT.0 .AND. M.GT.0 (else not referenced),
- *          WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such
- *                    that SCALE*SVA(1:N) are the computed singular values
- *                    of A. (See the description of SVA().)
- *          WORK(2) = See the description of WORK(1).
- *          WORK(3) = SCONDA is an estimate for the condition number of
- *                    column equilibrated A. (If JOBA .EQ. 'E' or 'G')
- *                    SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
- *                    It is computed using DPOCON. It holds
- *                    N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
- *                    where R is the triangular factor from the QRF of A.
- *                    However, if R is truncated and the numerical rank is
- *                    determined to be strictly smaller than N, SCONDA is
- *                    returned as -1, thus indicating that the smallest
- *                    singular values might be lost.
- *
- *          If full SVD is needed, the following two condition numbers are
- *          useful for the analysis of the algorithm. They are provied for
- *          a developer/implementer who is familiar with the details of
- *          the method.
- *
- *          WORK(4) = an estimate of the scaled condition number of the
- *                    triangular factor in the first QR factorization.
- *          WORK(5) = an estimate of the scaled condition number of the
- *                    triangular factor in the second QR factorization.
- *          The following two parameters are computed if JOBT .EQ. 'T'.
- *          They are provided for a developer/implementer who is familiar
- *          with the details of the method.
- *
- *          WORK(6) = the entropy of A^t*A :: this is the Shannon entropy
- *                    of diag(A^t*A) / Trace(A^t*A) taken as point in the
- *                    probability simplex.
- *          WORK(7) = the entropy of A*A^t.
- *
- *  LWORK   (input) INTEGER
- *          Length of WORK to confirm proper allocation of work space.
- *          LWORK depends on the job:
- *
- *          If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and
- *            -> .. no scaled condition estimate required (JOBE.EQ.'N'):
- *               LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement.
- *               ->> For optimal performance (blocked code) the optimal value
- *               is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal
- *               block size for DGEQP3 and DGEQRF.
- *               In general, optimal LWORK is computed as
- *               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 7).
- *            -> .. an estimate of the scaled condition number of A is
- *               required (JOBA='E', 'G'). In this case, LWORK is the maximum
- *               of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4*N,7).
- *               ->> For optimal performance (blocked code) the optimal value
- *               is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7).
- *               In general, the optimal length LWORK is computed as
- *               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF),
- *                                                     N+N*N+LWORK(DPOCON),7).
- *
- *          If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),
- *            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
- *            -> For optimal performance, LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
- *               where NB is the optimal block size for DGEQP3, DGEQRF, DGELQ,
- *               DORMLQ. In general, the optimal length LWORK is computed as
- *               LWORK >= max(2*M+N,N+LWORK(DGEQP3), N+LWORK(DPOCON),
- *                       N+LWORK(DGELQ), 2*N+LWORK(DGEQRF), N+LWORK(DORMLQ)).
- *
- *          If SIGMA and the left singular vectors are needed
- *            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
- *            -> For optimal performance:
- *               if JOBU.EQ.'U' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
- *               if JOBU.EQ.'F' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,N+M*NB,7),
- *               where NB is the optimal block size for DGEQP3, DGEQRF, DORMQR.
- *               In general, the optimal length LWORK is computed as
- *               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DPOCON),
- *                        2*N+LWORK(DGEQRF), N+LWORK(DORMQR)).
- *               Here LWORK(DORMQR) equals N*NB (for JOBU.EQ.'U') or
- *               M*NB (for JOBU.EQ.'F').
- *
- *          If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and
- *            -> if JOBV.EQ.'V'
- *               the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N).
- *            -> if JOBV.EQ.'J' the minimal requirement is
- *               LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6).
- *            -> For optimal performance, LWORK should be additionally
- *               larger than N+M*NB, where NB is the optimal block size
- *               for DORMQR.
- *
- *  IWORK   (workspace/output) INTEGER array, dimension M+3*N.
- *          On exit,
- *          IWORK(1) = the numerical rank determined after the initial
- *                     QR factorization with pivoting. See the descriptions
- *                     of JOBA and JOBR.
- *          IWORK(2) = the number of the computed nonzero singular values
- *          IWORK(3) = if nonzero, a warning message:
- *                     If IWORK(3).EQ.1 then some of the column norms of A
- *                     were denormalized floats. The requested high accuracy
- *                     is not warranted by the data.
- *
- *  INFO    (output) INTEGER
- *           < 0  : if INFO = -i, then the i-th argument had an illegal value.
- *           = 0 :  successfull exit;
- *           > 0 :  DGEJSV  did not converge in the maximal allowed number
- *                  of sweeps. The computed values may be inaccurate.
- *
- *  Further Details
- *  ===============
- *
- *  DGEJSV implements a preconditioned Jacobi SVD algorithm. It uses DGEQP3,
- *  DGEQRF, and DGELQF as preprocessors and preconditioners. Optionally, an
- *  additional row pivoting can be used as a preprocessor, which in some
- *  cases results in much higher accuracy. An example is matrix A with the
- *  structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
- *  diagonal matrices and C is well-conditioned matrix. In that case, complete
- *  pivoting in the first QR factorizations provides accuracy dependent on the
- *  condition number of C, and independent of D1, D2. Such higher accuracy is
- *  not completely understood theoretically, but it works well in practice.
- *  Further, if A can be written as A = B*D, with well-conditioned B and some
- *  diagonal D, then the high accuracy is guaranteed, both theoretically and
- *  in software, independent of D. For more details see [1], [2].
- *     The computational range for the singular values can be the full range
- *  ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
- *  & LAPACK routines called by DGEJSV are implemented to work in that range.
- *  If that is not the case, then the restriction for safe computation with
- *  the singular values in the range of normalized IEEE numbers is that the
- *  spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
- *  overflow. This code (DGEJSV) is best used in this restricted range,
- *  meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are
- *  returned as zeros. See JOBR for details on this.
- *     Further, this implementation is somewhat slower than the one described
- *  in [1,2] due to replacement of some non-LAPACK components, and because
- *  the choice of some tuning parameters in the iterative part (DGESVJ) is
- *  left to the implementer on a particular machine.
- *     The rank revealing QR factorization (in this code: DGEQP3) should be
- *  implemented as in [3]. We have a new version of DGEQP3 under development
- *  that is more robust than the current one in LAPACK, with a cleaner cut in
- *  rank defficient cases. It will be available in the SIGMA library [4].
- *  If M is much larger than N, it is obvious that the inital QRF with
- *  column pivoting can be preprocessed by the QRF without pivoting. That
- *  well known trick is not used in DGEJSV because in some cases heavy row
- *  weighting can be treated with complete pivoting. The overhead in cases
- *  M much larger than N is then only due to pivoting, but the benefits in
- *  terms of accuracy have prevailed. The implementer/user can incorporate
- *  this extra QRF step easily. The implementer can also improve data movement
- *  (matrix transpose, matrix copy, matrix transposed copy) - this
- *  implementation of DGEJSV uses only the simplest, naive data movement.
- *
- *  Contributors
- *
- *  Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
- *
- *  References
- *
- * [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
- *     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
- *     LAPACK Working note 169.
- * [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
- *     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
- *     LAPACK Working note 170.
- * [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
- *     factorization software - a case study.
- *     ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
- *     LAPACK Working note 176.
- * [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
- *     QSVD, (H,K)-SVD computations.
- *     Department of Mathematics, University of Zagreb, 2008.
- *
- *  Bugs, examples and comments
- *
- *  Please report all bugs and send interesting examples and/or comments to
- *  drmac@math.hr. Thank you.
- *
  *  ===========================================================================
  *
  *     .. Local Parameters ..

CVSweb interface <joel.bertrand@systella.fr>

Removed from v.1.6
changed lines
	Added in v.1.7