Diff for /rpl/lapack/lapack/dgejsv.f between versions 1.6 and 1.7

version 1.6, 2011/07/22 07:38:04 version 1.7, 2011/11/21 20:42:50
Line 1 Line 1
   *> \brief \b DGEJSV
   *
   *  =========== DOCUMENTATION ===========
   *
   * Online html documentation available at 
   *            http://www.netlib.org/lapack/explore-html/ 
   *
   *> \htmlonly
   *> Download DGEJSV + dependencies 
   *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/dgejsv.f"> 
   *> [TGZ]</a> 
   *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/dgejsv.f"> 
   *> [ZIP]</a> 
   *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/dgejsv.f"> 
   *> [TXT]</a>
   *> \endhtmlonly 
   *
   *  Definition:
   *  ===========
   *
   *       SUBROUTINE DGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
   *                          M, N, A, LDA, SVA, U, LDU, V, LDV,
   *                          WORK, LWORK, IWORK, INFO )
   * 
   *       .. Scalar Arguments ..
   *       IMPLICIT    NONE
   *       INTEGER     INFO, LDA, LDU, LDV, LWORK, M, N
   *       ..
   *       .. Array Arguments ..
   *       DOUBLE PRECISION A( LDA, * ), SVA( N ), U( LDU, * ), V( LDV, * ),
   *      $            WORK( LWORK )
   *       INTEGER     IWORK( * )
   *       CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
   *       ..
   *  
   *
   *> \par Purpose:
   *  =============
   *>
   *> \verbatim
   *>
   *> DGEJSV computes the singular value decomposition (SVD) of a real M-by-N
   *> matrix [A], where M >= N. The SVD of [A] is written as
   *>
   *>              [A] = [U] * [SIGMA] * [V]^t,
   *>
   *> where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
   *> diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and
   *> [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are
   *> the singular values of [A]. The columns of [U] and [V] are the left and
   *> the right singular vectors of [A], respectively. The matrices [U] and [V]
   *> are computed and stored in the arrays U and V, respectively. The diagonal
   *> of [SIGMA] is computed and stored in the array SVA.
   *> \endverbatim
   *
   *  Arguments:
   *  ==========
   *
   *> \param[in] JOBA
   *> \verbatim
   *>          JOBA is CHARACTER*1
   *>        Specifies the level of accuracy:
   *>       = 'C': This option works well (high relative accuracy) if A = B * D,
   *>             with well-conditioned B and arbitrary diagonal matrix D.
   *>             The accuracy cannot be spoiled by COLUMN scaling. The
   *>             accuracy of the computed output depends on the condition of
   *>             B, and the procedure aims at the best theoretical accuracy.
   *>             The relative error max_{i=1:N}|d sigma_i| / sigma_i is
   *>             bounded by f(M,N)*epsilon* cond(B), independent of D.
   *>             The input matrix is preprocessed with the QRF with column
   *>             pivoting. This initial preprocessing and preconditioning by
   *>             a rank revealing QR factorization is common for all values of
   *>             JOBA. Additional actions are specified as follows:
   *>       = 'E': Computation as with 'C' with an additional estimate of the
   *>             condition number of B. It provides a realistic error bound.
   *>       = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
   *>             D1, D2, and well-conditioned matrix C, this option gives
   *>             higher accuracy than the 'C' option. If the structure of the
   *>             input matrix is not known, and relative accuracy is
   *>             desirable, then this option is advisable. The input matrix A
   *>             is preprocessed with QR factorization with FULL (row and
   *>             column) pivoting.
   *>       = 'G'  Computation as with 'F' with an additional estimate of the
   *>             condition number of B, where A=D*B. If A has heavily weighted
   *>             rows, then using this condition number gives too pessimistic
   *>             error bound.
   *>       = 'A': Small singular values are the noise and the matrix is treated
   *>             as numerically rank defficient. The error in the computed
   *>             singular values is bounded by f(m,n)*epsilon*||A||.
   *>             The computed SVD A = U * S * V^t restores A up to
   *>             f(m,n)*epsilon*||A||.
   *>             This gives the procedure the licence to discard (set to zero)
   *>             all singular values below N*epsilon*||A||.
   *>       = 'R': Similar as in 'A'. Rank revealing property of the initial
   *>             QR factorization is used do reveal (using triangular factor)
   *>             a gap sigma_{r+1} < epsilon * sigma_r in which case the
   *>             numerical RANK is declared to be r. The SVD is computed with
   *>             absolute error bounds, but more accurately than with 'A'.
   *> \endverbatim
   *>
   *> \param[in] JOBU
   *> \verbatim
   *>          JOBU is CHARACTER*1
   *>        Specifies whether to compute the columns of U:
   *>       = 'U': N columns of U are returned in the array U.
   *>       = 'F': full set of M left sing. vectors is returned in the array U.
   *>       = 'W': U may be used as workspace of length M*N. See the description
   *>             of U.
   *>       = 'N': U is not computed.
   *> \endverbatim
   *>
   *> \param[in] JOBV
   *> \verbatim
   *>          JOBV is CHARACTER*1
   *>        Specifies whether to compute the matrix V:
   *>       = 'V': N columns of V are returned in the array V; Jacobi rotations
   *>             are not explicitly accumulated.
   *>       = 'J': N columns of V are returned in the array V, but they are
   *>             computed as the product of Jacobi rotations. This option is
   *>             allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
   *>       = 'W': V may be used as workspace of length N*N. See the description
   *>             of V.
   *>       = 'N': V is not computed.
   *> \endverbatim
   *>
   *> \param[in] JOBR
   *> \verbatim
   *>          JOBR is CHARACTER*1
   *>        Specifies the RANGE for the singular values. Issues the licence to
   *>        set to zero small positive singular values if they are outside
   *>        specified range. If A .NE. 0 is scaled so that the largest singular
   *>        value of c*A is around DSQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
   *>        the licence to kill columns of A whose norm in c*A is less than
   *>        DSQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,
   *>        where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
   *>       = 'N': Do not kill small columns of c*A. This option assumes that
   *>             BLAS and QR factorizations and triangular solvers are
   *>             implemented to work in that range. If the condition of A
   *>             is greater than BIG, use DGESVJ.
   *>       = 'R': RESTRICTED range for sigma(c*A) is [DSQRT(SFMIN), DSQRT(BIG)]
   *>             (roughly, as described above). This option is recommended.
   *>                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~
   *>        For computing the singular values in the FULL range [SFMIN,BIG]
   *>        use DGESVJ.
   *> \endverbatim
   *>
   *> \param[in] JOBT
   *> \verbatim
   *>          JOBT is CHARACTER*1
   *>        If the matrix is square then the procedure may determine to use
   *>        transposed A if A^t seems to be better with respect to convergence.
   *>        If the matrix is not square, JOBT is ignored. This is subject to
   *>        changes in the future.
   *>        The decision is based on two values of entropy over the adjoint
   *>        orbit of A^t * A. See the descriptions of WORK(6) and WORK(7).
   *>       = 'T': transpose if entropy test indicates possibly faster
   *>        convergence of Jacobi process if A^t is taken as input. If A is
   *>        replaced with A^t, then the row pivoting is included automatically.
   *>       = 'N': do not speculate.
   *>        This option can be used to compute only the singular values, or the
   *>        full SVD (U, SIGMA and V). For only one set of singular vectors
   *>        (U or V), the caller should provide both U and V, as one of the
   *>        matrices is used as workspace if the matrix A is transposed.
   *>        The implementer can easily remove this constraint and make the
   *>        code more complicated. See the descriptions of U and V.
   *> \endverbatim
   *>
   *> \param[in] JOBP
   *> \verbatim
   *>          JOBP is CHARACTER*1
   *>        Issues the licence to introduce structured perturbations to drown
   *>        denormalized numbers. This licence should be active if the
   *>        denormals are poorly implemented, causing slow computation,
   *>        especially in cases of fast convergence (!). For details see [1,2].
   *>        For the sake of simplicity, this perturbations are included only
   *>        when the full SVD or only the singular values are requested. The
   *>        implementer/user can easily add the perturbation for the cases of
   *>        computing one set of singular vectors.
   *>       = 'P': introduce perturbation
   *>       = 'N': do not perturb
   *> \endverbatim
   *>
   *> \param[in] M
   *> \verbatim
   *>          M is INTEGER
   *>         The number of rows of the input matrix A.  M >= 0.
   *> \endverbatim
   *>
   *> \param[in] N
   *> \verbatim
   *>          N is INTEGER
   *>         The number of columns of the input matrix A. M >= N >= 0.
   *> \endverbatim
   *>
   *> \param[in,out] A
   *> \verbatim
   *>          A is DOUBLE PRECISION array, dimension (LDA,N)
   *>          On entry, the M-by-N matrix A.
   *> \endverbatim
   *>
   *> \param[in] LDA
   *> \verbatim
   *>          LDA is INTEGER
   *>          The leading dimension of the array A.  LDA >= max(1,M).
   *> \endverbatim
   *>
   *> \param[out] SVA
   *> \verbatim
   *>          SVA is DOUBLE PRECISION array, dimension (N)
   *>          On exit,
   *>          - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
   *>            computation SVA contains Euclidean column norms of the
   *>            iterated matrices in the array A.
   *>          - For WORK(1) .NE. WORK(2): The singular values of A are
   *>            (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
   *>            sigma_max(A) overflows or if small singular values have been
   *>            saved from underflow by scaling the input matrix A.
   *>          - If JOBR='R' then some of the singular values may be returned
   *>            as exact zeros obtained by "set to zero" because they are
   *>            below the numerical rank threshold or are denormalized numbers.
   *> \endverbatim
   *>
   *> \param[out] U
   *> \verbatim
   *>          U is DOUBLE PRECISION array, dimension ( LDU, N )
   *>          If JOBU = 'U', then U contains on exit the M-by-N matrix of
   *>                         the left singular vectors.
   *>          If JOBU = 'F', then U contains on exit the M-by-M matrix of
   *>                         the left singular vectors, including an ONB
   *>                         of the orthogonal complement of the Range(A).
   *>          If JOBU = 'W'  .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),
   *>                         then U is used as workspace if the procedure
   *>                         replaces A with A^t. In that case, [V] is computed
   *>                         in U as left singular vectors of A^t and then
   *>                         copied back to the V array. This 'W' option is just
   *>                         a reminder to the caller that in this case U is
   *>                         reserved as workspace of length N*N.
   *>          If JOBU = 'N'  U is not referenced.
   *> \endverbatim
   *>
   *> \param[in] LDU
   *> \verbatim
   *>          LDU is INTEGER
   *>          The leading dimension of the array U,  LDU >= 1.
   *>          IF  JOBU = 'U' or 'F' or 'W',  then LDU >= M.
   *> \endverbatim
   *>
   *> \param[out] V
   *> \verbatim
   *>          V is DOUBLE PRECISION array, dimension ( LDV, N )
   *>          If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
   *>                         the right singular vectors;
   *>          If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),
   *>                         then V is used as workspace if the pprocedure
   *>                         replaces A with A^t. In that case, [U] is computed
   *>                         in V as right singular vectors of A^t and then
   *>                         copied back to the U array. This 'W' option is just
   *>                         a reminder to the caller that in this case V is
   *>                         reserved as workspace of length N*N.
   *>          If JOBV = 'N'  V is not referenced.
   *> \endverbatim
   *>
   *> \param[in] LDV
   *> \verbatim
   *>          LDV is INTEGER
   *>          The leading dimension of the array V,  LDV >= 1.
   *>          If JOBV = 'V' or 'J' or 'W', then LDV >= N.
   *> \endverbatim
   *>
   *> \param[out] WORK
   *> \verbatim
   *>          WORK is DOUBLE PRECISION array, dimension at least LWORK.
   *>          On exit, if N.GT.0 .AND. M.GT.0 (else not referenced), 
   *>          WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such
   *>                    that SCALE*SVA(1:N) are the computed singular values
   *>                    of A. (See the description of SVA().)
   *>          WORK(2) = See the description of WORK(1).
   *>          WORK(3) = SCONDA is an estimate for the condition number of
   *>                    column equilibrated A. (If JOBA .EQ. 'E' or 'G')
   *>                    SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
   *>                    It is computed using DPOCON. It holds
   *>                    N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
   *>                    where R is the triangular factor from the QRF of A.
   *>                    However, if R is truncated and the numerical rank is
   *>                    determined to be strictly smaller than N, SCONDA is
   *>                    returned as -1, thus indicating that the smallest
   *>                    singular values might be lost.
   *>
   *>          If full SVD is needed, the following two condition numbers are
   *>          useful for the analysis of the algorithm. They are provied for
   *>          a developer/implementer who is familiar with the details of
   *>          the method.
   *>
   *>          WORK(4) = an estimate of the scaled condition number of the
   *>                    triangular factor in the first QR factorization.
   *>          WORK(5) = an estimate of the scaled condition number of the
   *>                    triangular factor in the second QR factorization.
   *>          The following two parameters are computed if JOBT .EQ. 'T'.
   *>          They are provided for a developer/implementer who is familiar
   *>          with the details of the method.
   *>
   *>          WORK(6) = the entropy of A^t*A :: this is the Shannon entropy
   *>                    of diag(A^t*A) / Trace(A^t*A) taken as point in the
   *>                    probability simplex.
   *>          WORK(7) = the entropy of A*A^t.
   *> \endverbatim
   *>
   *> \param[in] LWORK
   *> \verbatim
   *>          LWORK is INTEGER
   *>          Length of WORK to confirm proper allocation of work space.
   *>          LWORK depends on the job:
   *>
   *>          If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and
   *>            -> .. no scaled condition estimate required (JOBE.EQ.'N'):
   *>               LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement.
   *>               ->> For optimal performance (blocked code) the optimal value
   *>               is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal
   *>               block size for DGEQP3 and DGEQRF.
   *>               In general, optimal LWORK is computed as 
   *>               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 7).        
   *>            -> .. an estimate of the scaled condition number of A is
   *>               required (JOBA='E', 'G'). In this case, LWORK is the maximum
   *>               of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4*N,7).
   *>               ->> For optimal performance (blocked code) the optimal value 
   *>               is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7).
   *>               In general, the optimal length LWORK is computed as
   *>               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 
   *>                                                     N+N*N+LWORK(DPOCON),7).
   *>
   *>          If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),
   *>            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
   *>            -> For optimal performance, LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
   *>               where NB is the optimal block size for DGEQP3, DGEQRF, DGELQ,
   *>               DORMLQ. In general, the optimal length LWORK is computed as
   *>               LWORK >= max(2*M+N,N+LWORK(DGEQP3), N+LWORK(DPOCON), 
   *>                       N+LWORK(DGELQ), 2*N+LWORK(DGEQRF), N+LWORK(DORMLQ)).
   *>
   *>          If SIGMA and the left singular vectors are needed
   *>            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
   *>            -> For optimal performance:
   *>               if JOBU.EQ.'U' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
   *>               if JOBU.EQ.'F' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,N+M*NB,7),
   *>               where NB is the optimal block size for DGEQP3, DGEQRF, DORMQR.
   *>               In general, the optimal length LWORK is computed as
   *>               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DPOCON),
   *>                        2*N+LWORK(DGEQRF), N+LWORK(DORMQR)). 
   *>               Here LWORK(DORMQR) equals N*NB (for JOBU.EQ.'U') or 
   *>               M*NB (for JOBU.EQ.'F').
   *>               
   *>          If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and 
   *>            -> if JOBV.EQ.'V'  
   *>               the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N). 
   *>            -> if JOBV.EQ.'J' the minimal requirement is 
   *>               LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6).
   *>            -> For optimal performance, LWORK should be additionally
   *>               larger than N+M*NB, where NB is the optimal block size
   *>               for DORMQR.
   *> \endverbatim
   *>
   *> \param[out] IWORK
   *> \verbatim
   *>          IWORK is INTEGER array, dimension M+3*N.
   *>          On exit,
   *>          IWORK(1) = the numerical rank determined after the initial
   *>                     QR factorization with pivoting. See the descriptions
   *>                     of JOBA and JOBR.
   *>          IWORK(2) = the number of the computed nonzero singular values
   *>          IWORK(3) = if nonzero, a warning message:
   *>                     If IWORK(3).EQ.1 then some of the column norms of A
   *>                     were denormalized floats. The requested high accuracy
   *>                     is not warranted by the data.
   *> \endverbatim
   *>
   *> \param[out] INFO
   *> \verbatim
   *>          INFO is INTEGER
   *>           < 0  : if INFO = -i, then the i-th argument had an illegal value.
   *>           = 0 :  successfull exit;
   *>           > 0 :  DGEJSV  did not converge in the maximal allowed number
   *>                  of sweeps. The computed values may be inaccurate.
   *> \endverbatim
   *
   *  Authors:
   *  ========
   *
   *> \author Univ. of Tennessee 
   *> \author Univ. of California Berkeley 
   *> \author Univ. of Colorado Denver 
   *> \author NAG Ltd. 
   *
   *> \date November 2011
   *
   *> \ingroup doubleGEcomputational
   *
   *> \par Further Details:
   *  =====================
   *>
   *> \verbatim
   *>
   *>  DGEJSV implements a preconditioned Jacobi SVD algorithm. It uses DGEQP3,
   *>  DGEQRF, and DGELQF as preprocessors and preconditioners. Optionally, an
   *>  additional row pivoting can be used as a preprocessor, which in some
   *>  cases results in much higher accuracy. An example is matrix A with the
   *>  structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
   *>  diagonal matrices and C is well-conditioned matrix. In that case, complete
   *>  pivoting in the first QR factorizations provides accuracy dependent on the
   *>  condition number of C, and independent of D1, D2. Such higher accuracy is
   *>  not completely understood theoretically, but it works well in practice.
   *>  Further, if A can be written as A = B*D, with well-conditioned B and some
   *>  diagonal D, then the high accuracy is guaranteed, both theoretically and
   *>  in software, independent of D. For more details see [1], [2].
   *>     The computational range for the singular values can be the full range
   *>  ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
   *>  & LAPACK routines called by DGEJSV are implemented to work in that range.
   *>  If that is not the case, then the restriction for safe computation with
   *>  the singular values in the range of normalized IEEE numbers is that the
   *>  spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
   *>  overflow. This code (DGEJSV) is best used in this restricted range,
   *>  meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are
   *>  returned as zeros. See JOBR for details on this.
   *>     Further, this implementation is somewhat slower than the one described
   *>  in [1,2] due to replacement of some non-LAPACK components, and because
   *>  the choice of some tuning parameters in the iterative part (DGESVJ) is
   *>  left to the implementer on a particular machine.
   *>     The rank revealing QR factorization (in this code: DGEQP3) should be
   *>  implemented as in [3]. We have a new version of DGEQP3 under development
   *>  that is more robust than the current one in LAPACK, with a cleaner cut in
   *>  rank defficient cases. It will be available in the SIGMA library [4].
   *>  If M is much larger than N, it is obvious that the inital QRF with
   *>  column pivoting can be preprocessed by the QRF without pivoting. That
   *>  well known trick is not used in DGEJSV because in some cases heavy row
   *>  weighting can be treated with complete pivoting. The overhead in cases
   *>  M much larger than N is then only due to pivoting, but the benefits in
   *>  terms of accuracy have prevailed. The implementer/user can incorporate
   *>  this extra QRF step easily. The implementer can also improve data movement
   *>  (matrix transpose, matrix copy, matrix transposed copy) - this
   *>  implementation of DGEJSV uses only the simplest, naive data movement.
   *> \endverbatim
   *
   *> \par Contributors:
   *  ==================
   *>
   *>  Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
   *
   *> \par References:
   *  ================
   *>
   *> \verbatim
   *>
   *> [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
   *>     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
   *>     LAPACK Working note 169.
   *> [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
   *>     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
   *>     LAPACK Working note 170.
   *> [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
   *>     factorization software - a case study.
   *>     ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
   *>     LAPACK Working note 176.
   *> [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
   *>     QSVD, (H,K)-SVD computations.
   *>     Department of Mathematics, University of Zagreb, 2008.
   *> \endverbatim
   *
   *>  \par Bugs, examples and comments:
   *   =================================
   *>
   *>  Please report all bugs and send interesting examples and/or comments to
   *>  drmac@math.hr. Thank you.
   *>
   *  =====================================================================
       SUBROUTINE DGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,        SUBROUTINE DGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
      $                   M, N, A, LDA, SVA, U, LDU, V, LDV,       $                   M, N, A, LDA, SVA, U, LDU, V, LDV,
      $                   WORK, LWORK, IWORK, INFO )       $                   WORK, LWORK, IWORK, INFO )
 *  *
 *  -- LAPACK routine (version 3.3.1)                                    --  *  -- LAPACK computational routine (version 3.4.0) --
 *  
 *  -- Contributed by Zlatko Drmac of the University of Zagreb and     --  
 *  -- Kresimir Veselic of the Fernuniversitaet Hagen                  --  
 *  -- April 2011                                                      --  
 *  
 *  -- LAPACK is a software package provided by Univ. of Tennessee,    --  *  -- LAPACK is a software package provided by Univ. of Tennessee,    --
 *  -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--  *  -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
 *  *     November 2011
 * This routine is also part of SIGMA (version 1.23, October 23. 2008.)  
 * SIGMA is a library of algorithms for highly accurate algorithms for  
 * computation of SVD, PSVD, QSVD, (H,K)-SVD, and for solution of the  
 * eigenvalue problems Hx = lambda M x, H M x = lambda x with H, M > 0.  
 *  *
 *     .. Scalar Arguments ..  *     .. Scalar Arguments ..
       IMPLICIT    NONE        IMPLICIT    NONE
Line 27 Line 490
       CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV        CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
 *     ..  *     ..
 *  *
 *  Purpose  
 *  =======  
 *  
 *  DGEJSV computes the singular value decomposition (SVD) of a real M-by-N  
 *  matrix [A], where M >= N. The SVD of [A] is written as  
 *  
 *               [A] = [U] * [SIGMA] * [V]^t,  
 *  
 *  where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N  
 *  diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and  
 *  [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are  
 *  the singular values of [A]. The columns of [U] and [V] are the left and  
 *  the right singular vectors of [A], respectively. The matrices [U] and [V]  
 *  are computed and stored in the arrays U and V, respectively. The diagonal  
 *  of [SIGMA] is computed and stored in the array SVA.  
 *  
 *  Arguments  
 *  =========  
 *  
 *  JOBA    (input) CHARACTER*1  
 *        Specifies the level of accuracy:  
 *       = 'C': This option works well (high relative accuracy) if A = B * D,  
 *             with well-conditioned B and arbitrary diagonal matrix D.  
 *             The accuracy cannot be spoiled by COLUMN scaling. The  
 *             accuracy of the computed output depends on the condition of  
 *             B, and the procedure aims at the best theoretical accuracy.  
 *             The relative error max_{i=1:N}|d sigma_i| / sigma_i is  
 *             bounded by f(M,N)*epsilon* cond(B), independent of D.  
 *             The input matrix is preprocessed with the QRF with column  
 *             pivoting. This initial preprocessing and preconditioning by  
 *             a rank revealing QR factorization is common for all values of  
 *             JOBA. Additional actions are specified as follows:  
 *       = 'E': Computation as with 'C' with an additional estimate of the  
 *             condition number of B. It provides a realistic error bound.  
 *       = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings  
 *             D1, D2, and well-conditioned matrix C, this option gives  
 *             higher accuracy than the 'C' option. If the structure of the  
 *             input matrix is not known, and relative accuracy is  
 *             desirable, then this option is advisable. The input matrix A  
 *             is preprocessed with QR factorization with FULL (row and  
 *             column) pivoting.  
 *       = 'G'  Computation as with 'F' with an additional estimate of the  
 *             condition number of B, where A=D*B. If A has heavily weighted  
 *             rows, then using this condition number gives too pessimistic  
 *             error bound.  
 *       = 'A': Small singular values are the noise and the matrix is treated  
 *             as numerically rank defficient. The error in the computed  
 *             singular values is bounded by f(m,n)*epsilon*||A||.  
 *             The computed SVD A = U * S * V^t restores A up to  
 *             f(m,n)*epsilon*||A||.  
 *             This gives the procedure the licence to discard (set to zero)  
 *             all singular values below N*epsilon*||A||.  
 *       = 'R': Similar as in 'A'. Rank revealing property of the initial  
 *             QR factorization is used do reveal (using triangular factor)  
 *             a gap sigma_{r+1} < epsilon * sigma_r in which case the  
 *             numerical RANK is declared to be r. The SVD is computed with  
 *             absolute error bounds, but more accurately than with 'A'.  
 *  
 *  JOBU    (input) CHARACTER*1  
 *        Specifies whether to compute the columns of U:  
 *       = 'U': N columns of U are returned in the array U.  
 *       = 'F': full set of M left sing. vectors is returned in the array U.  
 *       = 'W': U may be used as workspace of length M*N. See the description  
 *             of U.  
 *       = 'N': U is not computed.  
 *  
 *  JOBV    (input) CHARACTER*1  
 *        Specifies whether to compute the matrix V:  
 *       = 'V': N columns of V are returned in the array V; Jacobi rotations  
 *             are not explicitly accumulated.  
 *       = 'J': N columns of V are returned in the array V, but they are  
 *             computed as the product of Jacobi rotations. This option is  
 *             allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.  
 *       = 'W': V may be used as workspace of length N*N. See the description  
 *             of V.  
 *       = 'N': V is not computed.  
 *  
 *  JOBR    (input) CHARACTER*1  
 *        Specifies the RANGE for the singular values. Issues the licence to  
 *        set to zero small positive singular values if they are outside  
 *        specified range. If A .NE. 0 is scaled so that the largest singular  
 *        value of c*A is around DSQRT(BIG), BIG=SLAMCH('O'), then JOBR issues  
 *        the licence to kill columns of A whose norm in c*A is less than  
 *        DSQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,  
 *        where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').  
 *       = 'N': Do not kill small columns of c*A. This option assumes that  
 *             BLAS and QR factorizations and triangular solvers are  
 *             implemented to work in that range. If the condition of A  
 *             is greater than BIG, use DGESVJ.  
 *       = 'R': RESTRICTED range for sigma(c*A) is [DSQRT(SFMIN), DSQRT(BIG)]  
 *             (roughly, as described above). This option is recommended.  
 *                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~  
 *        For computing the singular values in the FULL range [SFMIN,BIG]  
 *        use DGESVJ.  
 *  
 *  JOBT    (input) CHARACTER*1  
 *        If the matrix is square then the procedure may determine to use  
 *        transposed A if A^t seems to be better with respect to convergence.  
 *        If the matrix is not square, JOBT is ignored. This is subject to  
 *        changes in the future.  
 *        The decision is based on two values of entropy over the adjoint  
 *        orbit of A^t * A. See the descriptions of WORK(6) and WORK(7).  
 *       = 'T': transpose if entropy test indicates possibly faster  
 *        convergence of Jacobi process if A^t is taken as input. If A is  
 *        replaced with A^t, then the row pivoting is included automatically.  
 *       = 'N': do not speculate.  
 *        This option can be used to compute only the singular values, or the  
 *        full SVD (U, SIGMA and V). For only one set of singular vectors  
 *        (U or V), the caller should provide both U and V, as one of the  
 *        matrices is used as workspace if the matrix A is transposed.  
 *        The implementer can easily remove this constraint and make the  
 *        code more complicated. See the descriptions of U and V.  
 *  
 *  JOBP    (input) CHARACTER*1  
 *        Issues the licence to introduce structured perturbations to drown  
 *        denormalized numbers. This licence should be active if the  
 *        denormals are poorly implemented, causing slow computation,  
 *        especially in cases of fast convergence (!). For details see [1,2].  
 *        For the sake of simplicity, this perturbations are included only  
 *        when the full SVD or only the singular values are requested. The  
 *        implementer/user can easily add the perturbation for the cases of  
 *        computing one set of singular vectors.  
 *       = 'P': introduce perturbation  
 *       = 'N': do not perturb  
 *  
 *  M       (input) INTEGER  
 *         The number of rows of the input matrix A.  M >= 0.  
 *  
 *  N       (input) INTEGER  
 *         The number of columns of the input matrix A. M >= N >= 0.  
 *  
 *  A       (input/workspace) DOUBLE PRECISION array, dimension (LDA,N)  
 *          On entry, the M-by-N matrix A.  
 *  
 *  LDA     (input) INTEGER  
 *          The leading dimension of the array A.  LDA >= max(1,M).  
 *  
 *  SVA     (workspace/output) DOUBLE PRECISION array, dimension (N)  
 *          On exit,  
 *          - For WORK(1)/WORK(2) = ONE: The singular values of A. During the  
 *            computation SVA contains Euclidean column norms of the  
 *            iterated matrices in the array A.  
 *          - For WORK(1) .NE. WORK(2): The singular values of A are  
 *            (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if  
 *            sigma_max(A) overflows or if small singular values have been  
 *            saved from underflow by scaling the input matrix A.  
 *          - If JOBR='R' then some of the singular values may be returned  
 *            as exact zeros obtained by "set to zero" because they are  
 *            below the numerical rank threshold or are denormalized numbers.  
 *  
 *  U       (workspace/output) DOUBLE PRECISION array, dimension ( LDU, N )  
 *          If JOBU = 'U', then U contains on exit the M-by-N matrix of  
 *                         the left singular vectors.  
 *          If JOBU = 'F', then U contains on exit the M-by-M matrix of  
 *                         the left singular vectors, including an ONB  
 *                         of the orthogonal complement of the Range(A).  
 *          If JOBU = 'W'  .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),  
 *                         then U is used as workspace if the procedure  
 *                         replaces A with A^t. In that case, [V] is computed  
 *                         in U as left singular vectors of A^t and then  
 *                         copied back to the V array. This 'W' option is just  
 *                         a reminder to the caller that in this case U is  
 *                         reserved as workspace of length N*N.  
 *          If JOBU = 'N'  U is not referenced.  
 *  
 * LDU      (input) INTEGER  
 *          The leading dimension of the array U,  LDU >= 1.  
 *          IF  JOBU = 'U' or 'F' or 'W',  then LDU >= M.  
 *  
 *  V       (workspace/output) DOUBLE PRECISION array, dimension ( LDV, N )  
 *          If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of  
 *                         the right singular vectors;  
 *          If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),  
 *                         then V is used as workspace if the pprocedure  
 *                         replaces A with A^t. In that case, [U] is computed  
 *                         in V as right singular vectors of A^t and then  
 *                         copied back to the U array. This 'W' option is just  
 *                         a reminder to the caller that in this case V is  
 *                         reserved as workspace of length N*N.  
 *          If JOBV = 'N'  V is not referenced.  
 *  
 *  LDV     (input) INTEGER  
 *          The leading dimension of the array V,  LDV >= 1.  
 *          If JOBV = 'V' or 'J' or 'W', then LDV >= N.  
 *  
 *  WORK    (workspace/output) DOUBLE PRECISION array, dimension at least LWORK.  
 *          On exit, if N.GT.0 .AND. M.GT.0 (else not referenced),   
 *          WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such  
 *                    that SCALE*SVA(1:N) are the computed singular values  
 *                    of A. (See the description of SVA().)  
 *          WORK(2) = See the description of WORK(1).  
 *          WORK(3) = SCONDA is an estimate for the condition number of  
 *                    column equilibrated A. (If JOBA .EQ. 'E' or 'G')  
 *                    SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).  
 *                    It is computed using DPOCON. It holds  
 *                    N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA  
 *                    where R is the triangular factor from the QRF of A.  
 *                    However, if R is truncated and the numerical rank is  
 *                    determined to be strictly smaller than N, SCONDA is  
 *                    returned as -1, thus indicating that the smallest  
 *                    singular values might be lost.  
 *  
 *          If full SVD is needed, the following two condition numbers are  
 *          useful for the analysis of the algorithm. They are provied for  
 *          a developer/implementer who is familiar with the details of  
 *          the method.  
 *  
 *          WORK(4) = an estimate of the scaled condition number of the  
 *                    triangular factor in the first QR factorization.  
 *          WORK(5) = an estimate of the scaled condition number of the  
 *                    triangular factor in the second QR factorization.  
 *          The following two parameters are computed if JOBT .EQ. 'T'.  
 *          They are provided for a developer/implementer who is familiar  
 *          with the details of the method.  
 *  
 *          WORK(6) = the entropy of A^t*A :: this is the Shannon entropy  
 *                    of diag(A^t*A) / Trace(A^t*A) taken as point in the  
 *                    probability simplex.  
 *          WORK(7) = the entropy of A*A^t.  
 *  
 *  LWORK   (input) INTEGER  
 *          Length of WORK to confirm proper allocation of work space.  
 *          LWORK depends on the job:  
 *  
 *          If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and  
 *            -> .. no scaled condition estimate required (JOBE.EQ.'N'):  
 *               LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement.  
 *               ->> For optimal performance (blocked code) the optimal value  
 *               is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal  
 *               block size for DGEQP3 and DGEQRF.  
 *               In general, optimal LWORK is computed as   
 *               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 7).          
 *            -> .. an estimate of the scaled condition number of A is  
 *               required (JOBA='E', 'G'). In this case, LWORK is the maximum  
 *               of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4*N,7).  
 *               ->> For optimal performance (blocked code) the optimal value   
 *               is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7).  
 *               In general, the optimal length LWORK is computed as  
 *               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF),   
 *                                                     N+N*N+LWORK(DPOCON),7).  
 *  
 *          If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),  
 *            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).  
 *            -> For optimal performance, LWORK >= max(2*M+N,3*N+(N+1)*NB,7),  
 *               where NB is the optimal block size for DGEQP3, DGEQRF, DGELQ,  
 *               DORMLQ. In general, the optimal length LWORK is computed as  
 *               LWORK >= max(2*M+N,N+LWORK(DGEQP3), N+LWORK(DPOCON),   
 *                       N+LWORK(DGELQ), 2*N+LWORK(DGEQRF), N+LWORK(DORMLQ)).  
 *  
 *          If SIGMA and the left singular vectors are needed  
 *            -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).  
 *            -> For optimal performance:  
 *               if JOBU.EQ.'U' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,7),  
 *               if JOBU.EQ.'F' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,N+M*NB,7),  
 *               where NB is the optimal block size for DGEQP3, DGEQRF, DORMQR.  
 *               In general, the optimal length LWORK is computed as  
 *               LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DPOCON),  
 *                        2*N+LWORK(DGEQRF), N+LWORK(DORMQR)).   
 *               Here LWORK(DORMQR) equals N*NB (for JOBU.EQ.'U') or   
 *               M*NB (for JOBU.EQ.'F').  
 *                 
 *          If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and   
 *            -> if JOBV.EQ.'V'    
 *               the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N).   
 *            -> if JOBV.EQ.'J' the minimal requirement is   
 *               LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6).  
 *            -> For optimal performance, LWORK should be additionally  
 *               larger than N+M*NB, where NB is the optimal block size  
 *               for DORMQR.  
 *  
 *  IWORK   (workspace/output) INTEGER array, dimension M+3*N.  
 *          On exit,  
 *          IWORK(1) = the numerical rank determined after the initial  
 *                     QR factorization with pivoting. See the descriptions  
 *                     of JOBA and JOBR.  
 *          IWORK(2) = the number of the computed nonzero singular values  
 *          IWORK(3) = if nonzero, a warning message:  
 *                     If IWORK(3).EQ.1 then some of the column norms of A  
 *                     were denormalized floats. The requested high accuracy  
 *                     is not warranted by the data.  
 *  
 *  INFO    (output) INTEGER  
 *           < 0  : if INFO = -i, then the i-th argument had an illegal value.  
 *           = 0 :  successfull exit;  
 *           > 0 :  DGEJSV  did not converge in the maximal allowed number  
 *                  of sweeps. The computed values may be inaccurate.  
 *  
 *  Further Details  
 *  ===============  
 *  
 *  DGEJSV implements a preconditioned Jacobi SVD algorithm. It uses DGEQP3,  
 *  DGEQRF, and DGELQF as preprocessors and preconditioners. Optionally, an  
 *  additional row pivoting can be used as a preprocessor, which in some  
 *  cases results in much higher accuracy. An example is matrix A with the  
 *  structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned  
 *  diagonal matrices and C is well-conditioned matrix. In that case, complete  
 *  pivoting in the first QR factorizations provides accuracy dependent on the  
 *  condition number of C, and independent of D1, D2. Such higher accuracy is  
 *  not completely understood theoretically, but it works well in practice.  
 *  Further, if A can be written as A = B*D, with well-conditioned B and some  
 *  diagonal D, then the high accuracy is guaranteed, both theoretically and  
 *  in software, independent of D. For more details see [1], [2].  
 *     The computational range for the singular values can be the full range  
 *  ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS  
 *  & LAPACK routines called by DGEJSV are implemented to work in that range.  
 *  If that is not the case, then the restriction for safe computation with  
 *  the singular values in the range of normalized IEEE numbers is that the  
 *  spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not  
 *  overflow. This code (DGEJSV) is best used in this restricted range,  
 *  meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are  
 *  returned as zeros. See JOBR for details on this.  
 *     Further, this implementation is somewhat slower than the one described  
 *  in [1,2] due to replacement of some non-LAPACK components, and because  
 *  the choice of some tuning parameters in the iterative part (DGESVJ) is  
 *  left to the implementer on a particular machine.  
 *     The rank revealing QR factorization (in this code: DGEQP3) should be  
 *  implemented as in [3]. We have a new version of DGEQP3 under development  
 *  that is more robust than the current one in LAPACK, with a cleaner cut in  
 *  rank defficient cases. It will be available in the SIGMA library [4].  
 *  If M is much larger than N, it is obvious that the inital QRF with  
 *  column pivoting can be preprocessed by the QRF without pivoting. That  
 *  well known trick is not used in DGEJSV because in some cases heavy row  
 *  weighting can be treated with complete pivoting. The overhead in cases  
 *  M much larger than N is then only due to pivoting, but the benefits in  
 *  terms of accuracy have prevailed. The implementer/user can incorporate  
 *  this extra QRF step easily. The implementer can also improve data movement  
 *  (matrix transpose, matrix copy, matrix transposed copy) - this  
 *  implementation of DGEJSV uses only the simplest, naive data movement.  
 *  
 *  Contributors  
 *  
 *  Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)  
 *  
 *  References  
 *  
 * [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.  
 *     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.  
 *     LAPACK Working note 169.  
 * [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.  
 *     SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.  
 *     LAPACK Working note 170.  
 * [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR  
 *     factorization software - a case study.  
 *     ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.  
 *     LAPACK Working note 176.  
 * [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,  
 *     QSVD, (H,K)-SVD computations.  
 *     Department of Mathematics, University of Zagreb, 2008.  
 *  
 *  Bugs, examples and comments  
 *  
 *  Please report all bugs and send interesting examples and/or comments to  
 *  drmac@math.hr. Thank you.  
 *  
 *  ===========================================================================  *  ===========================================================================
 *  *
 *     .. Local Parameters ..  *     .. Local Parameters ..

Removed from v.1.6  
changed lines
  Added in v.1.7


CVSweb interface <joel.bertrand@systella.fr>