--- rpl/lapack/lapack/zgejsv.f 2016/08/27 15:27:12 1.2 +++ rpl/lapack/lapack/zgejsv.f 2023/08/07 08:39:17 1.9 @@ -2,18 +2,18 @@ * * =========== DOCUMENTATION =========== * -* Online html documentation available at -* http://www.netlib.org/lapack/explore-html/ +* Online html documentation available at +* http://www.netlib.org/lapack/explore-html/ * *> \htmlonly -*> Download ZGEJSV + dependencies -*> -*> [TGZ] -*> -*> [ZIP] -*> +*> Download ZGEJSV + dependencies +*> +*> [TGZ] +*> +*> [ZIP] +*> *> [TXT] -*> \endhtmlonly +*> \endhtmlonly * * Definition: * =========== @@ -21,18 +21,18 @@ * SUBROUTINE ZGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP, * M, N, A, LDA, SVA, U, LDU, V, LDV, * CWORK, LWORK, RWORK, LRWORK, IWORK, INFO ) -* +* * .. Scalar Arguments .. * IMPLICIT NONE * INTEGER INFO, LDA, LDU, LDV, LWORK, M, N * .. * .. Array Arguments .. * COMPLEX*16 A( LDA, * ), U( LDU, * ), V( LDV, * ), CWORK( LWORK ) -* DOUBLE PRECISION SVA( N ), RWORK( LRWORK ) +* DOUBLE PRECISION SVA( N ), RWORK( LRWORK ) * INTEGER IWORK( * ) * CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV * .. -* +* * *> \par Purpose: * ============= @@ -80,12 +80,13 @@ *> desirable, then this option is advisable. The input matrix A *> is preprocessed with QR factorization with FULL (row and *> column) pivoting. -*> = 'G' Computation as with 'F' with an additional estimate of the -*> condition number of B, where A=D*B. If A has heavily weighted +*> = 'G': Computation as with 'F' with an additional estimate of the +*> condition number of B, where A=B*D. If A has heavily weighted *> rows, then using this condition number gives too pessimistic *> error bound. -*> = 'A': Small singular values are the noise and the matrix is treated -*> as numerically rank defficient. The error in the computed +*> = 'A': Small singular values are not well determined by the data +*> and are considered as noisy; the matrix is treated as +*> numerically rank deficient. The error in the computed *> singular values is bounded by f(m,n)*epsilon*||A||. *> The computed SVD A = U * S * V^* restores A up to *> f(m,n)*epsilon*||A||. @@ -97,7 +98,7 @@ *> numerical RANK is declared to be r. The SVD is computed with *> absolute error bounds, but more accurately than with 'A'. *> \endverbatim -*> +*> *> \param[in] JOBU *> \verbatim *> JOBU is CHARACTER*1 @@ -108,7 +109,7 @@ *> of U. *> = 'N': U is not computed. *> \endverbatim -*> +*> *> \param[in] JOBV *> \verbatim *> JOBV is CHARACTER*1 @@ -116,13 +117,12 @@ *> = 'V': N columns of V are returned in the array V; Jacobi rotations *> are not explicitly accumulated. *> = 'J': N columns of V are returned in the array V, but they are -*> computed as the product of Jacobi rotations. This option is -*> allowed only if JOBU .NE. 'N', i.e. in computing the full SVD. +*> computed as the product of Jacobi rotations, if JOBT = 'N'. *> = 'W': V may be used as workspace of length N*N. See the description *> of V. *> = 'N': V is not computed. *> \endverbatim -*> +*> *> \param[in] JOBR *> \verbatim *> JOBR is CHARACTER*1 @@ -131,7 +131,7 @@ *> specified range. If A .NE. 0 is scaled so that the largest singular *> value of c*A is around SQRT(BIG), BIG=DLAMCH('O'), then JOBR issues *> the licence to kill columns of A whose norm in c*A is less than -*> SQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN, +*> SQRT(SFMIN) (for JOBR = 'R'), or less than SMALL=SFMIN/EPSLN, *> where SFMIN=DLAMCH('S'), EPSLN=DLAMCH('E'). *> = 'N': Do not kill small columns of c*A. This option assumes that *> BLAS and QR factorizations and triangular solvers are @@ -143,28 +143,29 @@ *> For computing the singular values in the FULL range [SFMIN,BIG] *> use ZGESVJ. *> \endverbatim -*> +*> *> \param[in] JOBT *> \verbatim *> JOBT is CHARACTER*1 *> If the matrix is square then the procedure may determine to use *> transposed A if A^* seems to be better with respect to convergence. -*> If the matrix is not square, JOBT is ignored. This is subject to -*> changes in the future. +*> If the matrix is not square, JOBT is ignored. *> The decision is based on two values of entropy over the adjoint *> orbit of A^* * A. See the descriptions of WORK(6) and WORK(7). *> = 'T': transpose if entropy test indicates possibly faster *> convergence of Jacobi process if A^* is taken as input. If A is *> replaced with A^*, then the row pivoting is included automatically. *> = 'N': do not speculate. -*> This option can be used to compute only the singular values, or the -*> full SVD (U, SIGMA and V). For only one set of singular vectors +*> The option 'T' can be used to compute only the singular values, or +*> the full SVD (U, SIGMA and V). For only one set of singular vectors *> (U or V), the caller should provide both U and V, as one of the *> matrices is used as workspace if the matrix A is transposed. *> The implementer can easily remove this constraint and make the *> code more complicated. See the descriptions of U and V. +*> In general, this option is considered experimental, and 'N'; should +*> be preferred. This is subject to changes in the future. *> \endverbatim -*> +*> *> \param[in] JOBP *> \verbatim *> JOBP is CHARACTER*1 @@ -228,7 +229,7 @@ *> If JOBU = 'F', then U contains on exit the M-by-M matrix of *> the left singular vectors, including an ONB *> of the orthogonal complement of the Range(A). -*> If JOBU = 'W' .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N), +*> If JOBU = 'W' .AND. (JOBV = 'V' .AND. JOBT = 'T' .AND. M = N), *> then U is used as workspace if the procedure *> replaces A with A^*. In that case, [V] is computed *> in U as left singular vectors of A^* and then @@ -250,7 +251,7 @@ *> V is COMPLEX*16 array, dimension ( LDV, N ) *> If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of *> the right singular vectors; -*> If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N), +*> If JOBV = 'W', AND (JOBU = 'U' AND JOBT = 'T' AND M = N), *> then V is used as workspace if the pprocedure *> replaces A with A^*. In that case, [U] is computed *> in V as right singular vectors of A^* and then @@ -269,7 +270,10 @@ *> *> \param[out] CWORK *> \verbatim -*> CWORK is COMPLEX*16 array, dimension at least LWORK. +*> CWORK is COMPLEX*16 array, dimension (MAX(2,LWORK)) +*> If the call to ZGEJSV is a workspace query (indicated by LWORK=-1 or +*> LRWORK=-1), then on exit CWORK(1) contains the required length of +*> CWORK for the job parameters used in the call. *> \endverbatim *> *> \param[in] LWORK @@ -278,62 +282,83 @@ *> Length of CWORK to confirm proper allocation of workspace. *> LWORK depends on the job: *> -*> 1. If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and +*> 1. If only SIGMA is needed ( JOBU = 'N', JOBV = 'N' ) and *> 1.1 .. no scaled condition estimate required (JOBA.NE.'E'.AND.JOBA.NE.'G'): *> LWORK >= 2*N+1. This is the minimal requirement. *> ->> For optimal performance (blocked code) the optimal value *> is LWORK >= N + (N+1)*NB. Here NB is the optimal *> block size for ZGEQP3 and ZGEQRF. -*> In general, optimal LWORK is computed as -*> LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF)). +*> In general, optimal LWORK is computed as +*> LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF), LWORK(ZGESVJ)). *> 1.2. .. an estimate of the scaled condition number of A is *> required (JOBA='E', or 'G'). In this case, LWORK the minimal -*> requirement is LWORK >= N*N + 3*N. -*> ->> For optimal performance (blocked code) the optimal value -*> is LWORK >= max(N+(N+1)*NB, N*N+3*N). +*> requirement is LWORK >= N*N + 2*N. +*> ->> For optimal performance (blocked code) the optimal value +*> is LWORK >= max(N+(N+1)*NB, N*N+2*N)=N**2+2*N. *> In general, the optimal length LWORK is computed as -*> LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF), -*> N+N*N+LWORK(ZPOCON)). -*> -*> 2. If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'), -*> (JOBU.EQ.'N') +*> LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZGEQRF), LWORK(ZGESVJ), +*> N*N+LWORK(ZPOCON)). +*> 2. If SIGMA and the right singular vectors are needed (JOBV = 'V'), +*> (JOBU = 'N') +*> 2.1 .. no scaled condition estimate requested (JOBE = 'N'): *> -> the minimal requirement is LWORK >= 3*N. -*> -> For optimal performance, LWORK >= max(N+(N+1)*NB, 3*N,2*N+N*NB), -*> where NB is the optimal block size for ZGEQP3, ZGEQRF, ZGELQF, +*> -> For optimal performance, +*> LWORK >= max(N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB, +*> where NB is the optimal block size for ZGEQP3, ZGEQRF, ZGELQ, *> ZUNMLQ. In general, the optimal length LWORK is computed as -*> LWORK >= max(N+LWORK(ZGEQP3), N+LWORK(ZPOCON), N+LWORK(ZGESVJ), +*> LWORK >= max(N+LWORK(ZGEQP3), N+LWORK(ZGESVJ), *> N+LWORK(ZGELQF), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMLQ)). -*> +*> 2.2 .. an estimate of the scaled condition number of A is +*> required (JOBA='E', or 'G'). +*> -> the minimal requirement is LWORK >= 3*N. +*> -> For optimal performance, +*> LWORK >= max(N+(N+1)*NB, 2*N,2*N+N*NB)=2*N+N*NB, +*> where NB is the optimal block size for ZGEQP3, ZGEQRF, ZGELQ, +*> ZUNMLQ. In general, the optimal length LWORK is computed as +*> LWORK >= max(N+LWORK(ZGEQP3), LWORK(ZPOCON), N+LWORK(ZGESVJ), +*> N+LWORK(ZGELQF), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMLQ)). *> 3. If SIGMA and the left singular vectors are needed +*> 3.1 .. no scaled condition estimate requested (JOBE = 'N'): *> -> the minimal requirement is LWORK >= 3*N. *> -> For optimal performance: -*> if JOBU.EQ.'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB), +*> if JOBU = 'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB, +*> where NB is the optimal block size for ZGEQP3, ZGEQRF, ZUNMQR. +*> In general, the optimal length LWORK is computed as +*> LWORK >= max(N+LWORK(ZGEQP3), 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMQR)). +*> 3.2 .. an estimate of the scaled condition number of A is +*> required (JOBA='E', or 'G'). +*> -> the minimal requirement is LWORK >= 3*N. +*> -> For optimal performance: +*> if JOBU = 'U' :: LWORK >= max(3*N, N+(N+1)*NB, 2*N+N*NB)=2*N+N*NB, *> where NB is the optimal block size for ZGEQP3, ZGEQRF, ZUNMQR. *> In general, the optimal length LWORK is computed as *> LWORK >= max(N+LWORK(ZGEQP3),N+LWORK(ZPOCON), -*> 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMQR)). -*> -*> 4. If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and -*> 4.1. if JOBV.EQ.'V' +*> 2*N+LWORK(ZGEQRF), N+LWORK(ZUNMQR)). +*> 4. If the full SVD is needed: (JOBU = 'U' or JOBU = 'F') and +*> 4.1. if JOBV = 'V' *> the minimal requirement is LWORK >= 5*N+2*N*N. -*> 4.2. if JOBV.EQ.'J' the minimal requirement is +*> 4.2. if JOBV = 'J' the minimal requirement is *> LWORK >= 4*N+N*N. *> In both cases, the allocated CWORK can accommodate blocked runs -*> of ZGEQP3, ZGEQRF, ZGELQF, ZUNMQR, ZUNMLQ. +*> of ZGEQP3, ZGEQRF, ZGELQF, SUNMQR, ZUNMLQ. +*> +*> If the call to ZGEJSV is a workspace query (indicated by LWORK=-1 or +*> LRWORK=-1), then on exit CWORK(1) contains the optimal and CWORK(2) contains the +*> minimal length of CWORK for the job parameters used in the call. *> \endverbatim *> *> \param[out] RWORK *> \verbatim -*> RWORK is DOUBLE PRECISION array, dimension at least LRWORK. +*> RWORK is DOUBLE PRECISION array, dimension (MAX(7,LWORK)) *> On exit, *> RWORK(1) = Determines the scaling factor SCALE = RWORK(2) / RWORK(1) *> such that SCALE*SVA(1:N) are the computed singular values *> of A. (See the description of SVA().) *> RWORK(2) = See the description of RWORK(1). *> RWORK(3) = SCONDA is an estimate for the condition number of -*> column equilibrated A. (If JOBA .EQ. 'E' or 'G') +*> column equilibrated A. (If JOBA = 'E' or 'G') *> SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1). -*> It is computed using SPOCON. It holds +*> It is computed using ZPOCON. It holds *> N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA *> where R is the triangular factor from the QRF of A. *> However, if R is truncated and the numerical rank is @@ -342,7 +367,7 @@ *> singular values might be lost. *> *> If full SVD is needed, the following two condition numbers are -*> useful for the analysis of the algorithm. They are provied for +*> useful for the analysis of the algorithm. They are provided for *> a developer/implementer who is familiar with the details of *> the method. *> @@ -350,13 +375,16 @@ *> triangular factor in the first QR factorization. *> RWORK(5) = an estimate of the scaled condition number of the *> triangular factor in the second QR factorization. -*> The following two parameters are computed if JOBT .EQ. 'T'. +*> The following two parameters are computed if JOBT = 'T'. *> They are provided for a developer/implementer who is familiar *> with the details of the method. *> RWORK(6) = the entropy of A^* * A :: this is the Shannon entropy *> of diag(A^* * A) / Trace(A^* * A) taken as point in the *> probability simplex. *> RWORK(7) = the entropy of A * A^*. (See the description of RWORK(6).) +*> If the call to ZGEJSV is a workspace query (indicated by LWORK=-1 or +*> LRWORK=-1), then on exit RWORK(1) contains the required length of +*> RWORK for the job parameters used in the call. *> \endverbatim *> *> \param[in] LRWORK @@ -365,75 +393,95 @@ *> Length of RWORK to confirm proper allocation of workspace. *> LRWORK depends on the job: *> -*> 1. If only singular values are requested i.e. if -*> LSAME(JOBU,'N') .AND. LSAME(JOBV,'N') +*> 1. If only the singular values are requested i.e. if +*> LSAME(JOBU,'N') .AND. LSAME(JOBV,'N') *> then: *> 1.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'), -*> then LRWORK = max( 7, N + 2 * M ). -*> 1.2. Otherwise, LRWORK = max( 7, 2 * N ). +*> then: LRWORK = max( 7, 2 * M ). +*> 1.2. Otherwise, LRWORK = max( 7, N ). *> 2. If singular values with the right singular vectors are requested -*> i.e. if -*> (LSAME(JOBV,'V').OR.LSAME(JOBV,'J')) .AND. +*> i.e. if +*> (LSAME(JOBV,'V').OR.LSAME(JOBV,'J')) .AND. *> .NOT.(LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) *> then: *> 2.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'), -*> then LRWORK = max( 7, N + 2 * M ). -*> 2.2. Otherwise, LRWORK = max( 7, 2 * N ). -*> 3. If singular values with the left singular vectors are requested, i.e. if +*> then LRWORK = max( 7, 2 * M ). +*> 2.2. Otherwise, LRWORK = max( 7, N ). +*> 3. If singular values with the left singular vectors are requested, i.e. if *> (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND. *> .NOT.(LSAME(JOBV,'V').OR.LSAME(JOBV,'J')) *> then: *> 3.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'), -*> then LRWORK = max( 7, N + 2 * M ). -*> 3.2. Otherwise, LRWORK = max( 7, 2 * N ). -*> 4. If singular values with both the left and the right singular vectors -*> are requested, i.e. if +*> then LRWORK = max( 7, 2 * M ). +*> 3.2. Otherwise, LRWORK = max( 7, N ). +*> 4. If singular values with both the left and the right singular vectors +*> are requested, i.e. if *> (LSAME(JOBU,'U').OR.LSAME(JOBU,'F')) .AND. *> (LSAME(JOBV,'V').OR.LSAME(JOBV,'J')) *> then: *> 4.1. If LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'), -*> then LRWORK = max( 7, N + 2 * M ). -*> 4.2. Otherwise, LRWORK = max( 7, 2 * N ). +*> then LRWORK = max( 7, 2 * M ). +*> 4.2. Otherwise, LRWORK = max( 7, N ). +*> +*> If, on entry, LRWORK = -1 or LWORK=-1, a workspace query is assumed and +*> the length of RWORK is returned in RWORK(1). *> \endverbatim -*> +*> *> \param[out] IWORK *> \verbatim -*> IWORK is INTEGER array, of dimension: -*> If LSAME(JOBA,'F') .OR. LSAME(JOBA,'G'), then -*> the dimension of IWORK is max( 3, 2 * N + M ). -*> Otherwise, the dimension of IWORK is -*> -> max( 3, 2*N ) for full SVD -*> -> max( 3, N ) for singular values only or singular -*> values with one set of singular vectors (left or right) +*> IWORK is INTEGER array, of dimension at least 4, that further depends +*> on the job: +*> +*> 1. If only the singular values are requested then: +*> If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) +*> then the length of IWORK is N+M; otherwise the length of IWORK is N. +*> 2. If the singular values and the right singular vectors are requested then: +*> If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) +*> then the length of IWORK is N+M; otherwise the length of IWORK is N. +*> 3. If the singular values and the left singular vectors are requested then: +*> If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) +*> then the length of IWORK is N+M; otherwise the length of IWORK is N. +*> 4. If the singular values with both the left and the right singular vectors +*> are requested, then: +*> 4.1. If LSAME(JOBV,'J') the length of IWORK is determined as follows: +*> If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) +*> then the length of IWORK is N+M; otherwise the length of IWORK is N. +*> 4.2. If LSAME(JOBV,'V') the length of IWORK is determined as follows: +*> If ( LSAME(JOBT,'T') .OR. LSAME(JOBA,'F') .OR. LSAME(JOBA,'G') ) +*> then the length of IWORK is 2*N+M; otherwise the length of IWORK is 2*N. +*> *> On exit, *> IWORK(1) = the numerical rank determined after the initial *> QR factorization with pivoting. See the descriptions *> of JOBA and JOBR. *> IWORK(2) = the number of the computed nonzero singular values *> IWORK(3) = if nonzero, a warning message: -*> If IWORK(3).EQ.1 then some of the column norms of A +*> If IWORK(3) = 1 then some of the column norms of A *> were denormalized floats. The requested high accuracy *> is not warranted by the data. +*> IWORK(4) = 1 or -1. If IWORK(4) = 1, then the procedure used A^* to +*> do the job as specified by the JOB parameters. +*> If the call to ZGEJSV is a workspace query (indicated by LWORK = -1 or +*> LRWORK = -1), then on exit IWORK(1) contains the required length of +*> IWORK for the job parameters used in the call. *> \endverbatim *> *> \param[out] INFO *> \verbatim *> INFO is INTEGER -*> < 0 : if INFO = -i, then the i-th argument had an illegal value. -*> = 0 : successfull exit; -*> > 0 : ZGEJSV did not converge in the maximal allowed number -*> of sweeps. The computed values may be inaccurate. +*> < 0: if INFO = -i, then the i-th argument had an illegal value. +*> = 0: successful exit; +*> > 0: ZGEJSV did not converge in the maximal allowed number +*> of sweeps. The computed values may be inaccurate. *> \endverbatim * * Authors: * ======== * -*> \author Univ. of Tennessee -*> \author Univ. of California Berkeley -*> \author Univ. of Colorado Denver -*> \author NAG Ltd. -* -*> \date June 2016 +*> \author Univ. of Tennessee +*> \author Univ. of California Berkeley +*> \author Univ. of Colorado Denver +*> \author NAG Ltd. * *> \ingroup complex16GEsing * @@ -470,8 +518,8 @@ *> The rank revealing QR factorization (in this code: ZGEQP3) should be *> implemented as in [3]. We have a new version of ZGEQP3 under development *> that is more robust than the current one in LAPACK, with a cleaner cut in -*> rank defficient cases. It will be available in the SIGMA library [4]. -*> If M is much larger than N, it is obvious that the inital QRF with +*> rank deficient cases. It will be available in the SIGMA library [4]. +*> If M is much larger than N, it is obvious that the initial QRF with *> column pivoting can be preprocessed by the QRF without pivoting. That *> well known trick is not used in ZGEJSV because in some cases heavy row *> weighting can be treated with complete pivoting. The overhead in cases @@ -480,11 +528,13 @@ *> this extra QRF step easily. The implementer can also improve data movement *> (matrix transpose, matrix copy, matrix transposed copy) - this *> implementation of ZGEJSV uses only the simplest, naive data movement. +*> \endverbatim * -*> \par Contributors: +*> \par Contributor: * ================== *> -*> Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany) +*> Zlatko Drmac, Department of Mathematics, Faculty of Science, +*> University of Zagreb (Zagreb, Croatia); drmac@math.hr * *> \par References: * ================ @@ -503,7 +553,7 @@ *> LAPACK Working note 176. *> [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV, *> QSVD, (H,K)-SVD computations. -*> Department of Mathematics, University of Zagreb, 2008. +*> Department of Mathematics, University of Zagreb, 2008, 2016. *> \endverbatim * *> \par Bugs, examples and comments: @@ -517,19 +567,18 @@ $ M, N, A, LDA, SVA, U, LDU, V, LDV, $ CWORK, LWORK, RWORK, LRWORK, IWORK, INFO ) * -* -- LAPACK computational routine (version 3.6.1) -- +* -- LAPACK computational routine -- * -- LAPACK is a software package provided by Univ. of Tennessee, -- * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..-- -* June 2016 * * .. Scalar Arguments .. IMPLICIT NONE INTEGER INFO, LDA, LDU, LDV, LWORK, LRWORK, M, N * .. * .. Array Arguments .. - COMPLEX*16 A( LDA, * ), U( LDU, * ), V( LDV, * ), + COMPLEX*16 A( LDA, * ), U( LDU, * ), V( LDV, * ), $ CWORK( LWORK ) - DOUBLE PRECISION SVA( N ), RWORK( * ) + DOUBLE PRECISION SVA( N ), RWORK( LRWORK ) INTEGER IWORK( * ) CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV * .. @@ -537,25 +586,35 @@ * =========================================================================== * * .. Local Parameters .. - DOUBLE PRECISION ZERO, ONE + DOUBLE PRECISION ZERO, ONE PARAMETER ( ZERO = 0.0D0, ONE = 1.0D0 ) - COMPLEX*16 CZERO, CONE - PARAMETER ( CZERO = ( 0.0D0, 0.0D0 ), CONE = ( 1.0D0, 0.0D0 ) ) + COMPLEX*16 CZERO, CONE + PARAMETER ( CZERO = ( 0.0D0, 0.0D0 ), CONE = ( 1.0D0, 0.0D0 ) ) * .. * .. Local Scalars .. COMPLEX*16 CTEMP - DOUBLE PRECISION AAPP, AAQQ, AATMAX, AATMIN, BIG, BIG1, - $ COND_OK, CONDR1, CONDR2, ENTRA, ENTRAT, EPSLN, - $ MAXPRJ, SCALEM, SCONDA, SFMIN, SMALL, TEMP1, + DOUBLE PRECISION AAPP, AAQQ, AATMAX, AATMIN, BIG, BIG1, + $ COND_OK, CONDR1, CONDR2, ENTRA, ENTRAT, EPSLN, + $ MAXPRJ, SCALEM, SCONDA, SFMIN, SMALL, TEMP1, $ USCAL1, USCAL2, XSC INTEGER IERR, N1, NR, NUMRANK, p, q, WARNING - LOGICAL ALMORT, DEFR, ERREST, GOSCAL, JRACC, KILL, LSVEC, - $ L2ABER, L2KILL, L2PERT, L2RANK, L2TRAN, - $ NOSCAL, ROWPIV, RSVEC, TRANSP + LOGICAL ALMORT, DEFR, ERREST, GOSCAL, JRACC, KILL, LQUERY, + $ LSVEC, L2ABER, L2KILL, L2PERT, L2RANK, L2TRAN, NOSCAL, + $ ROWPIV, RSVEC, TRANSP +* + INTEGER OPTWRK, MINWRK, MINRWRK, MINIWRK + INTEGER LWCON, LWLQF, LWQP3, LWQRF, LWUNMLQ, LWUNMQR, LWUNMQRM, + $ LWSVDJ, LWSVDJV, LRWQP3, LRWCON, LRWSVDJ, IWOFF + INTEGER LWRK_ZGELQF, LWRK_ZGEQP3, LWRK_ZGEQP3N, LWRK_ZGEQRF, + $ LWRK_ZGESVJ, LWRK_ZGESVJV, LWRK_ZGESVJU, LWRK_ZUNMLQ, + $ LWRK_ZUNMQR, LWRK_ZUNMQRM * .. +* .. Local Arrays + COMPLEX*16 CDUMMY(1) + DOUBLE PRECISION RDUMMY(1) +* * .. Intrinsic Functions .. - INTRINSIC ABS, DCMPLX, DCONJG, DLOG, DMAX1, DMIN1, DBLE, - $ MAX0, MIN0, NINT, DSQRT + INTRINSIC ABS, DCMPLX, CONJG, DLOG, MAX, MIN, DBLE, NINT, SQRT * .. * .. External Functions .. DOUBLE PRECISION DLAMCH, DZNRM2 @@ -564,16 +623,16 @@ EXTERNAL IDAMAX, IZAMAX, LSAME, DLAMCH, DZNRM2 * .. * .. External Subroutines .. - EXTERNAL DLASSQ, ZCOPY, ZGELQF, ZGEQP3, ZGEQRF, ZLACPY, ZLASCL, - $ DLASCL, ZLASET, ZLASSQ, ZLASWP, ZUNGQR, ZUNMLQ, - $ ZUNMQR, ZPOCON, DSCAL, ZDSCAL, ZSWAP, ZTRSM, XERBLA + EXTERNAL DLASSQ, ZCOPY, ZGELQF, ZGEQP3, ZGEQRF, ZLACPY, ZLAPMR, + $ ZLASCL, DLASCL, ZLASET, ZLASSQ, ZLASWP, ZUNGQR, ZUNMLQ, + $ ZUNMQR, ZPOCON, DSCAL, ZDSCAL, ZSWAP, ZTRSM, ZLACGV, + $ XERBLA * EXTERNAL ZGESVJ * .. * * Test the input arguments * - LSVEC = LSAME( JOBU, 'U' ) .OR. LSAME( JOBU, 'F' ) JRACC = LSAME( JOBV, 'J' ) RSVEC = LSAME( JOBV, 'V' ) .OR. JRACC @@ -581,23 +640,25 @@ L2RANK = LSAME( JOBA, 'R' ) L2ABER = LSAME( JOBA, 'A' ) ERREST = LSAME( JOBA, 'E' ) .OR. LSAME( JOBA, 'G' ) - L2TRAN = LSAME( JOBT, 'T' ) + L2TRAN = LSAME( JOBT, 'T' ) .AND. ( M .EQ. N ) L2KILL = LSAME( JOBR, 'R' ) DEFR = LSAME( JOBR, 'N' ) L2PERT = LSAME( JOBP, 'P' ) * + LQUERY = ( LWORK .EQ. -1 ) .OR. ( LRWORK .EQ. -1 ) +* IF ( .NOT.(ROWPIV .OR. L2RANK .OR. L2ABER .OR. $ ERREST .OR. LSAME( JOBA, 'C' ) )) THEN INFO = - 1 - ELSE IF ( .NOT.( LSVEC .OR. LSAME( JOBU, 'N' ) .OR. - $ LSAME( JOBU, 'W' )) ) THEN + ELSE IF ( .NOT.( LSVEC .OR. LSAME( JOBU, 'N' ) .OR. + $ ( LSAME( JOBU, 'W' ) .AND. RSVEC .AND. L2TRAN ) ) ) THEN INFO = - 2 ELSE IF ( .NOT.( RSVEC .OR. LSAME( JOBV, 'N' ) .OR. - $ LSAME( JOBV, 'W' )) .OR. ( JRACC .AND. (.NOT.LSVEC) ) ) THEN + $ ( LSAME( JOBV, 'W' ) .AND. LSVEC .AND. L2TRAN ) ) ) THEN INFO = - 3 ELSE IF ( .NOT. ( L2KILL .OR. DEFR ) ) THEN INFO = - 4 - ELSE IF ( .NOT. ( L2TRAN .OR. LSAME( JOBT, 'N' ) ) ) THEN + ELSE IF ( .NOT. ( LSAME(JOBT,'T') .OR. LSAME(JOBT,'N') ) ) THEN INFO = - 5 ELSE IF ( .NOT. ( L2PERT .OR. LSAME( JOBP, 'N' ) ) ) THEN INFO = - 6 @@ -611,37 +672,294 @@ INFO = - 13 ELSE IF ( RSVEC .AND. ( LDV .LT. N ) ) THEN INFO = - 15 - ELSE IF ( (.NOT.(LSVEC .OR. RSVEC .OR. ERREST).AND. - $ (LWORK .LT. 2*N+1)) .OR. - $ (.NOT.(LSVEC .OR. RSVEC) .AND. ERREST .AND. - $ (LWORK .LT. N*N+3*N)) .OR. - $ (LSVEC .AND. (.NOT.RSVEC) .AND. (LWORK .LT. 3*N)) - $ .OR. - $ (RSVEC .AND. (.NOT.LSVEC) .AND. (LWORK .LT. 3*N)) - $ .OR. - $ (LSVEC .AND. RSVEC .AND. (.NOT.JRACC) .AND. - $ (LWORK.LT.5*N+2*N*N)) - $ .OR. (LSVEC .AND. RSVEC .AND. JRACC .AND. - $ LWORK.LT.4*N+N*N)) - $ THEN - INFO = - 17 - ELSE IF ( LRWORK.LT. MAX0(N+2*M,7)) THEN - INFO = -19 ELSE * #:) INFO = 0 END IF * + IF ( INFO .EQ. 0 ) THEN +* .. compute the minimal and the optimal workspace lengths +* [[The expressions for computing the minimal and the optimal +* values of LCWORK, LRWORK are written with a lot of redundancy and +* can be simplified. However, this verbose form is useful for +* maintenance and modifications of the code.]] +* +* .. minimal workspace length for ZGEQP3 of an M x N matrix, +* ZGEQRF of an N x N matrix, ZGELQF of an N x N matrix, +* ZUNMLQ for computing N x N matrix, ZUNMQR for computing N x N +* matrix, ZUNMQR for computing M x N matrix, respectively. + LWQP3 = N+1 + LWQRF = MAX( 1, N ) + LWLQF = MAX( 1, N ) + LWUNMLQ = MAX( 1, N ) + LWUNMQR = MAX( 1, N ) + LWUNMQRM = MAX( 1, M ) +* .. minimal workspace length for ZPOCON of an N x N matrix + LWCON = 2 * N +* .. minimal workspace length for ZGESVJ of an N x N matrix, +* without and with explicit accumulation of Jacobi rotations + LWSVDJ = MAX( 2 * N, 1 ) + LWSVDJV = MAX( 2 * N, 1 ) +* .. minimal REAL workspace length for ZGEQP3, ZPOCON, ZGESVJ + LRWQP3 = 2 * N + LRWCON = N + LRWSVDJ = N + IF ( LQUERY ) THEN + CALL ZGEQP3( M, N, A, LDA, IWORK, CDUMMY, CDUMMY, -1, + $ RDUMMY, IERR ) + LWRK_ZGEQP3 = INT( CDUMMY(1) ) + CALL ZGEQRF( N, N, A, LDA, CDUMMY, CDUMMY,-1, IERR ) + LWRK_ZGEQRF = INT( CDUMMY(1) ) + CALL ZGELQF( N, N, A, LDA, CDUMMY, CDUMMY,-1, IERR ) + LWRK_ZGELQF = INT( CDUMMY(1) ) + END IF + MINWRK = 2 + OPTWRK = 2 + MINIWRK = N + IF ( .NOT. (LSVEC .OR. RSVEC ) ) THEN +* .. minimal and optimal sizes of the complex workspace if +* only the singular values are requested + IF ( ERREST ) THEN + MINWRK = MAX( N+LWQP3, N**2+LWCON, N+LWQRF, LWSVDJ ) + ELSE + MINWRK = MAX( N+LWQP3, N+LWQRF, LWSVDJ ) + END IF + IF ( LQUERY ) THEN + CALL ZGESVJ( 'L', 'N', 'N', N, N, A, LDA, SVA, N, V, + $ LDV, CDUMMY, -1, RDUMMY, -1, IERR ) + LWRK_ZGESVJ = INT( CDUMMY(1) ) + IF ( ERREST ) THEN + OPTWRK = MAX( N+LWRK_ZGEQP3, N**2+LWCON, + $ N+LWRK_ZGEQRF, LWRK_ZGESVJ ) + ELSE + OPTWRK = MAX( N+LWRK_ZGEQP3, N+LWRK_ZGEQRF, + $ LWRK_ZGESVJ ) + END IF + END IF + IF ( L2TRAN .OR. ROWPIV ) THEN + IF ( ERREST ) THEN + MINRWRK = MAX( 7, 2*M, LRWQP3, LRWCON, LRWSVDJ ) + ELSE + MINRWRK = MAX( 7, 2*M, LRWQP3, LRWSVDJ ) + END IF + ELSE + IF ( ERREST ) THEN + MINRWRK = MAX( 7, LRWQP3, LRWCON, LRWSVDJ ) + ELSE + MINRWRK = MAX( 7, LRWQP3, LRWSVDJ ) + END IF + END IF + IF ( ROWPIV .OR. L2TRAN ) MINIWRK = MINIWRK + M + ELSE IF ( RSVEC .AND. (.NOT.LSVEC) ) THEN +* .. minimal and optimal sizes of the complex workspace if the +* singular values and the right singular vectors are requested + IF ( ERREST ) THEN + MINWRK = MAX( N+LWQP3, LWCON, LWSVDJ, N+LWLQF, + $ 2*N+LWQRF, N+LWSVDJ, N+LWUNMLQ ) + ELSE + MINWRK = MAX( N+LWQP3, LWSVDJ, N+LWLQF, 2*N+LWQRF, + $ N+LWSVDJ, N+LWUNMLQ ) + END IF + IF ( LQUERY ) THEN + CALL ZGESVJ( 'L', 'U', 'N', N,N, U, LDU, SVA, N, A, + $ LDA, CDUMMY, -1, RDUMMY, -1, IERR ) + LWRK_ZGESVJ = INT( CDUMMY(1) ) + CALL ZUNMLQ( 'L', 'C', N, N, N, A, LDA, CDUMMY, + $ V, LDV, CDUMMY, -1, IERR ) + LWRK_ZUNMLQ = INT( CDUMMY(1) ) + IF ( ERREST ) THEN + OPTWRK = MAX( N+LWRK_ZGEQP3, LWCON, LWRK_ZGESVJ, + $ N+LWRK_ZGELQF, 2*N+LWRK_ZGEQRF, + $ N+LWRK_ZGESVJ, N+LWRK_ZUNMLQ ) + ELSE + OPTWRK = MAX( N+LWRK_ZGEQP3, LWRK_ZGESVJ,N+LWRK_ZGELQF, + $ 2*N+LWRK_ZGEQRF, N+LWRK_ZGESVJ, + $ N+LWRK_ZUNMLQ ) + END IF + END IF + IF ( L2TRAN .OR. ROWPIV ) THEN + IF ( ERREST ) THEN + MINRWRK = MAX( 7, 2*M, LRWQP3, LRWSVDJ, LRWCON ) + ELSE + MINRWRK = MAX( 7, 2*M, LRWQP3, LRWSVDJ ) + END IF + ELSE + IF ( ERREST ) THEN + MINRWRK = MAX( 7, LRWQP3, LRWSVDJ, LRWCON ) + ELSE + MINRWRK = MAX( 7, LRWQP3, LRWSVDJ ) + END IF + END IF + IF ( ROWPIV .OR. L2TRAN ) MINIWRK = MINIWRK + M + ELSE IF ( LSVEC .AND. (.NOT.RSVEC) ) THEN +* .. minimal and optimal sizes of the complex workspace if the +* singular values and the left singular vectors are requested + IF ( ERREST ) THEN + MINWRK = N + MAX( LWQP3,LWCON,N+LWQRF,LWSVDJ,LWUNMQRM ) + ELSE + MINWRK = N + MAX( LWQP3, N+LWQRF, LWSVDJ, LWUNMQRM ) + END IF + IF ( LQUERY ) THEN + CALL ZGESVJ( 'L', 'U', 'N', N,N, U, LDU, SVA, N, A, + $ LDA, CDUMMY, -1, RDUMMY, -1, IERR ) + LWRK_ZGESVJ = INT( CDUMMY(1) ) + CALL ZUNMQR( 'L', 'N', M, N, N, A, LDA, CDUMMY, U, + $ LDU, CDUMMY, -1, IERR ) + LWRK_ZUNMQRM = INT( CDUMMY(1) ) + IF ( ERREST ) THEN + OPTWRK = N + MAX( LWRK_ZGEQP3, LWCON, N+LWRK_ZGEQRF, + $ LWRK_ZGESVJ, LWRK_ZUNMQRM ) + ELSE + OPTWRK = N + MAX( LWRK_ZGEQP3, N+LWRK_ZGEQRF, + $ LWRK_ZGESVJ, LWRK_ZUNMQRM ) + END IF + END IF + IF ( L2TRAN .OR. ROWPIV ) THEN + IF ( ERREST ) THEN + MINRWRK = MAX( 7, 2*M, LRWQP3, LRWSVDJ, LRWCON ) + ELSE + MINRWRK = MAX( 7, 2*M, LRWQP3, LRWSVDJ ) + END IF + ELSE + IF ( ERREST ) THEN + MINRWRK = MAX( 7, LRWQP3, LRWSVDJ, LRWCON ) + ELSE + MINRWRK = MAX( 7, LRWQP3, LRWSVDJ ) + END IF + END IF + IF ( ROWPIV .OR. L2TRAN ) MINIWRK = MINIWRK + M + ELSE +* .. minimal and optimal sizes of the complex workspace if the +* full SVD is requested + IF ( .NOT. JRACC ) THEN + IF ( ERREST ) THEN + MINWRK = MAX( N+LWQP3, N+LWCON, 2*N+N**2+LWCON, + $ 2*N+LWQRF, 2*N+LWQP3, + $ 2*N+N**2+N+LWLQF, 2*N+N**2+N+N**2+LWCON, + $ 2*N+N**2+N+LWSVDJ, 2*N+N**2+N+LWSVDJV, + $ 2*N+N**2+N+LWUNMQR,2*N+N**2+N+LWUNMLQ, + $ N+N**2+LWSVDJ, N+LWUNMQRM ) + ELSE + MINWRK = MAX( N+LWQP3, 2*N+N**2+LWCON, + $ 2*N+LWQRF, 2*N+LWQP3, + $ 2*N+N**2+N+LWLQF, 2*N+N**2+N+N**2+LWCON, + $ 2*N+N**2+N+LWSVDJ, 2*N+N**2+N+LWSVDJV, + $ 2*N+N**2+N+LWUNMQR,2*N+N**2+N+LWUNMLQ, + $ N+N**2+LWSVDJ, N+LWUNMQRM ) + END IF + MINIWRK = MINIWRK + N + IF ( ROWPIV .OR. L2TRAN ) MINIWRK = MINIWRK + M + ELSE + IF ( ERREST ) THEN + MINWRK = MAX( N+LWQP3, N+LWCON, 2*N+LWQRF, + $ 2*N+N**2+LWSVDJV, 2*N+N**2+N+LWUNMQR, + $ N+LWUNMQRM ) + ELSE + MINWRK = MAX( N+LWQP3, 2*N+LWQRF, + $ 2*N+N**2+LWSVDJV, 2*N+N**2+N+LWUNMQR, + $ N+LWUNMQRM ) + END IF + IF ( ROWPIV .OR. L2TRAN ) MINIWRK = MINIWRK + M + END IF + IF ( LQUERY ) THEN + CALL ZUNMQR( 'L', 'N', M, N, N, A, LDA, CDUMMY, U, + $ LDU, CDUMMY, -1, IERR ) + LWRK_ZUNMQRM = INT( CDUMMY(1) ) + CALL ZUNMQR( 'L', 'N', N, N, N, A, LDA, CDUMMY, U, + $ LDU, CDUMMY, -1, IERR ) + LWRK_ZUNMQR = INT( CDUMMY(1) ) + IF ( .NOT. JRACC ) THEN + CALL ZGEQP3( N,N, A, LDA, IWORK, CDUMMY,CDUMMY, -1, + $ RDUMMY, IERR ) + LWRK_ZGEQP3N = INT( CDUMMY(1) ) + CALL ZGESVJ( 'L', 'U', 'N', N, N, U, LDU, SVA, + $ N, V, LDV, CDUMMY, -1, RDUMMY, -1, IERR ) + LWRK_ZGESVJ = INT( CDUMMY(1) ) + CALL ZGESVJ( 'U', 'U', 'N', N, N, U, LDU, SVA, + $ N, V, LDV, CDUMMY, -1, RDUMMY, -1, IERR ) + LWRK_ZGESVJU = INT( CDUMMY(1) ) + CALL ZGESVJ( 'L', 'U', 'V', N, N, U, LDU, SVA, + $ N, V, LDV, CDUMMY, -1, RDUMMY, -1, IERR ) + LWRK_ZGESVJV = INT( CDUMMY(1) ) + CALL ZUNMLQ( 'L', 'C', N, N, N, A, LDA, CDUMMY, + $ V, LDV, CDUMMY, -1, IERR ) + LWRK_ZUNMLQ = INT( CDUMMY(1) ) + IF ( ERREST ) THEN + OPTWRK = MAX( N+LWRK_ZGEQP3, N+LWCON, + $ 2*N+N**2+LWCON, 2*N+LWRK_ZGEQRF, + $ 2*N+LWRK_ZGEQP3N, + $ 2*N+N**2+N+LWRK_ZGELQF, + $ 2*N+N**2+N+N**2+LWCON, + $ 2*N+N**2+N+LWRK_ZGESVJ, + $ 2*N+N**2+N+LWRK_ZGESVJV, + $ 2*N+N**2+N+LWRK_ZUNMQR, + $ 2*N+N**2+N+LWRK_ZUNMLQ, + $ N+N**2+LWRK_ZGESVJU, + $ N+LWRK_ZUNMQRM ) + ELSE + OPTWRK = MAX( N+LWRK_ZGEQP3, + $ 2*N+N**2+LWCON, 2*N+LWRK_ZGEQRF, + $ 2*N+LWRK_ZGEQP3N, + $ 2*N+N**2+N+LWRK_ZGELQF, + $ 2*N+N**2+N+N**2+LWCON, + $ 2*N+N**2+N+LWRK_ZGESVJ, + $ 2*N+N**2+N+LWRK_ZGESVJV, + $ 2*N+N**2+N+LWRK_ZUNMQR, + $ 2*N+N**2+N+LWRK_ZUNMLQ, + $ N+N**2+LWRK_ZGESVJU, + $ N+LWRK_ZUNMQRM ) + END IF + ELSE + CALL ZGESVJ( 'L', 'U', 'V', N, N, U, LDU, SVA, + $ N, V, LDV, CDUMMY, -1, RDUMMY, -1, IERR ) + LWRK_ZGESVJV = INT( CDUMMY(1) ) + CALL ZUNMQR( 'L', 'N', N, N, N, CDUMMY, N, CDUMMY, + $ V, LDV, CDUMMY, -1, IERR ) + LWRK_ZUNMQR = INT( CDUMMY(1) ) + CALL ZUNMQR( 'L', 'N', M, N, N, A, LDA, CDUMMY, U, + $ LDU, CDUMMY, -1, IERR ) + LWRK_ZUNMQRM = INT( CDUMMY(1) ) + IF ( ERREST ) THEN + OPTWRK = MAX( N+LWRK_ZGEQP3, N+LWCON, + $ 2*N+LWRK_ZGEQRF, 2*N+N**2, + $ 2*N+N**2+LWRK_ZGESVJV, + $ 2*N+N**2+N+LWRK_ZUNMQR,N+LWRK_ZUNMQRM ) + ELSE + OPTWRK = MAX( N+LWRK_ZGEQP3, 2*N+LWRK_ZGEQRF, + $ 2*N+N**2, 2*N+N**2+LWRK_ZGESVJV, + $ 2*N+N**2+N+LWRK_ZUNMQR, + $ N+LWRK_ZUNMQRM ) + END IF + END IF + END IF + IF ( L2TRAN .OR. ROWPIV ) THEN + MINRWRK = MAX( 7, 2*M, LRWQP3, LRWSVDJ, LRWCON ) + ELSE + MINRWRK = MAX( 7, LRWQP3, LRWSVDJ, LRWCON ) + END IF + END IF + MINWRK = MAX( 2, MINWRK ) + OPTWRK = MAX( MINWRK, OPTWRK ) + IF ( LWORK .LT. MINWRK .AND. (.NOT.LQUERY) ) INFO = - 17 + IF ( LRWORK .LT. MINRWRK .AND. (.NOT.LQUERY) ) INFO = - 19 + END IF +* IF ( INFO .NE. 0 ) THEN * #:( CALL XERBLA( 'ZGEJSV', - INFO ) RETURN + ELSE IF ( LQUERY ) THEN + CWORK(1) = OPTWRK + CWORK(2) = MINWRK + RWORK(1) = MINRWRK + IWORK(1) = MAX( 4, MINIWRK ) + RETURN END IF * * Quick return for void matrix (Y3K safe) * #:) IF ( ( M .EQ. 0 ) .OR. ( N .EQ. 0 ) ) THEN - IWORK(1:3) = 0 + IWORK(1:4) = 0 RWORK(1:7) = 0 RETURN ENDIF @@ -669,7 +987,7 @@ * overflow. It is possible that this scaling pushes the smallest * column norm left from the underflow threshold (extreme case). * - SCALEM = ONE / DSQRT(DBLE(M)*DBLE(N)) + SCALEM = ONE / SQRT(DBLE(M)*DBLE(N)) NOSCAL = .TRUE. GOSCAL = .TRUE. DO 1874 p = 1, N @@ -681,7 +999,7 @@ CALL XERBLA( 'ZGEJSV', -INFO ) RETURN END IF - AAQQ = DSQRT(AAQQ) + AAQQ = SQRT(AAQQ) IF ( ( AAPP .LT. (BIG / AAQQ) ) .AND. NOSCAL ) THEN SVA(p) = AAPP * AAQQ ELSE @@ -699,8 +1017,8 @@ AAPP = ZERO AAQQ = BIG DO 4781 p = 1, N - AAPP = DMAX1( AAPP, SVA(p) ) - IF ( SVA(p) .NE. ZERO ) AAQQ = DMIN1( AAQQ, SVA(p) ) + AAPP = MAX( AAPP, SVA(p) ) + IF ( SVA(p) .NE. ZERO ) AAQQ = MIN( AAQQ, SVA(p) ) 4781 CONTINUE * * Quick return for zero M x N matrix @@ -722,11 +1040,12 @@ IWORK(1) = 0 IWORK(2) = 0 IWORK(3) = 0 + IWORK(4) = -1 RETURN END IF * * Issue warning if denormalized column norms detected. Override the -* high relative accuracy request. Issue licence to kill columns +* high relative accuracy request. Issue licence to kill nonzero columns * (set them to zero) whose norm is less than sigma_max / BIG (roughly). * #:( WARNING = 0 @@ -770,7 +1089,8 @@ IWORK(1) = 0 IWORK(2) = 0 END IF - IWORK(3) = 0 + IWORK(3) = 0 + IWORK(4) = -1 IF ( ERREST ) RWORK(3) = ONE IF ( LSVEC .AND. RSVEC ) THEN RWORK(4) = ONE @@ -785,7 +1105,6 @@ END IF * TRANSP = .FALSE. - L2TRAN = L2TRAN .AND. ( M .EQ. N ) * AATMAX = -ONE AATMIN = BIG @@ -803,23 +1122,23 @@ CALL ZLASSQ( N, A(p,1), LDA, XSC, TEMP1 ) * ZLASSQ gets both the ell_2 and the ell_infinity norm * in one pass through the vector - RWORK(M+N+p) = XSC * SCALEM - RWORK(N+p) = XSC * (SCALEM*DSQRT(TEMP1)) - AATMAX = DMAX1( AATMAX, RWORK(N+p) ) - IF (RWORK(N+p) .NE. ZERO) - $ AATMIN = DMIN1(AATMIN,RWORK(N+p)) + RWORK(M+p) = XSC * SCALEM + RWORK(p) = XSC * (SCALEM*SQRT(TEMP1)) + AATMAX = MAX( AATMAX, RWORK(p) ) + IF (RWORK(p) .NE. ZERO) + $ AATMIN = MIN(AATMIN,RWORK(p)) 1950 CONTINUE ELSE DO 1904 p = 1, M - RWORK(M+N+p) = SCALEM*ABS( A(p,IZAMAX(N,A(p,1),LDA)) ) - AATMAX = DMAX1( AATMAX, RWORK(M+N+p) ) - AATMIN = DMIN1( AATMIN, RWORK(M+N+p) ) + RWORK(M+p) = SCALEM*ABS( A(p,IZAMAX(N,A(p,1),LDA)) ) + AATMAX = MAX( AATMAX, RWORK(M+p) ) + AATMIN = MIN( AATMIN, RWORK(M+p) ) 1904 CONTINUE END IF * END IF * -* For square matrix A try to determine whether A^* would be better +* For square matrix A try to determine whether A^* would be better * input for the preconditioned Jacobi SVD, with faster convergence. * The decision is based on an O(N) function of the vector of column * and row norms of A, based on the Shannon entropy. This should give @@ -849,7 +1168,7 @@ * same trace. * ENTRAT = ZERO - DO 1114 p = N+1, N+M + DO 1114 p = 1, M BIG1 = ( ( RWORK(p) / XSC )**2 ) * TEMP1 IF ( BIG1 .NE. ZERO ) ENTRAT = ENTRAT + BIG1 * DLOG(BIG1) 1114 CONTINUE @@ -859,27 +1178,26 @@ * usually means better input for the algorithm. * TRANSP = ( ENTRAT .LT. ENTRA ) - TRANSP = .TRUE. -* -* If A^* is better than A, take the adjoint of A. -* +* +* If A^* is better than A, take the adjoint of A. This is allowed +* only for square matrices, M=N. IF ( TRANSP ) THEN * In an optimal implementation, this trivial transpose * should be replaced with faster transpose. DO 1115 p = 1, N - 1 - A(p,p) = DCONJG(A(p,p)) + A(p,p) = CONJG(A(p,p)) DO 1116 q = p + 1, N - CTEMP = DCONJG(A(q,p)) - A(q,p) = DCONJG(A(p,q)) + CTEMP = CONJG(A(q,p)) + A(q,p) = CONJG(A(p,q)) A(p,q) = CTEMP 1116 CONTINUE 1115 CONTINUE - A(N,N) = DCONJG(A(N,N)) + A(N,N) = CONJG(A(N,N)) DO 1117 p = 1, N - RWORK(M+N+p) = SVA(p) - SVA(p) = RWORK(N+p) -* previously computed row 2-norms are now column 2-norms -* of the transposed matrix + RWORK(M+p) = SVA(p) + SVA(p) = RWORK(p) +* previously computed row 2-norms are now column 2-norms +* of the transposed matrix 1117 CONTINUE TEMP1 = AAPP AAPP = AATMAX @@ -890,7 +1208,7 @@ KILL = LSVEC LSVEC = RSVEC RSVEC = KILL - IF ( LSVEC ) N1 = N + IF ( LSVEC ) N1 = N * ROWPIV = .TRUE. END IF @@ -907,9 +1225,11 @@ * overflows in the intermediate results. If the singular values spread * from SFMIN to BIG, then ZGESVJ will compute them. So, in that case, * one should use ZGESVJ instead of ZGEJSV. -* - BIG1 = DSQRT( BIG ) - TEMP1 = DSQRT( BIG / DBLE(N) ) +* >> change in the April 2016 update: allow bigger range, i.e. the +* largest column is allowed up to BIG/N and ZGESVJ will do the rest. + BIG1 = SQRT( BIG ) + TEMP1 = SQRT( BIG / DBLE(N) ) +* TEMP1 = BIG/DBLE(N) * CALL DLASCL( 'G', 0, 0, AAPP, TEMP1, N, 1, SVA, N, IERR ) IF ( AAQQ .GT. (AAPP * SFMIN) ) THEN @@ -930,7 +1250,7 @@ * L2KILL enforces computation of nonzero singular values in * the restricted range of condition number of the initial A, * sigma_max(A) / sigma_min(A) approx. SQRT(BIG)/SQRT(SFMIN). - XSC = DSQRT( SFMIN ) + XSC = SQRT( SFMIN ) ELSE XSC = SMALL * @@ -942,7 +1262,7 @@ * time. Depending on the concrete implementation of BLAS and LAPACK * (i.e. how they behave in presence of extreme ill-conditioning) the * implementor may decide to remove this switch. - IF ( ( AAQQ.LT.DSQRT(SFMIN) ) .AND. LSVEC .AND. RSVEC ) THEN + IF ( ( AAQQ.LT.SQRT(SFMIN) ) .AND. LSVEC .AND. RSVEC ) THEN JRACC = .TRUE. END IF * @@ -964,18 +1284,22 @@ * row pivoting combined with standard column pivoting * has similar effect as Powell-Reid complete pivoting. * The ell-infinity norms of A are made nonincreasing. + IF ( ( LSVEC .AND. RSVEC ) .AND. .NOT.( JRACC ) ) THEN + IWOFF = 2*N + ELSE + IWOFF = N + END IF DO 1952 p = 1, M - 1 - q = IDAMAX( M-p+1, RWORK(M+N+p), 1 ) + p - 1 - IWORK(2*N+p) = q + q = IDAMAX( M-p+1, RWORK(M+p), 1 ) + p - 1 + IWORK(IWOFF+p) = q IF ( p .NE. q ) THEN - TEMP1 = RWORK(M+N+p) - RWORK(M+N+p) = RWORK(M+N+q) - RWORK(M+N+q) = TEMP1 + TEMP1 = RWORK(M+p) + RWORK(M+p) = RWORK(M+q) + RWORK(M+q) = TEMP1 END IF 1952 CONTINUE - CALL ZLASWP( N, A, LDA, 1, M-1, IWORK(2*N+1), 1 ) + CALL ZLASWP( N, A, LDA, 1, M-1, IWORK(IWOFF+1), 1 ) END IF - * * End of the preparation phase (scaling, optional sorting and * transposing, optional flushing of small columns). @@ -989,14 +1313,14 @@ * (eg speed by replacing global with restricted window pivoting, such * as in xGEQPX from TOMS # 782). Good results will be obtained using * xGEQPX with properly (!) chosen numerical parameters. -* Any improvement of ZGEQP3 improves overal performance of ZGEJSV. +* Any improvement of ZGEQP3 improves overall performance of ZGEJSV. * * A * P1 = Q1 * [ R1^* 0]^*: DO 1963 p = 1, N * .. all columns are free columns IWORK(p) = 0 1963 CONTINUE - CALL ZGEQP3( M, N, A, LDA, IWORK, CWORK, CWORK(N+1), LWORK-N, + CALL ZGEQP3( M, N, A, LDA, IWORK, CWORK, CWORK(N+1), LWORK-N, $ RWORK, IERR ) * * The upper triangular matrix R1 from the first QRF is inspected for @@ -1011,9 +1335,9 @@ IF ( L2ABER ) THEN * Standard absolute error bound suffices. All sigma_i with * sigma_i < N*EPSLN*||A|| are flushed to zero. This is an -* agressive enforcement of lower numerical rank by introducing a +* aggressive enforcement of lower numerical rank by introducing a * backward error of the order of N*EPSLN*||A||. - TEMP1 = DSQRT(DBLE(N))*EPSLN + TEMP1 = SQRT(DBLE(N))*EPSLN DO 3001 p = 2, N IF ( ABS(A(p,p)) .GE. (TEMP1*ABS(A(1,1))) ) THEN NR = NR + 1 @@ -1023,10 +1347,10 @@ 3001 CONTINUE 3002 CONTINUE ELSE IF ( L2RANK ) THEN -* .. similarly as above, only slightly more gentle (less agressive). +* .. similarly as above, only slightly more gentle (less aggressive). * Sudden drop on the diagonal of R1 is used as the criterion for -* close-to-rank-defficient. - TEMP1 = DSQRT(SFMIN) +* close-to-rank-deficient. + TEMP1 = SQRT(SFMIN) DO 3401 p = 2, N IF ( ( ABS(A(p,p)) .LT. (EPSLN*ABS(A(p-1,p-1))) ) .OR. $ ( ABS(A(p,p)) .LT. SMALL ) .OR. @@ -1043,7 +1367,7 @@ * Here we just remove the underflowed part of the triangular * factor. This prevents the situation in which the code is * working hard to get the accuracy not warranted by the data. - TEMP1 = DSQRT(SFMIN) + TEMP1 = SQRT(SFMIN) DO 3301 p = 2, N IF ( ( ABS(A(p,p)) .LT. SMALL ) .OR. $ ( L2KILL .AND. (ABS(A(p,p)) .LT. TEMP1) ) ) GO TO 3302 @@ -1058,7 +1382,7 @@ MAXPRJ = ONE DO 3051 p = 2, N TEMP1 = ABS(A(p,p)) / SVA(IWORK(p)) - MAXPRJ = DMIN1( MAXPRJ, TEMP1 ) + MAXPRJ = MIN( MAXPRJ, TEMP1 ) 3051 CONTINUE IF ( MAXPRJ**2 .GE. ONE - DBLE(N)*EPSLN ) ALMORT = .TRUE. END IF @@ -1077,8 +1401,13 @@ TEMP1 = SVA(IWORK(p)) CALL ZDSCAL( p, ONE/TEMP1, V(1,p), 1 ) 3053 CONTINUE - CALL ZPOCON( 'U', N, V, LDV, ONE, TEMP1, - $ CWORK(N+1), RWORK, IERR ) + IF ( LSVEC )THEN + CALL ZPOCON( 'U', N, V, LDV, ONE, TEMP1, + $ CWORK(N+1), RWORK, IERR ) + ELSE + CALL ZPOCON( 'U', N, V, LDV, ONE, TEMP1, + $ CWORK, RWORK, IERR ) + END IF * ELSE IF ( LSVEC ) THEN * .. U is available as workspace @@ -1090,17 +1419,27 @@ CALL ZPOCON( 'U', N, U, LDU, ONE, TEMP1, $ CWORK(N+1), RWORK, IERR ) ELSE - CALL ZLACPY( 'U', N, N, A, LDA, CWORK(N+1), N ) + CALL ZLACPY( 'U', N, N, A, LDA, CWORK, N ) +*[] CALL ZLACPY( 'U', N, N, A, LDA, CWORK(N+1), N ) +* Change: here index shifted by N to the left, CWORK(1:N) +* not needed for SIGMA only computation DO 3052 p = 1, N TEMP1 = SVA(IWORK(p)) - CALL ZDSCAL( p, ONE/TEMP1, CWORK(N+(p-1)*N+1), 1 ) +*[] CALL ZDSCAL( p, ONE/TEMP1, CWORK(N+(p-1)*N+1), 1 ) + CALL ZDSCAL( p, ONE/TEMP1, CWORK((p-1)*N+1), 1 ) 3052 CONTINUE * .. the columns of R are scaled to have unit Euclidean lengths. - CALL ZPOCON( 'U', N, CWORK(N+1), N, ONE, TEMP1, - $ CWORK(N+N*N+1), RWORK, IERR ) +*[] CALL ZPOCON( 'U', N, CWORK(N+1), N, ONE, TEMP1, +*[] $ CWORK(N+N*N+1), RWORK, IERR ) + CALL ZPOCON( 'U', N, CWORK, N, ONE, TEMP1, + $ CWORK(N*N+1), RWORK, IERR ) * END IF - SCONDA = ONE / DSQRT(TEMP1) + IF ( TEMP1 .NE. ZERO ) THEN + SCONDA = ONE / SQRT(TEMP1) + ELSE + SCONDA = - ONE + END IF * SCONDA is an estimate of SQRT(||(R^* * R)^(-1)||_1). * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA ELSE @@ -1108,7 +1447,7 @@ END IF END IF * - L2PERT = L2PERT .AND. ( ABS( A(1,1)/A(NR,NR) ) .GT. DSQRT(BIG1) ) + L2PERT = L2PERT .AND. ( ABS( A(1,1)/A(NR,NR) ) .GT. SQRT(BIG1) ) * If there is no violent scaling, artificial perturbation is not needed. * * Phase 3: @@ -1118,11 +1457,11 @@ * Singular Values only * * .. transpose A(1:NR,1:N) - DO 1946 p = 1, MIN0( N-1, NR ) + DO 1946 p = 1, MIN( N-1, NR ) CALL ZCOPY( N-p, A(p,p+1), LDA, A(p+1,p), 1 ) CALL ZLACGV( N-p+1, A(p,p), 1 ) 1946 CONTINUE - IF ( NR .EQ. N ) A(N,N) = DCONJG(A(N,N)) + IF ( NR .EQ. N ) A(N,N) = CONJG(A(N,N)) * * The following two DO-loops introduce small relative perturbation * into the strict upper triangle of the lower triangular matrix. @@ -1179,7 +1518,7 @@ IF ( ( (p.GT.q) .AND. (ABS(A(p,q)).LE.TEMP1) ) $ .OR. ( p .LT. q ) ) * $ A(p,q) = TEMP1 * ( A(p,q) / ABS(A(p,q)) ) - $ A(p,q) = CTEMP + $ A(p,q) = CTEMP 1949 CONTINUE 1947 CONTINUE ELSE @@ -1190,14 +1529,16 @@ * triangular matrix (plus perturbation which is ignored in * the part which destroys triangular form (confusing?!)) * - CALL ZGESVJ( 'L', 'NoU', 'NoV', NR, NR, A, LDA, SVA, + CALL ZGESVJ( 'L', 'N', 'N', NR, NR, A, LDA, SVA, $ N, V, LDV, CWORK, LWORK, RWORK, LRWORK, INFO ) * SCALEM = RWORK(1) NUMRANK = NINT(RWORK(2)) * * - ELSE IF ( RSVEC .AND. ( .NOT. LSVEC ) ) THEN + ELSE IF ( ( RSVEC .AND. ( .NOT. LSVEC ) .AND. ( .NOT. JRACC ) ) + $ .OR. + $ ( JRACC .AND. ( .NOT. LSVEC ) .AND. ( NR .NE. N ) ) ) THEN * * -> Singular Values and Right Singular Vectors <- * @@ -1208,9 +1549,9 @@ CALL ZCOPY( N-p+1, A(p,p), LDA, V(p,p), 1 ) CALL ZLACGV( N-p+1, V(p,p), 1 ) 1998 CONTINUE - CALL ZLASET( 'Upper', NR-1,NR-1, CZERO, CZERO, V(1,2), LDV ) + CALL ZLASET( 'U', NR-1,NR-1, CZERO, CZERO, V(1,2), LDV ) * - CALL ZGESVJ( 'L','U','N', N, NR, V,LDV, SVA, NR, A,LDA, + CALL ZGESVJ( 'L','U','N', N, NR, V, LDV, SVA, NR, A, LDA, $ CWORK, LWORK, RWORK, LRWORK, INFO ) SCALEM = RWORK(1) NUMRANK = NINT(RWORK(2)) @@ -1220,19 +1561,19 @@ * .. two more QR factorizations ( one QRF is not enough, two require * accumulated product of Jacobi rotations, three are perfect ) * - CALL ZLASET( 'Lower', NR-1,NR-1, CZERO, CZERO, A(2,1), LDA ) + CALL ZLASET( 'L', NR-1,NR-1, CZERO, CZERO, A(2,1), LDA ) CALL ZGELQF( NR,N, A, LDA, CWORK, CWORK(N+1), LWORK-N, IERR) - CALL ZLACPY( 'Lower', NR, NR, A, LDA, V, LDV ) - CALL ZLASET( 'Upper', NR-1,NR-1, CZERO, CZERO, V(1,2), LDV ) + CALL ZLACPY( 'L', NR, NR, A, LDA, V, LDV ) + CALL ZLASET( 'U', NR-1,NR-1, CZERO, CZERO, V(1,2), LDV ) CALL ZGEQRF( NR, NR, V, LDV, CWORK(N+1), CWORK(2*N+1), $ LWORK-2*N, IERR ) DO 8998 p = 1, NR CALL ZCOPY( NR-p+1, V(p,p), LDV, V(p,p), 1 ) - CALL ZLACGV( NR-p+1, V(p,p), 1 ) + CALL ZLACGV( NR-p+1, V(p,p), 1 ) 8998 CONTINUE - CALL ZLASET('Upper', NR-1, NR-1, CZERO, CZERO, V(1,2), LDV) + CALL ZLASET('U', NR-1, NR-1, CZERO, CZERO, V(1,2), LDV) * - CALL ZGESVJ( 'Lower', 'U','N', NR, NR, V,LDV, SVA, NR, U, + CALL ZGESVJ( 'L', 'U','N', NR, NR, V,LDV, SVA, NR, U, $ LDU, CWORK(N+1), LWORK-N, RWORK, LRWORK, INFO ) SCALEM = RWORK(1) NUMRANK = NINT(RWORK(2)) @@ -1242,19 +1583,30 @@ CALL ZLASET( 'A',N-NR,N-NR,CZERO,CONE, V(NR+1,NR+1),LDV ) END IF * - CALL ZUNMLQ( 'Left', 'C', N, N, NR, A, LDA, CWORK, + CALL ZUNMLQ( 'L', 'C', N, N, NR, A, LDA, CWORK, $ V, LDV, CWORK(N+1), LWORK-N, IERR ) * END IF +* .. permute the rows of V +* DO 8991 p = 1, N +* CALL ZCOPY( N, V(p,1), LDV, A(IWORK(p),1), LDA ) +* 8991 CONTINUE +* CALL ZLACPY( 'All', N, N, A, LDA, V, LDV ) + CALL ZLAPMR( .FALSE., N, N, V, LDV, IWORK ) +* + IF ( TRANSP ) THEN + CALL ZLACPY( 'A', N, N, V, LDV, U, LDU ) + END IF * - DO 8991 p = 1, N - CALL ZCOPY( N, V(p,1), LDV, A(IWORK(p),1), LDA ) - 8991 CONTINUE - CALL ZLACPY( 'All', N, N, A, LDA, V, LDV ) + ELSE IF ( JRACC .AND. (.NOT. LSVEC) .AND. ( NR.EQ. N ) ) THEN +* + CALL ZLASET( 'L', N-1,N-1, CZERO, CZERO, A(2,1), LDA ) * - IF ( TRANSP ) THEN - CALL ZLACPY( 'All', N, N, V, LDV, U, LDU ) - END IF + CALL ZGESVJ( 'U','N','V', N, N, A, LDA, SVA, N, V, LDV, + $ CWORK, LWORK, RWORK, LRWORK, INFO ) + SCALEM = RWORK(1) + NUMRANK = NINT(RWORK(2)) + CALL ZLAPMR( .FALSE., N, N, V, LDV, IWORK ) * ELSE IF ( LSVEC .AND. ( .NOT. RSVEC ) ) THEN * @@ -1266,18 +1618,18 @@ CALL ZCOPY( N-p+1, A(p,p), LDA, U(p,p), 1 ) CALL ZLACGV( N-p+1, U(p,p), 1 ) 1965 CONTINUE - CALL ZLASET( 'Upper', NR-1, NR-1, CZERO, CZERO, U(1,2), LDU ) + CALL ZLASET( 'U', NR-1, NR-1, CZERO, CZERO, U(1,2), LDU ) * CALL ZGEQRF( N, NR, U, LDU, CWORK(N+1), CWORK(2*N+1), $ LWORK-2*N, IERR ) * DO 1967 p = 1, NR - 1 CALL ZCOPY( NR-p, U(p,p+1), LDU, U(p+1,p), 1 ) - CALL ZLACGV( N-p+1, U(p,p), 1 ) + CALL ZLACGV( N-p+1, U(p,p), 1 ) 1967 CONTINUE - CALL ZLASET( 'Upper', NR-1, NR-1, CZERO, CZERO, U(1,2), LDU ) + CALL ZLASET( 'U', NR-1, NR-1, CZERO, CZERO, U(1,2), LDU ) * - CALL ZGESVJ( 'Lower', 'U', 'N', NR,NR, U, LDU, SVA, NR, A, + CALL ZGESVJ( 'L', 'U', 'N', NR,NR, U, LDU, SVA, NR, A, $ LDA, CWORK(N+1), LWORK-N, RWORK, LRWORK, INFO ) SCALEM = RWORK(1) NUMRANK = NINT(RWORK(2)) @@ -1290,11 +1642,11 @@ END IF END IF * - CALL ZUNMQR( 'Left', 'No Tr', M, N1, N, A, LDA, CWORK, U, + CALL ZUNMQR( 'L', 'N', M, N1, N, A, LDA, CWORK, U, $ LDU, CWORK(N+1), LWORK-N, IERR ) * IF ( ROWPIV ) - $ CALL ZLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 ) + $ CALL ZLASWP( N1, U, LDU, 1, M-1, IWORK(IWOFF+1), -1 ) * DO 1974 p = 1, N1 XSC = ONE / DZNRM2( M, U(1,p), 1 ) @@ -1302,7 +1654,7 @@ 1974 CONTINUE * IF ( TRANSP ) THEN - CALL ZLACPY( 'All', N, N, U, LDU, V, LDV ) + CALL ZLACPY( 'A', N, N, U, LDU, V, LDV ) END IF * ELSE @@ -1338,14 +1690,14 @@ * transposed copy above. * IF ( L2PERT ) THEN - XSC = DSQRT(SMALL) + XSC = SQRT(SMALL) DO 2969 q = 1, NR CTEMP = DCMPLX(XSC*ABS( V(q,q) ),ZERO) DO 2968 p = 1, N IF ( ( p .GT. q ) .AND. ( ABS(V(p,q)) .LE. TEMP1 ) $ .OR. ( p .LT. q ) ) * $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) ) - $ V(p,q) = CTEMP + $ V(p,q) = CTEMP IF ( p .LT. q ) V(p,q) = - V(p,q) 2968 CONTINUE 2969 CONTINUE @@ -1362,15 +1714,15 @@ TEMP1 = DZNRM2(NR-p+1,CWORK(2*N+(p-1)*NR+p),1) CALL ZDSCAL(NR-p+1,ONE/TEMP1,CWORK(2*N+(p-1)*NR+p),1) 3950 CONTINUE - CALL ZPOCON('Lower',NR,CWORK(2*N+1),NR,ONE,TEMP1, + CALL ZPOCON('L',NR,CWORK(2*N+1),NR,ONE,TEMP1, $ CWORK(2*N+NR*NR+1),RWORK,IERR) - CONDR1 = ONE / DSQRT(TEMP1) -* .. here need a second oppinion on the condition number + CONDR1 = ONE / SQRT(TEMP1) +* .. here need a second opinion on the condition number * .. then assume worst case scenario * R1 is OK for inverse <=> CONDR1 .LT. DBLE(N) * more conservative <=> CONDR1 .LT. SQRT(DBLE(N)) * - COND_OK = DSQRT(DSQRT(DBLE(NR))) + COND_OK = SQRT(SQRT(DBLE(NR))) *[TP] COND_OK is a tuning parameter. * IF ( CONDR1 .LT. COND_OK ) THEN @@ -1382,14 +1734,14 @@ $ LWORK-2*N, IERR ) * IF ( L2PERT ) THEN - XSC = DSQRT(SMALL)/EPSLN + XSC = SQRT(SMALL)/EPSLN DO 3959 p = 2, NR DO 3958 q = 1, p - 1 - CTEMP=DCMPLX(XSC*DMIN1(ABS(V(p,p)),ABS(V(q,q))), + CTEMP=DCMPLX(XSC*MIN(ABS(V(p,p)),ABS(V(q,q))), $ ZERO) IF ( ABS(V(q,p)) .LE. TEMP1 ) * $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) ) - $ V(q,p) = CTEMP + $ V(q,p) = CTEMP 3958 CONTINUE 3959 CONTINUE END IF @@ -1403,14 +1755,14 @@ CALL ZCOPY( NR-p, V(p,p+1), LDV, V(p+1,p), 1 ) CALL ZLACGV(NR-p+1, V(p,p), 1 ) 1969 CONTINUE - V(NR,NR)=DCONJG(V(NR,NR)) + V(NR,NR)=CONJG(V(NR,NR)) * CONDR2 = CONDR1 * ELSE * * .. ill-conditioned case: second QRF with pivoting -* Note that windowed pivoting would be equaly good +* Note that windowed pivoting would be equally good * numerically, and more run-time efficient. So, in * an optimal implementation, the next call to ZGEQP3 * should be replaced with eg. CALL ZGEQPX (ACM TOMS #782) @@ -1425,14 +1777,14 @@ ** CALL ZGEQRF( N, NR, V, LDV, CWORK(N+1), CWORK(2*N+1), ** $ LWORK-2*N, IERR ) IF ( L2PERT ) THEN - XSC = DSQRT(SMALL) + XSC = SQRT(SMALL) DO 3969 p = 2, NR DO 3968 q = 1, p - 1 - CTEMP=DCMPLX(XSC*DMIN1(ABS(V(p,p)),ABS(V(q,q))), + CTEMP=DCMPLX(XSC*MIN(ABS(V(p,p)),ABS(V(q,q))), $ ZERO) IF ( ABS(V(q,p)) .LE. TEMP1 ) * $ V(q,p) = TEMP1 * ( V(q,p) / ABS(V(q,p)) ) - $ V(q,p) = CTEMP + $ V(q,p) = CTEMP 3968 CONTINUE 3969 CONTINUE END IF @@ -1440,13 +1792,13 @@ CALL ZLACPY( 'A', N, NR, V, LDV, CWORK(2*N+1), N ) * IF ( L2PERT ) THEN - XSC = DSQRT(SMALL) + XSC = SQRT(SMALL) DO 8970 p = 2, NR DO 8971 q = 1, p - 1 - CTEMP=DCMPLX(XSC*DMIN1(ABS(V(p,p)),ABS(V(q,q))), + CTEMP=DCMPLX(XSC*MIN(ABS(V(p,p)),ABS(V(q,q))), $ ZERO) * V(p,q) = - TEMP1*( V(q,p) / ABS(V(q,p)) ) - V(p,q) = - CTEMP + V(p,q) = - CTEMP 8971 CONTINUE 8970 CONTINUE ELSE @@ -1462,13 +1814,13 @@ CALL ZDSCAL( p, ONE/TEMP1, CWORK(2*N+N*NR+NR+p), NR ) 4950 CONTINUE CALL ZPOCON( 'L',NR,CWORK(2*N+N*NR+NR+1),NR,ONE,TEMP1, - $ CWORK(2*N+N*NR+NR+NR*NR+1),RWORK,IERR ) - CONDR2 = ONE / DSQRT(TEMP1) + $ CWORK(2*N+N*NR+NR+NR*NR+1),RWORK,IERR ) + CONDR2 = ONE / SQRT(TEMP1) * * IF ( CONDR2 .GE. COND_OK ) THEN * .. save the Householder vectors used for Q3 -* (this overwrittes the copy of R2, as it will not be +* (this overwrites the copy of R2, as it will not be * needed in this branch, but it does not overwritte the * Huseholder vectors of Q2.). CALL ZLACPY( 'U', NR, NR, V, LDV, CWORK(2*N+1), N ) @@ -1479,7 +1831,7 @@ END IF * IF ( L2PERT ) THEN - XSC = DSQRT(SMALL) + XSC = SQRT(SMALL) DO 4968 q = 2, NR CTEMP = XSC * V(q,q) DO 4969 p = 1, q - 1 @@ -1540,8 +1892,8 @@ * the lower triangular L3 from the LQ factorization of * R2=L3*Q3), pre-multiplied with the transposed Q3. CALL ZGESVJ( 'L', 'U', 'N', NR, NR, V, LDV, SVA, NR, U, - $ LDU, CWORK(2*N+N*NR+NR+1), LWORK-2*N-N*NR-NR, - $ RWORK, LRWORK, INFO ) + $ LDU, CWORK(2*N+N*NR+NR+1), LWORK-2*N-N*NR-NR, + $ RWORK, LRWORK, INFO ) SCALEM = RWORK(1) NUMRANK = NINT(RWORK(2)) DO 3870 p = 1, NR @@ -1579,7 +1931,7 @@ * Compute the full SVD of L3 using ZGESVJ with explicit * accumulation of Jacobi rotations. CALL ZGESVJ( 'L', 'U', 'V', NR, NR, V, LDV, SVA, NR, U, - $ LDU, CWORK(2*N+N*NR+NR+1), LWORK-2*N-N*NR-NR, + $ LDU, CWORK(2*N+N*NR+NR+1), LWORK-2*N-N*NR-NR, $ RWORK, LRWORK, INFO ) SCALEM = RWORK(1) NUMRANK = NINT(RWORK(2)) @@ -1609,7 +1961,7 @@ * first QRF. Also, scale the columns to make them unit in * Euclidean norm. This applies to all cases. * - TEMP1 = DSQRT(DBLE(N)) * EPSLN + TEMP1 = SQRT(DBLE(N)) * EPSLN DO 1972 q = 1, N DO 972 p = 1, N CWORK(2*N+N*NR+NR+IWORK(p)) = V(p,q) @@ -1635,11 +1987,11 @@ * The Q matrix from the first QRF is built into the left singular * matrix U. This applies to all cases. * - CALL ZUNMQR( 'Left', 'No_Tr', M, N1, N, A, LDA, CWORK, U, + CALL ZUNMQR( 'L', 'N', M, N1, N, A, LDA, CWORK, U, $ LDU, CWORK(N+1), LWORK-N, IERR ) * The columns of U are normalized. The cost is O(M*N) flops. - TEMP1 = DSQRT(DBLE(M)) * EPSLN + TEMP1 = SQRT(DBLE(M)) * EPSLN DO 1973 p = 1, NR XSC = ONE / DZNRM2( M, U(1,p), 1 ) IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) ) @@ -1650,30 +2002,30 @@ * singular vectors must be adjusted. * IF ( ROWPIV ) - $ CALL ZLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 ) + $ CALL ZLASWP( N1, U, LDU, 1, M-1, IWORK(IWOFF+1), -1 ) * ELSE * * .. the initial matrix A has almost orthogonal columns and * the second QRF is not needed * - CALL ZLACPY( 'Upper', N, N, A, LDA, CWORK(N+1), N ) + CALL ZLACPY( 'U', N, N, A, LDA, CWORK(N+1), N ) IF ( L2PERT ) THEN - XSC = DSQRT(SMALL) + XSC = SQRT(SMALL) DO 5970 p = 2, N CTEMP = XSC * CWORK( N + (p-1)*N + p ) DO 5971 q = 1, p - 1 * CWORK(N+(q-1)*N+p)=-TEMP1 * ( CWORK(N+(p-1)*N+q) / * $ ABS(CWORK(N+(p-1)*N+q)) ) - CWORK(N+(q-1)*N+p)=-CTEMP + CWORK(N+(q-1)*N+p)=-CTEMP 5971 CONTINUE 5970 CONTINUE ELSE - CALL ZLASET( 'Lower',N-1,N-1,CZERO,CZERO,CWORK(N+2),N ) + CALL ZLASET( 'L',N-1,N-1,CZERO,CZERO,CWORK(N+2),N ) END IF * - CALL ZGESVJ( 'Upper', 'U', 'N', N, N, CWORK(N+1), N, SVA, - $ N, U, LDU, CWORK(N+N*N+1), LWORK-N-N*N, RWORK, LRWORK, + CALL ZGESVJ( 'U', 'U', 'N', N, N, CWORK(N+1), N, SVA, + $ N, U, LDU, CWORK(N+N*N+1), LWORK-N-N*N, RWORK, LRWORK, $ INFO ) * SCALEM = RWORK(1) @@ -1683,12 +2035,12 @@ CALL ZDSCAL( N, SVA(p), CWORK(N+(p-1)*N+1), 1 ) 6970 CONTINUE * - CALL ZTRSM( 'Left', 'Upper', 'NoTrans', 'No UD', N, N, + CALL ZTRSM( 'L', 'U', 'N', 'N', N, N, $ CONE, A, LDA, CWORK(N+1), N ) DO 6972 p = 1, N CALL ZCOPY( N, CWORK(N+p), N, V(IWORK(p),1), LDV ) 6972 CONTINUE - TEMP1 = DSQRT(DBLE(N))*EPSLN + TEMP1 = SQRT(DBLE(N))*EPSLN DO 6971 p = 1, N XSC = ONE / DZNRM2( N, V(1,p), 1 ) IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) ) @@ -1704,9 +2056,9 @@ CALL ZLASET( 'A',M-N,N1-N, CZERO, CONE,U(N+1,N+1),LDU) END IF END IF - CALL ZUNMQR( 'Left', 'No Tr', M, N1, N, A, LDA, CWORK, U, + CALL ZUNMQR( 'L', 'N', M, N1, N, A, LDA, CWORK, U, $ LDU, CWORK(N+1), LWORK-N, IERR ) - TEMP1 = DSQRT(DBLE(M))*EPSLN + TEMP1 = SQRT(DBLE(M))*EPSLN DO 6973 p = 1, N1 XSC = ONE / DZNRM2( M, U(1,p), 1 ) IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) ) @@ -1714,7 +2066,7 @@ 6973 CONTINUE * IF ( ROWPIV ) - $ CALL ZLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 ) + $ CALL ZLASWP( N1, U, LDU, 1, M-1, IWORK(IWOFF+1), -1 ) * END IF * @@ -1724,13 +2076,14 @@ * * This branch deploys a preconditioned Jacobi SVD with explicitly * accumulated rotations. It is included as optional, mainly for -* experimental purposes. It does perfom well, and can also be used. +* experimental purposes. It does perform well, and can also be used. * In this implementation, this branch will be automatically activated * if the condition number sigma_max(A) / sigma_min(A) is predicted * to be greater than the overflow threshold. This is because the * a posteriori computation of the singular vectors assumes robust * implementation of BLAS and some LAPACK procedures, capable of working -* in presence of extreme values. Since that is not always the case, ... +* in presence of extreme values, e.g. when the singular values spread from +* the underflow to the overflow threshold. * DO 7968 p = 1, NR CALL ZCOPY( N-p+1, A(p,p), LDA, V(p,p), 1 ) @@ -1738,14 +2091,14 @@ 7968 CONTINUE * IF ( L2PERT ) THEN - XSC = DSQRT(SMALL/EPSLN) + XSC = SQRT(SMALL/EPSLN) DO 5969 q = 1, NR CTEMP = DCMPLX(XSC*ABS( V(q,q) ),ZERO) DO 5968 p = 1, N IF ( ( p .GT. q ) .AND. ( ABS(V(p,q)) .LE. TEMP1 ) $ .OR. ( p .LT. q ) ) * $ V(p,q) = TEMP1 * ( V(p,q) / ABS(V(p,q)) ) - $ V(p,q) = CTEMP + $ V(p,q) = CTEMP IF ( p .LT. q ) V(p,q) = - V(p,q) 5968 CONTINUE 5969 CONTINUE @@ -1763,13 +2116,13 @@ 7969 CONTINUE IF ( L2PERT ) THEN - XSC = DSQRT(SMALL/EPSLN) + XSC = SQRT(SMALL/EPSLN) DO 9970 q = 2, NR DO 9971 p = 1, q - 1 - CTEMP = DCMPLX(XSC * DMIN1(ABS(U(p,p)),ABS(U(q,q))), + CTEMP = DCMPLX(XSC * MIN(ABS(U(p,p)),ABS(U(q,q))), $ ZERO) * U(p,q) = - TEMP1 * ( U(q,p) / ABS(U(q,p)) ) - U(p,q) = - CTEMP + U(p,q) = - CTEMP 9971 CONTINUE 9970 CONTINUE ELSE @@ -1777,7 +2130,7 @@ END IF CALL ZGESVJ( 'L', 'U', 'V', NR, NR, U, LDU, SVA, - $ N, V, LDV, CWORK(2*N+N*NR+1), LWORK-2*N-N*NR, + $ N, V, LDV, CWORK(2*N+N*NR+1), LWORK-2*N-N*NR, $ RWORK, LRWORK, INFO ) SCALEM = RWORK(1) NUMRANK = NINT(RWORK(2)) @@ -1795,7 +2148,7 @@ * first QRF. Also, scale the columns to make them unit in * Euclidean norm. This applies to all cases. * - TEMP1 = DSQRT(DBLE(N)) * EPSLN + TEMP1 = SQRT(DBLE(N)) * EPSLN DO 7972 q = 1, N DO 8972 p = 1, N CWORK(2*N+N*NR+NR+IWORK(p)) = V(p,q) @@ -1819,11 +2172,11 @@ END IF END IF * - CALL ZUNMQR( 'Left', 'No Tr', M, N1, N, A, LDA, CWORK, U, + CALL ZUNMQR( 'L', 'N', M, N1, N, A, LDA, CWORK, U, $ LDU, CWORK(N+1), LWORK-N, IERR ) * IF ( ROWPIV ) - $ CALL ZLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 ) + $ CALL ZLASWP( N1, U, LDU, 1, M-1, IWORK(IWOFF+1), -1 ) * * END IF @@ -1866,6 +2219,12 @@ IWORK(1) = NR IWORK(2) = NUMRANK IWORK(3) = WARNING + IF ( TRANSP ) THEN + IWORK(4) = 1 + ELSE + IWORK(4) = -1 + END IF + * RETURN * ..