![]() ![]() | ![]() |
Mise à jour de lapack vers la version 3.3.0.
1: SUBROUTINE DGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V, 2: + LDV, WORK, LWORK, INFO ) 3: * 4: * -- LAPACK routine (version 3.3.0) -- 5: * 6: * -- Contributed by Zlatko Drmac of the University of Zagreb and -- 7: * -- Kresimir Veselic of the Fernuniversitaet Hagen -- 8: * November 2010 9: * 10: * -- LAPACK is a software package provided by Univ. of Tennessee, -- 11: * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..-- 12: * 13: * This routine is also part of SIGMA (version 1.23, October 23. 2008.) 14: * SIGMA is a library of algorithms for highly accurate algorithms for 15: * computation of SVD, PSVD, QSVD, (H,K)-SVD, and for solution of the 16: * eigenvalue problems Hx = lambda M x, H M x = lambda x with H, M > 0. 17: * 18: IMPLICIT NONE 19: * .. 20: * .. Scalar Arguments .. 21: INTEGER INFO, LDA, LDV, LWORK, M, MV, N 22: CHARACTER*1 JOBA, JOBU, JOBV 23: * .. 24: * .. Array Arguments .. 25: DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ), 26: + WORK( LWORK ) 27: * .. 28: * 29: * Purpose 30: * ======= 31: * 32: * DGESVJ computes the singular value decomposition (SVD) of a real 33: * M-by-N matrix A, where M >= N. The SVD of A is written as 34: * [++] [xx] [x0] [xx] 35: * A = U * SIGMA * V^t, [++] = [xx] * [ox] * [xx] 36: * [++] [xx] 37: * where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal 38: * matrix, and V is an N-by-N orthogonal matrix. The diagonal elements 39: * of SIGMA are the singular values of A. The columns of U and V are the 40: * left and the right singular vectors of A, respectively. 41: * 42: * Further Details 43: * ~~~~~~~~~~~~~~~ 44: * The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane 45: * rotations. The rotations are implemented as fast scaled rotations of 46: * Anda and Park [1]. In the case of underflow of the Jacobi angle, a 47: * modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses 48: * column interchanges of de Rijk [2]. The relative accuracy of the computed 49: * singular values and the accuracy of the computed singular vectors (in 50: * angle metric) is as guaranteed by the theory of Demmel and Veselic [3]. 51: * The condition number that determines the accuracy in the full rank case 52: * is essentially min_{D=diag} kappa(A*D), where kappa(.) is the 53: * spectral condition number. The best performance of this Jacobi SVD 54: * procedure is achieved if used in an accelerated version of Drmac and 55: * Veselic [5,6], and it is the kernel routine in the SIGMA library [7]. 56: * Some tunning parameters (marked with [TP]) are available for the 57: * implementer. 58: * The computational range for the nonzero singular values is the machine 59: * number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even 60: * denormalized singular values can be computed with the corresponding 61: * gradual loss of accurate digits. 62: * 63: * Contributors 64: * ~~~~~~~~~~~~ 65: * Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany) 66: * 67: * References 68: * ~~~~~~~~~~ 69: * [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling. 70: * SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174. 71: * [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the 72: * singular value decomposition on a vector computer. 73: * SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371. 74: * [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR. 75: * [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular 76: * value computation in floating point arithmetic. 77: * SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222. 78: * [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I. 79: * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342. 80: * LAPACK Working note 169. 81: * [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II. 82: * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362. 83: * LAPACK Working note 170. 84: * [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV, 85: * QSVD, (H,K)-SVD computations. 86: * Department of Mathematics, University of Zagreb, 2008. 87: * 88: * Bugs, Examples and Comments 89: * ~~~~~~~~~~~~~~~~~~~~~~~~~~~ 90: * Please report all bugs and send interesting test examples and comments to 91: * drmac@math.hr. Thank you. 92: * 93: * Arguments 94: * ========= 95: * 96: * JOBA (input) CHARACTER* 1 97: * Specifies the structure of A. 98: * = 'L': The input matrix A is lower triangular; 99: * = 'U': The input matrix A is upper triangular; 100: * = 'G': The input matrix A is general M-by-N matrix, M >= N. 101: * 102: * JOBU (input) CHARACTER*1 103: * Specifies whether to compute the left singular vectors 104: * (columns of U): 105: * = 'U': The left singular vectors corresponding to the nonzero 106: * singular values are computed and returned in the leading 107: * columns of A. See more details in the description of A. 108: * The default numerical orthogonality threshold is set to 109: * approximately TOL=CTOL*EPS, CTOL=DSQRT(M), EPS=DLAMCH('E'). 110: * = 'C': Analogous to JOBU='U', except that user can control the 111: * level of numerical orthogonality of the computed left 112: * singular vectors. TOL can be set to TOL = CTOL*EPS, where 113: * CTOL is given on input in the array WORK. 114: * No CTOL smaller than ONE is allowed. CTOL greater 115: * than 1 / EPS is meaningless. The option 'C' 116: * can be used if M*EPS is satisfactory orthogonality 117: * of the computed left singular vectors, so CTOL=M could 118: * save few sweeps of Jacobi rotations. 119: * See the descriptions of A and WORK(1). 120: * = 'N': The matrix U is not computed. However, see the 121: * description of A. 122: * 123: * JOBV (input) CHARACTER*1 124: * Specifies whether to compute the right singular vectors, that 125: * is, the matrix V: 126: * = 'V' : the matrix V is computed and returned in the array V 127: * = 'A' : the Jacobi rotations are applied to the MV-by-N 128: * array V. In other words, the right singular vector 129: * matrix V is not computed explicitly, instead it is 130: * applied to an MV-by-N matrix initially stored in the 131: * first MV rows of V. 132: * = 'N' : the matrix V is not computed and the array V is not 133: * referenced 134: * 135: * M (input) INTEGER 136: * The number of rows of the input matrix A. M >= 0. 137: * 138: * N (input) INTEGER 139: * The number of columns of the input matrix A. 140: * M >= N >= 0. 141: * 142: * A (input/output) DOUBLE PRECISION array, dimension (LDA,N) 143: * On entry, the M-by-N matrix A. 144: * On exit : 145: * If JOBU .EQ. 'U' .OR. JOBU .EQ. 'C' : 146: * If INFO .EQ. 0 : 147: * RANKA orthonormal columns of U are returned in the 148: * leading RANKA columns of the array A. Here RANKA <= N 149: * is the number of computed singular values of A that are 150: * above the underflow threshold DLAMCH('S'). The singular 151: * vectors corresponding to underflowed or zero singular 152: * values are not computed. The value of RANKA is returned 153: * in the array WORK as RANKA=NINT(WORK(2)). Also see the 154: * descriptions of SVA and WORK. The computed columns of U 155: * are mutually numerically orthogonal up to approximately 156: * TOL=DSQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU.EQ.'C'), 157: * see the description of JOBU. 158: * If INFO .GT. 0 : 159: * the procedure DGESVJ did not converge in the given number 160: * of iterations (sweeps). In that case, the computed 161: * columns of U may not be orthogonal up to TOL. The output 162: * U (stored in A), SIGMA (given by the computed singular 163: * values in SVA(1:N)) and V is still a decomposition of the 164: * input matrix A in the sense that the residual 165: * ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small. 166: * 167: * If JOBU .EQ. 'N' : 168: * If INFO .EQ. 0 : 169: * Note that the left singular vectors are 'for free' in the 170: * one-sided Jacobi SVD algorithm. However, if only the 171: * singular values are needed, the level of numerical 172: * orthogonality of U is not an issue and iterations are 173: * stopped when the columns of the iterated matrix are 174: * numerically orthogonal up to approximately M*EPS. Thus, 175: * on exit, A contains the columns of U scaled with the 176: * corresponding singular values. 177: * If INFO .GT. 0 : 178: * the procedure DGESVJ did not converge in the given number 179: * of iterations (sweeps). 180: * 181: * LDA (input) INTEGER 182: * The leading dimension of the array A. LDA >= max(1,M). 183: * 184: * SVA (workspace/output) DOUBLE PRECISION array, dimension (N) 185: * On exit : 186: * If INFO .EQ. 0 : 187: * depending on the value SCALE = WORK(1), we have: 188: * If SCALE .EQ. ONE : 189: * SVA(1:N) contains the computed singular values of A. 190: * During the computation SVA contains the Euclidean column 191: * norms of the iterated matrices in the array A. 192: * If SCALE .NE. ONE : 193: * The singular values of A are SCALE*SVA(1:N), and this 194: * factored representation is due to the fact that some of the 195: * singular values of A might underflow or overflow. 196: * If INFO .GT. 0 : 197: * the procedure DGESVJ did not converge in the given number of 198: * iterations (sweeps) and SCALE*SVA(1:N) may not be accurate. 199: * 200: * MV (input) INTEGER 201: * If JOBV .EQ. 'A', then the product of Jacobi rotations in DGESVJ 202: * is applied to the first MV rows of V. See the description of JOBV. 203: * 204: * V (input/output) DOUBLE PRECISION array, dimension (LDV,N) 205: * If JOBV = 'V', then V contains on exit the N-by-N matrix of 206: * the right singular vectors; 207: * If JOBV = 'A', then V contains the product of the computed right 208: * singular vector matrix and the initial matrix in 209: * the array V. 210: * If JOBV = 'N', then V is not referenced. 211: * 212: * LDV (input) INTEGER 213: * The leading dimension of the array V, LDV .GE. 1. 214: * If JOBV .EQ. 'V', then LDV .GE. max(1,N). 215: * If JOBV .EQ. 'A', then LDV .GE. max(1,MV) . 216: * 217: * WORK (input/workspace/output) DOUBLE PRECISION array, dimension max(4,M+N). 218: * On entry : 219: * If JOBU .EQ. 'C' : 220: * WORK(1) = CTOL, where CTOL defines the threshold for convergence. 221: * The process stops if all columns of A are mutually 222: * orthogonal up to CTOL*EPS, EPS=DLAMCH('E'). 223: * It is required that CTOL >= ONE, i.e. it is not 224: * allowed to force the routine to obtain orthogonality 225: * below EPS. 226: * On exit : 227: * WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N) 228: * are the computed singular values of A. 229: * (See description of SVA().) 230: * WORK(2) = NINT(WORK(2)) is the number of the computed nonzero 231: * singular values. 232: * WORK(3) = NINT(WORK(3)) is the number of the computed singular 233: * values that are larger than the underflow threshold. 234: * WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi 235: * rotations needed for numerical convergence. 236: * WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep. 237: * This is useful information in cases when DGESVJ did 238: * not converge, as it can be used to estimate whether 239: * the output is stil useful and for post festum analysis. 240: * WORK(6) = the largest absolute value over all sines of the 241: * Jacobi rotation angles in the last sweep. It can be 242: * useful for a post festum analysis. 243: * 244: * LWORK (input) INTEGER 245: * length of WORK, WORK >= MAX(6,M+N) 246: * 247: * INFO (output) INTEGER 248: * = 0 : successful exit. 249: * < 0 : if INFO = -i, then the i-th argument had an illegal value 250: * > 0 : DGESVJ did not converge in the maximal allowed number (30) 251: * of sweeps. The output may still be useful. See the 252: * description of WORK. 253: * 254: * ===================================================================== 255: * 256: * .. Local Parameters .. 257: DOUBLE PRECISION ZERO, HALF, ONE, TWO 258: PARAMETER ( ZERO = 0.0D0, HALF = 0.5D0, ONE = 1.0D0, 259: + TWO = 2.0D0 ) 260: INTEGER NSWEEP 261: PARAMETER ( NSWEEP = 30 ) 262: * .. 263: * .. Local Scalars .. 264: DOUBLE PRECISION AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG, 265: + BIGTHETA, CS, CTOL, EPSLN, LARGE, MXAAPQ, 266: + MXSINJ, ROOTBIG, ROOTEPS, ROOTSFMIN, ROOTTOL, 267: + SKL, SFMIN, SMALL, SN, T, TEMP1, THETA, 268: + THSIGN, TOL 269: INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1, 270: + ISWROT, jbc, jgl, KBL, LKAHEAD, MVL, N2, N34, 271: + N4, NBL, NOTROT, p, PSKIPPED, q, ROWSKIP, 272: + SWBAND 273: LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK, 274: + RSVEC, UCTOL, UPPER 275: * .. 276: * .. Local Arrays .. 277: DOUBLE PRECISION FASTR( 5 ) 278: * .. 279: * .. Intrinsic Functions .. 280: INTRINSIC DABS, DMAX1, DMIN1, DBLE, MIN0, DSIGN, DSQRT 281: * .. 282: * .. External Functions .. 283: * .. 284: * from BLAS 285: DOUBLE PRECISION DDOT, DNRM2 286: EXTERNAL DDOT, DNRM2 287: INTEGER IDAMAX 288: EXTERNAL IDAMAX 289: * from LAPACK 290: DOUBLE PRECISION DLAMCH 291: EXTERNAL DLAMCH 292: LOGICAL LSAME 293: EXTERNAL LSAME 294: * .. 295: * .. External Subroutines .. 296: * .. 297: * from BLAS 298: EXTERNAL DAXPY, DCOPY, DROTM, DSCAL, DSWAP 299: * from LAPACK 300: EXTERNAL DLASCL, DLASET, DLASSQ, XERBLA 301: * 302: EXTERNAL DGSVJ0, DGSVJ1 303: * .. 304: * .. Executable Statements .. 305: * 306: * Test the input arguments 307: * 308: LSVEC = LSAME( JOBU, 'U' ) 309: UCTOL = LSAME( JOBU, 'C' ) 310: RSVEC = LSAME( JOBV, 'V' ) 311: APPLV = LSAME( JOBV, 'A' ) 312: UPPER = LSAME( JOBA, 'U' ) 313: LOWER = LSAME( JOBA, 'L' ) 314: * 315: IF( .NOT.( UPPER .OR. LOWER .OR. LSAME( JOBA, 'G' ) ) ) THEN 316: INFO = -1 317: ELSE IF( .NOT.( LSVEC .OR. UCTOL .OR. LSAME( JOBU, 'N' ) ) ) THEN 318: INFO = -2 319: ELSE IF( .NOT.( RSVEC .OR. APPLV .OR. LSAME( JOBV, 'N' ) ) ) THEN 320: INFO = -3 321: ELSE IF( M.LT.0 ) THEN 322: INFO = -4 323: ELSE IF( ( N.LT.0 ) .OR. ( N.GT.M ) ) THEN 324: INFO = -5 325: ELSE IF( LDA.LT.M ) THEN 326: INFO = -7 327: ELSE IF( MV.LT.0 ) THEN 328: INFO = -9 329: ELSE IF( ( RSVEC .AND. ( LDV.LT.N ) ) .OR. 330: + ( APPLV .AND. ( LDV.LT.MV ) ) ) THEN 331: INFO = -11 332: ELSE IF( UCTOL .AND. ( WORK( 1 ).LE.ONE ) ) THEN 333: INFO = -12 334: ELSE IF( LWORK.LT.MAX0( M+N, 6 ) ) THEN 335: INFO = -13 336: ELSE 337: INFO = 0 338: END IF 339: * 340: * #:( 341: IF( INFO.NE.0 ) THEN 342: CALL XERBLA( 'DGESVJ', -INFO ) 343: RETURN 344: END IF 345: * 346: * #:) Quick return for void matrix 347: * 348: IF( ( M.EQ.0 ) .OR. ( N.EQ.0 ) )RETURN 349: * 350: * Set numerical parameters 351: * The stopping criterion for Jacobi rotations is 352: * 353: * max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS 354: * 355: * where EPS is the round-off and CTOL is defined as follows: 356: * 357: IF( UCTOL ) THEN 358: * ... user controlled 359: CTOL = WORK( 1 ) 360: ELSE 361: * ... default 362: IF( LSVEC .OR. RSVEC .OR. APPLV ) THEN 363: CTOL = DSQRT( DBLE( M ) ) 364: ELSE 365: CTOL = DBLE( M ) 366: END IF 367: END IF 368: * ... and the machine dependent parameters are 369: *[!] (Make sure that DLAMCH() works properly on the target machine.) 370: * 371: EPSLN = DLAMCH( 'Epsilon' ) 372: ROOTEPS = DSQRT( EPSLN ) 373: SFMIN = DLAMCH( 'SafeMinimum' ) 374: ROOTSFMIN = DSQRT( SFMIN ) 375: SMALL = SFMIN / EPSLN 376: BIG = DLAMCH( 'Overflow' ) 377: * BIG = ONE / SFMIN 378: ROOTBIG = ONE / ROOTSFMIN 379: LARGE = BIG / DSQRT( DBLE( M*N ) ) 380: BIGTHETA = ONE / ROOTEPS 381: * 382: TOL = CTOL*EPSLN 383: ROOTTOL = DSQRT( TOL ) 384: * 385: IF( DBLE( M )*EPSLN.GE.ONE ) THEN 386: INFO = -5 387: CALL XERBLA( 'DGESVJ', -INFO ) 388: RETURN 389: END IF 390: * 391: * Initialize the right singular vector matrix. 392: * 393: IF( RSVEC ) THEN 394: MVL = N 395: CALL DLASET( 'A', MVL, N, ZERO, ONE, V, LDV ) 396: ELSE IF( APPLV ) THEN 397: MVL = MV 398: END IF 399: RSVEC = RSVEC .OR. APPLV 400: * 401: * Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N ) 402: *(!) If necessary, scale A to protect the largest singular value 403: * from overflow. It is possible that saving the largest singular 404: * value destroys the information about the small ones. 405: * This initial scaling is almost minimal in the sense that the 406: * goal is to make sure that no column norm overflows, and that 407: * DSQRT(N)*max_i SVA(i) does not overflow. If INFinite entries 408: * in A are detected, the procedure returns with INFO=-6. 409: * 410: SKL= ONE / DSQRT( DBLE( M )*DBLE( N ) ) 411: NOSCALE = .TRUE. 412: GOSCALE = .TRUE. 413: * 414: IF( LOWER ) THEN 415: * the input matrix is M-by-N lower triangular (trapezoidal) 416: DO 1874 p = 1, N 417: AAPP = ZERO 418: AAQQ = ONE 419: CALL DLASSQ( M-p+1, A( p, p ), 1, AAPP, AAQQ ) 420: IF( AAPP.GT.BIG ) THEN 421: INFO = -6 422: CALL XERBLA( 'DGESVJ', -INFO ) 423: RETURN 424: END IF 425: AAQQ = DSQRT( AAQQ ) 426: IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN 427: SVA( p ) = AAPP*AAQQ 428: ELSE 429: NOSCALE = .FALSE. 430: SVA( p ) = AAPP*( AAQQ*SKL) 431: IF( GOSCALE ) THEN 432: GOSCALE = .FALSE. 433: DO 1873 q = 1, p - 1 434: SVA( q ) = SVA( q )*SKL 435: 1873 CONTINUE 436: END IF 437: END IF 438: 1874 CONTINUE 439: ELSE IF( UPPER ) THEN 440: * the input matrix is M-by-N upper triangular (trapezoidal) 441: DO 2874 p = 1, N 442: AAPP = ZERO 443: AAQQ = ONE 444: CALL DLASSQ( p, A( 1, p ), 1, AAPP, AAQQ ) 445: IF( AAPP.GT.BIG ) THEN 446: INFO = -6 447: CALL XERBLA( 'DGESVJ', -INFO ) 448: RETURN 449: END IF 450: AAQQ = DSQRT( AAQQ ) 451: IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN 452: SVA( p ) = AAPP*AAQQ 453: ELSE 454: NOSCALE = .FALSE. 455: SVA( p ) = AAPP*( AAQQ*SKL) 456: IF( GOSCALE ) THEN 457: GOSCALE = .FALSE. 458: DO 2873 q = 1, p - 1 459: SVA( q ) = SVA( q )*SKL 460: 2873 CONTINUE 461: END IF 462: END IF 463: 2874 CONTINUE 464: ELSE 465: * the input matrix is M-by-N general dense 466: DO 3874 p = 1, N 467: AAPP = ZERO 468: AAQQ = ONE 469: CALL DLASSQ( M, A( 1, p ), 1, AAPP, AAQQ ) 470: IF( AAPP.GT.BIG ) THEN 471: INFO = -6 472: CALL XERBLA( 'DGESVJ', -INFO ) 473: RETURN 474: END IF 475: AAQQ = DSQRT( AAQQ ) 476: IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN 477: SVA( p ) = AAPP*AAQQ 478: ELSE 479: NOSCALE = .FALSE. 480: SVA( p ) = AAPP*( AAQQ*SKL) 481: IF( GOSCALE ) THEN 482: GOSCALE = .FALSE. 483: DO 3873 q = 1, p - 1 484: SVA( q ) = SVA( q )*SKL 485: 3873 CONTINUE 486: END IF 487: END IF 488: 3874 CONTINUE 489: END IF 490: * 491: IF( NOSCALE )SKL= ONE 492: * 493: * Move the smaller part of the spectrum from the underflow threshold 494: *(!) Start by determining the position of the nonzero entries of the 495: * array SVA() relative to ( SFMIN, BIG ). 496: * 497: AAPP = ZERO 498: AAQQ = BIG 499: DO 4781 p = 1, N 500: IF( SVA( p ).NE.ZERO )AAQQ = DMIN1( AAQQ, SVA( p ) ) 501: AAPP = DMAX1( AAPP, SVA( p ) ) 502: 4781 CONTINUE 503: * 504: * #:) Quick return for zero matrix 505: * 506: IF( AAPP.EQ.ZERO ) THEN 507: IF( LSVEC )CALL DLASET( 'G', M, N, ZERO, ONE, A, LDA ) 508: WORK( 1 ) = ONE 509: WORK( 2 ) = ZERO 510: WORK( 3 ) = ZERO 511: WORK( 4 ) = ZERO 512: WORK( 5 ) = ZERO 513: WORK( 6 ) = ZERO 514: RETURN 515: END IF 516: * 517: * #:) Quick return for one-column matrix 518: * 519: IF( N.EQ.1 ) THEN 520: IF( LSVEC )CALL DLASCL( 'G', 0, 0, SVA( 1 ), SKL, M, 1, 521: + A( 1, 1 ), LDA, IERR ) 522: WORK( 1 ) = ONE / SKL 523: IF( SVA( 1 ).GE.SFMIN ) THEN 524: WORK( 2 ) = ONE 525: ELSE 526: WORK( 2 ) = ZERO 527: END IF 528: WORK( 3 ) = ZERO 529: WORK( 4 ) = ZERO 530: WORK( 5 ) = ZERO 531: WORK( 6 ) = ZERO 532: RETURN 533: END IF 534: * 535: * Protect small singular values from underflow, and try to 536: * avoid underflows/overflows in computing Jacobi rotations. 537: * 538: SN = DSQRT( SFMIN / EPSLN ) 539: TEMP1 = DSQRT( BIG / DBLE( N ) ) 540: IF( ( AAPP.LE.SN ) .OR. ( AAQQ.GE.TEMP1 ) .OR. 541: + ( ( SN.LE.AAQQ ) .AND. ( AAPP.LE.TEMP1 ) ) ) THEN 542: TEMP1 = DMIN1( BIG, TEMP1 / AAPP ) 543: * AAQQ = AAQQ*TEMP1 544: * AAPP = AAPP*TEMP1 545: ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.LE.TEMP1 ) ) THEN 546: TEMP1 = DMIN1( SN / AAQQ, BIG / ( AAPP*DSQRT( DBLE( N ) ) ) ) 547: * AAQQ = AAQQ*TEMP1 548: * AAPP = AAPP*TEMP1 549: ELSE IF( ( AAQQ.GE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN 550: TEMP1 = DMAX1( SN / AAQQ, TEMP1 / AAPP ) 551: * AAQQ = AAQQ*TEMP1 552: * AAPP = AAPP*TEMP1 553: ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN 554: TEMP1 = DMIN1( SN / AAQQ, BIG / ( DSQRT( DBLE( N ) )*AAPP ) ) 555: * AAQQ = AAQQ*TEMP1 556: * AAPP = AAPP*TEMP1 557: ELSE 558: TEMP1 = ONE 559: END IF 560: * 561: * Scale, if necessary 562: * 563: IF( TEMP1.NE.ONE ) THEN 564: CALL DLASCL( 'G', 0, 0, ONE, TEMP1, N, 1, SVA, N, IERR ) 565: END IF 566: SKL= TEMP1*SKL 567: IF( SKL.NE.ONE ) THEN 568: CALL DLASCL( JOBA, 0, 0, ONE, SKL, M, N, A, LDA, IERR ) 569: SKL= ONE / SKL 570: END IF 571: * 572: * Row-cyclic Jacobi SVD algorithm with column pivoting 573: * 574: EMPTSW = ( N*( N-1 ) ) / 2 575: NOTROT = 0 576: FASTR( 1 ) = ZERO 577: * 578: * A is represented in factored form A = A * diag(WORK), where diag(WORK) 579: * is initialized to identity. WORK is updated during fast scaled 580: * rotations. 581: * 582: DO 1868 q = 1, N 583: WORK( q ) = ONE 584: 1868 CONTINUE 585: * 586: * 587: SWBAND = 3 588: *[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective 589: * if DGESVJ is used as a computational routine in the preconditioned 590: * Jacobi SVD algorithm DGESVJ. For sweeps i=1:SWBAND the procedure 591: * works on pivots inside a band-like region around the diagonal. 592: * The boundaries are determined dynamically, based on the number of 593: * pivots above a threshold. 594: * 595: KBL = MIN0( 8, N ) 596: *[TP] KBL is a tuning parameter that defines the tile size in the 597: * tiling of the p-q loops of pivot pairs. In general, an optimal 598: * value of KBL depends on the matrix dimensions and on the 599: * parameters of the computer's memory. 600: * 601: NBL = N / KBL 602: IF( ( NBL*KBL ).NE.N )NBL = NBL + 1 603: * 604: BLSKIP = KBL**2 605: *[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL. 606: * 607: ROWSKIP = MIN0( 5, KBL ) 608: *[TP] ROWSKIP is a tuning parameter. 609: * 610: LKAHEAD = 1 611: *[TP] LKAHEAD is a tuning parameter. 612: * 613: * Quasi block transformations, using the lower (upper) triangular 614: * structure of the input matrix. The quasi-block-cycling usually 615: * invokes cubic convergence. Big part of this cycle is done inside 616: * canonical subspaces of dimensions less than M. 617: * 618: IF( ( LOWER .OR. UPPER ) .AND. ( N.GT.MAX0( 64, 4*KBL ) ) ) THEN 619: *[TP] The number of partition levels and the actual partition are 620: * tuning parameters. 621: N4 = N / 4 622: N2 = N / 2 623: N34 = 3*N4 624: IF( APPLV ) THEN 625: q = 0 626: ELSE 627: q = 1 628: END IF 629: * 630: IF( LOWER ) THEN 631: * 632: * This works very well on lower triangular matrices, in particular 633: * in the framework of the preconditioned Jacobi SVD (xGEJSV). 634: * The idea is simple: 635: * [+ 0 0 0] Note that Jacobi transformations of [0 0] 636: * [+ + 0 0] [0 0] 637: * [+ + x 0] actually work on [x 0] [x 0] 638: * [+ + x x] [x x]. [x x] 639: * 640: CALL DGSVJ0( JOBV, M-N34, N-N34, A( N34+1, N34+1 ), LDA, 641: + WORK( N34+1 ), SVA( N34+1 ), MVL, 642: + V( N34*q+1, N34+1 ), LDV, EPSLN, SFMIN, TOL, 643: + 2, WORK( N+1 ), LWORK-N, IERR ) 644: * 645: CALL DGSVJ0( JOBV, M-N2, N34-N2, A( N2+1, N2+1 ), LDA, 646: + WORK( N2+1 ), SVA( N2+1 ), MVL, 647: + V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 2, 648: + WORK( N+1 ), LWORK-N, IERR ) 649: * 650: CALL DGSVJ1( JOBV, M-N2, N-N2, N4, A( N2+1, N2+1 ), LDA, 651: + WORK( N2+1 ), SVA( N2+1 ), MVL, 652: + V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1, 653: + WORK( N+1 ), LWORK-N, IERR ) 654: * 655: CALL DGSVJ0( JOBV, M-N4, N2-N4, A( N4+1, N4+1 ), LDA, 656: + WORK( N4+1 ), SVA( N4+1 ), MVL, 657: + V( N4*q+1, N4+1 ), LDV, EPSLN, SFMIN, TOL, 1, 658: + WORK( N+1 ), LWORK-N, IERR ) 659: * 660: CALL DGSVJ0( JOBV, M, N4, A, LDA, WORK, SVA, MVL, V, LDV, 661: + EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N, 662: + IERR ) 663: * 664: CALL DGSVJ1( JOBV, M, N2, N4, A, LDA, WORK, SVA, MVL, V, 665: + LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ), 666: + LWORK-N, IERR ) 667: * 668: * 669: ELSE IF( UPPER ) THEN 670: * 671: * 672: CALL DGSVJ0( JOBV, N4, N4, A, LDA, WORK, SVA, MVL, V, LDV, 673: + EPSLN, SFMIN, TOL, 2, WORK( N+1 ), LWORK-N, 674: + IERR ) 675: * 676: CALL DGSVJ0( JOBV, N2, N4, A( 1, N4+1 ), LDA, WORK( N4+1 ), 677: + SVA( N4+1 ), MVL, V( N4*q+1, N4+1 ), LDV, 678: + EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N, 679: + IERR ) 680: * 681: CALL DGSVJ1( JOBV, N2, N2, N4, A, LDA, WORK, SVA, MVL, V, 682: + LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ), 683: + LWORK-N, IERR ) 684: * 685: CALL DGSVJ0( JOBV, N2+N4, N4, A( 1, N2+1 ), LDA, 686: + WORK( N2+1 ), SVA( N2+1 ), MVL, 687: + V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1, 688: + WORK( N+1 ), LWORK-N, IERR ) 689: 690: END IF 691: * 692: END IF 693: * 694: * .. Row-cyclic pivot strategy with de Rijk's pivoting .. 695: * 696: DO 1993 i = 1, NSWEEP 697: * 698: * .. go go go ... 699: * 700: MXAAPQ = ZERO 701: MXSINJ = ZERO 702: ISWROT = 0 703: * 704: NOTROT = 0 705: PSKIPPED = 0 706: * 707: * Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs 708: * 1 <= p < q <= N. This is the first step toward a blocked implementation 709: * of the rotations. New implementation, based on block transformations, 710: * is under development. 711: * 712: DO 2000 ibr = 1, NBL 713: * 714: igl = ( ibr-1 )*KBL + 1 715: * 716: DO 1002 ir1 = 0, MIN0( LKAHEAD, NBL-ibr ) 717: * 718: igl = igl + ir1*KBL 719: * 720: DO 2001 p = igl, MIN0( igl+KBL-1, N-1 ) 721: * 722: * .. de Rijk's pivoting 723: * 724: q = IDAMAX( N-p+1, SVA( p ), 1 ) + p - 1 725: IF( p.NE.q ) THEN 726: CALL DSWAP( M, A( 1, p ), 1, A( 1, q ), 1 ) 727: IF( RSVEC )CALL DSWAP( MVL, V( 1, p ), 1, 728: + V( 1, q ), 1 ) 729: TEMP1 = SVA( p ) 730: SVA( p ) = SVA( q ) 731: SVA( q ) = TEMP1 732: TEMP1 = WORK( p ) 733: WORK( p ) = WORK( q ) 734: WORK( q ) = TEMP1 735: END IF 736: * 737: IF( ir1.EQ.0 ) THEN 738: * 739: * Column norms are periodically updated by explicit 740: * norm computation. 741: * Caveat: 742: * Unfortunately, some BLAS implementations compute DNRM2(M,A(1,p),1) 743: * as DSQRT(DDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to 744: * overflow for ||A(:,p)||_2 > DSQRT(overflow_threshold), and to 745: * underflow for ||A(:,p)||_2 < DSQRT(underflow_threshold). 746: * Hence, DNRM2 cannot be trusted, not even in the case when 747: * the true norm is far from the under(over)flow boundaries. 748: * If properly implemented DNRM2 is available, the IF-THEN-ELSE 749: * below should read "AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)". 750: * 751: IF( ( SVA( p ).LT.ROOTBIG ) .AND. 752: + ( SVA( p ).GT.ROOTSFMIN ) ) THEN 753: SVA( p ) = DNRM2( M, A( 1, p ), 1 )*WORK( p ) 754: ELSE 755: TEMP1 = ZERO 756: AAPP = ONE 757: CALL DLASSQ( M, A( 1, p ), 1, TEMP1, AAPP ) 758: SVA( p ) = TEMP1*DSQRT( AAPP )*WORK( p ) 759: END IF 760: AAPP = SVA( p ) 761: ELSE 762: AAPP = SVA( p ) 763: END IF 764: * 765: IF( AAPP.GT.ZERO ) THEN 766: * 767: PSKIPPED = 0 768: * 769: DO 2002 q = p + 1, MIN0( igl+KBL-1, N ) 770: * 771: AAQQ = SVA( q ) 772: * 773: IF( AAQQ.GT.ZERO ) THEN 774: * 775: AAPP0 = AAPP 776: IF( AAQQ.GE.ONE ) THEN 777: ROTOK = ( SMALL*AAPP ).LE.AAQQ 778: IF( AAPP.LT.( BIG / AAQQ ) ) THEN 779: AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1, 780: + q ), 1 )*WORK( p )*WORK( q ) / 781: + AAQQ ) / AAPP 782: ELSE 783: CALL DCOPY( M, A( 1, p ), 1, 784: + WORK( N+1 ), 1 ) 785: CALL DLASCL( 'G', 0, 0, AAPP, 786: + WORK( p ), M, 1, 787: + WORK( N+1 ), LDA, IERR ) 788: AAPQ = DDOT( M, WORK( N+1 ), 1, 789: + A( 1, q ), 1 )*WORK( q ) / AAQQ 790: END IF 791: ELSE 792: ROTOK = AAPP.LE.( AAQQ / SMALL ) 793: IF( AAPP.GT.( SMALL / AAQQ ) ) THEN 794: AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1, 795: + q ), 1 )*WORK( p )*WORK( q ) / 796: + AAQQ ) / AAPP 797: ELSE 798: CALL DCOPY( M, A( 1, q ), 1, 799: + WORK( N+1 ), 1 ) 800: CALL DLASCL( 'G', 0, 0, AAQQ, 801: + WORK( q ), M, 1, 802: + WORK( N+1 ), LDA, IERR ) 803: AAPQ = DDOT( M, WORK( N+1 ), 1, 804: + A( 1, p ), 1 )*WORK( p ) / AAPP 805: END IF 806: END IF 807: * 808: MXAAPQ = DMAX1( MXAAPQ, DABS( AAPQ ) ) 809: * 810: * TO rotate or NOT to rotate, THAT is the question ... 811: * 812: IF( DABS( AAPQ ).GT.TOL ) THEN 813: * 814: * .. rotate 815: *[RTD] ROTATED = ROTATED + ONE 816: * 817: IF( ir1.EQ.0 ) THEN 818: NOTROT = 0 819: PSKIPPED = 0 820: ISWROT = ISWROT + 1 821: END IF 822: * 823: IF( ROTOK ) THEN 824: * 825: AQOAP = AAQQ / AAPP 826: APOAQ = AAPP / AAQQ 827: THETA = -HALF*DABS( AQOAP-APOAQ ) / 828: + AAPQ 829: * 830: IF( DABS( THETA ).GT.BIGTHETA ) THEN 831: * 832: T = HALF / THETA 833: FASTR( 3 ) = T*WORK( p ) / WORK( q ) 834: FASTR( 4 ) = -T*WORK( q ) / 835: + WORK( p ) 836: CALL DROTM( M, A( 1, p ), 1, 837: + A( 1, q ), 1, FASTR ) 838: IF( RSVEC )CALL DROTM( MVL, 839: + V( 1, p ), 1, 840: + V( 1, q ), 1, 841: + FASTR ) 842: SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO, 843: + ONE+T*APOAQ*AAPQ ) ) 844: AAPP = AAPP*DSQRT( DMAX1( ZERO, 845: + ONE-T*AQOAP*AAPQ ) ) 846: MXSINJ = DMAX1( MXSINJ, DABS( T ) ) 847: * 848: ELSE 849: * 850: * .. choose correct signum for THETA and rotate 851: * 852: THSIGN = -DSIGN( ONE, AAPQ ) 853: T = ONE / ( THETA+THSIGN* 854: + DSQRT( ONE+THETA*THETA ) ) 855: CS = DSQRT( ONE / ( ONE+T*T ) ) 856: SN = T*CS 857: * 858: MXSINJ = DMAX1( MXSINJ, DABS( SN ) ) 859: SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO, 860: + ONE+T*APOAQ*AAPQ ) ) 861: AAPP = AAPP*DSQRT( DMAX1( ZERO, 862: + ONE-T*AQOAP*AAPQ ) ) 863: * 864: APOAQ = WORK( p ) / WORK( q ) 865: AQOAP = WORK( q ) / WORK( p ) 866: IF( WORK( p ).GE.ONE ) THEN 867: IF( WORK( q ).GE.ONE ) THEN 868: FASTR( 3 ) = T*APOAQ 869: FASTR( 4 ) = -T*AQOAP 870: WORK( p ) = WORK( p )*CS 871: WORK( q ) = WORK( q )*CS 872: CALL DROTM( M, A( 1, p ), 1, 873: + A( 1, q ), 1, 874: + FASTR ) 875: IF( RSVEC )CALL DROTM( MVL, 876: + V( 1, p ), 1, V( 1, q ), 877: + 1, FASTR ) 878: ELSE 879: CALL DAXPY( M, -T*AQOAP, 880: + A( 1, q ), 1, 881: + A( 1, p ), 1 ) 882: CALL DAXPY( M, CS*SN*APOAQ, 883: + A( 1, p ), 1, 884: + A( 1, q ), 1 ) 885: WORK( p ) = WORK( p )*CS 886: WORK( q ) = WORK( q ) / CS 887: IF( RSVEC ) THEN 888: CALL DAXPY( MVL, -T*AQOAP, 889: + V( 1, q ), 1, 890: + V( 1, p ), 1 ) 891: CALL DAXPY( MVL, 892: + CS*SN*APOAQ, 893: + V( 1, p ), 1, 894: + V( 1, q ), 1 ) 895: END IF 896: END IF 897: ELSE 898: IF( WORK( q ).GE.ONE ) THEN 899: CALL DAXPY( M, T*APOAQ, 900: + A( 1, p ), 1, 901: + A( 1, q ), 1 ) 902: CALL DAXPY( M, -CS*SN*AQOAP, 903: + A( 1, q ), 1, 904: + A( 1, p ), 1 ) 905: WORK( p ) = WORK( p ) / CS 906: WORK( q ) = WORK( q )*CS 907: IF( RSVEC ) THEN 908: CALL DAXPY( MVL, T*APOAQ, 909: + V( 1, p ), 1, 910: + V( 1, q ), 1 ) 911: CALL DAXPY( MVL, 912: + -CS*SN*AQOAP, 913: + V( 1, q ), 1, 914: + V( 1, p ), 1 ) 915: END IF 916: ELSE 917: IF( WORK( p ).GE.WORK( q ) ) 918: + THEN 919: CALL DAXPY( M, -T*AQOAP, 920: + A( 1, q ), 1, 921: + A( 1, p ), 1 ) 922: CALL DAXPY( M, CS*SN*APOAQ, 923: + A( 1, p ), 1, 924: + A( 1, q ), 1 ) 925: WORK( p ) = WORK( p )*CS 926: WORK( q ) = WORK( q ) / CS 927: IF( RSVEC ) THEN 928: CALL DAXPY( MVL, 929: + -T*AQOAP, 930: + V( 1, q ), 1, 931: + V( 1, p ), 1 ) 932: CALL DAXPY( MVL, 933: + CS*SN*APOAQ, 934: + V( 1, p ), 1, 935: + V( 1, q ), 1 ) 936: END IF 937: ELSE 938: CALL DAXPY( M, T*APOAQ, 939: + A( 1, p ), 1, 940: + A( 1, q ), 1 ) 941: CALL DAXPY( M, 942: + -CS*SN*AQOAP, 943: + A( 1, q ), 1, 944: + A( 1, p ), 1 ) 945: WORK( p ) = WORK( p ) / CS 946: WORK( q ) = WORK( q )*CS 947: IF( RSVEC ) THEN 948: CALL DAXPY( MVL, 949: + T*APOAQ, V( 1, p ), 950: + 1, V( 1, q ), 1 ) 951: CALL DAXPY( MVL, 952: + -CS*SN*AQOAP, 953: + V( 1, q ), 1, 954: + V( 1, p ), 1 ) 955: END IF 956: END IF 957: END IF 958: END IF 959: END IF 960: * 961: ELSE 962: * .. have to use modified Gram-Schmidt like transformation 963: CALL DCOPY( M, A( 1, p ), 1, 964: + WORK( N+1 ), 1 ) 965: CALL DLASCL( 'G', 0, 0, AAPP, ONE, M, 966: + 1, WORK( N+1 ), LDA, 967: + IERR ) 968: CALL DLASCL( 'G', 0, 0, AAQQ, ONE, M, 969: + 1, A( 1, q ), LDA, IERR ) 970: TEMP1 = -AAPQ*WORK( p ) / WORK( q ) 971: CALL DAXPY( M, TEMP1, WORK( N+1 ), 1, 972: + A( 1, q ), 1 ) 973: CALL DLASCL( 'G', 0, 0, ONE, AAQQ, M, 974: + 1, A( 1, q ), LDA, IERR ) 975: SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO, 976: + ONE-AAPQ*AAPQ ) ) 977: MXSINJ = DMAX1( MXSINJ, SFMIN ) 978: END IF 979: * END IF ROTOK THEN ... ELSE 980: * 981: * In the case of cancellation in updating SVA(q), SVA(p) 982: * recompute SVA(q), SVA(p). 983: * 984: IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS ) 985: + THEN 986: IF( ( AAQQ.LT.ROOTBIG ) .AND. 987: + ( AAQQ.GT.ROOTSFMIN ) ) THEN 988: SVA( q ) = DNRM2( M, A( 1, q ), 1 )* 989: + WORK( q ) 990: ELSE 991: T = ZERO 992: AAQQ = ONE 993: CALL DLASSQ( M, A( 1, q ), 1, T, 994: + AAQQ ) 995: SVA( q ) = T*DSQRT( AAQQ )*WORK( q ) 996: END IF 997: END IF 998: IF( ( AAPP / AAPP0 ).LE.ROOTEPS ) THEN 999: IF( ( AAPP.LT.ROOTBIG ) .AND. 1000: + ( AAPP.GT.ROOTSFMIN ) ) THEN 1001: AAPP = DNRM2( M, A( 1, p ), 1 )* 1002: + WORK( p ) 1003: ELSE 1004: T = ZERO 1005: AAPP = ONE 1006: CALL DLASSQ( M, A( 1, p ), 1, T, 1007: + AAPP ) 1008: AAPP = T*DSQRT( AAPP )*WORK( p ) 1009: END IF 1010: SVA( p ) = AAPP 1011: END IF 1012: * 1013: ELSE 1014: * A(:,p) and A(:,q) already numerically orthogonal 1015: IF( ir1.EQ.0 )NOTROT = NOTROT + 1 1016: *[RTD] SKIPPED = SKIPPED + 1 1017: PSKIPPED = PSKIPPED + 1 1018: END IF 1019: ELSE 1020: * A(:,q) is zero column 1021: IF( ir1.EQ.0 )NOTROT = NOTROT + 1 1022: PSKIPPED = PSKIPPED + 1 1023: END IF 1024: * 1025: IF( ( i.LE.SWBAND ) .AND. 1026: + ( PSKIPPED.GT.ROWSKIP ) ) THEN 1027: IF( ir1.EQ.0 )AAPP = -AAPP 1028: NOTROT = 0 1029: GO TO 2103 1030: END IF 1031: * 1032: 2002 CONTINUE 1033: * END q-LOOP 1034: * 1035: 2103 CONTINUE 1036: * bailed out of q-loop 1037: * 1038: SVA( p ) = AAPP 1039: * 1040: ELSE 1041: SVA( p ) = AAPP 1042: IF( ( ir1.EQ.0 ) .AND. ( AAPP.EQ.ZERO ) ) 1043: + NOTROT = NOTROT + MIN0( igl+KBL-1, N ) - p 1044: END IF 1045: * 1046: 2001 CONTINUE 1047: * end of the p-loop 1048: * end of doing the block ( ibr, ibr ) 1049: 1002 CONTINUE 1050: * end of ir1-loop 1051: * 1052: * ... go to the off diagonal blocks 1053: * 1054: igl = ( ibr-1 )*KBL + 1 1055: * 1056: DO 2010 jbc = ibr + 1, NBL 1057: * 1058: jgl = ( jbc-1 )*KBL + 1 1059: * 1060: * doing the block at ( ibr, jbc ) 1061: * 1062: IJBLSK = 0 1063: DO 2100 p = igl, MIN0( igl+KBL-1, N ) 1064: * 1065: AAPP = SVA( p ) 1066: IF( AAPP.GT.ZERO ) THEN 1067: * 1068: PSKIPPED = 0 1069: * 1070: DO 2200 q = jgl, MIN0( jgl+KBL-1, N ) 1071: * 1072: AAQQ = SVA( q ) 1073: IF( AAQQ.GT.ZERO ) THEN 1074: AAPP0 = AAPP 1075: * 1076: * .. M x 2 Jacobi SVD .. 1077: * 1078: * Safe Gram matrix computation 1079: * 1080: IF( AAQQ.GE.ONE ) THEN 1081: IF( AAPP.GE.AAQQ ) THEN 1082: ROTOK = ( SMALL*AAPP ).LE.AAQQ 1083: ELSE 1084: ROTOK = ( SMALL*AAQQ ).LE.AAPP 1085: END IF 1086: IF( AAPP.LT.( BIG / AAQQ ) ) THEN 1087: AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1, 1088: + q ), 1 )*WORK( p )*WORK( q ) / 1089: + AAQQ ) / AAPP 1090: ELSE 1091: CALL DCOPY( M, A( 1, p ), 1, 1092: + WORK( N+1 ), 1 ) 1093: CALL DLASCL( 'G', 0, 0, AAPP, 1094: + WORK( p ), M, 1, 1095: + WORK( N+1 ), LDA, IERR ) 1096: AAPQ = DDOT( M, WORK( N+1 ), 1, 1097: + A( 1, q ), 1 )*WORK( q ) / AAQQ 1098: END IF 1099: ELSE 1100: IF( AAPP.GE.AAQQ ) THEN 1101: ROTOK = AAPP.LE.( AAQQ / SMALL ) 1102: ELSE 1103: ROTOK = AAQQ.LE.( AAPP / SMALL ) 1104: END IF 1105: IF( AAPP.GT.( SMALL / AAQQ ) ) THEN 1106: AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1, 1107: + q ), 1 )*WORK( p )*WORK( q ) / 1108: + AAQQ ) / AAPP 1109: ELSE 1110: CALL DCOPY( M, A( 1, q ), 1, 1111: + WORK( N+1 ), 1 ) 1112: CALL DLASCL( 'G', 0, 0, AAQQ, 1113: + WORK( q ), M, 1, 1114: + WORK( N+1 ), LDA, IERR ) 1115: AAPQ = DDOT( M, WORK( N+1 ), 1, 1116: + A( 1, p ), 1 )*WORK( p ) / AAPP 1117: END IF 1118: END IF 1119: * 1120: MXAAPQ = DMAX1( MXAAPQ, DABS( AAPQ ) ) 1121: * 1122: * TO rotate or NOT to rotate, THAT is the question ... 1123: * 1124: IF( DABS( AAPQ ).GT.TOL ) THEN 1125: NOTROT = 0 1126: *[RTD] ROTATED = ROTATED + 1 1127: PSKIPPED = 0 1128: ISWROT = ISWROT + 1 1129: * 1130: IF( ROTOK ) THEN 1131: * 1132: AQOAP = AAQQ / AAPP 1133: APOAQ = AAPP / AAQQ 1134: THETA = -HALF*DABS( AQOAP-APOAQ ) / 1135: + AAPQ 1136: IF( AAQQ.GT.AAPP0 )THETA = -THETA 1137: * 1138: IF( DABS( THETA ).GT.BIGTHETA ) THEN 1139: T = HALF / THETA 1140: FASTR( 3 ) = T*WORK( p ) / WORK( q ) 1141: FASTR( 4 ) = -T*WORK( q ) / 1142: + WORK( p ) 1143: CALL DROTM( M, A( 1, p ), 1, 1144: + A( 1, q ), 1, FASTR ) 1145: IF( RSVEC )CALL DROTM( MVL, 1146: + V( 1, p ), 1, 1147: + V( 1, q ), 1, 1148: + FASTR ) 1149: SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO, 1150: + ONE+T*APOAQ*AAPQ ) ) 1151: AAPP = AAPP*DSQRT( DMAX1( ZERO, 1152: + ONE-T*AQOAP*AAPQ ) ) 1153: MXSINJ = DMAX1( MXSINJ, DABS( T ) ) 1154: ELSE 1155: * 1156: * .. choose correct signum for THETA and rotate 1157: * 1158: THSIGN = -DSIGN( ONE, AAPQ ) 1159: IF( AAQQ.GT.AAPP0 )THSIGN = -THSIGN 1160: T = ONE / ( THETA+THSIGN* 1161: + DSQRT( ONE+THETA*THETA ) ) 1162: CS = DSQRT( ONE / ( ONE+T*T ) ) 1163: SN = T*CS 1164: MXSINJ = DMAX1( MXSINJ, DABS( SN ) ) 1165: SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO, 1166: + ONE+T*APOAQ*AAPQ ) ) 1167: AAPP = AAPP*DSQRT( DMAX1( ZERO, 1168: + ONE-T*AQOAP*AAPQ ) ) 1169: * 1170: APOAQ = WORK( p ) / WORK( q ) 1171: AQOAP = WORK( q ) / WORK( p ) 1172: IF( WORK( p ).GE.ONE ) THEN 1173: * 1174: IF( WORK( q ).GE.ONE ) THEN 1175: FASTR( 3 ) = T*APOAQ 1176: FASTR( 4 ) = -T*AQOAP 1177: WORK( p ) = WORK( p )*CS 1178: WORK( q ) = WORK( q )*CS 1179: CALL DROTM( M, A( 1, p ), 1, 1180: + A( 1, q ), 1, 1181: + FASTR ) 1182: IF( RSVEC )CALL DROTM( MVL, 1183: + V( 1, p ), 1, V( 1, q ), 1184: + 1, FASTR ) 1185: ELSE 1186: CALL DAXPY( M, -T*AQOAP, 1187: + A( 1, q ), 1, 1188: + A( 1, p ), 1 ) 1189: CALL DAXPY( M, CS*SN*APOAQ, 1190: + A( 1, p ), 1, 1191: + A( 1, q ), 1 ) 1192: IF( RSVEC ) THEN 1193: CALL DAXPY( MVL, -T*AQOAP, 1194: + V( 1, q ), 1, 1195: + V( 1, p ), 1 ) 1196: CALL DAXPY( MVL, 1197: + CS*SN*APOAQ, 1198: + V( 1, p ), 1, 1199: + V( 1, q ), 1 ) 1200: END IF 1201: WORK( p ) = WORK( p )*CS 1202: WORK( q ) = WORK( q ) / CS 1203: END IF 1204: ELSE 1205: IF( WORK( q ).GE.ONE ) THEN 1206: CALL DAXPY( M, T*APOAQ, 1207: + A( 1, p ), 1, 1208: + A( 1, q ), 1 ) 1209: CALL DAXPY( M, -CS*SN*AQOAP, 1210: + A( 1, q ), 1, 1211: + A( 1, p ), 1 ) 1212: IF( RSVEC ) THEN 1213: CALL DAXPY( MVL, T*APOAQ, 1214: + V( 1, p ), 1, 1215: + V( 1, q ), 1 ) 1216: CALL DAXPY( MVL, 1217: + -CS*SN*AQOAP, 1218: + V( 1, q ), 1, 1219: + V( 1, p ), 1 ) 1220: END IF 1221: WORK( p ) = WORK( p ) / CS 1222: WORK( q ) = WORK( q )*CS 1223: ELSE 1224: IF( WORK( p ).GE.WORK( q ) ) 1225: + THEN 1226: CALL DAXPY( M, -T*AQOAP, 1227: + A( 1, q ), 1, 1228: + A( 1, p ), 1 ) 1229: CALL DAXPY( M, CS*SN*APOAQ, 1230: + A( 1, p ), 1, 1231: + A( 1, q ), 1 ) 1232: WORK( p ) = WORK( p )*CS 1233: WORK( q ) = WORK( q ) / CS 1234: IF( RSVEC ) THEN 1235: CALL DAXPY( MVL, 1236: + -T*AQOAP, 1237: + V( 1, q ), 1, 1238: + V( 1, p ), 1 ) 1239: CALL DAXPY( MVL, 1240: + CS*SN*APOAQ, 1241: + V( 1, p ), 1, 1242: + V( 1, q ), 1 ) 1243: END IF 1244: ELSE 1245: CALL DAXPY( M, T*APOAQ, 1246: + A( 1, p ), 1, 1247: + A( 1, q ), 1 ) 1248: CALL DAXPY( M, 1249: + -CS*SN*AQOAP, 1250: + A( 1, q ), 1, 1251: + A( 1, p ), 1 ) 1252: WORK( p ) = WORK( p ) / CS 1253: WORK( q ) = WORK( q )*CS 1254: IF( RSVEC ) THEN 1255: CALL DAXPY( MVL, 1256: + T*APOAQ, V( 1, p ), 1257: + 1, V( 1, q ), 1 ) 1258: CALL DAXPY( MVL, 1259: + -CS*SN*AQOAP, 1260: + V( 1, q ), 1, 1261: + V( 1, p ), 1 ) 1262: END IF 1263: END IF 1264: END IF 1265: END IF 1266: END IF 1267: * 1268: ELSE 1269: IF( AAPP.GT.AAQQ ) THEN 1270: CALL DCOPY( M, A( 1, p ), 1, 1271: + WORK( N+1 ), 1 ) 1272: CALL DLASCL( 'G', 0, 0, AAPP, ONE, 1273: + M, 1, WORK( N+1 ), LDA, 1274: + IERR ) 1275: CALL DLASCL( 'G', 0, 0, AAQQ, ONE, 1276: + M, 1, A( 1, q ), LDA, 1277: + IERR ) 1278: TEMP1 = -AAPQ*WORK( p ) / WORK( q ) 1279: CALL DAXPY( M, TEMP1, WORK( N+1 ), 1280: + 1, A( 1, q ), 1 ) 1281: CALL DLASCL( 'G', 0, 0, ONE, AAQQ, 1282: + M, 1, A( 1, q ), LDA, 1283: + IERR ) 1284: SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO, 1285: + ONE-AAPQ*AAPQ ) ) 1286: MXSINJ = DMAX1( MXSINJ, SFMIN ) 1287: ELSE 1288: CALL DCOPY( M, A( 1, q ), 1, 1289: + WORK( N+1 ), 1 ) 1290: CALL DLASCL( 'G', 0, 0, AAQQ, ONE, 1291: + M, 1, WORK( N+1 ), LDA, 1292: + IERR ) 1293: CALL DLASCL( 'G', 0, 0, AAPP, ONE, 1294: + M, 1, A( 1, p ), LDA, 1295: + IERR ) 1296: TEMP1 = -AAPQ*WORK( q ) / WORK( p ) 1297: CALL DAXPY( M, TEMP1, WORK( N+1 ), 1298: + 1, A( 1, p ), 1 ) 1299: CALL DLASCL( 'G', 0, 0, ONE, AAPP, 1300: + M, 1, A( 1, p ), LDA, 1301: + IERR ) 1302: SVA( p ) = AAPP*DSQRT( DMAX1( ZERO, 1303: + ONE-AAPQ*AAPQ ) ) 1304: MXSINJ = DMAX1( MXSINJ, SFMIN ) 1305: END IF 1306: END IF 1307: * END IF ROTOK THEN ... ELSE 1308: * 1309: * In the case of cancellation in updating SVA(q) 1310: * .. recompute SVA(q) 1311: IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS ) 1312: + THEN 1313: IF( ( AAQQ.LT.ROOTBIG ) .AND. 1314: + ( AAQQ.GT.ROOTSFMIN ) ) THEN 1315: SVA( q ) = DNRM2( M, A( 1, q ), 1 )* 1316: + WORK( q ) 1317: ELSE 1318: T = ZERO 1319: AAQQ = ONE 1320: CALL DLASSQ( M, A( 1, q ), 1, T, 1321: + AAQQ ) 1322: SVA( q ) = T*DSQRT( AAQQ )*WORK( q ) 1323: END IF 1324: END IF 1325: IF( ( AAPP / AAPP0 )**2.LE.ROOTEPS ) THEN 1326: IF( ( AAPP.LT.ROOTBIG ) .AND. 1327: + ( AAPP.GT.ROOTSFMIN ) ) THEN 1328: AAPP = DNRM2( M, A( 1, p ), 1 )* 1329: + WORK( p ) 1330: ELSE 1331: T = ZERO 1332: AAPP = ONE 1333: CALL DLASSQ( M, A( 1, p ), 1, T, 1334: + AAPP ) 1335: AAPP = T*DSQRT( AAPP )*WORK( p ) 1336: END IF 1337: SVA( p ) = AAPP 1338: END IF 1339: * end of OK rotation 1340: ELSE 1341: NOTROT = NOTROT + 1 1342: *[RTD] SKIPPED = SKIPPED + 1 1343: PSKIPPED = PSKIPPED + 1 1344: IJBLSK = IJBLSK + 1 1345: END IF 1346: ELSE 1347: NOTROT = NOTROT + 1 1348: PSKIPPED = PSKIPPED + 1 1349: IJBLSK = IJBLSK + 1 1350: END IF 1351: * 1352: IF( ( i.LE.SWBAND ) .AND. ( IJBLSK.GE.BLSKIP ) ) 1353: + THEN 1354: SVA( p ) = AAPP 1355: NOTROT = 0 1356: GO TO 2011 1357: END IF 1358: IF( ( i.LE.SWBAND ) .AND. 1359: + ( PSKIPPED.GT.ROWSKIP ) ) THEN 1360: AAPP = -AAPP 1361: NOTROT = 0 1362: GO TO 2203 1363: END IF 1364: * 1365: 2200 CONTINUE 1366: * end of the q-loop 1367: 2203 CONTINUE 1368: * 1369: SVA( p ) = AAPP 1370: * 1371: ELSE 1372: * 1373: IF( AAPP.EQ.ZERO )NOTROT = NOTROT + 1374: + MIN0( jgl+KBL-1, N ) - jgl + 1 1375: IF( AAPP.LT.ZERO )NOTROT = 0 1376: * 1377: END IF 1378: * 1379: 2100 CONTINUE 1380: * end of the p-loop 1381: 2010 CONTINUE 1382: * end of the jbc-loop 1383: 2011 CONTINUE 1384: *2011 bailed out of the jbc-loop 1385: DO 2012 p = igl, MIN0( igl+KBL-1, N ) 1386: SVA( p ) = DABS( SVA( p ) ) 1387: 2012 CONTINUE 1388: *** 1389: 2000 CONTINUE 1390: *2000 :: end of the ibr-loop 1391: * 1392: * .. update SVA(N) 1393: IF( ( SVA( N ).LT.ROOTBIG ) .AND. ( SVA( N ).GT.ROOTSFMIN ) ) 1394: + THEN 1395: SVA( N ) = DNRM2( M, A( 1, N ), 1 )*WORK( N ) 1396: ELSE 1397: T = ZERO 1398: AAPP = ONE 1399: CALL DLASSQ( M, A( 1, N ), 1, T, AAPP ) 1400: SVA( N ) = T*DSQRT( AAPP )*WORK( N ) 1401: END IF 1402: * 1403: * Additional steering devices 1404: * 1405: IF( ( i.LT.SWBAND ) .AND. ( ( MXAAPQ.LE.ROOTTOL ) .OR. 1406: + ( ISWROT.LE.N ) ) )SWBAND = i 1407: * 1408: IF( ( i.GT.SWBAND+1 ) .AND. ( MXAAPQ.LT.DSQRT( DBLE( N ) )* 1409: + TOL ) .AND. ( DBLE( N )*MXAAPQ*MXSINJ.LT.TOL ) ) THEN 1410: GO TO 1994 1411: END IF 1412: * 1413: IF( NOTROT.GE.EMPTSW )GO TO 1994 1414: * 1415: 1993 CONTINUE 1416: * end i=1:NSWEEP loop 1417: * 1418: * #:( Reaching this point means that the procedure has not converged. 1419: INFO = NSWEEP - 1 1420: GO TO 1995 1421: * 1422: 1994 CONTINUE 1423: * #:) Reaching this point means numerical convergence after the i-th 1424: * sweep. 1425: * 1426: INFO = 0 1427: * #:) INFO = 0 confirms successful iterations. 1428: 1995 CONTINUE 1429: * 1430: * Sort the singular values and find how many are above 1431: * the underflow threshold. 1432: * 1433: N2 = 0 1434: N4 = 0 1435: DO 5991 p = 1, N - 1 1436: q = IDAMAX( N-p+1, SVA( p ), 1 ) + p - 1 1437: IF( p.NE.q ) THEN 1438: TEMP1 = SVA( p ) 1439: SVA( p ) = SVA( q ) 1440: SVA( q ) = TEMP1 1441: TEMP1 = WORK( p ) 1442: WORK( p ) = WORK( q ) 1443: WORK( q ) = TEMP1 1444: CALL DSWAP( M, A( 1, p ), 1, A( 1, q ), 1 ) 1445: IF( RSVEC )CALL DSWAP( MVL, V( 1, p ), 1, V( 1, q ), 1 ) 1446: END IF 1447: IF( SVA( p ).NE.ZERO ) THEN 1448: N4 = N4 + 1 1449: IF( SVA( p )*SKL.GT.SFMIN )N2 = N2 + 1 1450: END IF 1451: 5991 CONTINUE 1452: IF( SVA( N ).NE.ZERO ) THEN 1453: N4 = N4 + 1 1454: IF( SVA( N )*SKL.GT.SFMIN )N2 = N2 + 1 1455: END IF 1456: * 1457: * Normalize the left singular vectors. 1458: * 1459: IF( LSVEC .OR. UCTOL ) THEN 1460: DO 1998 p = 1, N2 1461: CALL DSCAL( M, WORK( p ) / SVA( p ), A( 1, p ), 1 ) 1462: 1998 CONTINUE 1463: END IF 1464: * 1465: * Scale the product of Jacobi rotations (assemble the fast rotations). 1466: * 1467: IF( RSVEC ) THEN 1468: IF( APPLV ) THEN 1469: DO 2398 p = 1, N 1470: CALL DSCAL( MVL, WORK( p ), V( 1, p ), 1 ) 1471: 2398 CONTINUE 1472: ELSE 1473: DO 2399 p = 1, N 1474: TEMP1 = ONE / DNRM2( MVL, V( 1, p ), 1 ) 1475: CALL DSCAL( MVL, TEMP1, V( 1, p ), 1 ) 1476: 2399 CONTINUE 1477: END IF 1478: END IF 1479: * 1480: * Undo scaling, if necessary (and possible). 1481: IF( ( ( SKL.GT.ONE ) .AND. ( SVA( 1 ).LT.( BIG / 1482: + SKL) ) ) .OR. ( ( SKL.LT.ONE ) .AND. ( SVA( N2 ).GT. 1483: + ( SFMIN / SKL) ) ) ) THEN 1484: DO 2400 p = 1, N 1485: SVA( p ) = SKL*SVA( p ) 1486: 2400 CONTINUE 1487: SKL= ONE 1488: END IF 1489: * 1490: WORK( 1 ) = SKL 1491: * The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE 1492: * then some of the singular values may overflow or underflow and 1493: * the spectrum is given in this factored representation. 1494: * 1495: WORK( 2 ) = DBLE( N4 ) 1496: * N4 is the number of computed nonzero singular values of A. 1497: * 1498: WORK( 3 ) = DBLE( N2 ) 1499: * N2 is the number of singular values of A greater than SFMIN. 1500: * If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers 1501: * that may carry some information. 1502: * 1503: WORK( 4 ) = DBLE( i ) 1504: * i is the index of the last sweep before declaring convergence. 1505: * 1506: WORK( 5 ) = MXAAPQ 1507: * MXAAPQ is the largest absolute value of scaled pivots in the 1508: * last sweep 1509: * 1510: WORK( 6 ) = MXSINJ 1511: * MXSINJ is the largest absolute value of the sines of Jacobi angles 1512: * in the last sweep 1513: * 1514: RETURN 1515: * .. 1516: * .. END OF DGESVJ 1517: * .. 1518: END