1: SUBROUTINE DGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
2: + LDV, WORK, LWORK, INFO )
3: *
4: * -- LAPACK routine (version 3.2.2) --
5: *
6: * -- Contributed by Zlatko Drmac of the University of Zagreb and --
7: * -- Kresimir Veselic of the Fernuniversitaet Hagen --
8: * -- June 2010 --
9: *
10: * -- LAPACK is a software package provided by Univ. of Tennessee, --
11: * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
12: *
13: * This routine is also part of SIGMA (version 1.23, October 23. 2008.)
14: * SIGMA is a library of algorithms for highly accurate algorithms for
15: * computation of SVD, PSVD, QSVD, (H,K)-SVD, and for solution of the
16: * eigenvalue problems Hx = lambda M x, H M x = lambda x with H, M > 0.
17: *
18: IMPLICIT NONE
19: * ..
20: * .. Scalar Arguments ..
21: INTEGER INFO, LDA, LDV, LWORK, M, MV, N
22: CHARACTER*1 JOBA, JOBU, JOBV
23: * ..
24: * .. Array Arguments ..
25: DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ),
26: + WORK( LWORK )
27: * ..
28: *
29: * Purpose
30: * =======
31: *
32: * DGESVJ computes the singular value decomposition (SVD) of a real
33: * M-by-N matrix A, where M >= N. The SVD of A is written as
34: * [++] [xx] [x0] [xx]
35: * A = U * SIGMA * V^t, [++] = [xx] * [ox] * [xx]
36: * [++] [xx]
37: * where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
38: * matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
39: * of SIGMA are the singular values of A. The columns of U and V are the
40: * left and the right singular vectors of A, respectively.
41: *
42: * Further Details
43: * ~~~~~~~~~~~~~~~
44: * The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane
45: * rotations. The rotations are implemented as fast scaled rotations of
46: * Anda and Park [1]. In the case of underflow of the Jacobi angle, a
47: * modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses
48: * column interchanges of de Rijk [2]. The relative accuracy of the computed
49: * singular values and the accuracy of the computed singular vectors (in
50: * angle metric) is as guaranteed by the theory of Demmel and Veselic [3].
51: * The condition number that determines the accuracy in the full rank case
52: * is essentially min_{D=diag} kappa(A*D), where kappa(.) is the
53: * spectral condition number. The best performance of this Jacobi SVD
54: * procedure is achieved if used in an accelerated version of Drmac and
55: * Veselic [5,6], and it is the kernel routine in the SIGMA library [7].
56: * Some tunning parameters (marked with [TP]) are available for the
57: * implementer.
58: * The computational range for the nonzero singular values is the machine
59: * number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even
60: * denormalized singular values can be computed with the corresponding
61: * gradual loss of accurate digits.
62: *
63: * Contributors
64: * ~~~~~~~~~~~~
65: * Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
66: *
67: * References
68: * ~~~~~~~~~~
69: * [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
70: * SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.
71: * [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the
72: * singular value decomposition on a vector computer.
73: * SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.
74: * [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
75: * [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular
76: * value computation in floating point arithmetic.
77: * SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.
78: * [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
79: * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
80: * LAPACK Working note 169.
81: * [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
82: * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
83: * LAPACK Working note 170.
84: * [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
85: * QSVD, (H,K)-SVD computations.
86: * Department of Mathematics, University of Zagreb, 2008.
87: *
88: * Bugs, Examples and Comments
89: * ~~~~~~~~~~~~~~~~~~~~~~~~~~~
90: * Please report all bugs and send interesting test examples and comments to
91: * drmac@math.hr. Thank you.
92: *
93: * Arguments
94: * =========
95: *
96: * JOBA (input) CHARACTER* 1
97: * Specifies the structure of A.
98: * = 'L': The input matrix A is lower triangular;
99: * = 'U': The input matrix A is upper triangular;
100: * = 'G': The input matrix A is general M-by-N matrix, M >= N.
101: *
102: * JOBU (input) CHARACTER*1
103: * Specifies whether to compute the left singular vectors
104: * (columns of U):
105: * = 'U': The left singular vectors corresponding to the nonzero
106: * singular values are computed and returned in the leading
107: * columns of A. See more details in the description of A.
108: * The default numerical orthogonality threshold is set to
109: * approximately TOL=CTOL*EPS, CTOL=DSQRT(M), EPS=DLAMCH('E').
110: * = 'C': Analogous to JOBU='U', except that user can control the
111: * level of numerical orthogonality of the computed left
112: * singular vectors. TOL can be set to TOL = CTOL*EPS, where
113: * CTOL is given on input in the array WORK.
114: * No CTOL smaller than ONE is allowed. CTOL greater
115: * than 1 / EPS is meaningless. The option 'C'
116: * can be used if M*EPS is satisfactory orthogonality
117: * of the computed left singular vectors, so CTOL=M could
118: * save few sweeps of Jacobi rotations.
119: * See the descriptions of A and WORK(1).
120: * = 'N': The matrix U is not computed. However, see the
121: * description of A.
122: *
123: * JOBV (input) CHARACTER*1
124: * Specifies whether to compute the right singular vectors, that
125: * is, the matrix V:
126: * = 'V' : the matrix V is computed and returned in the array V
127: * = 'A' : the Jacobi rotations are applied to the MV-by-N
128: * array V. In other words, the right singular vector
129: * matrix V is not computed explicitly, instead it is
130: * applied to an MV-by-N matrix initially stored in the
131: * first MV rows of V.
132: * = 'N' : the matrix V is not computed and the array V is not
133: * referenced
134: *
135: * M (input) INTEGER
136: * The number of rows of the input matrix A. M >= 0.
137: *
138: * N (input) INTEGER
139: * The number of columns of the input matrix A.
140: * M >= N >= 0.
141: *
142: * A (input/output) DOUBLE PRECISION array, dimension (LDA,N)
143: * On entry, the M-by-N matrix A.
144: * On exit :
145: * If JOBU .EQ. 'U' .OR. JOBU .EQ. 'C' :
146: * If INFO .EQ. 0 :
147: * RANKA orthonormal columns of U are returned in the
148: * leading RANKA columns of the array A. Here RANKA <= N
149: * is the number of computed singular values of A that are
150: * above the underflow threshold DLAMCH('S'). The singular
151: * vectors corresponding to underflowed or zero singular
152: * values are not computed. The value of RANKA is returned
153: * in the array WORK as RANKA=NINT(WORK(2)). Also see the
154: * descriptions of SVA and WORK. The computed columns of U
155: * are mutually numerically orthogonal up to approximately
156: * TOL=DSQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU.EQ.'C'),
157: * see the description of JOBU.
158: * If INFO .GT. 0 :
159: * the procedure DGESVJ did not converge in the given number
160: * of iterations (sweeps). In that case, the computed
161: * columns of U may not be orthogonal up to TOL. The output
162: * U (stored in A), SIGMA (given by the computed singular
163: * values in SVA(1:N)) and V is still a decomposition of the
164: * input matrix A in the sense that the residual
165: * ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.
166: *
167: * If JOBU .EQ. 'N' :
168: * If INFO .EQ. 0 :
169: * Note that the left singular vectors are 'for free' in the
170: * one-sided Jacobi SVD algorithm. However, if only the
171: * singular values are needed, the level of numerical
172: * orthogonality of U is not an issue and iterations are
173: * stopped when the columns of the iterated matrix are
174: * numerically orthogonal up to approximately M*EPS. Thus,
175: * on exit, A contains the columns of U scaled with the
176: * corresponding singular values.
177: * If INFO .GT. 0 :
178: * the procedure DGESVJ did not converge in the given number
179: * of iterations (sweeps).
180: *
181: * LDA (input) INTEGER
182: * The leading dimension of the array A. LDA >= max(1,M).
183: *
184: * SVA (workspace/output) DOUBLE PRECISION array, dimension (N)
185: * On exit :
186: * If INFO .EQ. 0 :
187: * depending on the value SCALE = WORK(1), we have:
188: * If SCALE .EQ. ONE :
189: * SVA(1:N) contains the computed singular values of A.
190: * During the computation SVA contains the Euclidean column
191: * norms of the iterated matrices in the array A.
192: * If SCALE .NE. ONE :
193: * The singular values of A are SCALE*SVA(1:N), and this
194: * factored representation is due to the fact that some of the
195: * singular values of A might underflow or overflow.
196: * If INFO .GT. 0 :
197: * the procedure DGESVJ did not converge in the given number of
198: * iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
199: *
200: * MV (input) INTEGER
201: * If JOBV .EQ. 'A', then the product of Jacobi rotations in DGESVJ
202: * is applied to the first MV rows of V. See the description of JOBV.
203: *
204: * V (input/output) DOUBLE PRECISION array, dimension (LDV,N)
205: * If JOBV = 'V', then V contains on exit the N-by-N matrix of
206: * the right singular vectors;
207: * If JOBV = 'A', then V contains the product of the computed right
208: * singular vector matrix and the initial matrix in
209: * the array V.
210: * If JOBV = 'N', then V is not referenced.
211: *
212: * LDV (input) INTEGER
213: * The leading dimension of the array V, LDV .GE. 1.
214: * If JOBV .EQ. 'V', then LDV .GE. max(1,N).
215: * If JOBV .EQ. 'A', then LDV .GE. max(1,MV) .
216: *
217: * WORK (input/workspace/output) DOUBLE PRECISION array, dimension max(4,M+N).
218: * On entry :
219: * If JOBU .EQ. 'C' :
220: * WORK(1) = CTOL, where CTOL defines the threshold for convergence.
221: * The process stops if all columns of A are mutually
222: * orthogonal up to CTOL*EPS, EPS=DLAMCH('E').
223: * It is required that CTOL >= ONE, i.e. it is not
224: * allowed to force the routine to obtain orthogonality
225: * below EPSILON.
226: * On exit :
227: * WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
228: * are the computed singular values of A.
229: * (See description of SVA().)
230: * WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
231: * singular values.
232: * WORK(3) = NINT(WORK(3)) is the number of the computed singular
233: * values that are larger than the underflow threshold.
234: * WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
235: * rotations needed for numerical convergence.
236: * WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
237: * This is useful information in cases when DGESVJ did
238: * not converge, as it can be used to estimate whether
239: * the output is stil useful and for post festum analysis.
240: * WORK(6) = the largest absolute value over all sines of the
241: * Jacobi rotation angles in the last sweep. It can be
242: * useful for a post festum analysis.
243: *
244: * LWORK (input) INTEGER
245: * length of WORK, WORK >= MAX(6,M+N)
246: *
247: * INFO (output) INTEGER
248: * = 0 : successful exit.
249: * < 0 : if INFO = -i, then the i-th argument had an illegal value
250: * > 0 : DGESVJ did not converge in the maximal allowed number (30)
251: * of sweeps. The output may still be useful. See the
252: * description of WORK.
253: *
254: * =====================================================================
255: *
256: * .. Local Parameters ..
257: DOUBLE PRECISION ZERO, HALF, ONE, TWO
258: PARAMETER ( ZERO = 0.0D0, HALF = 0.5D0, ONE = 1.0D0,
259: + TWO = 2.0D0 )
260: INTEGER NSWEEP
261: PARAMETER ( NSWEEP = 30 )
262: * ..
263: * .. Local Scalars ..
264: DOUBLE PRECISION AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
265: + BIGTHETA, CS, CTOL, EPSILON, LARGE, MXAAPQ,
266: + MXSINJ, ROOTBIG, ROOTEPS, ROOTSFMIN, ROOTTOL,
267: + SCALE, SFMIN, SMALL, SN, T, TEMP1, THETA,
268: + THSIGN, TOL
269: INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
270: + ISWROT, jbc, jgl, KBL, LKAHEAD, MVL, N2, N34,
271: + N4, NBL, NOTROT, p, PSKIPPED, q, ROWSKIP,
272: + SWBAND
273: LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK,
274: + RSVEC, UCTOL, UPPER
275: * ..
276: * .. Local Arrays ..
277: DOUBLE PRECISION FASTR( 5 )
278: * ..
279: * .. Intrinsic Functions ..
280: INTRINSIC DABS, DMAX1, DMIN1, DBLE, MIN0, DSIGN, DSQRT
281: * ..
282: * .. External Functions ..
283: * ..
284: * from BLAS
285: DOUBLE PRECISION DDOT, DNRM2
286: EXTERNAL DDOT, DNRM2
287: INTEGER IDAMAX
288: EXTERNAL IDAMAX
289: * from LAPACK
290: DOUBLE PRECISION DLAMCH
291: EXTERNAL DLAMCH
292: LOGICAL LSAME
293: EXTERNAL LSAME
294: * ..
295: * .. External Subroutines ..
296: * ..
297: * from BLAS
298: EXTERNAL DAXPY, DCOPY, DROTM, DSCAL, DSWAP
299: * from LAPACK
300: EXTERNAL DLASCL, DLASET, DLASSQ, XERBLA
301: *
302: EXTERNAL DGSVJ0, DGSVJ1
303: * ..
304: * .. Executable Statements ..
305: *
306: * Test the input arguments
307: *
308: LSVEC = LSAME( JOBU, 'U' )
309: UCTOL = LSAME( JOBU, 'C' )
310: RSVEC = LSAME( JOBV, 'V' )
311: APPLV = LSAME( JOBV, 'A' )
312: UPPER = LSAME( JOBA, 'U' )
313: LOWER = LSAME( JOBA, 'L' )
314: *
315: IF( .NOT.( UPPER .OR. LOWER .OR. LSAME( JOBA, 'G' ) ) ) THEN
316: INFO = -1
317: ELSE IF( .NOT.( LSVEC .OR. UCTOL .OR. LSAME( JOBU, 'N' ) ) ) THEN
318: INFO = -2
319: ELSE IF( .NOT.( RSVEC .OR. APPLV .OR. LSAME( JOBV, 'N' ) ) ) THEN
320: INFO = -3
321: ELSE IF( M.LT.0 ) THEN
322: INFO = -4
323: ELSE IF( ( N.LT.0 ) .OR. ( N.GT.M ) ) THEN
324: INFO = -5
325: ELSE IF( LDA.LT.M ) THEN
326: INFO = -7
327: ELSE IF( MV.LT.0 ) THEN
328: INFO = -9
329: ELSE IF( ( RSVEC .AND. ( LDV.LT.N ) ) .OR.
330: + ( APPLV .AND. ( LDV.LT.MV ) ) ) THEN
331: INFO = -11
332: ELSE IF( UCTOL .AND. ( WORK( 1 ).LE.ONE ) ) THEN
333: INFO = -12
334: ELSE IF( LWORK.LT.MAX0( M+N, 6 ) ) THEN
335: INFO = -13
336: ELSE
337: INFO = 0
338: END IF
339: *
340: * #:(
341: IF( INFO.NE.0 ) THEN
342: CALL XERBLA( 'DGESVJ', -INFO )
343: RETURN
344: END IF
345: *
346: * #:) Quick return for void matrix
347: *
348: IF( ( M.EQ.0 ) .OR. ( N.EQ.0 ) )RETURN
349: *
350: * Set numerical parameters
351: * The stopping criterion for Jacobi rotations is
352: *
353: * max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
354: *
355: * where EPS is the round-off and CTOL is defined as follows:
356: *
357: IF( UCTOL ) THEN
358: * ... user controlled
359: CTOL = WORK( 1 )
360: ELSE
361: * ... default
362: IF( LSVEC .OR. RSVEC .OR. APPLV ) THEN
363: CTOL = DSQRT( DBLE( M ) )
364: ELSE
365: CTOL = DBLE( M )
366: END IF
367: END IF
368: * ... and the machine dependent parameters are
369: *[!] (Make sure that DLAMCH() works properly on the target machine.)
370: *
371: EPSILON = DLAMCH( 'Epsilon' )
372: ROOTEPS = DSQRT( EPSILON )
373: SFMIN = DLAMCH( 'SafeMinimum' )
374: ROOTSFMIN = DSQRT( SFMIN )
375: SMALL = SFMIN / EPSILON
376: BIG = DLAMCH( 'Overflow' )
377: * BIG = ONE / SFMIN
378: ROOTBIG = ONE / ROOTSFMIN
379: LARGE = BIG / DSQRT( DBLE( M*N ) )
380: BIGTHETA = ONE / ROOTEPS
381: *
382: TOL = CTOL*EPSILON
383: ROOTTOL = DSQRT( TOL )
384: *
385: IF( DBLE( M )*EPSILON.GE.ONE ) THEN
386: INFO = -5
387: CALL XERBLA( 'DGESVJ', -INFO )
388: RETURN
389: END IF
390: *
391: * Initialize the right singular vector matrix.
392: *
393: IF( RSVEC ) THEN
394: MVL = N
395: CALL DLASET( 'A', MVL, N, ZERO, ONE, V, LDV )
396: ELSE IF( APPLV ) THEN
397: MVL = MV
398: END IF
399: RSVEC = RSVEC .OR. APPLV
400: *
401: * Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
402: *(!) If necessary, scale A to protect the largest singular value
403: * from overflow. It is possible that saving the largest singular
404: * value destroys the information about the small ones.
405: * This initial scaling is almost minimal in the sense that the
406: * goal is to make sure that no column norm overflows, and that
407: * DSQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
408: * in A are detected, the procedure returns with INFO=-6.
409: *
410: SCALE = ONE / DSQRT( DBLE( M )*DBLE( N ) )
411: NOSCALE = .TRUE.
412: GOSCALE = .TRUE.
413: *
414: IF( LOWER ) THEN
415: * the input matrix is M-by-N lower triangular (trapezoidal)
416: DO 1874 p = 1, N
417: AAPP = ZERO
418: AAQQ = ZERO
419: CALL DLASSQ( M-p+1, A( p, p ), 1, AAPP, AAQQ )
420: IF( AAPP.GT.BIG ) THEN
421: INFO = -6
422: CALL XERBLA( 'DGESVJ', -INFO )
423: RETURN
424: END IF
425: AAQQ = DSQRT( AAQQ )
426: IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN
427: SVA( p ) = AAPP*AAQQ
428: ELSE
429: NOSCALE = .FALSE.
430: SVA( p ) = AAPP*( AAQQ*SCALE )
431: IF( GOSCALE ) THEN
432: GOSCALE = .FALSE.
433: DO 1873 q = 1, p - 1
434: SVA( q ) = SVA( q )*SCALE
435: 1873 CONTINUE
436: END IF
437: END IF
438: 1874 CONTINUE
439: ELSE IF( UPPER ) THEN
440: * the input matrix is M-by-N upper triangular (trapezoidal)
441: DO 2874 p = 1, N
442: AAPP = ZERO
443: AAQQ = ZERO
444: CALL DLASSQ( p, A( 1, p ), 1, AAPP, AAQQ )
445: IF( AAPP.GT.BIG ) THEN
446: INFO = -6
447: CALL XERBLA( 'DGESVJ', -INFO )
448: RETURN
449: END IF
450: AAQQ = DSQRT( AAQQ )
451: IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN
452: SVA( p ) = AAPP*AAQQ
453: ELSE
454: NOSCALE = .FALSE.
455: SVA( p ) = AAPP*( AAQQ*SCALE )
456: IF( GOSCALE ) THEN
457: GOSCALE = .FALSE.
458: DO 2873 q = 1, p - 1
459: SVA( q ) = SVA( q )*SCALE
460: 2873 CONTINUE
461: END IF
462: END IF
463: 2874 CONTINUE
464: ELSE
465: * the input matrix is M-by-N general dense
466: DO 3874 p = 1, N
467: AAPP = ZERO
468: AAQQ = ZERO
469: CALL DLASSQ( M, A( 1, p ), 1, AAPP, AAQQ )
470: IF( AAPP.GT.BIG ) THEN
471: INFO = -6
472: CALL XERBLA( 'DGESVJ', -INFO )
473: RETURN
474: END IF
475: AAQQ = DSQRT( AAQQ )
476: IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN
477: SVA( p ) = AAPP*AAQQ
478: ELSE
479: NOSCALE = .FALSE.
480: SVA( p ) = AAPP*( AAQQ*SCALE )
481: IF( GOSCALE ) THEN
482: GOSCALE = .FALSE.
483: DO 3873 q = 1, p - 1
484: SVA( q ) = SVA( q )*SCALE
485: 3873 CONTINUE
486: END IF
487: END IF
488: 3874 CONTINUE
489: END IF
490: *
491: IF( NOSCALE )SCALE = ONE
492: *
493: * Move the smaller part of the spectrum from the underflow threshold
494: *(!) Start by determining the position of the nonzero entries of the
495: * array SVA() relative to ( SFMIN, BIG ).
496: *
497: AAPP = ZERO
498: AAQQ = BIG
499: DO 4781 p = 1, N
500: IF( SVA( p ).NE.ZERO )AAQQ = DMIN1( AAQQ, SVA( p ) )
501: AAPP = DMAX1( AAPP, SVA( p ) )
502: 4781 CONTINUE
503: *
504: * #:) Quick return for zero matrix
505: *
506: IF( AAPP.EQ.ZERO ) THEN
507: IF( LSVEC )CALL DLASET( 'G', M, N, ZERO, ONE, A, LDA )
508: WORK( 1 ) = ONE
509: WORK( 2 ) = ZERO
510: WORK( 3 ) = ZERO
511: WORK( 4 ) = ZERO
512: WORK( 5 ) = ZERO
513: WORK( 6 ) = ZERO
514: RETURN
515: END IF
516: *
517: * #:) Quick return for one-column matrix
518: *
519: IF( N.EQ.1 ) THEN
520: IF( LSVEC )CALL DLASCL( 'G', 0, 0, SVA( 1 ), SCALE, M, 1,
521: + A( 1, 1 ), LDA, IERR )
522: WORK( 1 ) = ONE / SCALE
523: IF( SVA( 1 ).GE.SFMIN ) THEN
524: WORK( 2 ) = ONE
525: ELSE
526: WORK( 2 ) = ZERO
527: END IF
528: WORK( 3 ) = ZERO
529: WORK( 4 ) = ZERO
530: WORK( 5 ) = ZERO
531: WORK( 6 ) = ZERO
532: RETURN
533: END IF
534: *
535: * Protect small singular values from underflow, and try to
536: * avoid underflows/overflows in computing Jacobi rotations.
537: *
538: SN = DSQRT( SFMIN / EPSILON )
539: TEMP1 = DSQRT( BIG / DBLE( N ) )
540: IF( ( AAPP.LE.SN ) .OR. ( AAQQ.GE.TEMP1 ) .OR.
541: + ( ( SN.LE.AAQQ ) .AND. ( AAPP.LE.TEMP1 ) ) ) THEN
542: TEMP1 = DMIN1( BIG, TEMP1 / AAPP )
543: * AAQQ = AAQQ*TEMP1
544: * AAPP = AAPP*TEMP1
545: ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.LE.TEMP1 ) ) THEN
546: TEMP1 = DMIN1( SN / AAQQ, BIG / ( AAPP*DSQRT( DBLE( N ) ) ) )
547: * AAQQ = AAQQ*TEMP1
548: * AAPP = AAPP*TEMP1
549: ELSE IF( ( AAQQ.GE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN
550: TEMP1 = DMAX1( SN / AAQQ, TEMP1 / AAPP )
551: * AAQQ = AAQQ*TEMP1
552: * AAPP = AAPP*TEMP1
553: ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN
554: TEMP1 = DMIN1( SN / AAQQ, BIG / ( DSQRT( DBLE( N ) )*AAPP ) )
555: * AAQQ = AAQQ*TEMP1
556: * AAPP = AAPP*TEMP1
557: ELSE
558: TEMP1 = ONE
559: END IF
560: *
561: * Scale, if necessary
562: *
563: IF( TEMP1.NE.ONE ) THEN
564: CALL DLASCL( 'G', 0, 0, ONE, TEMP1, N, 1, SVA, N, IERR )
565: END IF
566: SCALE = TEMP1*SCALE
567: IF( SCALE.NE.ONE ) THEN
568: CALL DLASCL( JOBA, 0, 0, ONE, SCALE, M, N, A, LDA, IERR )
569: SCALE = ONE / SCALE
570: END IF
571: *
572: * Row-cyclic Jacobi SVD algorithm with column pivoting
573: *
574: EMPTSW = ( N*( N-1 ) ) / 2
575: NOTROT = 0
576: FASTR( 1 ) = ZERO
577: *
578: * A is represented in factored form A = A * diag(WORK), where diag(WORK)
579: * is initialized to identity. WORK is updated during fast scaled
580: * rotations.
581: *
582: DO 1868 q = 1, N
583: WORK( q ) = ONE
584: 1868 CONTINUE
585: *
586: *
587: SWBAND = 3
588: *[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
589: * if DGESVJ is used as a computational routine in the preconditioned
590: * Jacobi SVD algorithm DGESVJ. For sweeps i=1:SWBAND the procedure
591: * works on pivots inside a band-like region around the diagonal.
592: * The boundaries are determined dynamically, based on the number of
593: * pivots above a threshold.
594: *
595: KBL = MIN0( 8, N )
596: *[TP] KBL is a tuning parameter that defines the tile size in the
597: * tiling of the p-q loops of pivot pairs. In general, an optimal
598: * value of KBL depends on the matrix dimensions and on the
599: * parameters of the computer's memory.
600: *
601: NBL = N / KBL
602: IF( ( NBL*KBL ).NE.N )NBL = NBL + 1
603: *
604: BLSKIP = KBL**2
605: *[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
606: *
607: ROWSKIP = MIN0( 5, KBL )
608: *[TP] ROWSKIP is a tuning parameter.
609: *
610: LKAHEAD = 1
611: *[TP] LKAHEAD is a tuning parameter.
612: *
613: * Quasi block transformations, using the lower (upper) triangular
614: * structure of the input matrix. The quasi-block-cycling usually
615: * invokes cubic convergence. Big part of this cycle is done inside
616: * canonical subspaces of dimensions less than M.
617: *
618: IF( ( LOWER .OR. UPPER ) .AND. ( N.GT.MAX0( 64, 4*KBL ) ) ) THEN
619: *[TP] The number of partition levels and the actual partition are
620: * tuning parameters.
621: N4 = N / 4
622: N2 = N / 2
623: N34 = 3*N4
624: IF( APPLV ) THEN
625: q = 0
626: ELSE
627: q = 1
628: END IF
629: *
630: IF( LOWER ) THEN
631: *
632: * This works very well on lower triangular matrices, in particular
633: * in the framework of the preconditioned Jacobi SVD (xGEJSV).
634: * The idea is simple:
635: * [+ 0 0 0] Note that Jacobi transformations of [0 0]
636: * [+ + 0 0] [0 0]
637: * [+ + x 0] actually work on [x 0] [x 0]
638: * [+ + x x] [x x]. [x x]
639: *
640: CALL DGSVJ0( JOBV, M-N34, N-N34, A( N34+1, N34+1 ), LDA,
641: + WORK( N34+1 ), SVA( N34+1 ), MVL,
642: + V( N34*q+1, N34+1 ), LDV, EPSILON, SFMIN, TOL,
643: + 2, WORK( N+1 ), LWORK-N, IERR )
644: *
645: CALL DGSVJ0( JOBV, M-N2, N34-N2, A( N2+1, N2+1 ), LDA,
646: + WORK( N2+1 ), SVA( N2+1 ), MVL,
647: + V( N2*q+1, N2+1 ), LDV, EPSILON, SFMIN, TOL, 2,
648: + WORK( N+1 ), LWORK-N, IERR )
649: *
650: CALL DGSVJ1( JOBV, M-N2, N-N2, N4, A( N2+1, N2+1 ), LDA,
651: + WORK( N2+1 ), SVA( N2+1 ), MVL,
652: + V( N2*q+1, N2+1 ), LDV, EPSILON, SFMIN, TOL, 1,
653: + WORK( N+1 ), LWORK-N, IERR )
654: *
655: CALL DGSVJ0( JOBV, M-N4, N2-N4, A( N4+1, N4+1 ), LDA,
656: + WORK( N4+1 ), SVA( N4+1 ), MVL,
657: + V( N4*q+1, N4+1 ), LDV, EPSILON, SFMIN, TOL, 1,
658: + WORK( N+1 ), LWORK-N, IERR )
659: *
660: CALL DGSVJ0( JOBV, M, N4, A, LDA, WORK, SVA, MVL, V, LDV,
661: + EPSILON, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N,
662: + IERR )
663: *
664: CALL DGSVJ1( JOBV, M, N2, N4, A, LDA, WORK, SVA, MVL, V,
665: + LDV, EPSILON, SFMIN, TOL, 1, WORK( N+1 ),
666: + LWORK-N, IERR )
667: *
668: *
669: ELSE IF( UPPER ) THEN
670: *
671: *
672: CALL DGSVJ0( JOBV, N4, N4, A, LDA, WORK, SVA, MVL, V, LDV,
673: + EPSILON, SFMIN, TOL, 2, WORK( N+1 ), LWORK-N,
674: + IERR )
675: *
676: CALL DGSVJ0( JOBV, N2, N4, A( 1, N4+1 ), LDA, WORK( N4+1 ),
677: + SVA( N4+1 ), MVL, V( N4*q+1, N4+1 ), LDV,
678: + EPSILON, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N,
679: + IERR )
680: *
681: CALL DGSVJ1( JOBV, N2, N2, N4, A, LDA, WORK, SVA, MVL, V,
682: + LDV, EPSILON, SFMIN, TOL, 1, WORK( N+1 ),
683: + LWORK-N, IERR )
684: *
685: CALL DGSVJ0( JOBV, N2+N4, N4, A( 1, N2+1 ), LDA,
686: + WORK( N2+1 ), SVA( N2+1 ), MVL,
687: + V( N2*q+1, N2+1 ), LDV, EPSILON, SFMIN, TOL, 1,
688: + WORK( N+1 ), LWORK-N, IERR )
689:
690: END IF
691: *
692: END IF
693: *
694: * .. Row-cyclic pivot strategy with de Rijk's pivoting ..
695: *
696: DO 1993 i = 1, NSWEEP
697: *
698: * .. go go go ...
699: *
700: MXAAPQ = ZERO
701: MXSINJ = ZERO
702: ISWROT = 0
703: *
704: NOTROT = 0
705: PSKIPPED = 0
706: *
707: * Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
708: * 1 <= p < q <= N. This is the first step toward a blocked implementation
709: * of the rotations. New implementation, based on block transformations,
710: * is under development.
711: *
712: DO 2000 ibr = 1, NBL
713: *
714: igl = ( ibr-1 )*KBL + 1
715: *
716: DO 1002 ir1 = 0, MIN0( LKAHEAD, NBL-ibr )
717: *
718: igl = igl + ir1*KBL
719: *
720: DO 2001 p = igl, MIN0( igl+KBL-1, N-1 )
721: *
722: * .. de Rijk's pivoting
723: *
724: q = IDAMAX( N-p+1, SVA( p ), 1 ) + p - 1
725: IF( p.NE.q ) THEN
726: CALL DSWAP( M, A( 1, p ), 1, A( 1, q ), 1 )
727: IF( RSVEC )CALL DSWAP( MVL, V( 1, p ), 1,
728: + V( 1, q ), 1 )
729: TEMP1 = SVA( p )
730: SVA( p ) = SVA( q )
731: SVA( q ) = TEMP1
732: TEMP1 = WORK( p )
733: WORK( p ) = WORK( q )
734: WORK( q ) = TEMP1
735: END IF
736: *
737: IF( ir1.EQ.0 ) THEN
738: *
739: * Column norms are periodically updated by explicit
740: * norm computation.
741: * Caveat:
742: * Unfortunately, some BLAS implementations compute DNRM2(M,A(1,p),1)
743: * as DSQRT(DDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
744: * overflow for ||A(:,p)||_2 > DSQRT(overflow_threshold), and to
745: * underflow for ||A(:,p)||_2 < DSQRT(underflow_threshold).
746: * Hence, DNRM2 cannot be trusted, not even in the case when
747: * the true norm is far from the under(over)flow boundaries.
748: * If properly implemented DNRM2 is available, the IF-THEN-ELSE
749: * below should read "AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)".
750: *
751: IF( ( SVA( p ).LT.ROOTBIG ) .AND.
752: + ( SVA( p ).GT.ROOTSFMIN ) ) THEN
753: SVA( p ) = DNRM2( M, A( 1, p ), 1 )*WORK( p )
754: ELSE
755: TEMP1 = ZERO
756: AAPP = ZERO
757: CALL DLASSQ( M, A( 1, p ), 1, TEMP1, AAPP )
758: SVA( p ) = TEMP1*DSQRT( AAPP )*WORK( p )
759: END IF
760: AAPP = SVA( p )
761: ELSE
762: AAPP = SVA( p )
763: END IF
764: *
765: IF( AAPP.GT.ZERO ) THEN
766: *
767: PSKIPPED = 0
768: *
769: DO 2002 q = p + 1, MIN0( igl+KBL-1, N )
770: *
771: AAQQ = SVA( q )
772: *
773: IF( AAQQ.GT.ZERO ) THEN
774: *
775: AAPP0 = AAPP
776: IF( AAQQ.GE.ONE ) THEN
777: ROTOK = ( SMALL*AAPP ).LE.AAQQ
778: IF( AAPP.LT.( BIG / AAQQ ) ) THEN
779: AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
780: + q ), 1 )*WORK( p )*WORK( q ) /
781: + AAQQ ) / AAPP
782: ELSE
783: CALL DCOPY( M, A( 1, p ), 1,
784: + WORK( N+1 ), 1 )
785: CALL DLASCL( 'G', 0, 0, AAPP,
786: + WORK( p ), M, 1,
787: + WORK( N+1 ), LDA, IERR )
788: AAPQ = DDOT( M, WORK( N+1 ), 1,
789: + A( 1, q ), 1 )*WORK( q ) / AAQQ
790: END IF
791: ELSE
792: ROTOK = AAPP.LE.( AAQQ / SMALL )
793: IF( AAPP.GT.( SMALL / AAQQ ) ) THEN
794: AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
795: + q ), 1 )*WORK( p )*WORK( q ) /
796: + AAQQ ) / AAPP
797: ELSE
798: CALL DCOPY( M, A( 1, q ), 1,
799: + WORK( N+1 ), 1 )
800: CALL DLASCL( 'G', 0, 0, AAQQ,
801: + WORK( q ), M, 1,
802: + WORK( N+1 ), LDA, IERR )
803: AAPQ = DDOT( M, WORK( N+1 ), 1,
804: + A( 1, p ), 1 )*WORK( p ) / AAPP
805: END IF
806: END IF
807: *
808: MXAAPQ = DMAX1( MXAAPQ, DABS( AAPQ ) )
809: *
810: * TO rotate or NOT to rotate, THAT is the question ...
811: *
812: IF( DABS( AAPQ ).GT.TOL ) THEN
813: *
814: * .. rotate
815: *[RTD] ROTATED = ROTATED + ONE
816: *
817: IF( ir1.EQ.0 ) THEN
818: NOTROT = 0
819: PSKIPPED = 0
820: ISWROT = ISWROT + 1
821: END IF
822: *
823: IF( ROTOK ) THEN
824: *
825: AQOAP = AAQQ / AAPP
826: APOAQ = AAPP / AAQQ
827: THETA = -HALF*DABS( AQOAP-APOAQ ) /
828: + AAPQ
829: *
830: IF( DABS( THETA ).GT.BIGTHETA ) THEN
831: *
832: T = HALF / THETA
833: FASTR( 3 ) = T*WORK( p ) / WORK( q )
834: FASTR( 4 ) = -T*WORK( q ) /
835: + WORK( p )
836: CALL DROTM( M, A( 1, p ), 1,
837: + A( 1, q ), 1, FASTR )
838: IF( RSVEC )CALL DROTM( MVL,
839: + V( 1, p ), 1,
840: + V( 1, q ), 1,
841: + FASTR )
842: SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
843: + ONE+T*APOAQ*AAPQ ) )
844: AAPP = AAPP*DSQRT( ONE-T*AQOAP*
845: + AAPQ )
846: MXSINJ = DMAX1( MXSINJ, DABS( T ) )
847: *
848: ELSE
849: *
850: * .. choose correct signum for THETA and rotate
851: *
852: THSIGN = -DSIGN( ONE, AAPQ )
853: T = ONE / ( THETA+THSIGN*
854: + DSQRT( ONE+THETA*THETA ) )
855: CS = DSQRT( ONE / ( ONE+T*T ) )
856: SN = T*CS
857: *
858: MXSINJ = DMAX1( MXSINJ, DABS( SN ) )
859: SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
860: + ONE+T*APOAQ*AAPQ ) )
861: AAPP = AAPP*DSQRT( DMAX1( ZERO,
862: + ONE-T*AQOAP*AAPQ ) )
863: *
864: APOAQ = WORK( p ) / WORK( q )
865: AQOAP = WORK( q ) / WORK( p )
866: IF( WORK( p ).GE.ONE ) THEN
867: IF( WORK( q ).GE.ONE ) THEN
868: FASTR( 3 ) = T*APOAQ
869: FASTR( 4 ) = -T*AQOAP
870: WORK( p ) = WORK( p )*CS
871: WORK( q ) = WORK( q )*CS
872: CALL DROTM( M, A( 1, p ), 1,
873: + A( 1, q ), 1,
874: + FASTR )
875: IF( RSVEC )CALL DROTM( MVL,
876: + V( 1, p ), 1, V( 1, q ),
877: + 1, FASTR )
878: ELSE
879: CALL DAXPY( M, -T*AQOAP,
880: + A( 1, q ), 1,
881: + A( 1, p ), 1 )
882: CALL DAXPY( M, CS*SN*APOAQ,
883: + A( 1, p ), 1,
884: + A( 1, q ), 1 )
885: WORK( p ) = WORK( p )*CS
886: WORK( q ) = WORK( q ) / CS
887: IF( RSVEC ) THEN
888: CALL DAXPY( MVL, -T*AQOAP,
889: + V( 1, q ), 1,
890: + V( 1, p ), 1 )
891: CALL DAXPY( MVL,
892: + CS*SN*APOAQ,
893: + V( 1, p ), 1,
894: + V( 1, q ), 1 )
895: END IF
896: END IF
897: ELSE
898: IF( WORK( q ).GE.ONE ) THEN
899: CALL DAXPY( M, T*APOAQ,
900: + A( 1, p ), 1,
901: + A( 1, q ), 1 )
902: CALL DAXPY( M, -CS*SN*AQOAP,
903: + A( 1, q ), 1,
904: + A( 1, p ), 1 )
905: WORK( p ) = WORK( p ) / CS
906: WORK( q ) = WORK( q )*CS
907: IF( RSVEC ) THEN
908: CALL DAXPY( MVL, T*APOAQ,
909: + V( 1, p ), 1,
910: + V( 1, q ), 1 )
911: CALL DAXPY( MVL,
912: + -CS*SN*AQOAP,
913: + V( 1, q ), 1,
914: + V( 1, p ), 1 )
915: END IF
916: ELSE
917: IF( WORK( p ).GE.WORK( q ) )
918: + THEN
919: CALL DAXPY( M, -T*AQOAP,
920: + A( 1, q ), 1,
921: + A( 1, p ), 1 )
922: CALL DAXPY( M, CS*SN*APOAQ,
923: + A( 1, p ), 1,
924: + A( 1, q ), 1 )
925: WORK( p ) = WORK( p )*CS
926: WORK( q ) = WORK( q ) / CS
927: IF( RSVEC ) THEN
928: CALL DAXPY( MVL,
929: + -T*AQOAP,
930: + V( 1, q ), 1,
931: + V( 1, p ), 1 )
932: CALL DAXPY( MVL,
933: + CS*SN*APOAQ,
934: + V( 1, p ), 1,
935: + V( 1, q ), 1 )
936: END IF
937: ELSE
938: CALL DAXPY( M, T*APOAQ,
939: + A( 1, p ), 1,
940: + A( 1, q ), 1 )
941: CALL DAXPY( M,
942: + -CS*SN*AQOAP,
943: + A( 1, q ), 1,
944: + A( 1, p ), 1 )
945: WORK( p ) = WORK( p ) / CS
946: WORK( q ) = WORK( q )*CS
947: IF( RSVEC ) THEN
948: CALL DAXPY( MVL,
949: + T*APOAQ, V( 1, p ),
950: + 1, V( 1, q ), 1 )
951: CALL DAXPY( MVL,
952: + -CS*SN*AQOAP,
953: + V( 1, q ), 1,
954: + V( 1, p ), 1 )
955: END IF
956: END IF
957: END IF
958: END IF
959: END IF
960: *
961: ELSE
962: * .. have to use modified Gram-Schmidt like transformation
963: CALL DCOPY( M, A( 1, p ), 1,
964: + WORK( N+1 ), 1 )
965: CALL DLASCL( 'G', 0, 0, AAPP, ONE, M,
966: + 1, WORK( N+1 ), LDA,
967: + IERR )
968: CALL DLASCL( 'G', 0, 0, AAQQ, ONE, M,
969: + 1, A( 1, q ), LDA, IERR )
970: TEMP1 = -AAPQ*WORK( p ) / WORK( q )
971: CALL DAXPY( M, TEMP1, WORK( N+1 ), 1,
972: + A( 1, q ), 1 )
973: CALL DLASCL( 'G', 0, 0, ONE, AAQQ, M,
974: + 1, A( 1, q ), LDA, IERR )
975: SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
976: + ONE-AAPQ*AAPQ ) )
977: MXSINJ = DMAX1( MXSINJ, SFMIN )
978: END IF
979: * END IF ROTOK THEN ... ELSE
980: *
981: * In the case of cancellation in updating SVA(q), SVA(p)
982: * recompute SVA(q), SVA(p).
983: *
984: IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS )
985: + THEN
986: IF( ( AAQQ.LT.ROOTBIG ) .AND.
987: + ( AAQQ.GT.ROOTSFMIN ) ) THEN
988: SVA( q ) = DNRM2( M, A( 1, q ), 1 )*
989: + WORK( q )
990: ELSE
991: T = ZERO
992: AAQQ = ZERO
993: CALL DLASSQ( M, A( 1, q ), 1, T,
994: + AAQQ )
995: SVA( q ) = T*DSQRT( AAQQ )*WORK( q )
996: END IF
997: END IF
998: IF( ( AAPP / AAPP0 ).LE.ROOTEPS ) THEN
999: IF( ( AAPP.LT.ROOTBIG ) .AND.
1000: + ( AAPP.GT.ROOTSFMIN ) ) THEN
1001: AAPP = DNRM2( M, A( 1, p ), 1 )*
1002: + WORK( p )
1003: ELSE
1004: T = ZERO
1005: AAPP = ZERO
1006: CALL DLASSQ( M, A( 1, p ), 1, T,
1007: + AAPP )
1008: AAPP = T*DSQRT( AAPP )*WORK( p )
1009: END IF
1010: SVA( p ) = AAPP
1011: END IF
1012: *
1013: ELSE
1014: * A(:,p) and A(:,q) already numerically orthogonal
1015: IF( ir1.EQ.0 )NOTROT = NOTROT + 1
1016: *[RTD] SKIPPED = SKIPPED + 1
1017: PSKIPPED = PSKIPPED + 1
1018: END IF
1019: ELSE
1020: * A(:,q) is zero column
1021: IF( ir1.EQ.0 )NOTROT = NOTROT + 1
1022: PSKIPPED = PSKIPPED + 1
1023: END IF
1024: *
1025: IF( ( i.LE.SWBAND ) .AND.
1026: + ( PSKIPPED.GT.ROWSKIP ) ) THEN
1027: IF( ir1.EQ.0 )AAPP = -AAPP
1028: NOTROT = 0
1029: GO TO 2103
1030: END IF
1031: *
1032: 2002 CONTINUE
1033: * END q-LOOP
1034: *
1035: 2103 CONTINUE
1036: * bailed out of q-loop
1037: *
1038: SVA( p ) = AAPP
1039: *
1040: ELSE
1041: SVA( p ) = AAPP
1042: IF( ( ir1.EQ.0 ) .AND. ( AAPP.EQ.ZERO ) )
1043: + NOTROT = NOTROT + MIN0( igl+KBL-1, N ) - p
1044: END IF
1045: *
1046: 2001 CONTINUE
1047: * end of the p-loop
1048: * end of doing the block ( ibr, ibr )
1049: 1002 CONTINUE
1050: * end of ir1-loop
1051: *
1052: * ... go to the off diagonal blocks
1053: *
1054: igl = ( ibr-1 )*KBL + 1
1055: *
1056: DO 2010 jbc = ibr + 1, NBL
1057: *
1058: jgl = ( jbc-1 )*KBL + 1
1059: *
1060: * doing the block at ( ibr, jbc )
1061: *
1062: IJBLSK = 0
1063: DO 2100 p = igl, MIN0( igl+KBL-1, N )
1064: *
1065: AAPP = SVA( p )
1066: IF( AAPP.GT.ZERO ) THEN
1067: *
1068: PSKIPPED = 0
1069: *
1070: DO 2200 q = jgl, MIN0( jgl+KBL-1, N )
1071: *
1072: AAQQ = SVA( q )
1073: IF( AAQQ.GT.ZERO ) THEN
1074: AAPP0 = AAPP
1075: *
1076: * .. M x 2 Jacobi SVD ..
1077: *
1078: * Safe Gram matrix computation
1079: *
1080: IF( AAQQ.GE.ONE ) THEN
1081: IF( AAPP.GE.AAQQ ) THEN
1082: ROTOK = ( SMALL*AAPP ).LE.AAQQ
1083: ELSE
1084: ROTOK = ( SMALL*AAQQ ).LE.AAPP
1085: END IF
1086: IF( AAPP.LT.( BIG / AAQQ ) ) THEN
1087: AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
1088: + q ), 1 )*WORK( p )*WORK( q ) /
1089: + AAQQ ) / AAPP
1090: ELSE
1091: CALL DCOPY( M, A( 1, p ), 1,
1092: + WORK( N+1 ), 1 )
1093: CALL DLASCL( 'G', 0, 0, AAPP,
1094: + WORK( p ), M, 1,
1095: + WORK( N+1 ), LDA, IERR )
1096: AAPQ = DDOT( M, WORK( N+1 ), 1,
1097: + A( 1, q ), 1 )*WORK( q ) / AAQQ
1098: END IF
1099: ELSE
1100: IF( AAPP.GE.AAQQ ) THEN
1101: ROTOK = AAPP.LE.( AAQQ / SMALL )
1102: ELSE
1103: ROTOK = AAQQ.LE.( AAPP / SMALL )
1104: END IF
1105: IF( AAPP.GT.( SMALL / AAQQ ) ) THEN
1106: AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
1107: + q ), 1 )*WORK( p )*WORK( q ) /
1108: + AAQQ ) / AAPP
1109: ELSE
1110: CALL DCOPY( M, A( 1, q ), 1,
1111: + WORK( N+1 ), 1 )
1112: CALL DLASCL( 'G', 0, 0, AAQQ,
1113: + WORK( q ), M, 1,
1114: + WORK( N+1 ), LDA, IERR )
1115: AAPQ = DDOT( M, WORK( N+1 ), 1,
1116: + A( 1, p ), 1 )*WORK( p ) / AAPP
1117: END IF
1118: END IF
1119: *
1120: MXAAPQ = DMAX1( MXAAPQ, DABS( AAPQ ) )
1121: *
1122: * TO rotate or NOT to rotate, THAT is the question ...
1123: *
1124: IF( DABS( AAPQ ).GT.TOL ) THEN
1125: NOTROT = 0
1126: *[RTD] ROTATED = ROTATED + 1
1127: PSKIPPED = 0
1128: ISWROT = ISWROT + 1
1129: *
1130: IF( ROTOK ) THEN
1131: *
1132: AQOAP = AAQQ / AAPP
1133: APOAQ = AAPP / AAQQ
1134: THETA = -HALF*DABS( AQOAP-APOAQ ) /
1135: + AAPQ
1136: IF( AAQQ.GT.AAPP0 )THETA = -THETA
1137: *
1138: IF( DABS( THETA ).GT.BIGTHETA ) THEN
1139: T = HALF / THETA
1140: FASTR( 3 ) = T*WORK( p ) / WORK( q )
1141: FASTR( 4 ) = -T*WORK( q ) /
1142: + WORK( p )
1143: CALL DROTM( M, A( 1, p ), 1,
1144: + A( 1, q ), 1, FASTR )
1145: IF( RSVEC )CALL DROTM( MVL,
1146: + V( 1, p ), 1,
1147: + V( 1, q ), 1,
1148: + FASTR )
1149: SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
1150: + ONE+T*APOAQ*AAPQ ) )
1151: AAPP = AAPP*DSQRT( DMAX1( ZERO,
1152: + ONE-T*AQOAP*AAPQ ) )
1153: MXSINJ = DMAX1( MXSINJ, DABS( T ) )
1154: ELSE
1155: *
1156: * .. choose correct signum for THETA and rotate
1157: *
1158: THSIGN = -DSIGN( ONE, AAPQ )
1159: IF( AAQQ.GT.AAPP0 )THSIGN = -THSIGN
1160: T = ONE / ( THETA+THSIGN*
1161: + DSQRT( ONE+THETA*THETA ) )
1162: CS = DSQRT( ONE / ( ONE+T*T ) )
1163: SN = T*CS
1164: MXSINJ = DMAX1( MXSINJ, DABS( SN ) )
1165: SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
1166: + ONE+T*APOAQ*AAPQ ) )
1167: AAPP = AAPP*DSQRT( ONE-T*AQOAP*
1168: + AAPQ )
1169: *
1170: APOAQ = WORK( p ) / WORK( q )
1171: AQOAP = WORK( q ) / WORK( p )
1172: IF( WORK( p ).GE.ONE ) THEN
1173: *
1174: IF( WORK( q ).GE.ONE ) THEN
1175: FASTR( 3 ) = T*APOAQ
1176: FASTR( 4 ) = -T*AQOAP
1177: WORK( p ) = WORK( p )*CS
1178: WORK( q ) = WORK( q )*CS
1179: CALL DROTM( M, A( 1, p ), 1,
1180: + A( 1, q ), 1,
1181: + FASTR )
1182: IF( RSVEC )CALL DROTM( MVL,
1183: + V( 1, p ), 1, V( 1, q ),
1184: + 1, FASTR )
1185: ELSE
1186: CALL DAXPY( M, -T*AQOAP,
1187: + A( 1, q ), 1,
1188: + A( 1, p ), 1 )
1189: CALL DAXPY( M, CS*SN*APOAQ,
1190: + A( 1, p ), 1,
1191: + A( 1, q ), 1 )
1192: IF( RSVEC ) THEN
1193: CALL DAXPY( MVL, -T*AQOAP,
1194: + V( 1, q ), 1,
1195: + V( 1, p ), 1 )
1196: CALL DAXPY( MVL,
1197: + CS*SN*APOAQ,
1198: + V( 1, p ), 1,
1199: + V( 1, q ), 1 )
1200: END IF
1201: WORK( p ) = WORK( p )*CS
1202: WORK( q ) = WORK( q ) / CS
1203: END IF
1204: ELSE
1205: IF( WORK( q ).GE.ONE ) THEN
1206: CALL DAXPY( M, T*APOAQ,
1207: + A( 1, p ), 1,
1208: + A( 1, q ), 1 )
1209: CALL DAXPY( M, -CS*SN*AQOAP,
1210: + A( 1, q ), 1,
1211: + A( 1, p ), 1 )
1212: IF( RSVEC ) THEN
1213: CALL DAXPY( MVL, T*APOAQ,
1214: + V( 1, p ), 1,
1215: + V( 1, q ), 1 )
1216: CALL DAXPY( MVL,
1217: + -CS*SN*AQOAP,
1218: + V( 1, q ), 1,
1219: + V( 1, p ), 1 )
1220: END IF
1221: WORK( p ) = WORK( p ) / CS
1222: WORK( q ) = WORK( q )*CS
1223: ELSE
1224: IF( WORK( p ).GE.WORK( q ) )
1225: + THEN
1226: CALL DAXPY( M, -T*AQOAP,
1227: + A( 1, q ), 1,
1228: + A( 1, p ), 1 )
1229: CALL DAXPY( M, CS*SN*APOAQ,
1230: + A( 1, p ), 1,
1231: + A( 1, q ), 1 )
1232: WORK( p ) = WORK( p )*CS
1233: WORK( q ) = WORK( q ) / CS
1234: IF( RSVEC ) THEN
1235: CALL DAXPY( MVL,
1236: + -T*AQOAP,
1237: + V( 1, q ), 1,
1238: + V( 1, p ), 1 )
1239: CALL DAXPY( MVL,
1240: + CS*SN*APOAQ,
1241: + V( 1, p ), 1,
1242: + V( 1, q ), 1 )
1243: END IF
1244: ELSE
1245: CALL DAXPY( M, T*APOAQ,
1246: + A( 1, p ), 1,
1247: + A( 1, q ), 1 )
1248: CALL DAXPY( M,
1249: + -CS*SN*AQOAP,
1250: + A( 1, q ), 1,
1251: + A( 1, p ), 1 )
1252: WORK( p ) = WORK( p ) / CS
1253: WORK( q ) = WORK( q )*CS
1254: IF( RSVEC ) THEN
1255: CALL DAXPY( MVL,
1256: + T*APOAQ, V( 1, p ),
1257: + 1, V( 1, q ), 1 )
1258: CALL DAXPY( MVL,
1259: + -CS*SN*AQOAP,
1260: + V( 1, q ), 1,
1261: + V( 1, p ), 1 )
1262: END IF
1263: END IF
1264: END IF
1265: END IF
1266: END IF
1267: *
1268: ELSE
1269: IF( AAPP.GT.AAQQ ) THEN
1270: CALL DCOPY( M, A( 1, p ), 1,
1271: + WORK( N+1 ), 1 )
1272: CALL DLASCL( 'G', 0, 0, AAPP, ONE,
1273: + M, 1, WORK( N+1 ), LDA,
1274: + IERR )
1275: CALL DLASCL( 'G', 0, 0, AAQQ, ONE,
1276: + M, 1, A( 1, q ), LDA,
1277: + IERR )
1278: TEMP1 = -AAPQ*WORK( p ) / WORK( q )
1279: CALL DAXPY( M, TEMP1, WORK( N+1 ),
1280: + 1, A( 1, q ), 1 )
1281: CALL DLASCL( 'G', 0, 0, ONE, AAQQ,
1282: + M, 1, A( 1, q ), LDA,
1283: + IERR )
1284: SVA( q ) = AAQQ*DSQRT( DMAX1( ZERO,
1285: + ONE-AAPQ*AAPQ ) )
1286: MXSINJ = DMAX1( MXSINJ, SFMIN )
1287: ELSE
1288: CALL DCOPY( M, A( 1, q ), 1,
1289: + WORK( N+1 ), 1 )
1290: CALL DLASCL( 'G', 0, 0, AAQQ, ONE,
1291: + M, 1, WORK( N+1 ), LDA,
1292: + IERR )
1293: CALL DLASCL( 'G', 0, 0, AAPP, ONE,
1294: + M, 1, A( 1, p ), LDA,
1295: + IERR )
1296: TEMP1 = -AAPQ*WORK( q ) / WORK( p )
1297: CALL DAXPY( M, TEMP1, WORK( N+1 ),
1298: + 1, A( 1, p ), 1 )
1299: CALL DLASCL( 'G', 0, 0, ONE, AAPP,
1300: + M, 1, A( 1, p ), LDA,
1301: + IERR )
1302: SVA( p ) = AAPP*DSQRT( DMAX1( ZERO,
1303: + ONE-AAPQ*AAPQ ) )
1304: MXSINJ = DMAX1( MXSINJ, SFMIN )
1305: END IF
1306: END IF
1307: * END IF ROTOK THEN ... ELSE
1308: *
1309: * In the case of cancellation in updating SVA(q)
1310: * .. recompute SVA(q)
1311: IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS )
1312: + THEN
1313: IF( ( AAQQ.LT.ROOTBIG ) .AND.
1314: + ( AAQQ.GT.ROOTSFMIN ) ) THEN
1315: SVA( q ) = DNRM2( M, A( 1, q ), 1 )*
1316: + WORK( q )
1317: ELSE
1318: T = ZERO
1319: AAQQ = ZERO
1320: CALL DLASSQ( M, A( 1, q ), 1, T,
1321: + AAQQ )
1322: SVA( q ) = T*DSQRT( AAQQ )*WORK( q )
1323: END IF
1324: END IF
1325: IF( ( AAPP / AAPP0 )**2.LE.ROOTEPS ) THEN
1326: IF( ( AAPP.LT.ROOTBIG ) .AND.
1327: + ( AAPP.GT.ROOTSFMIN ) ) THEN
1328: AAPP = DNRM2( M, A( 1, p ), 1 )*
1329: + WORK( p )
1330: ELSE
1331: T = ZERO
1332: AAPP = ZERO
1333: CALL DLASSQ( M, A( 1, p ), 1, T,
1334: + AAPP )
1335: AAPP = T*DSQRT( AAPP )*WORK( p )
1336: END IF
1337: SVA( p ) = AAPP
1338: END IF
1339: * end of OK rotation
1340: ELSE
1341: NOTROT = NOTROT + 1
1342: *[RTD] SKIPPED = SKIPPED + 1
1343: PSKIPPED = PSKIPPED + 1
1344: IJBLSK = IJBLSK + 1
1345: END IF
1346: ELSE
1347: NOTROT = NOTROT + 1
1348: PSKIPPED = PSKIPPED + 1
1349: IJBLSK = IJBLSK + 1
1350: END IF
1351: *
1352: IF( ( i.LE.SWBAND ) .AND. ( IJBLSK.GE.BLSKIP ) )
1353: + THEN
1354: SVA( p ) = AAPP
1355: NOTROT = 0
1356: GO TO 2011
1357: END IF
1358: IF( ( i.LE.SWBAND ) .AND.
1359: + ( PSKIPPED.GT.ROWSKIP ) ) THEN
1360: AAPP = -AAPP
1361: NOTROT = 0
1362: GO TO 2203
1363: END IF
1364: *
1365: 2200 CONTINUE
1366: * end of the q-loop
1367: 2203 CONTINUE
1368: *
1369: SVA( p ) = AAPP
1370: *
1371: ELSE
1372: *
1373: IF( AAPP.EQ.ZERO )NOTROT = NOTROT +
1374: + MIN0( jgl+KBL-1, N ) - jgl + 1
1375: IF( AAPP.LT.ZERO )NOTROT = 0
1376: *
1377: END IF
1378: *
1379: 2100 CONTINUE
1380: * end of the p-loop
1381: 2010 CONTINUE
1382: * end of the jbc-loop
1383: 2011 CONTINUE
1384: *2011 bailed out of the jbc-loop
1385: DO 2012 p = igl, MIN0( igl+KBL-1, N )
1386: SVA( p ) = DABS( SVA( p ) )
1387: 2012 CONTINUE
1388: ***
1389: 2000 CONTINUE
1390: *2000 :: end of the ibr-loop
1391: *
1392: * .. update SVA(N)
1393: IF( ( SVA( N ).LT.ROOTBIG ) .AND. ( SVA( N ).GT.ROOTSFMIN ) )
1394: + THEN
1395: SVA( N ) = DNRM2( M, A( 1, N ), 1 )*WORK( N )
1396: ELSE
1397: T = ZERO
1398: AAPP = ZERO
1399: CALL DLASSQ( M, A( 1, N ), 1, T, AAPP )
1400: SVA( N ) = T*DSQRT( AAPP )*WORK( N )
1401: END IF
1402: *
1403: * Additional steering devices
1404: *
1405: IF( ( i.LT.SWBAND ) .AND. ( ( MXAAPQ.LE.ROOTTOL ) .OR.
1406: + ( ISWROT.LE.N ) ) )SWBAND = i
1407: *
1408: IF( ( i.GT.SWBAND+1 ) .AND. ( MXAAPQ.LT.DSQRT( DBLE( N ) )*
1409: + TOL ) .AND. ( DBLE( N )*MXAAPQ*MXSINJ.LT.TOL ) ) THEN
1410: GO TO 1994
1411: END IF
1412: *
1413: IF( NOTROT.GE.EMPTSW )GO TO 1994
1414: *
1415: 1993 CONTINUE
1416: * end i=1:NSWEEP loop
1417: *
1418: * #:( Reaching this point means that the procedure has not converged.
1419: INFO = NSWEEP - 1
1420: GO TO 1995
1421: *
1422: 1994 CONTINUE
1423: * #:) Reaching this point means numerical convergence after the i-th
1424: * sweep.
1425: *
1426: INFO = 0
1427: * #:) INFO = 0 confirms successful iterations.
1428: 1995 CONTINUE
1429: *
1430: * Sort the singular values and find how many are above
1431: * the underflow threshold.
1432: *
1433: N2 = 0
1434: N4 = 0
1435: DO 5991 p = 1, N - 1
1436: q = IDAMAX( N-p+1, SVA( p ), 1 ) + p - 1
1437: IF( p.NE.q ) THEN
1438: TEMP1 = SVA( p )
1439: SVA( p ) = SVA( q )
1440: SVA( q ) = TEMP1
1441: TEMP1 = WORK( p )
1442: WORK( p ) = WORK( q )
1443: WORK( q ) = TEMP1
1444: CALL DSWAP( M, A( 1, p ), 1, A( 1, q ), 1 )
1445: IF( RSVEC )CALL DSWAP( MVL, V( 1, p ), 1, V( 1, q ), 1 )
1446: END IF
1447: IF( SVA( p ).NE.ZERO ) THEN
1448: N4 = N4 + 1
1449: IF( SVA( p )*SCALE.GT.SFMIN )N2 = N2 + 1
1450: END IF
1451: 5991 CONTINUE
1452: IF( SVA( N ).NE.ZERO ) THEN
1453: N4 = N4 + 1
1454: IF( SVA( N )*SCALE.GT.SFMIN )N2 = N2 + 1
1455: END IF
1456: *
1457: * Normalize the left singular vectors.
1458: *
1459: IF( LSVEC .OR. UCTOL ) THEN
1460: DO 1998 p = 1, N2
1461: CALL DSCAL( M, WORK( p ) / SVA( p ), A( 1, p ), 1 )
1462: 1998 CONTINUE
1463: END IF
1464: *
1465: * Scale the product of Jacobi rotations (assemble the fast rotations).
1466: *
1467: IF( RSVEC ) THEN
1468: IF( APPLV ) THEN
1469: DO 2398 p = 1, N
1470: CALL DSCAL( MVL, WORK( p ), V( 1, p ), 1 )
1471: 2398 CONTINUE
1472: ELSE
1473: DO 2399 p = 1, N
1474: TEMP1 = ONE / DNRM2( MVL, V( 1, p ), 1 )
1475: CALL DSCAL( MVL, TEMP1, V( 1, p ), 1 )
1476: 2399 CONTINUE
1477: END IF
1478: END IF
1479: *
1480: * Undo scaling, if necessary (and possible).
1481: IF( ( ( SCALE.GT.ONE ) .AND. ( SVA( 1 ).LT.( BIG /
1482: + SCALE ) ) ) .OR. ( ( SCALE.LT.ONE ) .AND. ( SVA( N2 ).GT.
1483: + ( SFMIN / SCALE ) ) ) ) THEN
1484: DO 2400 p = 1, N
1485: SVA( p ) = SCALE*SVA( p )
1486: 2400 CONTINUE
1487: SCALE = ONE
1488: END IF
1489: *
1490: WORK( 1 ) = SCALE
1491: * The singular values of A are SCALE*SVA(1:N). If SCALE.NE.ONE
1492: * then some of the singular values may overflow or underflow and
1493: * the spectrum is given in this factored representation.
1494: *
1495: WORK( 2 ) = DBLE( N4 )
1496: * N4 is the number of computed nonzero singular values of A.
1497: *
1498: WORK( 3 ) = DBLE( N2 )
1499: * N2 is the number of singular values of A greater than SFMIN.
1500: * If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1501: * that may carry some information.
1502: *
1503: WORK( 4 ) = DBLE( i )
1504: * i is the index of the last sweep before declaring convergence.
1505: *
1506: WORK( 5 ) = MXAAPQ
1507: * MXAAPQ is the largest absolute value of scaled pivots in the
1508: * last sweep
1509: *
1510: WORK( 6 ) = MXSINJ
1511: * MXSINJ is the largest absolute value of the sines of Jacobi angles
1512: * in the last sweep
1513: *
1514: RETURN
1515: * ..
1516: * .. END OF DGESVJ
1517: * ..
1518: END
CVSweb interface <joel.bertrand@systella.fr>