1: *> \brief \b DGESVJ
2: *
3: * =========== DOCUMENTATION ===========
4: *
5: * Online html documentation available at
6: * http://www.netlib.org/lapack/explore-html/
7: *
8: *> \htmlonly
9: *> Download DGESVJ + dependencies
10: *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/dgesvj.f">
11: *> [TGZ]</a>
12: *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/dgesvj.f">
13: *> [ZIP]</a>
14: *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/dgesvj.f">
15: *> [TXT]</a>
16: *> \endhtmlonly
17: *
18: * Definition:
19: * ===========
20: *
21: * SUBROUTINE DGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
22: * LDV, WORK, LWORK, INFO )
23: *
24: * .. Scalar Arguments ..
25: * INTEGER INFO, LDA, LDV, LWORK, M, MV, N
26: * CHARACTER*1 JOBA, JOBU, JOBV
27: * ..
28: * .. Array Arguments ..
29: * DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ),
30: * $ WORK( LWORK )
31: * ..
32: *
33: *
34: *> \par Purpose:
35: * =============
36: *>
37: *> \verbatim
38: *>
39: *> DGESVJ computes the singular value decomposition (SVD) of a real
40: *> M-by-N matrix A, where M >= N. The SVD of A is written as
41: *> [++] [xx] [x0] [xx]
42: *> A = U * SIGMA * V^t, [++] = [xx] * [ox] * [xx]
43: *> [++] [xx]
44: *> where SIGMA is an N-by-N diagonal matrix, U is an M-by-N orthonormal
45: *> matrix, and V is an N-by-N orthogonal matrix. The diagonal elements
46: *> of SIGMA are the singular values of A. The columns of U and V are the
47: *> left and the right singular vectors of A, respectively.
48: *> DGESVJ can sometimes compute tiny singular values and their singular vectors much
49: *> more accurately than other SVD routines, see below under Further Details.
50: *> \endverbatim
51: *
52: * Arguments:
53: * ==========
54: *
55: *> \param[in] JOBA
56: *> \verbatim
57: *> JOBA is CHARACTER* 1
58: *> Specifies the structure of A.
59: *> = 'L': The input matrix A is lower triangular;
60: *> = 'U': The input matrix A is upper triangular;
61: *> = 'G': The input matrix A is general M-by-N matrix, M >= N.
62: *> \endverbatim
63: *>
64: *> \param[in] JOBU
65: *> \verbatim
66: *> JOBU is CHARACTER*1
67: *> Specifies whether to compute the left singular vectors
68: *> (columns of U):
69: *> = 'U': The left singular vectors corresponding to the nonzero
70: *> singular values are computed and returned in the leading
71: *> columns of A. See more details in the description of A.
72: *> The default numerical orthogonality threshold is set to
73: *> approximately TOL=CTOL*EPS, CTOL=DSQRT(M), EPS=DLAMCH('E').
74: *> = 'C': Analogous to JOBU='U', except that user can control the
75: *> level of numerical orthogonality of the computed left
76: *> singular vectors. TOL can be set to TOL = CTOL*EPS, where
77: *> CTOL is given on input in the array WORK.
78: *> No CTOL smaller than ONE is allowed. CTOL greater
79: *> than 1 / EPS is meaningless. The option 'C'
80: *> can be used if M*EPS is satisfactory orthogonality
81: *> of the computed left singular vectors, so CTOL=M could
82: *> save few sweeps of Jacobi rotations.
83: *> See the descriptions of A and WORK(1).
84: *> = 'N': The matrix U is not computed. However, see the
85: *> description of A.
86: *> \endverbatim
87: *>
88: *> \param[in] JOBV
89: *> \verbatim
90: *> JOBV is CHARACTER*1
91: *> Specifies whether to compute the right singular vectors, that
92: *> is, the matrix V:
93: *> = 'V' : the matrix V is computed and returned in the array V
94: *> = 'A' : the Jacobi rotations are applied to the MV-by-N
95: *> array V. In other words, the right singular vector
96: *> matrix V is not computed explicitly, instead it is
97: *> applied to an MV-by-N matrix initially stored in the
98: *> first MV rows of V.
99: *> = 'N' : the matrix V is not computed and the array V is not
100: *> referenced
101: *> \endverbatim
102: *>
103: *> \param[in] M
104: *> \verbatim
105: *> M is INTEGER
106: *> The number of rows of the input matrix A. 1/DLAMCH('E') > M >= 0.
107: *> \endverbatim
108: *>
109: *> \param[in] N
110: *> \verbatim
111: *> N is INTEGER
112: *> The number of columns of the input matrix A.
113: *> M >= N >= 0.
114: *> \endverbatim
115: *>
116: *> \param[in,out] A
117: *> \verbatim
118: *> A is DOUBLE PRECISION array, dimension (LDA,N)
119: *> On entry, the M-by-N matrix A.
120: *> On exit :
121: *> If JOBU .EQ. 'U' .OR. JOBU .EQ. 'C' :
122: *> If INFO .EQ. 0 :
123: *> RANKA orthonormal columns of U are returned in the
124: *> leading RANKA columns of the array A. Here RANKA <= N
125: *> is the number of computed singular values of A that are
126: *> above the underflow threshold DLAMCH('S'). The singular
127: *> vectors corresponding to underflowed or zero singular
128: *> values are not computed. The value of RANKA is returned
129: *> in the array WORK as RANKA=NINT(WORK(2)). Also see the
130: *> descriptions of SVA and WORK. The computed columns of U
131: *> are mutually numerically orthogonal up to approximately
132: *> TOL=DSQRT(M)*EPS (default); or TOL=CTOL*EPS (JOBU.EQ.'C'),
133: *> see the description of JOBU.
134: *> If INFO .GT. 0 :
135: *> the procedure DGESVJ did not converge in the given number
136: *> of iterations (sweeps). In that case, the computed
137: *> columns of U may not be orthogonal up to TOL. The output
138: *> U (stored in A), SIGMA (given by the computed singular
139: *> values in SVA(1:N)) and V is still a decomposition of the
140: *> input matrix A in the sense that the residual
141: *> ||A-SCALE*U*SIGMA*V^T||_2 / ||A||_2 is small.
142: *>
143: *> If JOBU .EQ. 'N' :
144: *> If INFO .EQ. 0 :
145: *> Note that the left singular vectors are 'for free' in the
146: *> one-sided Jacobi SVD algorithm. However, if only the
147: *> singular values are needed, the level of numerical
148: *> orthogonality of U is not an issue and iterations are
149: *> stopped when the columns of the iterated matrix are
150: *> numerically orthogonal up to approximately M*EPS. Thus,
151: *> on exit, A contains the columns of U scaled with the
152: *> corresponding singular values.
153: *> If INFO .GT. 0 :
154: *> the procedure DGESVJ did not converge in the given number
155: *> of iterations (sweeps).
156: *> \endverbatim
157: *>
158: *> \param[in] LDA
159: *> \verbatim
160: *> LDA is INTEGER
161: *> The leading dimension of the array A. LDA >= max(1,M).
162: *> \endverbatim
163: *>
164: *> \param[out] SVA
165: *> \verbatim
166: *> SVA is DOUBLE PRECISION array, dimension (N)
167: *> On exit :
168: *> If INFO .EQ. 0 :
169: *> depending on the value SCALE = WORK(1), we have:
170: *> If SCALE .EQ. ONE :
171: *> SVA(1:N) contains the computed singular values of A.
172: *> During the computation SVA contains the Euclidean column
173: *> norms of the iterated matrices in the array A.
174: *> If SCALE .NE. ONE :
175: *> The singular values of A are SCALE*SVA(1:N), and this
176: *> factored representation is due to the fact that some of the
177: *> singular values of A might underflow or overflow.
178: *> If INFO .GT. 0 :
179: *> the procedure DGESVJ did not converge in the given number of
180: *> iterations (sweeps) and SCALE*SVA(1:N) may not be accurate.
181: *> \endverbatim
182: *>
183: *> \param[in] MV
184: *> \verbatim
185: *> MV is INTEGER
186: *> If JOBV .EQ. 'A', then the product of Jacobi rotations in DGESVJ
187: *> is applied to the first MV rows of V. See the description of JOBV.
188: *> \endverbatim
189: *>
190: *> \param[in,out] V
191: *> \verbatim
192: *> V is DOUBLE PRECISION array, dimension (LDV,N)
193: *> If JOBV = 'V', then V contains on exit the N-by-N matrix of
194: *> the right singular vectors;
195: *> If JOBV = 'A', then V contains the product of the computed right
196: *> singular vector matrix and the initial matrix in
197: *> the array V.
198: *> If JOBV = 'N', then V is not referenced.
199: *> \endverbatim
200: *>
201: *> \param[in] LDV
202: *> \verbatim
203: *> LDV is INTEGER
204: *> The leading dimension of the array V, LDV .GE. 1.
205: *> If JOBV .EQ. 'V', then LDV .GE. max(1,N).
206: *> If JOBV .EQ. 'A', then LDV .GE. max(1,MV) .
207: *> \endverbatim
208: *>
209: *> \param[in,out] WORK
210: *> \verbatim
211: *> WORK is DOUBLE PRECISION array, dimension MAX(6,M+N).
212: *> On entry :
213: *> If JOBU .EQ. 'C' :
214: *> WORK(1) = CTOL, where CTOL defines the threshold for convergence.
215: *> The process stops if all columns of A are mutually
216: *> orthogonal up to CTOL*EPS, EPS=DLAMCH('E').
217: *> It is required that CTOL >= ONE, i.e. it is not
218: *> allowed to force the routine to obtain orthogonality
219: *> below EPS.
220: *> On exit :
221: *> WORK(1) = SCALE is the scaling factor such that SCALE*SVA(1:N)
222: *> are the computed singular values of A.
223: *> (See description of SVA().)
224: *> WORK(2) = NINT(WORK(2)) is the number of the computed nonzero
225: *> singular values.
226: *> WORK(3) = NINT(WORK(3)) is the number of the computed singular
227: *> values that are larger than the underflow threshold.
228: *> WORK(4) = NINT(WORK(4)) is the number of sweeps of Jacobi
229: *> rotations needed for numerical convergence.
230: *> WORK(5) = max_{i.NE.j} |COS(A(:,i),A(:,j))| in the last sweep.
231: *> This is useful information in cases when DGESVJ did
232: *> not converge, as it can be used to estimate whether
233: *> the output is stil useful and for post festum analysis.
234: *> WORK(6) = the largest absolute value over all sines of the
235: *> Jacobi rotation angles in the last sweep. It can be
236: *> useful for a post festum analysis.
237: *> \endverbatim
238: *>
239: *> \param[in] LWORK
240: *> \verbatim
241: *> LWORK is INTEGER
242: *> length of WORK, WORK >= MAX(6,M+N)
243: *> \endverbatim
244: *>
245: *> \param[out] INFO
246: *> \verbatim
247: *> INFO is INTEGER
248: *> = 0 : successful exit.
249: *> < 0 : if INFO = -i, then the i-th argument had an illegal value
250: *> > 0 : DGESVJ did not converge in the maximal allowed number (30)
251: *> of sweeps. The output may still be useful. See the
252: *> description of WORK.
253: *> \endverbatim
254: *
255: * Authors:
256: * ========
257: *
258: *> \author Univ. of Tennessee
259: *> \author Univ. of California Berkeley
260: *> \author Univ. of Colorado Denver
261: *> \author NAG Ltd.
262: *
263: *> \date December 2016
264: *
265: *> \ingroup doubleGEcomputational
266: *
267: *> \par Further Details:
268: * =====================
269: *>
270: *> \verbatim
271: *>
272: *> The orthogonal N-by-N matrix V is obtained as a product of Jacobi plane
273: *> rotations. The rotations are implemented as fast scaled rotations of
274: *> Anda and Park [1]. In the case of underflow of the Jacobi angle, a
275: *> modified Jacobi transformation of Drmac [4] is used. Pivot strategy uses
276: *> column interchanges of de Rijk [2]. The relative accuracy of the computed
277: *> singular values and the accuracy of the computed singular vectors (in
278: *> angle metric) is as guaranteed by the theory of Demmel and Veselic [3].
279: *> The condition number that determines the accuracy in the full rank case
280: *> is essentially min_{D=diag} kappa(A*D), where kappa(.) is the
281: *> spectral condition number. The best performance of this Jacobi SVD
282: *> procedure is achieved if used in an accelerated version of Drmac and
283: *> Veselic [5,6], and it is the kernel routine in the SIGMA library [7].
284: *> Some tunning parameters (marked with [TP]) are available for the
285: *> implementer.
286: *> The computational range for the nonzero singular values is the machine
287: *> number interval ( UNDERFLOW , OVERFLOW ). In extreme cases, even
288: *> denormalized singular values can be computed with the corresponding
289: *> gradual loss of accurate digits.
290: *> \endverbatim
291: *
292: *> \par Contributors:
293: * ==================
294: *>
295: *> \verbatim
296: *>
297: *> ============
298: *>
299: *> Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
300: *> \endverbatim
301: *
302: *> \par References:
303: * ================
304: *>
305: *> \verbatim
306: *>
307: *> [1] A. A. Anda and H. Park: Fast plane rotations with dynamic scaling.
308: *> SIAM J. matrix Anal. Appl., Vol. 15 (1994), pp. 162-174.
309: *> [2] P. P. M. De Rijk: A one-sided Jacobi algorithm for computing the
310: *> singular value decomposition on a vector computer.
311: *> SIAM J. Sci. Stat. Comp., Vol. 10 (1998), pp. 359-371.
312: *> [3] J. Demmel and K. Veselic: Jacobi method is more accurate than QR.
313: *> [4] Z. Drmac: Implementation of Jacobi rotations for accurate singular
314: *> value computation in floating point arithmetic.
315: *> SIAM J. Sci. Comp., Vol. 18 (1997), pp. 1200-1222.
316: *> [5] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
317: *> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
318: *> LAPACK Working note 169.
319: *> [6] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
320: *> SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
321: *> LAPACK Working note 170.
322: *> [7] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
323: *> QSVD, (H,K)-SVD computations.
324: *> Department of Mathematics, University of Zagreb, 2008.
325: *> \endverbatim
326: *
327: *> \par Bugs, examples and comments:
328: * =================================
329: *>
330: *> \verbatim
331: *> ===========================
332: *> Please report all bugs and send interesting test examples and comments to
333: *> drmac@math.hr. Thank you.
334: *> \endverbatim
335: *>
336: * =====================================================================
337: SUBROUTINE DGESVJ( JOBA, JOBU, JOBV, M, N, A, LDA, SVA, MV, V,
338: $ LDV, WORK, LWORK, INFO )
339: *
340: * -- LAPACK computational routine (version 3.7.0) --
341: * -- LAPACK is a software package provided by Univ. of Tennessee, --
342: * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
343: * December 2016
344: *
345: * .. Scalar Arguments ..
346: INTEGER INFO, LDA, LDV, LWORK, M, MV, N
347: CHARACTER*1 JOBA, JOBU, JOBV
348: * ..
349: * .. Array Arguments ..
350: DOUBLE PRECISION A( LDA, * ), SVA( N ), V( LDV, * ),
351: $ WORK( LWORK )
352: * ..
353: *
354: * =====================================================================
355: *
356: * .. Local Parameters ..
357: DOUBLE PRECISION ZERO, HALF, ONE
358: PARAMETER ( ZERO = 0.0D0, HALF = 0.5D0, ONE = 1.0D0)
359: INTEGER NSWEEP
360: PARAMETER ( NSWEEP = 30 )
361: * ..
362: * .. Local Scalars ..
363: DOUBLE PRECISION AAPP, AAPP0, AAPQ, AAQQ, APOAQ, AQOAP, BIG,
364: $ BIGTHETA, CS, CTOL, EPSLN, LARGE, MXAAPQ,
365: $ MXSINJ, ROOTBIG, ROOTEPS, ROOTSFMIN, ROOTTOL,
366: $ SKL, SFMIN, SMALL, SN, T, TEMP1, THETA,
367: $ THSIGN, TOL
368: INTEGER BLSKIP, EMPTSW, i, ibr, IERR, igl, IJBLSK, ir1,
369: $ ISWROT, jbc, jgl, KBL, LKAHEAD, MVL, N2, N34,
370: $ N4, NBL, NOTROT, p, PSKIPPED, q, ROWSKIP,
371: $ SWBAND
372: LOGICAL APPLV, GOSCALE, LOWER, LSVEC, NOSCALE, ROTOK,
373: $ RSVEC, UCTOL, UPPER
374: * ..
375: * .. Local Arrays ..
376: DOUBLE PRECISION FASTR( 5 )
377: * ..
378: * .. Intrinsic Functions ..
379: INTRINSIC DABS, MAX, MIN, DBLE, DSIGN, DSQRT
380: * ..
381: * .. External Functions ..
382: * ..
383: * from BLAS
384: DOUBLE PRECISION DDOT, DNRM2
385: EXTERNAL DDOT, DNRM2
386: INTEGER IDAMAX
387: EXTERNAL IDAMAX
388: * from LAPACK
389: DOUBLE PRECISION DLAMCH
390: EXTERNAL DLAMCH
391: LOGICAL LSAME
392: EXTERNAL LSAME
393: * ..
394: * .. External Subroutines ..
395: * ..
396: * from BLAS
397: EXTERNAL DAXPY, DCOPY, DROTM, DSCAL, DSWAP
398: * from LAPACK
399: EXTERNAL DLASCL, DLASET, DLASSQ, XERBLA
400: *
401: EXTERNAL DGSVJ0, DGSVJ1
402: * ..
403: * .. Executable Statements ..
404: *
405: * Test the input arguments
406: *
407: LSVEC = LSAME( JOBU, 'U' )
408: UCTOL = LSAME( JOBU, 'C' )
409: RSVEC = LSAME( JOBV, 'V' )
410: APPLV = LSAME( JOBV, 'A' )
411: UPPER = LSAME( JOBA, 'U' )
412: LOWER = LSAME( JOBA, 'L' )
413: *
414: IF( .NOT.( UPPER .OR. LOWER .OR. LSAME( JOBA, 'G' ) ) ) THEN
415: INFO = -1
416: ELSE IF( .NOT.( LSVEC .OR. UCTOL .OR. LSAME( JOBU, 'N' ) ) ) THEN
417: INFO = -2
418: ELSE IF( .NOT.( RSVEC .OR. APPLV .OR. LSAME( JOBV, 'N' ) ) ) THEN
419: INFO = -3
420: ELSE IF( M.LT.0 ) THEN
421: INFO = -4
422: ELSE IF( ( N.LT.0 ) .OR. ( N.GT.M ) ) THEN
423: INFO = -5
424: ELSE IF( LDA.LT.M ) THEN
425: INFO = -7
426: ELSE IF( MV.LT.0 ) THEN
427: INFO = -9
428: ELSE IF( ( RSVEC .AND. ( LDV.LT.N ) ) .OR.
429: $ ( APPLV .AND. ( LDV.LT.MV ) ) ) THEN
430: INFO = -11
431: ELSE IF( UCTOL .AND. ( WORK( 1 ).LE.ONE ) ) THEN
432: INFO = -12
433: ELSE IF( LWORK.LT.MAX( M+N, 6 ) ) THEN
434: INFO = -13
435: ELSE
436: INFO = 0
437: END IF
438: *
439: * #:(
440: IF( INFO.NE.0 ) THEN
441: CALL XERBLA( 'DGESVJ', -INFO )
442: RETURN
443: END IF
444: *
445: * #:) Quick return for void matrix
446: *
447: IF( ( M.EQ.0 ) .OR. ( N.EQ.0 ) )RETURN
448: *
449: * Set numerical parameters
450: * The stopping criterion for Jacobi rotations is
451: *
452: * max_{i<>j}|A(:,i)^T * A(:,j)|/(||A(:,i)||*||A(:,j)||) < CTOL*EPS
453: *
454: * where EPS is the round-off and CTOL is defined as follows:
455: *
456: IF( UCTOL ) THEN
457: * ... user controlled
458: CTOL = WORK( 1 )
459: ELSE
460: * ... default
461: IF( LSVEC .OR. RSVEC .OR. APPLV ) THEN
462: CTOL = DSQRT( DBLE( M ) )
463: ELSE
464: CTOL = DBLE( M )
465: END IF
466: END IF
467: * ... and the machine dependent parameters are
468: *[!] (Make sure that DLAMCH() works properly on the target machine.)
469: *
470: EPSLN = DLAMCH( 'Epsilon' )
471: ROOTEPS = DSQRT( EPSLN )
472: SFMIN = DLAMCH( 'SafeMinimum' )
473: ROOTSFMIN = DSQRT( SFMIN )
474: SMALL = SFMIN / EPSLN
475: BIG = DLAMCH( 'Overflow' )
476: * BIG = ONE / SFMIN
477: ROOTBIG = ONE / ROOTSFMIN
478: LARGE = BIG / DSQRT( DBLE( M*N ) )
479: BIGTHETA = ONE / ROOTEPS
480: *
481: TOL = CTOL*EPSLN
482: ROOTTOL = DSQRT( TOL )
483: *
484: IF( DBLE( M )*EPSLN.GE.ONE ) THEN
485: INFO = -4
486: CALL XERBLA( 'DGESVJ', -INFO )
487: RETURN
488: END IF
489: *
490: * Initialize the right singular vector matrix.
491: *
492: IF( RSVEC ) THEN
493: MVL = N
494: CALL DLASET( 'A', MVL, N, ZERO, ONE, V, LDV )
495: ELSE IF( APPLV ) THEN
496: MVL = MV
497: END IF
498: RSVEC = RSVEC .OR. APPLV
499: *
500: * Initialize SVA( 1:N ) = ( ||A e_i||_2, i = 1:N )
501: *(!) If necessary, scale A to protect the largest singular value
502: * from overflow. It is possible that saving the largest singular
503: * value destroys the information about the small ones.
504: * This initial scaling is almost minimal in the sense that the
505: * goal is to make sure that no column norm overflows, and that
506: * DSQRT(N)*max_i SVA(i) does not overflow. If INFinite entries
507: * in A are detected, the procedure returns with INFO=-6.
508: *
509: SKL= ONE / DSQRT( DBLE( M )*DBLE( N ) )
510: NOSCALE = .TRUE.
511: GOSCALE = .TRUE.
512: *
513: IF( LOWER ) THEN
514: * the input matrix is M-by-N lower triangular (trapezoidal)
515: DO 1874 p = 1, N
516: AAPP = ZERO
517: AAQQ = ONE
518: CALL DLASSQ( M-p+1, A( p, p ), 1, AAPP, AAQQ )
519: IF( AAPP.GT.BIG ) THEN
520: INFO = -6
521: CALL XERBLA( 'DGESVJ', -INFO )
522: RETURN
523: END IF
524: AAQQ = DSQRT( AAQQ )
525: IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN
526: SVA( p ) = AAPP*AAQQ
527: ELSE
528: NOSCALE = .FALSE.
529: SVA( p ) = AAPP*( AAQQ*SKL)
530: IF( GOSCALE ) THEN
531: GOSCALE = .FALSE.
532: DO 1873 q = 1, p - 1
533: SVA( q ) = SVA( q )*SKL
534: 1873 CONTINUE
535: END IF
536: END IF
537: 1874 CONTINUE
538: ELSE IF( UPPER ) THEN
539: * the input matrix is M-by-N upper triangular (trapezoidal)
540: DO 2874 p = 1, N
541: AAPP = ZERO
542: AAQQ = ONE
543: CALL DLASSQ( p, A( 1, p ), 1, AAPP, AAQQ )
544: IF( AAPP.GT.BIG ) THEN
545: INFO = -6
546: CALL XERBLA( 'DGESVJ', -INFO )
547: RETURN
548: END IF
549: AAQQ = DSQRT( AAQQ )
550: IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN
551: SVA( p ) = AAPP*AAQQ
552: ELSE
553: NOSCALE = .FALSE.
554: SVA( p ) = AAPP*( AAQQ*SKL)
555: IF( GOSCALE ) THEN
556: GOSCALE = .FALSE.
557: DO 2873 q = 1, p - 1
558: SVA( q ) = SVA( q )*SKL
559: 2873 CONTINUE
560: END IF
561: END IF
562: 2874 CONTINUE
563: ELSE
564: * the input matrix is M-by-N general dense
565: DO 3874 p = 1, N
566: AAPP = ZERO
567: AAQQ = ONE
568: CALL DLASSQ( M, A( 1, p ), 1, AAPP, AAQQ )
569: IF( AAPP.GT.BIG ) THEN
570: INFO = -6
571: CALL XERBLA( 'DGESVJ', -INFO )
572: RETURN
573: END IF
574: AAQQ = DSQRT( AAQQ )
575: IF( ( AAPP.LT.( BIG / AAQQ ) ) .AND. NOSCALE ) THEN
576: SVA( p ) = AAPP*AAQQ
577: ELSE
578: NOSCALE = .FALSE.
579: SVA( p ) = AAPP*( AAQQ*SKL)
580: IF( GOSCALE ) THEN
581: GOSCALE = .FALSE.
582: DO 3873 q = 1, p - 1
583: SVA( q ) = SVA( q )*SKL
584: 3873 CONTINUE
585: END IF
586: END IF
587: 3874 CONTINUE
588: END IF
589: *
590: IF( NOSCALE )SKL= ONE
591: *
592: * Move the smaller part of the spectrum from the underflow threshold
593: *(!) Start by determining the position of the nonzero entries of the
594: * array SVA() relative to ( SFMIN, BIG ).
595: *
596: AAPP = ZERO
597: AAQQ = BIG
598: DO 4781 p = 1, N
599: IF( SVA( p ).NE.ZERO )AAQQ = MIN( AAQQ, SVA( p ) )
600: AAPP = MAX( AAPP, SVA( p ) )
601: 4781 CONTINUE
602: *
603: * #:) Quick return for zero matrix
604: *
605: IF( AAPP.EQ.ZERO ) THEN
606: IF( LSVEC )CALL DLASET( 'G', M, N, ZERO, ONE, A, LDA )
607: WORK( 1 ) = ONE
608: WORK( 2 ) = ZERO
609: WORK( 3 ) = ZERO
610: WORK( 4 ) = ZERO
611: WORK( 5 ) = ZERO
612: WORK( 6 ) = ZERO
613: RETURN
614: END IF
615: *
616: * #:) Quick return for one-column matrix
617: *
618: IF( N.EQ.1 ) THEN
619: IF( LSVEC )CALL DLASCL( 'G', 0, 0, SVA( 1 ), SKL, M, 1,
620: $ A( 1, 1 ), LDA, IERR )
621: WORK( 1 ) = ONE / SKL
622: IF( SVA( 1 ).GE.SFMIN ) THEN
623: WORK( 2 ) = ONE
624: ELSE
625: WORK( 2 ) = ZERO
626: END IF
627: WORK( 3 ) = ZERO
628: WORK( 4 ) = ZERO
629: WORK( 5 ) = ZERO
630: WORK( 6 ) = ZERO
631: RETURN
632: END IF
633: *
634: * Protect small singular values from underflow, and try to
635: * avoid underflows/overflows in computing Jacobi rotations.
636: *
637: SN = DSQRT( SFMIN / EPSLN )
638: TEMP1 = DSQRT( BIG / DBLE( N ) )
639: IF( ( AAPP.LE.SN ) .OR. ( AAQQ.GE.TEMP1 ) .OR.
640: $ ( ( SN.LE.AAQQ ) .AND. ( AAPP.LE.TEMP1 ) ) ) THEN
641: TEMP1 = MIN( BIG, TEMP1 / AAPP )
642: * AAQQ = AAQQ*TEMP1
643: * AAPP = AAPP*TEMP1
644: ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.LE.TEMP1 ) ) THEN
645: TEMP1 = MIN( SN / AAQQ, BIG / ( AAPP*DSQRT( DBLE( N ) ) ) )
646: * AAQQ = AAQQ*TEMP1
647: * AAPP = AAPP*TEMP1
648: ELSE IF( ( AAQQ.GE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN
649: TEMP1 = MAX( SN / AAQQ, TEMP1 / AAPP )
650: * AAQQ = AAQQ*TEMP1
651: * AAPP = AAPP*TEMP1
652: ELSE IF( ( AAQQ.LE.SN ) .AND. ( AAPP.GE.TEMP1 ) ) THEN
653: TEMP1 = MIN( SN / AAQQ, BIG / ( DSQRT( DBLE( N ) )*AAPP ) )
654: * AAQQ = AAQQ*TEMP1
655: * AAPP = AAPP*TEMP1
656: ELSE
657: TEMP1 = ONE
658: END IF
659: *
660: * Scale, if necessary
661: *
662: IF( TEMP1.NE.ONE ) THEN
663: CALL DLASCL( 'G', 0, 0, ONE, TEMP1, N, 1, SVA, N, IERR )
664: END IF
665: SKL= TEMP1*SKL
666: IF( SKL.NE.ONE ) THEN
667: CALL DLASCL( JOBA, 0, 0, ONE, SKL, M, N, A, LDA, IERR )
668: SKL= ONE / SKL
669: END IF
670: *
671: * Row-cyclic Jacobi SVD algorithm with column pivoting
672: *
673: EMPTSW = ( N*( N-1 ) ) / 2
674: NOTROT = 0
675: FASTR( 1 ) = ZERO
676: *
677: * A is represented in factored form A = A * diag(WORK), where diag(WORK)
678: * is initialized to identity. WORK is updated during fast scaled
679: * rotations.
680: *
681: DO 1868 q = 1, N
682: WORK( q ) = ONE
683: 1868 CONTINUE
684: *
685: *
686: SWBAND = 3
687: *[TP] SWBAND is a tuning parameter [TP]. It is meaningful and effective
688: * if DGESVJ is used as a computational routine in the preconditioned
689: * Jacobi SVD algorithm DGESVJ. For sweeps i=1:SWBAND the procedure
690: * works on pivots inside a band-like region around the diagonal.
691: * The boundaries are determined dynamically, based on the number of
692: * pivots above a threshold.
693: *
694: KBL = MIN( 8, N )
695: *[TP] KBL is a tuning parameter that defines the tile size in the
696: * tiling of the p-q loops of pivot pairs. In general, an optimal
697: * value of KBL depends on the matrix dimensions and on the
698: * parameters of the computer's memory.
699: *
700: NBL = N / KBL
701: IF( ( NBL*KBL ).NE.N )NBL = NBL + 1
702: *
703: BLSKIP = KBL**2
704: *[TP] BLKSKIP is a tuning parameter that depends on SWBAND and KBL.
705: *
706: ROWSKIP = MIN( 5, KBL )
707: *[TP] ROWSKIP is a tuning parameter.
708: *
709: LKAHEAD = 1
710: *[TP] LKAHEAD is a tuning parameter.
711: *
712: * Quasi block transformations, using the lower (upper) triangular
713: * structure of the input matrix. The quasi-block-cycling usually
714: * invokes cubic convergence. Big part of this cycle is done inside
715: * canonical subspaces of dimensions less than M.
716: *
717: IF( ( LOWER .OR. UPPER ) .AND. ( N.GT.MAX( 64, 4*KBL ) ) ) THEN
718: *[TP] The number of partition levels and the actual partition are
719: * tuning parameters.
720: N4 = N / 4
721: N2 = N / 2
722: N34 = 3*N4
723: IF( APPLV ) THEN
724: q = 0
725: ELSE
726: q = 1
727: END IF
728: *
729: IF( LOWER ) THEN
730: *
731: * This works very well on lower triangular matrices, in particular
732: * in the framework of the preconditioned Jacobi SVD (xGEJSV).
733: * The idea is simple:
734: * [+ 0 0 0] Note that Jacobi transformations of [0 0]
735: * [+ + 0 0] [0 0]
736: * [+ + x 0] actually work on [x 0] [x 0]
737: * [+ + x x] [x x]. [x x]
738: *
739: CALL DGSVJ0( JOBV, M-N34, N-N34, A( N34+1, N34+1 ), LDA,
740: $ WORK( N34+1 ), SVA( N34+1 ), MVL,
741: $ V( N34*q+1, N34+1 ), LDV, EPSLN, SFMIN, TOL,
742: $ 2, WORK( N+1 ), LWORK-N, IERR )
743: *
744: CALL DGSVJ0( JOBV, M-N2, N34-N2, A( N2+1, N2+1 ), LDA,
745: $ WORK( N2+1 ), SVA( N2+1 ), MVL,
746: $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 2,
747: $ WORK( N+1 ), LWORK-N, IERR )
748: *
749: CALL DGSVJ1( JOBV, M-N2, N-N2, N4, A( N2+1, N2+1 ), LDA,
750: $ WORK( N2+1 ), SVA( N2+1 ), MVL,
751: $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1,
752: $ WORK( N+1 ), LWORK-N, IERR )
753: *
754: CALL DGSVJ0( JOBV, M-N4, N2-N4, A( N4+1, N4+1 ), LDA,
755: $ WORK( N4+1 ), SVA( N4+1 ), MVL,
756: $ V( N4*q+1, N4+1 ), LDV, EPSLN, SFMIN, TOL, 1,
757: $ WORK( N+1 ), LWORK-N, IERR )
758: *
759: CALL DGSVJ0( JOBV, M, N4, A, LDA, WORK, SVA, MVL, V, LDV,
760: $ EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N,
761: $ IERR )
762: *
763: CALL DGSVJ1( JOBV, M, N2, N4, A, LDA, WORK, SVA, MVL, V,
764: $ LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ),
765: $ LWORK-N, IERR )
766: *
767: *
768: ELSE IF( UPPER ) THEN
769: *
770: *
771: CALL DGSVJ0( JOBV, N4, N4, A, LDA, WORK, SVA, MVL, V, LDV,
772: $ EPSLN, SFMIN, TOL, 2, WORK( N+1 ), LWORK-N,
773: $ IERR )
774: *
775: CALL DGSVJ0( JOBV, N2, N4, A( 1, N4+1 ), LDA, WORK( N4+1 ),
776: $ SVA( N4+1 ), MVL, V( N4*q+1, N4+1 ), LDV,
777: $ EPSLN, SFMIN, TOL, 1, WORK( N+1 ), LWORK-N,
778: $ IERR )
779: *
780: CALL DGSVJ1( JOBV, N2, N2, N4, A, LDA, WORK, SVA, MVL, V,
781: $ LDV, EPSLN, SFMIN, TOL, 1, WORK( N+1 ),
782: $ LWORK-N, IERR )
783: *
784: CALL DGSVJ0( JOBV, N2+N4, N4, A( 1, N2+1 ), LDA,
785: $ WORK( N2+1 ), SVA( N2+1 ), MVL,
786: $ V( N2*q+1, N2+1 ), LDV, EPSLN, SFMIN, TOL, 1,
787: $ WORK( N+1 ), LWORK-N, IERR )
788:
789: END IF
790: *
791: END IF
792: *
793: * .. Row-cyclic pivot strategy with de Rijk's pivoting ..
794: *
795: DO 1993 i = 1, NSWEEP
796: *
797: * .. go go go ...
798: *
799: MXAAPQ = ZERO
800: MXSINJ = ZERO
801: ISWROT = 0
802: *
803: NOTROT = 0
804: PSKIPPED = 0
805: *
806: * Each sweep is unrolled using KBL-by-KBL tiles over the pivot pairs
807: * 1 <= p < q <= N. This is the first step toward a blocked implementation
808: * of the rotations. New implementation, based on block transformations,
809: * is under development.
810: *
811: DO 2000 ibr = 1, NBL
812: *
813: igl = ( ibr-1 )*KBL + 1
814: *
815: DO 1002 ir1 = 0, MIN( LKAHEAD, NBL-ibr )
816: *
817: igl = igl + ir1*KBL
818: *
819: DO 2001 p = igl, MIN( igl+KBL-1, N-1 )
820: *
821: * .. de Rijk's pivoting
822: *
823: q = IDAMAX( N-p+1, SVA( p ), 1 ) + p - 1
824: IF( p.NE.q ) THEN
825: CALL DSWAP( M, A( 1, p ), 1, A( 1, q ), 1 )
826: IF( RSVEC )CALL DSWAP( MVL, V( 1, p ), 1,
827: $ V( 1, q ), 1 )
828: TEMP1 = SVA( p )
829: SVA( p ) = SVA( q )
830: SVA( q ) = TEMP1
831: TEMP1 = WORK( p )
832: WORK( p ) = WORK( q )
833: WORK( q ) = TEMP1
834: END IF
835: *
836: IF( ir1.EQ.0 ) THEN
837: *
838: * Column norms are periodically updated by explicit
839: * norm computation.
840: * Caveat:
841: * Unfortunately, some BLAS implementations compute DNRM2(M,A(1,p),1)
842: * as DSQRT(DDOT(M,A(1,p),1,A(1,p),1)), which may cause the result to
843: * overflow for ||A(:,p)||_2 > DSQRT(overflow_threshold), and to
844: * underflow for ||A(:,p)||_2 < DSQRT(underflow_threshold).
845: * Hence, DNRM2 cannot be trusted, not even in the case when
846: * the true norm is far from the under(over)flow boundaries.
847: * If properly implemented DNRM2 is available, the IF-THEN-ELSE
848: * below should read "AAPP = DNRM2( M, A(1,p), 1 ) * WORK(p)".
849: *
850: IF( ( SVA( p ).LT.ROOTBIG ) .AND.
851: $ ( SVA( p ).GT.ROOTSFMIN ) ) THEN
852: SVA( p ) = DNRM2( M, A( 1, p ), 1 )*WORK( p )
853: ELSE
854: TEMP1 = ZERO
855: AAPP = ONE
856: CALL DLASSQ( M, A( 1, p ), 1, TEMP1, AAPP )
857: SVA( p ) = TEMP1*DSQRT( AAPP )*WORK( p )
858: END IF
859: AAPP = SVA( p )
860: ELSE
861: AAPP = SVA( p )
862: END IF
863: *
864: IF( AAPP.GT.ZERO ) THEN
865: *
866: PSKIPPED = 0
867: *
868: DO 2002 q = p + 1, MIN( igl+KBL-1, N )
869: *
870: AAQQ = SVA( q )
871: *
872: IF( AAQQ.GT.ZERO ) THEN
873: *
874: AAPP0 = AAPP
875: IF( AAQQ.GE.ONE ) THEN
876: ROTOK = ( SMALL*AAPP ).LE.AAQQ
877: IF( AAPP.LT.( BIG / AAQQ ) ) THEN
878: AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
879: $ q ), 1 )*WORK( p )*WORK( q ) /
880: $ AAQQ ) / AAPP
881: ELSE
882: CALL DCOPY( M, A( 1, p ), 1,
883: $ WORK( N+1 ), 1 )
884: CALL DLASCL( 'G', 0, 0, AAPP,
885: $ WORK( p ), M, 1,
886: $ WORK( N+1 ), LDA, IERR )
887: AAPQ = DDOT( M, WORK( N+1 ), 1,
888: $ A( 1, q ), 1 )*WORK( q ) / AAQQ
889: END IF
890: ELSE
891: ROTOK = AAPP.LE.( AAQQ / SMALL )
892: IF( AAPP.GT.( SMALL / AAQQ ) ) THEN
893: AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
894: $ q ), 1 )*WORK( p )*WORK( q ) /
895: $ AAQQ ) / AAPP
896: ELSE
897: CALL DCOPY( M, A( 1, q ), 1,
898: $ WORK( N+1 ), 1 )
899: CALL DLASCL( 'G', 0, 0, AAQQ,
900: $ WORK( q ), M, 1,
901: $ WORK( N+1 ), LDA, IERR )
902: AAPQ = DDOT( M, WORK( N+1 ), 1,
903: $ A( 1, p ), 1 )*WORK( p ) / AAPP
904: END IF
905: END IF
906: *
907: MXAAPQ = MAX( MXAAPQ, DABS( AAPQ ) )
908: *
909: * TO rotate or NOT to rotate, THAT is the question ...
910: *
911: IF( DABS( AAPQ ).GT.TOL ) THEN
912: *
913: * .. rotate
914: *[RTD] ROTATED = ROTATED + ONE
915: *
916: IF( ir1.EQ.0 ) THEN
917: NOTROT = 0
918: PSKIPPED = 0
919: ISWROT = ISWROT + 1
920: END IF
921: *
922: IF( ROTOK ) THEN
923: *
924: AQOAP = AAQQ / AAPP
925: APOAQ = AAPP / AAQQ
926: THETA = -HALF*DABS(AQOAP-APOAQ)/AAPQ
927: *
928: IF( DABS( THETA ).GT.BIGTHETA ) THEN
929: *
930: T = HALF / THETA
931: FASTR( 3 ) = T*WORK( p ) / WORK( q )
932: FASTR( 4 ) = -T*WORK( q ) /
933: $ WORK( p )
934: CALL DROTM( M, A( 1, p ), 1,
935: $ A( 1, q ), 1, FASTR )
936: IF( RSVEC )CALL DROTM( MVL,
937: $ V( 1, p ), 1,
938: $ V( 1, q ), 1,
939: $ FASTR )
940: SVA( q ) = AAQQ*DSQRT( MAX( ZERO,
941: $ ONE+T*APOAQ*AAPQ ) )
942: AAPP = AAPP*DSQRT( MAX( ZERO,
943: $ ONE-T*AQOAP*AAPQ ) )
944: MXSINJ = MAX( MXSINJ, DABS( T ) )
945: *
946: ELSE
947: *
948: * .. choose correct signum for THETA and rotate
949: *
950: THSIGN = -DSIGN( ONE, AAPQ )
951: T = ONE / ( THETA+THSIGN*
952: $ DSQRT( ONE+THETA*THETA ) )
953: CS = DSQRT( ONE / ( ONE+T*T ) )
954: SN = T*CS
955: *
956: MXSINJ = MAX( MXSINJ, DABS( SN ) )
957: SVA( q ) = AAQQ*DSQRT( MAX( ZERO,
958: $ ONE+T*APOAQ*AAPQ ) )
959: AAPP = AAPP*DSQRT( MAX( ZERO,
960: $ ONE-T*AQOAP*AAPQ ) )
961: *
962: APOAQ = WORK( p ) / WORK( q )
963: AQOAP = WORK( q ) / WORK( p )
964: IF( WORK( p ).GE.ONE ) THEN
965: IF( WORK( q ).GE.ONE ) THEN
966: FASTR( 3 ) = T*APOAQ
967: FASTR( 4 ) = -T*AQOAP
968: WORK( p ) = WORK( p )*CS
969: WORK( q ) = WORK( q )*CS
970: CALL DROTM( M, A( 1, p ), 1,
971: $ A( 1, q ), 1,
972: $ FASTR )
973: IF( RSVEC )CALL DROTM( MVL,
974: $ V( 1, p ), 1, V( 1, q ),
975: $ 1, FASTR )
976: ELSE
977: CALL DAXPY( M, -T*AQOAP,
978: $ A( 1, q ), 1,
979: $ A( 1, p ), 1 )
980: CALL DAXPY( M, CS*SN*APOAQ,
981: $ A( 1, p ), 1,
982: $ A( 1, q ), 1 )
983: WORK( p ) = WORK( p )*CS
984: WORK( q ) = WORK( q ) / CS
985: IF( RSVEC ) THEN
986: CALL DAXPY( MVL, -T*AQOAP,
987: $ V( 1, q ), 1,
988: $ V( 1, p ), 1 )
989: CALL DAXPY( MVL,
990: $ CS*SN*APOAQ,
991: $ V( 1, p ), 1,
992: $ V( 1, q ), 1 )
993: END IF
994: END IF
995: ELSE
996: IF( WORK( q ).GE.ONE ) THEN
997: CALL DAXPY( M, T*APOAQ,
998: $ A( 1, p ), 1,
999: $ A( 1, q ), 1 )
1000: CALL DAXPY( M, -CS*SN*AQOAP,
1001: $ A( 1, q ), 1,
1002: $ A( 1, p ), 1 )
1003: WORK( p ) = WORK( p ) / CS
1004: WORK( q ) = WORK( q )*CS
1005: IF( RSVEC ) THEN
1006: CALL DAXPY( MVL, T*APOAQ,
1007: $ V( 1, p ), 1,
1008: $ V( 1, q ), 1 )
1009: CALL DAXPY( MVL,
1010: $ -CS*SN*AQOAP,
1011: $ V( 1, q ), 1,
1012: $ V( 1, p ), 1 )
1013: END IF
1014: ELSE
1015: IF( WORK( p ).GE.WORK( q ) )
1016: $ THEN
1017: CALL DAXPY( M, -T*AQOAP,
1018: $ A( 1, q ), 1,
1019: $ A( 1, p ), 1 )
1020: CALL DAXPY( M, CS*SN*APOAQ,
1021: $ A( 1, p ), 1,
1022: $ A( 1, q ), 1 )
1023: WORK( p ) = WORK( p )*CS
1024: WORK( q ) = WORK( q ) / CS
1025: IF( RSVEC ) THEN
1026: CALL DAXPY( MVL,
1027: $ -T*AQOAP,
1028: $ V( 1, q ), 1,
1029: $ V( 1, p ), 1 )
1030: CALL DAXPY( MVL,
1031: $ CS*SN*APOAQ,
1032: $ V( 1, p ), 1,
1033: $ V( 1, q ), 1 )
1034: END IF
1035: ELSE
1036: CALL DAXPY( M, T*APOAQ,
1037: $ A( 1, p ), 1,
1038: $ A( 1, q ), 1 )
1039: CALL DAXPY( M,
1040: $ -CS*SN*AQOAP,
1041: $ A( 1, q ), 1,
1042: $ A( 1, p ), 1 )
1043: WORK( p ) = WORK( p ) / CS
1044: WORK( q ) = WORK( q )*CS
1045: IF( RSVEC ) THEN
1046: CALL DAXPY( MVL,
1047: $ T*APOAQ, V( 1, p ),
1048: $ 1, V( 1, q ), 1 )
1049: CALL DAXPY( MVL,
1050: $ -CS*SN*AQOAP,
1051: $ V( 1, q ), 1,
1052: $ V( 1, p ), 1 )
1053: END IF
1054: END IF
1055: END IF
1056: END IF
1057: END IF
1058: *
1059: ELSE
1060: * .. have to use modified Gram-Schmidt like transformation
1061: CALL DCOPY( M, A( 1, p ), 1,
1062: $ WORK( N+1 ), 1 )
1063: CALL DLASCL( 'G', 0, 0, AAPP, ONE, M,
1064: $ 1, WORK( N+1 ), LDA,
1065: $ IERR )
1066: CALL DLASCL( 'G', 0, 0, AAQQ, ONE, M,
1067: $ 1, A( 1, q ), LDA, IERR )
1068: TEMP1 = -AAPQ*WORK( p ) / WORK( q )
1069: CALL DAXPY( M, TEMP1, WORK( N+1 ), 1,
1070: $ A( 1, q ), 1 )
1071: CALL DLASCL( 'G', 0, 0, ONE, AAQQ, M,
1072: $ 1, A( 1, q ), LDA, IERR )
1073: SVA( q ) = AAQQ*DSQRT( MAX( ZERO,
1074: $ ONE-AAPQ*AAPQ ) )
1075: MXSINJ = MAX( MXSINJ, SFMIN )
1076: END IF
1077: * END IF ROTOK THEN ... ELSE
1078: *
1079: * In the case of cancellation in updating SVA(q), SVA(p)
1080: * recompute SVA(q), SVA(p).
1081: *
1082: IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS )
1083: $ THEN
1084: IF( ( AAQQ.LT.ROOTBIG ) .AND.
1085: $ ( AAQQ.GT.ROOTSFMIN ) ) THEN
1086: SVA( q ) = DNRM2( M, A( 1, q ), 1 )*
1087: $ WORK( q )
1088: ELSE
1089: T = ZERO
1090: AAQQ = ONE
1091: CALL DLASSQ( M, A( 1, q ), 1, T,
1092: $ AAQQ )
1093: SVA( q ) = T*DSQRT( AAQQ )*WORK( q )
1094: END IF
1095: END IF
1096: IF( ( AAPP / AAPP0 ).LE.ROOTEPS ) THEN
1097: IF( ( AAPP.LT.ROOTBIG ) .AND.
1098: $ ( AAPP.GT.ROOTSFMIN ) ) THEN
1099: AAPP = DNRM2( M, A( 1, p ), 1 )*
1100: $ WORK( p )
1101: ELSE
1102: T = ZERO
1103: AAPP = ONE
1104: CALL DLASSQ( M, A( 1, p ), 1, T,
1105: $ AAPP )
1106: AAPP = T*DSQRT( AAPP )*WORK( p )
1107: END IF
1108: SVA( p ) = AAPP
1109: END IF
1110: *
1111: ELSE
1112: * A(:,p) and A(:,q) already numerically orthogonal
1113: IF( ir1.EQ.0 )NOTROT = NOTROT + 1
1114: *[RTD] SKIPPED = SKIPPED + 1
1115: PSKIPPED = PSKIPPED + 1
1116: END IF
1117: ELSE
1118: * A(:,q) is zero column
1119: IF( ir1.EQ.0 )NOTROT = NOTROT + 1
1120: PSKIPPED = PSKIPPED + 1
1121: END IF
1122: *
1123: IF( ( i.LE.SWBAND ) .AND.
1124: $ ( PSKIPPED.GT.ROWSKIP ) ) THEN
1125: IF( ir1.EQ.0 )AAPP = -AAPP
1126: NOTROT = 0
1127: GO TO 2103
1128: END IF
1129: *
1130: 2002 CONTINUE
1131: * END q-LOOP
1132: *
1133: 2103 CONTINUE
1134: * bailed out of q-loop
1135: *
1136: SVA( p ) = AAPP
1137: *
1138: ELSE
1139: SVA( p ) = AAPP
1140: IF( ( ir1.EQ.0 ) .AND. ( AAPP.EQ.ZERO ) )
1141: $ NOTROT = NOTROT + MIN( igl+KBL-1, N ) - p
1142: END IF
1143: *
1144: 2001 CONTINUE
1145: * end of the p-loop
1146: * end of doing the block ( ibr, ibr )
1147: 1002 CONTINUE
1148: * end of ir1-loop
1149: *
1150: * ... go to the off diagonal blocks
1151: *
1152: igl = ( ibr-1 )*KBL + 1
1153: *
1154: DO 2010 jbc = ibr + 1, NBL
1155: *
1156: jgl = ( jbc-1 )*KBL + 1
1157: *
1158: * doing the block at ( ibr, jbc )
1159: *
1160: IJBLSK = 0
1161: DO 2100 p = igl, MIN( igl+KBL-1, N )
1162: *
1163: AAPP = SVA( p )
1164: IF( AAPP.GT.ZERO ) THEN
1165: *
1166: PSKIPPED = 0
1167: *
1168: DO 2200 q = jgl, MIN( jgl+KBL-1, N )
1169: *
1170: AAQQ = SVA( q )
1171: IF( AAQQ.GT.ZERO ) THEN
1172: AAPP0 = AAPP
1173: *
1174: * .. M x 2 Jacobi SVD ..
1175: *
1176: * Safe Gram matrix computation
1177: *
1178: IF( AAQQ.GE.ONE ) THEN
1179: IF( AAPP.GE.AAQQ ) THEN
1180: ROTOK = ( SMALL*AAPP ).LE.AAQQ
1181: ELSE
1182: ROTOK = ( SMALL*AAQQ ).LE.AAPP
1183: END IF
1184: IF( AAPP.LT.( BIG / AAQQ ) ) THEN
1185: AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
1186: $ q ), 1 )*WORK( p )*WORK( q ) /
1187: $ AAQQ ) / AAPP
1188: ELSE
1189: CALL DCOPY( M, A( 1, p ), 1,
1190: $ WORK( N+1 ), 1 )
1191: CALL DLASCL( 'G', 0, 0, AAPP,
1192: $ WORK( p ), M, 1,
1193: $ WORK( N+1 ), LDA, IERR )
1194: AAPQ = DDOT( M, WORK( N+1 ), 1,
1195: $ A( 1, q ), 1 )*WORK( q ) / AAQQ
1196: END IF
1197: ELSE
1198: IF( AAPP.GE.AAQQ ) THEN
1199: ROTOK = AAPP.LE.( AAQQ / SMALL )
1200: ELSE
1201: ROTOK = AAQQ.LE.( AAPP / SMALL )
1202: END IF
1203: IF( AAPP.GT.( SMALL / AAQQ ) ) THEN
1204: AAPQ = ( DDOT( M, A( 1, p ), 1, A( 1,
1205: $ q ), 1 )*WORK( p )*WORK( q ) /
1206: $ AAQQ ) / AAPP
1207: ELSE
1208: CALL DCOPY( M, A( 1, q ), 1,
1209: $ WORK( N+1 ), 1 )
1210: CALL DLASCL( 'G', 0, 0, AAQQ,
1211: $ WORK( q ), M, 1,
1212: $ WORK( N+1 ), LDA, IERR )
1213: AAPQ = DDOT( M, WORK( N+1 ), 1,
1214: $ A( 1, p ), 1 )*WORK( p ) / AAPP
1215: END IF
1216: END IF
1217: *
1218: MXAAPQ = MAX( MXAAPQ, DABS( AAPQ ) )
1219: *
1220: * TO rotate or NOT to rotate, THAT is the question ...
1221: *
1222: IF( DABS( AAPQ ).GT.TOL ) THEN
1223: NOTROT = 0
1224: *[RTD] ROTATED = ROTATED + 1
1225: PSKIPPED = 0
1226: ISWROT = ISWROT + 1
1227: *
1228: IF( ROTOK ) THEN
1229: *
1230: AQOAP = AAQQ / AAPP
1231: APOAQ = AAPP / AAQQ
1232: THETA = -HALF*DABS(AQOAP-APOAQ)/AAPQ
1233: IF( AAQQ.GT.AAPP0 )THETA = -THETA
1234: *
1235: IF( DABS( THETA ).GT.BIGTHETA ) THEN
1236: T = HALF / THETA
1237: FASTR( 3 ) = T*WORK( p ) / WORK( q )
1238: FASTR( 4 ) = -T*WORK( q ) /
1239: $ WORK( p )
1240: CALL DROTM( M, A( 1, p ), 1,
1241: $ A( 1, q ), 1, FASTR )
1242: IF( RSVEC )CALL DROTM( MVL,
1243: $ V( 1, p ), 1,
1244: $ V( 1, q ), 1,
1245: $ FASTR )
1246: SVA( q ) = AAQQ*DSQRT( MAX( ZERO,
1247: $ ONE+T*APOAQ*AAPQ ) )
1248: AAPP = AAPP*DSQRT( MAX( ZERO,
1249: $ ONE-T*AQOAP*AAPQ ) )
1250: MXSINJ = MAX( MXSINJ, DABS( T ) )
1251: ELSE
1252: *
1253: * .. choose correct signum for THETA and rotate
1254: *
1255: THSIGN = -DSIGN( ONE, AAPQ )
1256: IF( AAQQ.GT.AAPP0 )THSIGN = -THSIGN
1257: T = ONE / ( THETA+THSIGN*
1258: $ DSQRT( ONE+THETA*THETA ) )
1259: CS = DSQRT( ONE / ( ONE+T*T ) )
1260: SN = T*CS
1261: MXSINJ = MAX( MXSINJ, DABS( SN ) )
1262: SVA( q ) = AAQQ*DSQRT( MAX( ZERO,
1263: $ ONE+T*APOAQ*AAPQ ) )
1264: AAPP = AAPP*DSQRT( MAX( ZERO,
1265: $ ONE-T*AQOAP*AAPQ ) )
1266: *
1267: APOAQ = WORK( p ) / WORK( q )
1268: AQOAP = WORK( q ) / WORK( p )
1269: IF( WORK( p ).GE.ONE ) THEN
1270: *
1271: IF( WORK( q ).GE.ONE ) THEN
1272: FASTR( 3 ) = T*APOAQ
1273: FASTR( 4 ) = -T*AQOAP
1274: WORK( p ) = WORK( p )*CS
1275: WORK( q ) = WORK( q )*CS
1276: CALL DROTM( M, A( 1, p ), 1,
1277: $ A( 1, q ), 1,
1278: $ FASTR )
1279: IF( RSVEC )CALL DROTM( MVL,
1280: $ V( 1, p ), 1, V( 1, q ),
1281: $ 1, FASTR )
1282: ELSE
1283: CALL DAXPY( M, -T*AQOAP,
1284: $ A( 1, q ), 1,
1285: $ A( 1, p ), 1 )
1286: CALL DAXPY( M, CS*SN*APOAQ,
1287: $ A( 1, p ), 1,
1288: $ A( 1, q ), 1 )
1289: IF( RSVEC ) THEN
1290: CALL DAXPY( MVL, -T*AQOAP,
1291: $ V( 1, q ), 1,
1292: $ V( 1, p ), 1 )
1293: CALL DAXPY( MVL,
1294: $ CS*SN*APOAQ,
1295: $ V( 1, p ), 1,
1296: $ V( 1, q ), 1 )
1297: END IF
1298: WORK( p ) = WORK( p )*CS
1299: WORK( q ) = WORK( q ) / CS
1300: END IF
1301: ELSE
1302: IF( WORK( q ).GE.ONE ) THEN
1303: CALL DAXPY( M, T*APOAQ,
1304: $ A( 1, p ), 1,
1305: $ A( 1, q ), 1 )
1306: CALL DAXPY( M, -CS*SN*AQOAP,
1307: $ A( 1, q ), 1,
1308: $ A( 1, p ), 1 )
1309: IF( RSVEC ) THEN
1310: CALL DAXPY( MVL, T*APOAQ,
1311: $ V( 1, p ), 1,
1312: $ V( 1, q ), 1 )
1313: CALL DAXPY( MVL,
1314: $ -CS*SN*AQOAP,
1315: $ V( 1, q ), 1,
1316: $ V( 1, p ), 1 )
1317: END IF
1318: WORK( p ) = WORK( p ) / CS
1319: WORK( q ) = WORK( q )*CS
1320: ELSE
1321: IF( WORK( p ).GE.WORK( q ) )
1322: $ THEN
1323: CALL DAXPY( M, -T*AQOAP,
1324: $ A( 1, q ), 1,
1325: $ A( 1, p ), 1 )
1326: CALL DAXPY( M, CS*SN*APOAQ,
1327: $ A( 1, p ), 1,
1328: $ A( 1, q ), 1 )
1329: WORK( p ) = WORK( p )*CS
1330: WORK( q ) = WORK( q ) / CS
1331: IF( RSVEC ) THEN
1332: CALL DAXPY( MVL,
1333: $ -T*AQOAP,
1334: $ V( 1, q ), 1,
1335: $ V( 1, p ), 1 )
1336: CALL DAXPY( MVL,
1337: $ CS*SN*APOAQ,
1338: $ V( 1, p ), 1,
1339: $ V( 1, q ), 1 )
1340: END IF
1341: ELSE
1342: CALL DAXPY( M, T*APOAQ,
1343: $ A( 1, p ), 1,
1344: $ A( 1, q ), 1 )
1345: CALL DAXPY( M,
1346: $ -CS*SN*AQOAP,
1347: $ A( 1, q ), 1,
1348: $ A( 1, p ), 1 )
1349: WORK( p ) = WORK( p ) / CS
1350: WORK( q ) = WORK( q )*CS
1351: IF( RSVEC ) THEN
1352: CALL DAXPY( MVL,
1353: $ T*APOAQ, V( 1, p ),
1354: $ 1, V( 1, q ), 1 )
1355: CALL DAXPY( MVL,
1356: $ -CS*SN*AQOAP,
1357: $ V( 1, q ), 1,
1358: $ V( 1, p ), 1 )
1359: END IF
1360: END IF
1361: END IF
1362: END IF
1363: END IF
1364: *
1365: ELSE
1366: IF( AAPP.GT.AAQQ ) THEN
1367: CALL DCOPY( M, A( 1, p ), 1,
1368: $ WORK( N+1 ), 1 )
1369: CALL DLASCL( 'G', 0, 0, AAPP, ONE,
1370: $ M, 1, WORK( N+1 ), LDA,
1371: $ IERR )
1372: CALL DLASCL( 'G', 0, 0, AAQQ, ONE,
1373: $ M, 1, A( 1, q ), LDA,
1374: $ IERR )
1375: TEMP1 = -AAPQ*WORK( p ) / WORK( q )
1376: CALL DAXPY( M, TEMP1, WORK( N+1 ),
1377: $ 1, A( 1, q ), 1 )
1378: CALL DLASCL( 'G', 0, 0, ONE, AAQQ,
1379: $ M, 1, A( 1, q ), LDA,
1380: $ IERR )
1381: SVA( q ) = AAQQ*DSQRT( MAX( ZERO,
1382: $ ONE-AAPQ*AAPQ ) )
1383: MXSINJ = MAX( MXSINJ, SFMIN )
1384: ELSE
1385: CALL DCOPY( M, A( 1, q ), 1,
1386: $ WORK( N+1 ), 1 )
1387: CALL DLASCL( 'G', 0, 0, AAQQ, ONE,
1388: $ M, 1, WORK( N+1 ), LDA,
1389: $ IERR )
1390: CALL DLASCL( 'G', 0, 0, AAPP, ONE,
1391: $ M, 1, A( 1, p ), LDA,
1392: $ IERR )
1393: TEMP1 = -AAPQ*WORK( q ) / WORK( p )
1394: CALL DAXPY( M, TEMP1, WORK( N+1 ),
1395: $ 1, A( 1, p ), 1 )
1396: CALL DLASCL( 'G', 0, 0, ONE, AAPP,
1397: $ M, 1, A( 1, p ), LDA,
1398: $ IERR )
1399: SVA( p ) = AAPP*DSQRT( MAX( ZERO,
1400: $ ONE-AAPQ*AAPQ ) )
1401: MXSINJ = MAX( MXSINJ, SFMIN )
1402: END IF
1403: END IF
1404: * END IF ROTOK THEN ... ELSE
1405: *
1406: * In the case of cancellation in updating SVA(q)
1407: * .. recompute SVA(q)
1408: IF( ( SVA( q ) / AAQQ )**2.LE.ROOTEPS )
1409: $ THEN
1410: IF( ( AAQQ.LT.ROOTBIG ) .AND.
1411: $ ( AAQQ.GT.ROOTSFMIN ) ) THEN
1412: SVA( q ) = DNRM2( M, A( 1, q ), 1 )*
1413: $ WORK( q )
1414: ELSE
1415: T = ZERO
1416: AAQQ = ONE
1417: CALL DLASSQ( M, A( 1, q ), 1, T,
1418: $ AAQQ )
1419: SVA( q ) = T*DSQRT( AAQQ )*WORK( q )
1420: END IF
1421: END IF
1422: IF( ( AAPP / AAPP0 )**2.LE.ROOTEPS ) THEN
1423: IF( ( AAPP.LT.ROOTBIG ) .AND.
1424: $ ( AAPP.GT.ROOTSFMIN ) ) THEN
1425: AAPP = DNRM2( M, A( 1, p ), 1 )*
1426: $ WORK( p )
1427: ELSE
1428: T = ZERO
1429: AAPP = ONE
1430: CALL DLASSQ( M, A( 1, p ), 1, T,
1431: $ AAPP )
1432: AAPP = T*DSQRT( AAPP )*WORK( p )
1433: END IF
1434: SVA( p ) = AAPP
1435: END IF
1436: * end of OK rotation
1437: ELSE
1438: NOTROT = NOTROT + 1
1439: *[RTD] SKIPPED = SKIPPED + 1
1440: PSKIPPED = PSKIPPED + 1
1441: IJBLSK = IJBLSK + 1
1442: END IF
1443: ELSE
1444: NOTROT = NOTROT + 1
1445: PSKIPPED = PSKIPPED + 1
1446: IJBLSK = IJBLSK + 1
1447: END IF
1448: *
1449: IF( ( i.LE.SWBAND ) .AND. ( IJBLSK.GE.BLSKIP ) )
1450: $ THEN
1451: SVA( p ) = AAPP
1452: NOTROT = 0
1453: GO TO 2011
1454: END IF
1455: IF( ( i.LE.SWBAND ) .AND.
1456: $ ( PSKIPPED.GT.ROWSKIP ) ) THEN
1457: AAPP = -AAPP
1458: NOTROT = 0
1459: GO TO 2203
1460: END IF
1461: *
1462: 2200 CONTINUE
1463: * end of the q-loop
1464: 2203 CONTINUE
1465: *
1466: SVA( p ) = AAPP
1467: *
1468: ELSE
1469: *
1470: IF( AAPP.EQ.ZERO )NOTROT = NOTROT +
1471: $ MIN( jgl+KBL-1, N ) - jgl + 1
1472: IF( AAPP.LT.ZERO )NOTROT = 0
1473: *
1474: END IF
1475: *
1476: 2100 CONTINUE
1477: * end of the p-loop
1478: 2010 CONTINUE
1479: * end of the jbc-loop
1480: 2011 CONTINUE
1481: *2011 bailed out of the jbc-loop
1482: DO 2012 p = igl, MIN( igl+KBL-1, N )
1483: SVA( p ) = DABS( SVA( p ) )
1484: 2012 CONTINUE
1485: ***
1486: 2000 CONTINUE
1487: *2000 :: end of the ibr-loop
1488: *
1489: * .. update SVA(N)
1490: IF( ( SVA( N ).LT.ROOTBIG ) .AND. ( SVA( N ).GT.ROOTSFMIN ) )
1491: $ THEN
1492: SVA( N ) = DNRM2( M, A( 1, N ), 1 )*WORK( N )
1493: ELSE
1494: T = ZERO
1495: AAPP = ONE
1496: CALL DLASSQ( M, A( 1, N ), 1, T, AAPP )
1497: SVA( N ) = T*DSQRT( AAPP )*WORK( N )
1498: END IF
1499: *
1500: * Additional steering devices
1501: *
1502: IF( ( i.LT.SWBAND ) .AND. ( ( MXAAPQ.LE.ROOTTOL ) .OR.
1503: $ ( ISWROT.LE.N ) ) )SWBAND = i
1504: *
1505: IF( ( i.GT.SWBAND+1 ) .AND. ( MXAAPQ.LT.DSQRT( DBLE( N ) )*
1506: $ TOL ) .AND. ( DBLE( N )*MXAAPQ*MXSINJ.LT.TOL ) ) THEN
1507: GO TO 1994
1508: END IF
1509: *
1510: IF( NOTROT.GE.EMPTSW )GO TO 1994
1511: *
1512: 1993 CONTINUE
1513: * end i=1:NSWEEP loop
1514: *
1515: * #:( Reaching this point means that the procedure has not converged.
1516: INFO = NSWEEP - 1
1517: GO TO 1995
1518: *
1519: 1994 CONTINUE
1520: * #:) Reaching this point means numerical convergence after the i-th
1521: * sweep.
1522: *
1523: INFO = 0
1524: * #:) INFO = 0 confirms successful iterations.
1525: 1995 CONTINUE
1526: *
1527: * Sort the singular values and find how many are above
1528: * the underflow threshold.
1529: *
1530: N2 = 0
1531: N4 = 0
1532: DO 5991 p = 1, N - 1
1533: q = IDAMAX( N-p+1, SVA( p ), 1 ) + p - 1
1534: IF( p.NE.q ) THEN
1535: TEMP1 = SVA( p )
1536: SVA( p ) = SVA( q )
1537: SVA( q ) = TEMP1
1538: TEMP1 = WORK( p )
1539: WORK( p ) = WORK( q )
1540: WORK( q ) = TEMP1
1541: CALL DSWAP( M, A( 1, p ), 1, A( 1, q ), 1 )
1542: IF( RSVEC )CALL DSWAP( MVL, V( 1, p ), 1, V( 1, q ), 1 )
1543: END IF
1544: IF( SVA( p ).NE.ZERO ) THEN
1545: N4 = N4 + 1
1546: IF( SVA( p )*SKL.GT.SFMIN )N2 = N2 + 1
1547: END IF
1548: 5991 CONTINUE
1549: IF( SVA( N ).NE.ZERO ) THEN
1550: N4 = N4 + 1
1551: IF( SVA( N )*SKL.GT.SFMIN )N2 = N2 + 1
1552: END IF
1553: *
1554: * Normalize the left singular vectors.
1555: *
1556: IF( LSVEC .OR. UCTOL ) THEN
1557: DO 1998 p = 1, N2
1558: CALL DSCAL( M, WORK( p ) / SVA( p ), A( 1, p ), 1 )
1559: 1998 CONTINUE
1560: END IF
1561: *
1562: * Scale the product of Jacobi rotations (assemble the fast rotations).
1563: *
1564: IF( RSVEC ) THEN
1565: IF( APPLV ) THEN
1566: DO 2398 p = 1, N
1567: CALL DSCAL( MVL, WORK( p ), V( 1, p ), 1 )
1568: 2398 CONTINUE
1569: ELSE
1570: DO 2399 p = 1, N
1571: TEMP1 = ONE / DNRM2( MVL, V( 1, p ), 1 )
1572: CALL DSCAL( MVL, TEMP1, V( 1, p ), 1 )
1573: 2399 CONTINUE
1574: END IF
1575: END IF
1576: *
1577: * Undo scaling, if necessary (and possible).
1578: IF( ( ( SKL.GT.ONE ) .AND. ( SVA( 1 ).LT.( BIG / SKL) ) )
1579: $ .OR. ( ( SKL.LT.ONE ) .AND. ( SVA( MAX( N2, 1 ) ) .GT.
1580: $ ( SFMIN / SKL) ) ) ) THEN
1581: DO 2400 p = 1, N
1582: SVA( P ) = SKL*SVA( P )
1583: 2400 CONTINUE
1584: SKL= ONE
1585: END IF
1586: *
1587: WORK( 1 ) = SKL
1588: * The singular values of A are SKL*SVA(1:N). If SKL.NE.ONE
1589: * then some of the singular values may overflow or underflow and
1590: * the spectrum is given in this factored representation.
1591: *
1592: WORK( 2 ) = DBLE( N4 )
1593: * N4 is the number of computed nonzero singular values of A.
1594: *
1595: WORK( 3 ) = DBLE( N2 )
1596: * N2 is the number of singular values of A greater than SFMIN.
1597: * If N2<N, SVA(N2:N) contains ZEROS and/or denormalized numbers
1598: * that may carry some information.
1599: *
1600: WORK( 4 ) = DBLE( i )
1601: * i is the index of the last sweep before declaring convergence.
1602: *
1603: WORK( 5 ) = MXAAPQ
1604: * MXAAPQ is the largest absolute value of scaled pivots in the
1605: * last sweep
1606: *
1607: WORK( 6 ) = MXSINJ
1608: * MXSINJ is the largest absolute value of the sines of Jacobi angles
1609: * in the last sweep
1610: *
1611: RETURN
1612: * ..
1613: * .. END OF DGESVJ
1614: * ..
1615: END
CVSweb interface <joel.bertrand@systella.fr>