1: SUBROUTINE DGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
2: & M, N, A, LDA, SVA, U, LDU, V, LDV,
3: & WORK, LWORK, IWORK, INFO )
4: *
5: * -- LAPACK routine (version 3.3.0) --
6: *
7: * -- Contributed by Zlatko Drmac of the University of Zagreb and --
8: * -- Kresimir Veselic of the Fernuniversitaet Hagen --
9: * November 2010
10: *
11: * -- LAPACK is a software package provided by Univ. of Tennessee, --
12: * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
13: *
14: * This routine is also part of SIGMA (version 1.23, October 23. 2008.)
15: * SIGMA is a library of algorithms for highly accurate algorithms for
16: * computation of SVD, PSVD, QSVD, (H,K)-SVD, and for solution of the
17: * eigenvalue problems Hx = lambda M x, H M x = lambda x with H, M > 0.
18: *
19: * .. Scalar Arguments ..
20: IMPLICIT NONE
21: INTEGER INFO, LDA, LDU, LDV, LWORK, M, N
22: * ..
23: * .. Array Arguments ..
24: DOUBLE PRECISION A( LDA, * ), SVA( N ), U( LDU, * ), V( LDV, * ),
25: & WORK( LWORK )
26: INTEGER IWORK( * )
27: CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
28: * ..
29: *
30: * Purpose
31: * =======
32: *
33: * DGEJSV computes the singular value decomposition (SVD) of a real M-by-N
34: * matrix [A], where M >= N. The SVD of [A] is written as
35: *
36: * [A] = [U] * [SIGMA] * [V]^t,
37: *
38: * where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
39: * diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and
40: * [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are
41: * the singular values of [A]. The columns of [U] and [V] are the left and
42: * the right singular vectors of [A], respectively. The matrices [U] and [V]
43: * are computed and stored in the arrays U and V, respectively. The diagonal
44: * of [SIGMA] is computed and stored in the array SVA.
45: *
46: * Arguments
47: * =========
48: *
49: * JOBA (input) CHARACTER*1
50: * Specifies the level of accuracy:
51: * = 'C': This option works well (high relative accuracy) if A = B * D,
52: * with well-conditioned B and arbitrary diagonal matrix D.
53: * The accuracy cannot be spoiled by COLUMN scaling. The
54: * accuracy of the computed output depends on the condition of
55: * B, and the procedure aims at the best theoretical accuracy.
56: * The relative error max_{i=1:N}|d sigma_i| / sigma_i is
57: * bounded by f(M,N)*epsilon* cond(B), independent of D.
58: * The input matrix is preprocessed with the QRF with column
59: * pivoting. This initial preprocessing and preconditioning by
60: * a rank revealing QR factorization is common for all values of
61: * JOBA. Additional actions are specified as follows:
62: * = 'E': Computation as with 'C' with an additional estimate of the
63: * condition number of B. It provides a realistic error bound.
64: * = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
65: * D1, D2, and well-conditioned matrix C, this option gives
66: * higher accuracy than the 'C' option. If the structure of the
67: * input matrix is not known, and relative accuracy is
68: * desirable, then this option is advisable. The input matrix A
69: * is preprocessed with QR factorization with FULL (row and
70: * column) pivoting.
71: * = 'G' Computation as with 'F' with an additional estimate of the
72: * condition number of B, where A=D*B. If A has heavily weighted
73: * rows, then using this condition number gives too pessimistic
74: * error bound.
75: * = 'A': Small singular values are the noise and the matrix is treated
76: * as numerically rank defficient. The error in the computed
77: * singular values is bounded by f(m,n)*epsilon*||A||.
78: * The computed SVD A = U * S * V^t restores A up to
79: * f(m,n)*epsilon*||A||.
80: * This gives the procedure the licence to discard (set to zero)
81: * all singular values below N*epsilon*||A||.
82: * = 'R': Similar as in 'A'. Rank revealing property of the initial
83: * QR factorization is used do reveal (using triangular factor)
84: * a gap sigma_{r+1} < epsilon * sigma_r in which case the
85: * numerical RANK is declared to be r. The SVD is computed with
86: * absolute error bounds, but more accurately than with 'A'.
87: *
88: * JOBU (input) CHARACTER*1
89: * Specifies whether to compute the columns of U:
90: * = 'U': N columns of U are returned in the array U.
91: * = 'F': full set of M left sing. vectors is returned in the array U.
92: * = 'W': U may be used as workspace of length M*N. See the description
93: * of U.
94: * = 'N': U is not computed.
95: *
96: * JOBV (input) CHARACTER*1
97: * Specifies whether to compute the matrix V:
98: * = 'V': N columns of V are returned in the array V; Jacobi rotations
99: * are not explicitly accumulated.
100: * = 'J': N columns of V are returned in the array V, but they are
101: * computed as the product of Jacobi rotations. This option is
102: * allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
103: * = 'W': V may be used as workspace of length N*N. See the description
104: * of V.
105: * = 'N': V is not computed.
106: *
107: * JOBR (input) CHARACTER*1
108: * Specifies the RANGE for the singular values. Issues the licence to
109: * set to zero small positive singular values if they are outside
110: * specified range. If A .NE. 0 is scaled so that the largest singular
111: * value of c*A is around DSQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
112: * the licence to kill columns of A whose norm in c*A is less than
113: * DSQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,
114: * where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
115: * = 'N': Do not kill small columns of c*A. This option assumes that
116: * BLAS and QR factorizations and triangular solvers are
117: * implemented to work in that range. If the condition of A
118: * is greater than BIG, use DGESVJ.
119: * = 'R': RESTRICTED range for sigma(c*A) is [DSQRT(SFMIN), DSQRT(BIG)]
120: * (roughly, as described above). This option is recommended.
121: * ~~~~~~~~~~~~~~~~~~~~~~~~~~~
122: * For computing the singular values in the FULL range [SFMIN,BIG]
123: * use DGESVJ.
124: *
125: * JOBT (input) CHARACTER*1
126: * If the matrix is square then the procedure may determine to use
127: * transposed A if A^t seems to be better with respect to convergence.
128: * If the matrix is not square, JOBT is ignored. This is subject to
129: * changes in the future.
130: * The decision is based on two values of entropy over the adjoint
131: * orbit of A^t * A. See the descriptions of WORK(6) and WORK(7).
132: * = 'T': transpose if entropy test indicates possibly faster
133: * convergence of Jacobi process if A^t is taken as input. If A is
134: * replaced with A^t, then the row pivoting is included automatically.
135: * = 'N': do not speculate.
136: * This option can be used to compute only the singular values, or the
137: * full SVD (U, SIGMA and V). For only one set of singular vectors
138: * (U or V), the caller should provide both U and V, as one of the
139: * matrices is used as workspace if the matrix A is transposed.
140: * The implementer can easily remove this constraint and make the
141: * code more complicated. See the descriptions of U and V.
142: *
143: * JOBP (input) CHARACTER*1
144: * Issues the licence to introduce structured perturbations to drown
145: * denormalized numbers. This licence should be active if the
146: * denormals are poorly implemented, causing slow computation,
147: * especially in cases of fast convergence (!). For details see [1,2].
148: * For the sake of simplicity, this perturbations are included only
149: * when the full SVD or only the singular values are requested. The
150: * implementer/user can easily add the perturbation for the cases of
151: * computing one set of singular vectors.
152: * = 'P': introduce perturbation
153: * = 'N': do not perturb
154: *
155: * M (input) INTEGER
156: * The number of rows of the input matrix A. M >= 0.
157: *
158: * N (input) INTEGER
159: * The number of columns of the input matrix A. M >= N >= 0.
160: *
161: * A (input/workspace) DOUBLE PRECISION array, dimension (LDA,N)
162: * On entry, the M-by-N matrix A.
163: *
164: * LDA (input) INTEGER
165: * The leading dimension of the array A. LDA >= max(1,M).
166: *
167: * SVA (workspace/output) DOUBLE PRECISION array, dimension (N)
168: * On exit,
169: * - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
170: * computation SVA contains Euclidean column norms of the
171: * iterated matrices in the array A.
172: * - For WORK(1) .NE. WORK(2): The singular values of A are
173: * (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
174: * sigma_max(A) overflows or if small singular values have been
175: * saved from underflow by scaling the input matrix A.
176: * - If JOBR='R' then some of the singular values may be returned
177: * as exact zeros obtained by "set to zero" because they are
178: * below the numerical rank threshold or are denormalized numbers.
179: *
180: * U (workspace/output) DOUBLE PRECISION array, dimension ( LDU, N )
181: * If JOBU = 'U', then U contains on exit the M-by-N matrix of
182: * the left singular vectors.
183: * If JOBU = 'F', then U contains on exit the M-by-M matrix of
184: * the left singular vectors, including an ONB
185: * of the orthogonal complement of the Range(A).
186: * If JOBU = 'W' .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),
187: * then U is used as workspace if the procedure
188: * replaces A with A^t. In that case, [V] is computed
189: * in U as left singular vectors of A^t and then
190: * copied back to the V array. This 'W' option is just
191: * a reminder to the caller that in this case U is
192: * reserved as workspace of length N*N.
193: * If JOBU = 'N' U is not referenced.
194: *
195: * LDU (input) INTEGER
196: * The leading dimension of the array U, LDU >= 1.
197: * IF JOBU = 'U' or 'F' or 'W', then LDU >= M.
198: *
199: * V (workspace/output) DOUBLE PRECISION array, dimension ( LDV, N )
200: * If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
201: * the right singular vectors;
202: * If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),
203: * then V is used as workspace if the pprocedure
204: * replaces A with A^t. In that case, [U] is computed
205: * in V as right singular vectors of A^t and then
206: * copied back to the U array. This 'W' option is just
207: * a reminder to the caller that in this case V is
208: * reserved as workspace of length N*N.
209: * If JOBV = 'N' V is not referenced.
210: *
211: * LDV (input) INTEGER
212: * The leading dimension of the array V, LDV >= 1.
213: * If JOBV = 'V' or 'J' or 'W', then LDV >= N.
214: *
215: * WORK (workspace/output) DOUBLE PRECISION array, dimension at least LWORK.
216: * On exit,
217: * WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such
218: * that SCALE*SVA(1:N) are the computed singular values
219: * of A. (See the description of SVA().)
220: * WORK(2) = See the description of WORK(1).
221: * WORK(3) = SCONDA is an estimate for the condition number of
222: * column equilibrated A. (If JOBA .EQ. 'E' or 'G')
223: * SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
224: * It is computed using DPOCON. It holds
225: * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
226: * where R is the triangular factor from the QRF of A.
227: * However, if R is truncated and the numerical rank is
228: * determined to be strictly smaller than N, SCONDA is
229: * returned as -1, thus indicating that the smallest
230: * singular values might be lost.
231: *
232: * If full SVD is needed, the following two condition numbers are
233: * useful for the analysis of the algorithm. They are provied for
234: * a developer/implementer who is familiar with the details of
235: * the method.
236: *
237: * WORK(4) = an estimate of the scaled condition number of the
238: * triangular factor in the first QR factorization.
239: * WORK(5) = an estimate of the scaled condition number of the
240: * triangular factor in the second QR factorization.
241: * The following two parameters are computed if JOBT .EQ. 'T'.
242: * They are provided for a developer/implementer who is familiar
243: * with the details of the method.
244: *
245: * WORK(6) = the entropy of A^t*A :: this is the Shannon entropy
246: * of diag(A^t*A) / Trace(A^t*A) taken as point in the
247: * probability simplex.
248: * WORK(7) = the entropy of A*A^t.
249: *
250: * LWORK (input) INTEGER
251: * Length of WORK to confirm proper allocation of work space.
252: * LWORK depends on the job:
253: *
254: * If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and
255: * -> .. no scaled condition estimate required ( JOBE.EQ.'N'):
256: * LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement.
257: * For optimal performance (blocked code) the optimal value
258: * is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal
259: * block size for xGEQP3/xGEQRF.
260: * -> .. an estimate of the scaled condition number of A is
261: * required (JOBA='E', 'G'). In this case, LWORK is the maximum
262: * of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4N,7).
263: *
264: * If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),
265: * -> the minimal requirement is LWORK >= max(2*N+M,7).
266: * -> For optimal performance, LWORK >= max(2*N+M,2*N+N*NB,7),
267: * where NB is the optimal block size.
268: *
269: * If SIGMA and the left singular vectors are needed
270: * -> the minimal requirement is LWORK >= max(2*N+M,7).
271: * -> For optimal performance, LWORK >= max(2*N+M,2*N+N*NB,7),
272: * where NB is the optimal block size.
273: *
274: * If full SVD is needed ( JOBU.EQ.'U' or 'F', JOBV.EQ.'V' ) and
275: * -> .. the singular vectors are computed without explicit
276: * accumulation of the Jacobi rotations, LWORK >= 6*N+2*N*N
277: * -> .. in the iterative part, the Jacobi rotations are
278: * explicitly accumulated (option, see the description of JOBV),
279: * then the minimal requirement is LWORK >= max(M+3*N+N*N,7).
280: * For better performance, if NB is the optimal block size,
281: * LWORK >= max(3*N+N*N+M,3*N+N*N+N*NB,7).
282: *
283: * IWORK (workspace/output) INTEGER array, dimension M+3*N.
284: * On exit,
285: * IWORK(1) = the numerical rank determined after the initial
286: * QR factorization with pivoting. See the descriptions
287: * of JOBA and JOBR.
288: * IWORK(2) = the number of the computed nonzero singular values
289: * IWORK(3) = if nonzero, a warning message:
290: * If IWORK(3).EQ.1 then some of the column norms of A
291: * were denormalized floats. The requested high accuracy
292: * is not warranted by the data.
293: *
294: * INFO (output) INTEGER
295: * < 0 : if INFO = -i, then the i-th argument had an illegal value.
296: * = 0 : successfull exit;
297: * > 0 : DGEJSV did not converge in the maximal allowed number
298: * of sweeps. The computed values may be inaccurate.
299: *
300: * Further Details
301: * ===============
302: *
303: * DGEJSV implements a preconditioned Jacobi SVD algorithm. It uses SGEQP3,
304: * SGEQRF, and SGELQF as preprocessors and preconditioners. Optionally, an
305: * additional row pivoting can be used as a preprocessor, which in some
306: * cases results in much higher accuracy. An example is matrix A with the
307: * structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
308: * diagonal matrices and C is well-conditioned matrix. In that case, complete
309: * pivoting in the first QR factorizations provides accuracy dependent on the
310: * condition number of C, and independent of D1, D2. Such higher accuracy is
311: * not completely understood theoretically, but it works well in practice.
312: * Further, if A can be written as A = B*D, with well-conditioned B and some
313: * diagonal D, then the high accuracy is guaranteed, both theoretically and
314: * in software, independent of D. For more details see [1], [2].
315: * The computational range for the singular values can be the full range
316: * ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
317: * & LAPACK routines called by DGEJSV are implemented to work in that range.
318: * If that is not the case, then the restriction for safe computation with
319: * the singular values in the range of normalized IEEE numbers is that the
320: * spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
321: * overflow. This code (DGEJSV) is best used in this restricted range,
322: * meaning that singular values of magnitude below ||A||_2 / SLAMCH('O') are
323: * returned as zeros. See JOBR for details on this.
324: * Further, this implementation is somewhat slower than the one described
325: * in [1,2] due to replacement of some non-LAPACK components, and because
326: * the choice of some tuning parameters in the iterative part (DGESVJ) is
327: * left to the implementer on a particular machine.
328: * The rank revealing QR factorization (in this code: SGEQP3) should be
329: * implemented as in [3]. We have a new version of SGEQP3 under development
330: * that is more robust than the current one in LAPACK, with a cleaner cut in
331: * rank defficient cases. It will be available in the SIGMA library [4].
332: * If M is much larger than N, it is obvious that the inital QRF with
333: * column pivoting can be preprocessed by the QRF without pivoting. That
334: * well known trick is not used in DGEJSV because in some cases heavy row
335: * weighting can be treated with complete pivoting. The overhead in cases
336: * M much larger than N is then only due to pivoting, but the benefits in
337: * terms of accuracy have prevailed. The implementer/user can incorporate
338: * this extra QRF step easily. The implementer can also improve data movement
339: * (matrix transpose, matrix copy, matrix transposed copy) - this
340: * implementation of DGEJSV uses only the simplest, naive data movement.
341: *
342: * Contributors
343: *
344: * Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
345: *
346: * References
347: *
348: * [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
349: * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
350: * LAPACK Working note 169.
351: * [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
352: * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
353: * LAPACK Working note 170.
354: * [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
355: * factorization software - a case study.
356: * ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
357: * LAPACK Working note 176.
358: * [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
359: * QSVD, (H,K)-SVD computations.
360: * Department of Mathematics, University of Zagreb, 2008.
361: *
362: * Bugs, examples and comments
363: *
364: * Please report all bugs and send interesting examples and/or comments to
365: * drmac@math.hr. Thank you.
366: *
367: * ==========================================================================
368: *
369: * .. Local Parameters ..
370: DOUBLE PRECISION ZERO, ONE
371: PARAMETER ( ZERO = 0.0D0, ONE = 1.0D0 )
372: * ..
373: * .. Local Scalars ..
374: DOUBLE PRECISION AAPP, AAQQ, AATMAX, AATMIN, BIG, BIG1, COND_OK,
375: & CONDR1, CONDR2, ENTRA, ENTRAT, EPSLN, MAXPRJ, SCALEM,
376: & SCONDA, SFMIN, SMALL, TEMP1, USCAL1, USCAL2, XSC
377: INTEGER IERR, N1, NR, NUMRANK, p, q, WARNING
378: LOGICAL ALMORT, DEFR, ERREST, GOSCAL, JRACC, KILL, LSVEC,
379: & L2ABER, L2KILL, L2PERT, L2RANK, L2TRAN,
380: & NOSCAL, ROWPIV, RSVEC, TRANSP
381: * ..
382: * .. Intrinsic Functions ..
383: INTRINSIC DABS, DLOG, DMAX1, DMIN1, DBLE,
384: & MAX0, MIN0, IDNINT, DSIGN, DSQRT
385: * ..
386: * .. External Functions ..
387: DOUBLE PRECISION DLAMCH, DNRM2
388: INTEGER IDAMAX
389: LOGICAL LSAME
390: EXTERNAL IDAMAX, LSAME, DLAMCH, DNRM2
391: * ..
392: * .. External Subroutines ..
393: EXTERNAL DCOPY, DGELQF, DGEQP3, DGEQRF, DLACPY, DLASCL,
394: & DLASET, DLASSQ, DLASWP, DORGQR, DORMLQ,
395: & DORMQR, DPOCON, DSCAL, DSWAP, DTRSM, XERBLA
396: *
397: EXTERNAL DGESVJ
398: * ..
399: *
400: * Test the input arguments
401: *
402: LSVEC = LSAME( JOBU, 'U' ) .OR. LSAME( JOBU, 'F' )
403: JRACC = LSAME( JOBV, 'J' )
404: RSVEC = LSAME( JOBV, 'V' ) .OR. JRACC
405: ROWPIV = LSAME( JOBA, 'F' ) .OR. LSAME( JOBA, 'G' )
406: L2RANK = LSAME( JOBA, 'R' )
407: L2ABER = LSAME( JOBA, 'A' )
408: ERREST = LSAME( JOBA, 'E' ) .OR. LSAME( JOBA, 'G' )
409: L2TRAN = LSAME( JOBT, 'T' )
410: L2KILL = LSAME( JOBR, 'R' )
411: DEFR = LSAME( JOBR, 'N' )
412: L2PERT = LSAME( JOBP, 'P' )
413: *
414: IF ( .NOT.(ROWPIV .OR. L2RANK .OR. L2ABER .OR.
415: & ERREST .OR. LSAME( JOBA, 'C' ) )) THEN
416: INFO = - 1
417: ELSE IF ( .NOT.( LSVEC .OR. LSAME( JOBU, 'N' ) .OR.
418: & LSAME( JOBU, 'W' )) ) THEN
419: INFO = - 2
420: ELSE IF ( .NOT.( RSVEC .OR. LSAME( JOBV, 'N' ) .OR.
421: & LSAME( JOBV, 'W' )) .OR. ( JRACC .AND. (.NOT.LSVEC) ) ) THEN
422: INFO = - 3
423: ELSE IF ( .NOT. ( L2KILL .OR. DEFR ) ) THEN
424: INFO = - 4
425: ELSE IF ( .NOT. ( L2TRAN .OR. LSAME( JOBT, 'N' ) ) ) THEN
426: INFO = - 5
427: ELSE IF ( .NOT. ( L2PERT .OR. LSAME( JOBP, 'N' ) ) ) THEN
428: INFO = - 6
429: ELSE IF ( M .LT. 0 ) THEN
430: INFO = - 7
431: ELSE IF ( ( N .LT. 0 ) .OR. ( N .GT. M ) ) THEN
432: INFO = - 8
433: ELSE IF ( LDA .LT. M ) THEN
434: INFO = - 10
435: ELSE IF ( LSVEC .AND. ( LDU .LT. M ) ) THEN
436: INFO = - 13
437: ELSE IF ( RSVEC .AND. ( LDV .LT. N ) ) THEN
438: INFO = - 14
439: ELSE IF ( (.NOT.(LSVEC .OR. RSVEC .OR. ERREST).AND.
440: & (LWORK .LT. MAX0(7,4*N+1,2*M+N))) .OR.
441: & (.NOT.(LSVEC .OR. LSVEC) .AND. ERREST .AND.
442: & (LWORK .LT. MAX0(7,4*N+N*N,2*M+N))) .OR.
443: & (LSVEC .AND. (.NOT.RSVEC) .AND. (LWORK .LT. MAX0(7,2*N+M))) .OR.
444: & (RSVEC .AND. (.NOT.LSVEC) .AND. (LWORK .LT. MAX0(7,2*N+M))) .OR.
445: & (LSVEC .AND. RSVEC .AND. .NOT.JRACC .AND. (LWORK.LT.6*N+2*N*N))
446: & .OR. (LSVEC.AND.RSVEC.AND.JRACC.AND.LWORK.LT.MAX0(7,M+3*N+N*N)))
447: & THEN
448: INFO = - 17
449: ELSE
450: * #:)
451: INFO = 0
452: END IF
453: *
454: IF ( INFO .NE. 0 ) THEN
455: * #:(
456: CALL XERBLA( 'DGEJSV', - INFO )
457: END IF
458: *
459: * Quick return for void matrix (Y3K safe)
460: * #:)
461: IF ( ( M .EQ. 0 ) .OR. ( N .EQ. 0 ) ) RETURN
462: *
463: * Determine whether the matrix U should be M x N or M x M
464: *
465: IF ( LSVEC ) THEN
466: N1 = N
467: IF ( LSAME( JOBU, 'F' ) ) N1 = M
468: END IF
469: *
470: * Set numerical parameters
471: *
472: *! NOTE: Make sure DLAMCH() does not fail on the target architecture.
473: *
474:
475: EPSLN = DLAMCH('Epsilon')
476: SFMIN = DLAMCH('SafeMinimum')
477: SMALL = SFMIN / EPSLN
478: BIG = DLAMCH('O')
479: * BIG = ONE / SFMIN
480: *
481: * Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
482: *
483: *(!) If necessary, scale SVA() to protect the largest norm from
484: * overflow. It is possible that this scaling pushes the smallest
485: * column norm left from the underflow threshold (extreme case).
486: *
487: SCALEM = ONE / DSQRT(DBLE(M)*DBLE(N))
488: NOSCAL = .TRUE.
489: GOSCAL = .TRUE.
490: DO 1874 p = 1, N
491: AAPP = ZERO
492: AAQQ = ONE
493: CALL DLASSQ( M, A(1,p), 1, AAPP, AAQQ )
494: IF ( AAPP .GT. BIG ) THEN
495: INFO = - 9
496: CALL XERBLA( 'DGEJSV', -INFO )
497: RETURN
498: END IF
499: AAQQ = DSQRT(AAQQ)
500: IF ( ( AAPP .LT. (BIG / AAQQ) ) .AND. NOSCAL ) THEN
501: SVA(p) = AAPP * AAQQ
502: ELSE
503: NOSCAL = .FALSE.
504: SVA(p) = AAPP * ( AAQQ * SCALEM )
505: IF ( GOSCAL ) THEN
506: GOSCAL = .FALSE.
507: CALL DSCAL( p-1, SCALEM, SVA, 1 )
508: END IF
509: END IF
510: 1874 CONTINUE
511: *
512: IF ( NOSCAL ) SCALEM = ONE
513: *
514: AAPP = ZERO
515: AAQQ = BIG
516: DO 4781 p = 1, N
517: AAPP = DMAX1( AAPP, SVA(p) )
518: IF ( SVA(p) .NE. ZERO ) AAQQ = DMIN1( AAQQ, SVA(p) )
519: 4781 CONTINUE
520: *
521: * Quick return for zero M x N matrix
522: * #:)
523: IF ( AAPP .EQ. ZERO ) THEN
524: IF ( LSVEC ) CALL DLASET( 'G', M, N1, ZERO, ONE, U, LDU )
525: IF ( RSVEC ) CALL DLASET( 'G', N, N, ZERO, ONE, V, LDV )
526: WORK(1) = ONE
527: WORK(2) = ONE
528: IF ( ERREST ) WORK(3) = ONE
529: IF ( LSVEC .AND. RSVEC ) THEN
530: WORK(4) = ONE
531: WORK(5) = ONE
532: END IF
533: IF ( L2TRAN ) THEN
534: WORK(6) = ZERO
535: WORK(7) = ZERO
536: END IF
537: IWORK(1) = 0
538: IWORK(2) = 0
539: RETURN
540: END IF
541: *
542: * Issue warning if denormalized column norms detected. Override the
543: * high relative accuracy request. Issue licence to kill columns
544: * (set them to zero) whose norm is less than sigma_max / BIG (roughly).
545: * #:(
546: WARNING = 0
547: IF ( AAQQ .LE. SFMIN ) THEN
548: L2RANK = .TRUE.
549: L2KILL = .TRUE.
550: WARNING = 1
551: END IF
552: *
553: * Quick return for one-column matrix
554: * #:)
555: IF ( N .EQ. 1 ) THEN
556: *
557: IF ( LSVEC ) THEN
558: CALL DLASCL( 'G',0,0,SVA(1),SCALEM, M,1,A(1,1),LDA,IERR )
559: CALL DLACPY( 'A', M, 1, A, LDA, U, LDU )
560: * computing all M left singular vectors of the M x 1 matrix
561: IF ( N1 .NE. N ) THEN
562: CALL DGEQRF( M, N, U,LDU, WORK, WORK(N+1),LWORK-N,IERR )
563: CALL DORGQR( M,N1,1, U,LDU,WORK,WORK(N+1),LWORK-N,IERR )
564: CALL DCOPY( M, A(1,1), 1, U(1,1), 1 )
565: END IF
566: END IF
567: IF ( RSVEC ) THEN
568: V(1,1) = ONE
569: END IF
570: IF ( SVA(1) .LT. (BIG*SCALEM) ) THEN
571: SVA(1) = SVA(1) / SCALEM
572: SCALEM = ONE
573: END IF
574: WORK(1) = ONE / SCALEM
575: WORK(2) = ONE
576: IF ( SVA(1) .NE. ZERO ) THEN
577: IWORK(1) = 1
578: IF ( ( SVA(1) / SCALEM) .GE. SFMIN ) THEN
579: IWORK(2) = 1
580: ELSE
581: IWORK(2) = 0
582: END IF
583: ELSE
584: IWORK(1) = 0
585: IWORK(2) = 0
586: END IF
587: IF ( ERREST ) WORK(3) = ONE
588: IF ( LSVEC .AND. RSVEC ) THEN
589: WORK(4) = ONE
590: WORK(5) = ONE
591: END IF
592: IF ( L2TRAN ) THEN
593: WORK(6) = ZERO
594: WORK(7) = ZERO
595: END IF
596: RETURN
597: *
598: END IF
599: *
600: TRANSP = .FALSE.
601: L2TRAN = L2TRAN .AND. ( M .EQ. N )
602: *
603: AATMAX = -ONE
604: AATMIN = BIG
605: IF ( ROWPIV .OR. L2TRAN ) THEN
606: *
607: * Compute the row norms, needed to determine row pivoting sequence
608: * (in the case of heavily row weighted A, row pivoting is strongly
609: * advised) and to collect information needed to compare the
610: * structures of A * A^t and A^t * A (in the case L2TRAN.EQ..TRUE.).
611: *
612: IF ( L2TRAN ) THEN
613: DO 1950 p = 1, M
614: XSC = ZERO
615: TEMP1 = ONE
616: CALL DLASSQ( N, A(p,1), LDA, XSC, TEMP1 )
617: * DLASSQ gets both the ell_2 and the ell_infinity norm
618: * in one pass through the vector
619: WORK(M+N+p) = XSC * SCALEM
620: WORK(N+p) = XSC * (SCALEM*DSQRT(TEMP1))
621: AATMAX = DMAX1( AATMAX, WORK(N+p) )
622: IF (WORK(N+p) .NE. ZERO) AATMIN = DMIN1(AATMIN,WORK(N+p))
623: 1950 CONTINUE
624: ELSE
625: DO 1904 p = 1, M
626: WORK(M+N+p) = SCALEM*DABS( A(p,IDAMAX(N,A(p,1),LDA)) )
627: AATMAX = DMAX1( AATMAX, WORK(M+N+p) )
628: AATMIN = DMIN1( AATMIN, WORK(M+N+p) )
629: 1904 CONTINUE
630: END IF
631: *
632: END IF
633: *
634: * For square matrix A try to determine whether A^t would be better
635: * input for the preconditioned Jacobi SVD, with faster convergence.
636: * The decision is based on an O(N) function of the vector of column
637: * and row norms of A, based on the Shannon entropy. This should give
638: * the right choice in most cases when the difference actually matters.
639: * It may fail and pick the slower converging side.
640: *
641: ENTRA = ZERO
642: ENTRAT = ZERO
643: IF ( L2TRAN ) THEN
644: *
645: XSC = ZERO
646: TEMP1 = ONE
647: CALL DLASSQ( N, SVA, 1, XSC, TEMP1 )
648: TEMP1 = ONE / TEMP1
649: *
650: ENTRA = ZERO
651: DO 1113 p = 1, N
652: BIG1 = ( ( SVA(p) / XSC )**2 ) * TEMP1
653: IF ( BIG1 .NE. ZERO ) ENTRA = ENTRA + BIG1 * DLOG(BIG1)
654: 1113 CONTINUE
655: ENTRA = - ENTRA / DLOG(DBLE(N))
656: *
657: * Now, SVA().^2/Trace(A^t * A) is a point in the probability simplex.
658: * It is derived from the diagonal of A^t * A. Do the same with the
659: * diagonal of A * A^t, compute the entropy of the corresponding
660: * probability distribution. Note that A * A^t and A^t * A have the
661: * same trace.
662: *
663: ENTRAT = ZERO
664: DO 1114 p = N+1, N+M
665: BIG1 = ( ( WORK(p) / XSC )**2 ) * TEMP1
666: IF ( BIG1 .NE. ZERO ) ENTRAT = ENTRAT + BIG1 * DLOG(BIG1)
667: 1114 CONTINUE
668: ENTRAT = - ENTRAT / DLOG(DBLE(M))
669: *
670: * Analyze the entropies and decide A or A^t. Smaller entropy
671: * usually means better input for the algorithm.
672: *
673: TRANSP = ( ENTRAT .LT. ENTRA )
674: *
675: * If A^t is better than A, transpose A.
676: *
677: IF ( TRANSP ) THEN
678: * In an optimal implementation, this trivial transpose
679: * should be replaced with faster transpose.
680: DO 1115 p = 1, N - 1
681: DO 1116 q = p + 1, N
682: TEMP1 = A(q,p)
683: A(q,p) = A(p,q)
684: A(p,q) = TEMP1
685: 1116 CONTINUE
686: 1115 CONTINUE
687: DO 1117 p = 1, N
688: WORK(M+N+p) = SVA(p)
689: SVA(p) = WORK(N+p)
690: 1117 CONTINUE
691: TEMP1 = AAPP
692: AAPP = AATMAX
693: AATMAX = TEMP1
694: TEMP1 = AAQQ
695: AAQQ = AATMIN
696: AATMIN = TEMP1
697: KILL = LSVEC
698: LSVEC = RSVEC
699: RSVEC = KILL
700: IF ( LSVEC ) N1 = N
701: *
702: ROWPIV = .TRUE.
703: END IF
704: *
705: END IF
706: * END IF L2TRAN
707: *
708: * Scale the matrix so that its maximal singular value remains less
709: * than DSQRT(BIG) -- the matrix is scaled so that its maximal column
710: * has Euclidean norm equal to DSQRT(BIG/N). The only reason to keep
711: * DSQRT(BIG) instead of BIG is the fact that DGEJSV uses LAPACK and
712: * BLAS routines that, in some implementations, are not capable of
713: * working in the full interval [SFMIN,BIG] and that they may provoke
714: * overflows in the intermediate results. If the singular values spread
715: * from SFMIN to BIG, then DGESVJ will compute them. So, in that case,
716: * one should use DGESVJ instead of DGEJSV.
717: *
718: BIG1 = DSQRT( BIG )
719: TEMP1 = DSQRT( BIG / DBLE(N) )
720: *
721: CALL DLASCL( 'G', 0, 0, AAPP, TEMP1, N, 1, SVA, N, IERR )
722: IF ( AAQQ .GT. (AAPP * SFMIN) ) THEN
723: AAQQ = ( AAQQ / AAPP ) * TEMP1
724: ELSE
725: AAQQ = ( AAQQ * TEMP1 ) / AAPP
726: END IF
727: TEMP1 = TEMP1 * SCALEM
728: CALL DLASCL( 'G', 0, 0, AAPP, TEMP1, M, N, A, LDA, IERR )
729: *
730: * To undo scaling at the end of this procedure, multiply the
731: * computed singular values with USCAL2 / USCAL1.
732: *
733: USCAL1 = TEMP1
734: USCAL2 = AAPP
735: *
736: IF ( L2KILL ) THEN
737: * L2KILL enforces computation of nonzero singular values in
738: * the restricted range of condition number of the initial A,
739: * sigma_max(A) / sigma_min(A) approx. DSQRT(BIG)/DSQRT(SFMIN).
740: XSC = DSQRT( SFMIN )
741: ELSE
742: XSC = SMALL
743: *
744: * Now, if the condition number of A is too big,
745: * sigma_max(A) / sigma_min(A) .GT. DSQRT(BIG/N) * EPSLN / SFMIN,
746: * as a precaution measure, the full SVD is computed using DGESVJ
747: * with accumulated Jacobi rotations. This provides numerically
748: * more robust computation, at the cost of slightly increased run
749: * time. Depending on the concrete implementation of BLAS and LAPACK
750: * (i.e. how they behave in presence of extreme ill-conditioning) the
751: * implementor may decide to remove this switch.
752: IF ( ( AAQQ.LT.DSQRT(SFMIN) ) .AND. LSVEC .AND. RSVEC ) THEN
753: JRACC = .TRUE.
754: END IF
755: *
756: END IF
757: IF ( AAQQ .LT. XSC ) THEN
758: DO 700 p = 1, N
759: IF ( SVA(p) .LT. XSC ) THEN
760: CALL DLASET( 'A', M, 1, ZERO, ZERO, A(1,p), LDA )
761: SVA(p) = ZERO
762: END IF
763: 700 CONTINUE
764: END IF
765: *
766: * Preconditioning using QR factorization with pivoting
767: *
768: IF ( ROWPIV ) THEN
769: * Optional row permutation (Bjoerck row pivoting):
770: * A result by Cox and Higham shows that the Bjoerck's
771: * row pivoting combined with standard column pivoting
772: * has similar effect as Powell-Reid complete pivoting.
773: * The ell-infinity norms of A are made nonincreasing.
774: DO 1952 p = 1, M - 1
775: q = IDAMAX( M-p+1, WORK(M+N+p), 1 ) + p - 1
776: IWORK(2*N+p) = q
777: IF ( p .NE. q ) THEN
778: TEMP1 = WORK(M+N+p)
779: WORK(M+N+p) = WORK(M+N+q)
780: WORK(M+N+q) = TEMP1
781: END IF
782: 1952 CONTINUE
783: CALL DLASWP( N, A, LDA, 1, M-1, IWORK(2*N+1), 1 )
784: END IF
785: *
786: * End of the preparation phase (scaling, optional sorting and
787: * transposing, optional flushing of small columns).
788: *
789: * Preconditioning
790: *
791: * If the full SVD is needed, the right singular vectors are computed
792: * from a matrix equation, and for that we need theoretical analysis
793: * of the Businger-Golub pivoting. So we use DGEQP3 as the first RR QRF.
794: * In all other cases the first RR QRF can be chosen by other criteria
795: * (eg speed by replacing global with restricted window pivoting, such
796: * as in SGEQPX from TOMS # 782). Good results will be obtained using
797: * SGEQPX with properly (!) chosen numerical parameters.
798: * Any improvement of DGEQP3 improves overal performance of DGEJSV.
799: *
800: * A * P1 = Q1 * [ R1^t 0]^t:
801: DO 1963 p = 1, N
802: * .. all columns are free columns
803: IWORK(p) = 0
804: 1963 CONTINUE
805: CALL DGEQP3( M,N,A,LDA, IWORK,WORK, WORK(N+1),LWORK-N, IERR )
806: *
807: * The upper triangular matrix R1 from the first QRF is inspected for
808: * rank deficiency and possibilities for deflation, or possible
809: * ill-conditioning. Depending on the user specified flag L2RANK,
810: * the procedure explores possibilities to reduce the numerical
811: * rank by inspecting the computed upper triangular factor. If
812: * L2RANK or L2ABER are up, then DGEJSV will compute the SVD of
813: * A + dA, where ||dA|| <= f(M,N)*EPSLN.
814: *
815: NR = 1
816: IF ( L2ABER ) THEN
817: * Standard absolute error bound suffices. All sigma_i with
818: * sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
819: * agressive enforcement of lower numerical rank by introducing a
820: * backward error of the order of N*EPSLN*||A||.
821: TEMP1 = DSQRT(DBLE(N))*EPSLN
822: DO 3001 p = 2, N
823: IF ( DABS(A(p,p)) .GE. (TEMP1*DABS(A(1,1))) ) THEN
824: NR = NR + 1
825: ELSE
826: GO TO 3002
827: END IF
828: 3001 CONTINUE
829: 3002 CONTINUE
830: ELSE IF ( L2RANK ) THEN
831: * .. similarly as above, only slightly more gentle (less agressive).
832: * Sudden drop on the diagonal of R1 is used as the criterion for
833: * close-to-rank-defficient.
834: TEMP1 = DSQRT(SFMIN)
835: DO 3401 p = 2, N
836: IF ( ( DABS(A(p,p)) .LT. (EPSLN*DABS(A(p-1,p-1))) ) .OR.
837: & ( DABS(A(p,p)) .LT. SMALL ) .OR.
838: & ( L2KILL .AND. (DABS(A(p,p)) .LT. TEMP1) ) ) GO TO 3402
839: NR = NR + 1
840: 3401 CONTINUE
841: 3402 CONTINUE
842: *
843: ELSE
844: * The goal is high relative accuracy. However, if the matrix
845: * has high scaled condition number the relative accuracy is in
846: * general not feasible. Later on, a condition number estimator
847: * will be deployed to estimate the scaled condition number.
848: * Here we just remove the underflowed part of the triangular
849: * factor. This prevents the situation in which the code is
850: * working hard to get the accuracy not warranted by the data.
851: TEMP1 = DSQRT(SFMIN)
852: DO 3301 p = 2, N
853: IF ( ( DABS(A(p,p)) .LT. SMALL ) .OR.
854: & ( L2KILL .AND. (DABS(A(p,p)) .LT. TEMP1) ) ) GO TO 3302
855: NR = NR + 1
856: 3301 CONTINUE
857: 3302 CONTINUE
858: *
859: END IF
860: *
861: ALMORT = .FALSE.
862: IF ( NR .EQ. N ) THEN
863: MAXPRJ = ONE
864: DO 3051 p = 2, N
865: TEMP1 = DABS(A(p,p)) / SVA(IWORK(p))
866: MAXPRJ = DMIN1( MAXPRJ, TEMP1 )
867: 3051 CONTINUE
868: IF ( MAXPRJ**2 .GE. ONE - DBLE(N)*EPSLN ) ALMORT = .TRUE.
869: END IF
870: *
871: *
872: SCONDA = - ONE
873: CONDR1 = - ONE
874: CONDR2 = - ONE
875: *
876: IF ( ERREST ) THEN
877: IF ( N .EQ. NR ) THEN
878: IF ( RSVEC ) THEN
879: * .. V is available as workspace
880: CALL DLACPY( 'U', N, N, A, LDA, V, LDV )
881: DO 3053 p = 1, N
882: TEMP1 = SVA(IWORK(p))
883: CALL DSCAL( p, ONE/TEMP1, V(1,p), 1 )
884: 3053 CONTINUE
885: CALL DPOCON( 'U', N, V, LDV, ONE, TEMP1,
886: & WORK(N+1), IWORK(2*N+M+1), IERR )
887: ELSE IF ( LSVEC ) THEN
888: * .. U is available as workspace
889: CALL DLACPY( 'U', N, N, A, LDA, U, LDU )
890: DO 3054 p = 1, N
891: TEMP1 = SVA(IWORK(p))
892: CALL DSCAL( p, ONE/TEMP1, U(1,p), 1 )
893: 3054 CONTINUE
894: CALL DPOCON( 'U', N, U, LDU, ONE, TEMP1,
895: & WORK(N+1), IWORK(2*N+M+1), IERR )
896: ELSE
897: CALL DLACPY( 'U', N, N, A, LDA, WORK(N+1), N )
898: DO 3052 p = 1, N
899: TEMP1 = SVA(IWORK(p))
900: CALL DSCAL( p, ONE/TEMP1, WORK(N+(p-1)*N+1), 1 )
901: 3052 CONTINUE
902: * .. the columns of R are scaled to have unit Euclidean lengths.
903: CALL DPOCON( 'U', N, WORK(N+1), N, ONE, TEMP1,
904: & WORK(N+N*N+1), IWORK(2*N+M+1), IERR )
905: END IF
906: SCONDA = ONE / DSQRT(TEMP1)
907: * SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
908: * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
909: ELSE
910: SCONDA = - ONE
911: END IF
912: END IF
913: *
914: L2PERT = L2PERT .AND. ( DABS( A(1,1)/A(NR,NR) ) .GT. DSQRT(BIG1) )
915: * If there is no violent scaling, artificial perturbation is not needed.
916: *
917: * Phase 3:
918: *
919:
920: IF ( .NOT. ( RSVEC .OR. LSVEC ) ) THEN
921: *
922: * Singular Values only
923: *
924: * .. transpose A(1:NR,1:N)
925: DO 1946 p = 1, MIN0( N-1, NR )
926: CALL DCOPY( N-p, A(p,p+1), LDA, A(p+1,p), 1 )
927: 1946 CONTINUE
928: *
929: * The following two DO-loops introduce small relative perturbation
930: * into the strict upper triangle of the lower triangular matrix.
931: * Small entries below the main diagonal are also changed.
932: * This modification is useful if the computing environment does not
933: * provide/allow FLUSH TO ZERO underflow, for it prevents many
934: * annoying denormalized numbers in case of strongly scaled matrices.
935: * The perturbation is structured so that it does not introduce any
936: * new perturbation of the singular values, and it does not destroy
937: * the job done by the preconditioner.
938: * The licence for this perturbation is in the variable L2PERT, which
939: * should be .FALSE. if FLUSH TO ZERO underflow is active.
940: *
941: IF ( .NOT. ALMORT ) THEN
942: *
943: IF ( L2PERT ) THEN
944: * XSC = DSQRT(SMALL)
945: XSC = EPSLN / DBLE(N)
946: DO 4947 q = 1, NR
947: TEMP1 = XSC*DABS(A(q,q))
948: DO 4949 p = 1, N
949: IF ( ( (p.GT.q) .AND. (DABS(A(p,q)).LE.TEMP1) )
950: & .OR. ( p .LT. q ) )
951: & A(p,q) = DSIGN( TEMP1, A(p,q) )
952: 4949 CONTINUE
953: 4947 CONTINUE
954: ELSE
955: CALL DLASET( 'U', NR-1,NR-1, ZERO,ZERO, A(1,2),LDA )
956: END IF
957: *
958: * .. second preconditioning using the QR factorization
959: *
960: CALL DGEQRF( N,NR, A,LDA, WORK, WORK(N+1),LWORK-N, IERR )
961: *
962: * .. and transpose upper to lower triangular
963: DO 1948 p = 1, NR - 1
964: CALL DCOPY( NR-p, A(p,p+1), LDA, A(p+1,p), 1 )
965: 1948 CONTINUE
966: *
967: END IF
968: *
969: * Row-cyclic Jacobi SVD algorithm with column pivoting
970: *
971: * .. again some perturbation (a "background noise") is added
972: * to drown denormals
973: IF ( L2PERT ) THEN
974: * XSC = DSQRT(SMALL)
975: XSC = EPSLN / DBLE(N)
976: DO 1947 q = 1, NR
977: TEMP1 = XSC*DABS(A(q,q))
978: DO 1949 p = 1, NR
979: IF ( ( (p.GT.q) .AND. (DABS(A(p,q)).LE.TEMP1) )
980: & .OR. ( p .LT. q ) )
981: & A(p,q) = DSIGN( TEMP1, A(p,q) )
982: 1949 CONTINUE
983: 1947 CONTINUE
984: ELSE
985: CALL DLASET( 'U', NR-1, NR-1, ZERO, ZERO, A(1,2), LDA )
986: END IF
987: *
988: * .. and one-sided Jacobi rotations are started on a lower
989: * triangular matrix (plus perturbation which is ignored in
990: * the part which destroys triangular form (confusing?!))
991: *
992: CALL DGESVJ( 'L', 'NoU', 'NoV', NR, NR, A, LDA, SVA,
993: & N, V, LDV, WORK, LWORK, INFO )
994: *
995: SCALEM = WORK(1)
996: NUMRANK = IDNINT(WORK(2))
997: *
998: *
999: ELSE IF ( RSVEC .AND. ( .NOT. LSVEC ) ) THEN
1000: *
1001: * -> Singular Values and Right Singular Vectors <-
1002: *
1003: IF ( ALMORT ) THEN
1004: *
1005: * .. in this case NR equals N
1006: DO 1998 p = 1, NR
1007: CALL DCOPY( N-p+1, A(p,p), LDA, V(p,p), 1 )
1008: 1998 CONTINUE
1009: CALL DLASET( 'Upper', NR-1, NR-1, ZERO, ZERO, V(1,2), LDV )
1010: *
1011: CALL DGESVJ( 'L','U','N', N, NR, V,LDV, SVA, NR, A,LDA,
1012: & WORK, LWORK, INFO )
1013: SCALEM = WORK(1)
1014: NUMRANK = IDNINT(WORK(2))
1015:
1016: ELSE
1017: *
1018: * .. two more QR factorizations ( one QRF is not enough, two require
1019: * accumulated product of Jacobi rotations, three are perfect )
1020: *
1021: CALL DLASET( 'Lower', NR-1, NR-1, ZERO, ZERO, A(2,1), LDA )
1022: CALL DGELQF( NR, N, A, LDA, WORK, WORK(N+1), LWORK-N, IERR)
1023: CALL DLACPY( 'Lower', NR, NR, A, LDA, V, LDV )
1024: CALL DLASET( 'Upper', NR-1, NR-1, ZERO, ZERO, V(1,2), LDV )
1025: CALL DGEQRF( NR, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1026: & LWORK-2*N, IERR )
1027: DO 8998 p = 1, NR
1028: CALL DCOPY( NR-p+1, V(p,p), LDV, V(p,p), 1 )
1029: 8998 CONTINUE
1030: CALL DLASET( 'Upper', NR-1, NR-1, ZERO, ZERO, V(1,2), LDV )
1031: *
1032: CALL DGESVJ( 'Lower', 'U','N', NR, NR, V,LDV, SVA, NR, U,
1033: & LDU, WORK(N+1), LWORK, INFO )
1034: SCALEM = WORK(N+1)
1035: NUMRANK = IDNINT(WORK(N+2))
1036: IF ( NR .LT. N ) THEN
1037: CALL DLASET( 'A',N-NR, NR, ZERO,ZERO, V(NR+1,1), LDV )
1038: CALL DLASET( 'A',NR, N-NR, ZERO,ZERO, V(1,NR+1), LDV )
1039: CALL DLASET( 'A',N-NR,N-NR,ZERO,ONE, V(NR+1,NR+1), LDV )
1040: END IF
1041: *
1042: CALL DORMLQ( 'Left', 'Transpose', N, N, NR, A, LDA, WORK,
1043: & V, LDV, WORK(N+1), LWORK-N, IERR )
1044: *
1045: END IF
1046: *
1047: DO 8991 p = 1, N
1048: CALL DCOPY( N, V(p,1), LDV, A(IWORK(p),1), LDA )
1049: 8991 CONTINUE
1050: CALL DLACPY( 'All', N, N, A, LDA, V, LDV )
1051: *
1052: IF ( TRANSP ) THEN
1053: CALL DLACPY( 'All', N, N, V, LDV, U, LDU )
1054: END IF
1055: *
1056: ELSE IF ( LSVEC .AND. ( .NOT. RSVEC ) ) THEN
1057: *
1058: * .. Singular Values and Left Singular Vectors ..
1059: *
1060: * .. second preconditioning step to avoid need to accumulate
1061: * Jacobi rotations in the Jacobi iterations.
1062: DO 1965 p = 1, NR
1063: CALL DCOPY( N-p+1, A(p,p), LDA, U(p,p), 1 )
1064: 1965 CONTINUE
1065: CALL DLASET( 'Upper', NR-1, NR-1, ZERO, ZERO, U(1,2), LDU )
1066: *
1067: CALL DGEQRF( N, NR, U, LDU, WORK(N+1), WORK(2*N+1),
1068: & LWORK-2*N, IERR )
1069: *
1070: DO 1967 p = 1, NR - 1
1071: CALL DCOPY( NR-p, U(p,p+1), LDU, U(p+1,p), 1 )
1072: 1967 CONTINUE
1073: CALL DLASET( 'Upper', NR-1, NR-1, ZERO, ZERO, U(1,2), LDU )
1074: *
1075: CALL DGESVJ( 'Lower', 'U', 'N', NR,NR, U, LDU, SVA, NR, A,
1076: & LDA, WORK(N+1), LWORK-N, INFO )
1077: SCALEM = WORK(N+1)
1078: NUMRANK = IDNINT(WORK(N+2))
1079: *
1080: IF ( NR .LT. M ) THEN
1081: CALL DLASET( 'A', M-NR, NR,ZERO, ZERO, U(NR+1,1), LDU )
1082: IF ( NR .LT. N1 ) THEN
1083: CALL DLASET( 'A',NR, N1-NR, ZERO, ZERO, U(1,NR+1), LDU )
1084: CALL DLASET( 'A',M-NR,N1-NR,ZERO,ONE,U(NR+1,NR+1), LDU )
1085: END IF
1086: END IF
1087: *
1088: CALL DORMQR( 'Left', 'No Tr', M, N1, N, A, LDA, WORK, U,
1089: & LDU, WORK(N+1), LWORK-N, IERR )
1090: *
1091: IF ( ROWPIV )
1092: & CALL DLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 )
1093: *
1094: DO 1974 p = 1, N1
1095: XSC = ONE / DNRM2( M, U(1,p), 1 )
1096: CALL DSCAL( M, XSC, U(1,p), 1 )
1097: 1974 CONTINUE
1098: *
1099: IF ( TRANSP ) THEN
1100: CALL DLACPY( 'All', N, N, U, LDU, V, LDV )
1101: END IF
1102: *
1103: ELSE
1104: *
1105: * .. Full SVD ..
1106: *
1107: IF ( .NOT. JRACC ) THEN
1108: *
1109: IF ( .NOT. ALMORT ) THEN
1110: *
1111: * Second Preconditioning Step (QRF [with pivoting])
1112: * Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1113: * equivalent to an LQF CALL. Since in many libraries the QRF
1114: * seems to be better optimized than the LQF, we do explicit
1115: * transpose and use the QRF. This is subject to changes in an
1116: * optimized implementation of DGEJSV.
1117: *
1118: DO 1968 p = 1, NR
1119: CALL DCOPY( N-p+1, A(p,p), LDA, V(p,p), 1 )
1120: 1968 CONTINUE
1121: *
1122: * .. the following two loops perturb small entries to avoid
1123: * denormals in the second QR factorization, where they are
1124: * as good as zeros. This is done to avoid painfully slow
1125: * computation with denormals. The relative size of the perturbation
1126: * is a parameter that can be changed by the implementer.
1127: * This perturbation device will be obsolete on machines with
1128: * properly implemented arithmetic.
1129: * To switch it off, set L2PERT=.FALSE. To remove it from the
1130: * code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1131: * The following two loops should be blocked and fused with the
1132: * transposed copy above.
1133: *
1134: IF ( L2PERT ) THEN
1135: XSC = DSQRT(SMALL)
1136: DO 2969 q = 1, NR
1137: TEMP1 = XSC*DABS( V(q,q) )
1138: DO 2968 p = 1, N
1139: IF ( ( p .GT. q ) .AND. ( DABS(V(p,q)) .LE. TEMP1 )
1140: & .OR. ( p .LT. q ) )
1141: & V(p,q) = DSIGN( TEMP1, V(p,q) )
1142: IF ( p. LT. q ) V(p,q) = - V(p,q)
1143: 2968 CONTINUE
1144: 2969 CONTINUE
1145: ELSE
1146: CALL DLASET( 'U', NR-1, NR-1, ZERO, ZERO, V(1,2), LDV )
1147: END IF
1148: *
1149: * Estimate the row scaled condition number of R1
1150: * (If R1 is rectangular, N > NR, then the condition number
1151: * of the leading NR x NR submatrix is estimated.)
1152: *
1153: CALL DLACPY( 'L', NR, NR, V, LDV, WORK(2*N+1), NR )
1154: DO 3950 p = 1, NR
1155: TEMP1 = DNRM2(NR-p+1,WORK(2*N+(p-1)*NR+p),1)
1156: CALL DSCAL(NR-p+1,ONE/TEMP1,WORK(2*N+(p-1)*NR+p),1)
1157: 3950 CONTINUE
1158: CALL DPOCON('Lower',NR,WORK(2*N+1),NR,ONE,TEMP1,
1159: & WORK(2*N+NR*NR+1),IWORK(M+2*N+1),IERR)
1160: CONDR1 = ONE / DSQRT(TEMP1)
1161: * .. here need a second oppinion on the condition number
1162: * .. then assume worst case scenario
1163: * R1 is OK for inverse <=> CONDR1 .LT. DBLE(N)
1164: * more conservative <=> CONDR1 .LT. DSQRT(DBLE(N))
1165: *
1166: COND_OK = DSQRT(DBLE(NR))
1167: *[TP] COND_OK is a tuning parameter.
1168:
1169: IF ( CONDR1 .LT. COND_OK ) THEN
1170: * .. the second QRF without pivoting. Note: in an optimized
1171: * implementation, this QRF should be implemented as the QRF
1172: * of a lower triangular matrix.
1173: * R1^t = Q2 * R2
1174: CALL DGEQRF( N, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1175: & LWORK-2*N, IERR )
1176: *
1177: IF ( L2PERT ) THEN
1178: XSC = DSQRT(SMALL)/EPSLN
1179: DO 3959 p = 2, NR
1180: DO 3958 q = 1, p - 1
1181: TEMP1 = XSC * DMIN1(DABS(V(p,p)),DABS(V(q,q)))
1182: IF ( DABS(V(q,p)) .LE. TEMP1 )
1183: & V(q,p) = DSIGN( TEMP1, V(q,p) )
1184: 3958 CONTINUE
1185: 3959 CONTINUE
1186: END IF
1187: *
1188: IF ( NR .NE. N )
1189: * .. save ...
1190: & CALL DLACPY( 'A', N, NR, V, LDV, WORK(2*N+1), N )
1191: *
1192: * .. this transposed copy should be better than naive
1193: DO 1969 p = 1, NR - 1
1194: CALL DCOPY( NR-p, V(p,p+1), LDV, V(p+1,p), 1 )
1195: 1969 CONTINUE
1196: *
1197: CONDR2 = CONDR1
1198: *
1199: ELSE
1200: *
1201: * .. ill-conditioned case: second QRF with pivoting
1202: * Note that windowed pivoting would be equaly good
1203: * numerically, and more run-time efficient. So, in
1204: * an optimal implementation, the next call to DGEQP3
1205: * should be replaced with eg. CALL SGEQPX (ACM TOMS #782)
1206: * with properly (carefully) chosen parameters.
1207: *
1208: * R1^t * P2 = Q2 * R2
1209: DO 3003 p = 1, NR
1210: IWORK(N+p) = 0
1211: 3003 CONTINUE
1212: CALL DGEQP3( N, NR, V, LDV, IWORK(N+1), WORK(N+1),
1213: & WORK(2*N+1), LWORK-2*N, IERR )
1214: ** CALL DGEQRF( N, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1215: ** & LWORK-2*N, IERR )
1216: IF ( L2PERT ) THEN
1217: XSC = DSQRT(SMALL)
1218: DO 3969 p = 2, NR
1219: DO 3968 q = 1, p - 1
1220: TEMP1 = XSC * DMIN1(DABS(V(p,p)),DABS(V(q,q)))
1221: IF ( DABS(V(q,p)) .LE. TEMP1 )
1222: & V(q,p) = DSIGN( TEMP1, V(q,p) )
1223: 3968 CONTINUE
1224: 3969 CONTINUE
1225: END IF
1226: *
1227: CALL DLACPY( 'A', N, NR, V, LDV, WORK(2*N+1), N )
1228: *
1229: IF ( L2PERT ) THEN
1230: XSC = DSQRT(SMALL)
1231: DO 8970 p = 2, NR
1232: DO 8971 q = 1, p - 1
1233: TEMP1 = XSC * DMIN1(DABS(V(p,p)),DABS(V(q,q)))
1234: V(p,q) = - DSIGN( TEMP1, V(q,p) )
1235: 8971 CONTINUE
1236: 8970 CONTINUE
1237: ELSE
1238: CALL DLASET( 'L',NR-1,NR-1,ZERO,ZERO,V(2,1),LDV )
1239: END IF
1240: * Now, compute R2 = L3 * Q3, the LQ factorization.
1241: CALL DGELQF( NR, NR, V, LDV, WORK(2*N+N*NR+1),
1242: & WORK(2*N+N*NR+NR+1), LWORK-2*N-N*NR-NR, IERR )
1243: * .. and estimate the condition number
1244: CALL DLACPY( 'L',NR,NR,V,LDV,WORK(2*N+N*NR+NR+1),NR )
1245: DO 4950 p = 1, NR
1246: TEMP1 = DNRM2( p, WORK(2*N+N*NR+NR+p), NR )
1247: CALL DSCAL( p, ONE/TEMP1, WORK(2*N+N*NR+NR+p), NR )
1248: 4950 CONTINUE
1249: CALL DPOCON( 'L',NR,WORK(2*N+N*NR+NR+1),NR,ONE,TEMP1,
1250: & WORK(2*N+N*NR+NR+NR*NR+1),IWORK(M+2*N+1),IERR )
1251: CONDR2 = ONE / DSQRT(TEMP1)
1252: *
1253: IF ( CONDR2 .GE. COND_OK ) THEN
1254: * .. save the Householder vectors used for Q3
1255: * (this overwrittes the copy of R2, as it will not be
1256: * needed in this branch, but it does not overwritte the
1257: * Huseholder vectors of Q2.).
1258: CALL DLACPY( 'U', NR, NR, V, LDV, WORK(2*N+1), N )
1259: * .. and the rest of the information on Q3 is in
1260: * WORK(2*N+N*NR+1:2*N+N*NR+N)
1261: END IF
1262: *
1263: END IF
1264: *
1265: IF ( L2PERT ) THEN
1266: XSC = DSQRT(SMALL)
1267: DO 4968 q = 2, NR
1268: TEMP1 = XSC * V(q,q)
1269: DO 4969 p = 1, q - 1
1270: * V(p,q) = - DSIGN( TEMP1, V(q,p) )
1271: V(p,q) = - DSIGN( TEMP1, V(p,q) )
1272: 4969 CONTINUE
1273: 4968 CONTINUE
1274: ELSE
1275: CALL DLASET( 'U', NR-1,NR-1, ZERO,ZERO, V(1,2), LDV )
1276: END IF
1277: *
1278: * Second preconditioning finished; continue with Jacobi SVD
1279: * The input matrix is lower trinagular.
1280: *
1281: * Recover the right singular vectors as solution of a well
1282: * conditioned triangular matrix equation.
1283: *
1284: IF ( CONDR1 .LT. COND_OK ) THEN
1285: *
1286: CALL DGESVJ( 'L','U','N',NR,NR,V,LDV,SVA,NR,U,
1287: & LDU,WORK(2*N+N*NR+NR+1),LWORK-2*N-N*NR-NR,INFO )
1288: SCALEM = WORK(2*N+N*NR+NR+1)
1289: NUMRANK = IDNINT(WORK(2*N+N*NR+NR+2))
1290: DO 3970 p = 1, NR
1291: CALL DCOPY( NR, V(1,p), 1, U(1,p), 1 )
1292: CALL DSCAL( NR, SVA(p), V(1,p), 1 )
1293: 3970 CONTINUE
1294:
1295: * .. pick the right matrix equation and solve it
1296: *
1297: IF ( NR. EQ. N ) THEN
1298: * :)) .. best case, R1 is inverted. The solution of this matrix
1299: * equation is Q2*V2 = the product of the Jacobi rotations
1300: * used in DGESVJ, premultiplied with the orthogonal matrix
1301: * from the second QR factorization.
1302: CALL DTRSM( 'L','U','N','N', NR,NR,ONE, A,LDA, V,LDV )
1303: ELSE
1304: * .. R1 is well conditioned, but non-square. Transpose(R2)
1305: * is inverted to get the product of the Jacobi rotations
1306: * used in DGESVJ. The Q-factor from the second QR
1307: * factorization is then built in explicitly.
1308: CALL DTRSM('L','U','T','N',NR,NR,ONE,WORK(2*N+1),
1309: & N,V,LDV)
1310: IF ( NR .LT. N ) THEN
1311: CALL DLASET('A',N-NR,NR,ZERO,ZERO,V(NR+1,1),LDV)
1312: CALL DLASET('A',NR,N-NR,ZERO,ZERO,V(1,NR+1),LDV)
1313: CALL DLASET('A',N-NR,N-NR,ZERO,ONE,V(NR+1,NR+1),LDV)
1314: END IF
1315: CALL DORMQR('L','N',N,N,NR,WORK(2*N+1),N,WORK(N+1),
1316: & V,LDV,WORK(2*N+N*NR+NR+1),LWORK-2*N-N*NR-NR,IERR)
1317: END IF
1318: *
1319: ELSE IF ( CONDR2 .LT. COND_OK ) THEN
1320: *
1321: * :) .. the input matrix A is very likely a relative of
1322: * the Kahan matrix :)
1323: * The matrix R2 is inverted. The solution of the matrix equation
1324: * is Q3^T*V3 = the product of the Jacobi rotations (appplied to
1325: * the lower triangular L3 from the LQ factorization of
1326: * R2=L3*Q3), pre-multiplied with the transposed Q3.
1327: CALL DGESVJ( 'L', 'U', 'N', NR, NR, V, LDV, SVA, NR, U,
1328: & LDU, WORK(2*N+N*NR+NR+1), LWORK-2*N-N*NR-NR, INFO )
1329: SCALEM = WORK(2*N+N*NR+NR+1)
1330: NUMRANK = IDNINT(WORK(2*N+N*NR+NR+2))
1331: DO 3870 p = 1, NR
1332: CALL DCOPY( NR, V(1,p), 1, U(1,p), 1 )
1333: CALL DSCAL( NR, SVA(p), U(1,p), 1 )
1334: 3870 CONTINUE
1335: CALL DTRSM('L','U','N','N',NR,NR,ONE,WORK(2*N+1),N,U,LDU)
1336: * .. apply the permutation from the second QR factorization
1337: DO 873 q = 1, NR
1338: DO 872 p = 1, NR
1339: WORK(2*N+N*NR+NR+IWORK(N+p)) = U(p,q)
1340: 872 CONTINUE
1341: DO 874 p = 1, NR
1342: U(p,q) = WORK(2*N+N*NR+NR+p)
1343: 874 CONTINUE
1344: 873 CONTINUE
1345: IF ( NR .LT. N ) THEN
1346: CALL DLASET( 'A',N-NR,NR,ZERO,ZERO,V(NR+1,1),LDV )
1347: CALL DLASET( 'A',NR,N-NR,ZERO,ZERO,V(1,NR+1),LDV )
1348: CALL DLASET( 'A',N-NR,N-NR,ZERO,ONE,V(NR+1,NR+1),LDV )
1349: END IF
1350: CALL DORMQR( 'L','N',N,N,NR,WORK(2*N+1),N,WORK(N+1),
1351: & V,LDV,WORK(2*N+N*NR+NR+1),LWORK-2*N-N*NR-NR,IERR )
1352: ELSE
1353: * Last line of defense.
1354: * #:( This is a rather pathological case: no scaled condition
1355: * improvement after two pivoted QR factorizations. Other
1356: * possibility is that the rank revealing QR factorization
1357: * or the condition estimator has failed, or the COND_OK
1358: * is set very close to ONE (which is unnecessary). Normally,
1359: * this branch should never be executed, but in rare cases of
1360: * failure of the RRQR or condition estimator, the last line of
1361: * defense ensures that DGEJSV completes the task.
1362: * Compute the full SVD of L3 using DGESVJ with explicit
1363: * accumulation of Jacobi rotations.
1364: CALL DGESVJ( 'L', 'U', 'V', NR, NR, V, LDV, SVA, NR, U,
1365: & LDU, WORK(2*N+N*NR+NR+1), LWORK-2*N-N*NR-NR, INFO )
1366: SCALEM = WORK(2*N+N*NR+NR+1)
1367: NUMRANK = IDNINT(WORK(2*N+N*NR+NR+2))
1368: IF ( NR .LT. N ) THEN
1369: CALL DLASET( 'A',N-NR,NR,ZERO,ZERO,V(NR+1,1),LDV )
1370: CALL DLASET( 'A',NR,N-NR,ZERO,ZERO,V(1,NR+1),LDV )
1371: CALL DLASET( 'A',N-NR,N-NR,ZERO,ONE,V(NR+1,NR+1),LDV )
1372: END IF
1373: CALL DORMQR( 'L','N',N,N,NR,WORK(2*N+1),N,WORK(N+1),
1374: & V,LDV,WORK(2*N+N*NR+NR+1),LWORK-2*N-N*NR-NR,IERR )
1375: *
1376: CALL DORMLQ( 'L', 'T', NR, NR, NR, WORK(2*N+1), N,
1377: & WORK(2*N+N*NR+1), U, LDU, WORK(2*N+N*NR+NR+1),
1378: & LWORK-2*N-N*NR-NR, IERR )
1379: DO 773 q = 1, NR
1380: DO 772 p = 1, NR
1381: WORK(2*N+N*NR+NR+IWORK(N+p)) = U(p,q)
1382: 772 CONTINUE
1383: DO 774 p = 1, NR
1384: U(p,q) = WORK(2*N+N*NR+NR+p)
1385: 774 CONTINUE
1386: 773 CONTINUE
1387: *
1388: END IF
1389: *
1390: * Permute the rows of V using the (column) permutation from the
1391: * first QRF. Also, scale the columns to make them unit in
1392: * Euclidean norm. This applies to all cases.
1393: *
1394: TEMP1 = DSQRT(DBLE(N)) * EPSLN
1395: DO 1972 q = 1, N
1396: DO 972 p = 1, N
1397: WORK(2*N+N*NR+NR+IWORK(p)) = V(p,q)
1398: 972 CONTINUE
1399: DO 973 p = 1, N
1400: V(p,q) = WORK(2*N+N*NR+NR+p)
1401: 973 CONTINUE
1402: XSC = ONE / DNRM2( N, V(1,q), 1 )
1403: IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) )
1404: & CALL DSCAL( N, XSC, V(1,q), 1 )
1405: 1972 CONTINUE
1406: * At this moment, V contains the right singular vectors of A.
1407: * Next, assemble the left singular vector matrix U (M x N).
1408: IF ( NR .LT. M ) THEN
1409: CALL DLASET( 'A', M-NR, NR, ZERO, ZERO, U(NR+1,1), LDU )
1410: IF ( NR .LT. N1 ) THEN
1411: CALL DLASET('A',NR,N1-NR,ZERO,ZERO,U(1,NR+1),LDU)
1412: CALL DLASET('A',M-NR,N1-NR,ZERO,ONE,U(NR+1,NR+1),LDU)
1413: END IF
1414: END IF
1415: *
1416: * The Q matrix from the first QRF is built into the left singular
1417: * matrix U. This applies to all cases.
1418: *
1419: CALL DORMQR( 'Left', 'No_Tr', M, N1, N, A, LDA, WORK, U,
1420: & LDU, WORK(N+1), LWORK-N, IERR )
1421:
1422: * The columns of U are normalized. The cost is O(M*N) flops.
1423: TEMP1 = DSQRT(DBLE(M)) * EPSLN
1424: DO 1973 p = 1, NR
1425: XSC = ONE / DNRM2( M, U(1,p), 1 )
1426: IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) )
1427: & CALL DSCAL( M, XSC, U(1,p), 1 )
1428: 1973 CONTINUE
1429: *
1430: * If the initial QRF is computed with row pivoting, the left
1431: * singular vectors must be adjusted.
1432: *
1433: IF ( ROWPIV )
1434: & CALL DLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 )
1435: *
1436: ELSE
1437: *
1438: * .. the initial matrix A has almost orthogonal columns and
1439: * the second QRF is not needed
1440: *
1441: CALL DLACPY( 'Upper', N, N, A, LDA, WORK(N+1), N )
1442: IF ( L2PERT ) THEN
1443: XSC = DSQRT(SMALL)
1444: DO 5970 p = 2, N
1445: TEMP1 = XSC * WORK( N + (p-1)*N + p )
1446: DO 5971 q = 1, p - 1
1447: WORK(N+(q-1)*N+p)=-DSIGN(TEMP1,WORK(N+(p-1)*N+q))
1448: 5971 CONTINUE
1449: 5970 CONTINUE
1450: ELSE
1451: CALL DLASET( 'Lower',N-1,N-1,ZERO,ZERO,WORK(N+2),N )
1452: END IF
1453: *
1454: CALL DGESVJ( 'Upper', 'U', 'N', N, N, WORK(N+1), N, SVA,
1455: & N, U, LDU, WORK(N+N*N+1), LWORK-N-N*N, INFO )
1456: *
1457: SCALEM = WORK(N+N*N+1)
1458: NUMRANK = IDNINT(WORK(N+N*N+2))
1459: DO 6970 p = 1, N
1460: CALL DCOPY( N, WORK(N+(p-1)*N+1), 1, U(1,p), 1 )
1461: CALL DSCAL( N, SVA(p), WORK(N+(p-1)*N+1), 1 )
1462: 6970 CONTINUE
1463: *
1464: CALL DTRSM( 'Left', 'Upper', 'NoTrans', 'No UD', N, N,
1465: & ONE, A, LDA, WORK(N+1), N )
1466: DO 6972 p = 1, N
1467: CALL DCOPY( N, WORK(N+p), N, V(IWORK(p),1), LDV )
1468: 6972 CONTINUE
1469: TEMP1 = DSQRT(DBLE(N))*EPSLN
1470: DO 6971 p = 1, N
1471: XSC = ONE / DNRM2( N, V(1,p), 1 )
1472: IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) )
1473: & CALL DSCAL( N, XSC, V(1,p), 1 )
1474: 6971 CONTINUE
1475: *
1476: * Assemble the left singular vector matrix U (M x N).
1477: *
1478: IF ( N .LT. M ) THEN
1479: CALL DLASET( 'A', M-N, N, ZERO, ZERO, U(N+1,1), LDU )
1480: IF ( N .LT. N1 ) THEN
1481: CALL DLASET( 'A',N, N1-N, ZERO, ZERO, U(1,N+1),LDU )
1482: CALL DLASET( 'A',M-N,N1-N, ZERO, ONE,U(N+1,N+1),LDU )
1483: END IF
1484: END IF
1485: CALL DORMQR( 'Left', 'No Tr', M, N1, N, A, LDA, WORK, U,
1486: & LDU, WORK(N+1), LWORK-N, IERR )
1487: TEMP1 = DSQRT(DBLE(M))*EPSLN
1488: DO 6973 p = 1, N1
1489: XSC = ONE / DNRM2( M, U(1,p), 1 )
1490: IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) )
1491: & CALL DSCAL( M, XSC, U(1,p), 1 )
1492: 6973 CONTINUE
1493: *
1494: IF ( ROWPIV )
1495: & CALL DLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 )
1496: *
1497: END IF
1498: *
1499: * end of the >> almost orthogonal case << in the full SVD
1500: *
1501: ELSE
1502: *
1503: * This branch deploys a preconditioned Jacobi SVD with explicitly
1504: * accumulated rotations. It is included as optional, mainly for
1505: * experimental purposes. It does perfom well, and can also be used.
1506: * In this implementation, this branch will be automatically activated
1507: * if the condition number sigma_max(A) / sigma_min(A) is predicted
1508: * to be greater than the overflow threshold. This is because the
1509: * a posteriori computation of the singular vectors assumes robust
1510: * implementation of BLAS and some LAPACK procedures, capable of working
1511: * in presence of extreme values. Since that is not always the case, ...
1512: *
1513: DO 7968 p = 1, NR
1514: CALL DCOPY( N-p+1, A(p,p), LDA, V(p,p), 1 )
1515: 7968 CONTINUE
1516: *
1517: IF ( L2PERT ) THEN
1518: XSC = DSQRT(SMALL/EPSLN)
1519: DO 5969 q = 1, NR
1520: TEMP1 = XSC*DABS( V(q,q) )
1521: DO 5968 p = 1, N
1522: IF ( ( p .GT. q ) .AND. ( DABS(V(p,q)) .LE. TEMP1 )
1523: & .OR. ( p .LT. q ) )
1524: & V(p,q) = DSIGN( TEMP1, V(p,q) )
1525: IF ( p. LT. q ) V(p,q) = - V(p,q)
1526: 5968 CONTINUE
1527: 5969 CONTINUE
1528: ELSE
1529: CALL DLASET( 'U', NR-1, NR-1, ZERO, ZERO, V(1,2), LDV )
1530: END IF
1531:
1532: CALL DGEQRF( N, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1533: & LWORK-2*N, IERR )
1534: CALL DLACPY( 'L', N, NR, V, LDV, WORK(2*N+1), N )
1535: *
1536: DO 7969 p = 1, NR
1537: CALL DCOPY( NR-p+1, V(p,p), LDV, U(p,p), 1 )
1538: 7969 CONTINUE
1539:
1540: IF ( L2PERT ) THEN
1541: XSC = DSQRT(SMALL/EPSLN)
1542: DO 9970 q = 2, NR
1543: DO 9971 p = 1, q - 1
1544: TEMP1 = XSC * DMIN1(DABS(U(p,p)),DABS(U(q,q)))
1545: U(p,q) = - DSIGN( TEMP1, U(q,p) )
1546: 9971 CONTINUE
1547: 9970 CONTINUE
1548: ELSE
1549: CALL DLASET('U', NR-1, NR-1, ZERO, ZERO, U(1,2), LDU )
1550: END IF
1551:
1552: CALL DGESVJ( 'G', 'U', 'V', NR, NR, U, LDU, SVA,
1553: & N, V, LDV, WORK(2*N+N*NR+1), LWORK-2*N-N*NR, INFO )
1554: SCALEM = WORK(2*N+N*NR+1)
1555: NUMRANK = IDNINT(WORK(2*N+N*NR+2))
1556:
1557: IF ( NR .LT. N ) THEN
1558: CALL DLASET( 'A',N-NR,NR,ZERO,ZERO,V(NR+1,1),LDV )
1559: CALL DLASET( 'A',NR,N-NR,ZERO,ZERO,V(1,NR+1),LDV )
1560: CALL DLASET( 'A',N-NR,N-NR,ZERO,ONE,V(NR+1,NR+1),LDV )
1561: END IF
1562:
1563: CALL DORMQR( 'L','N',N,N,NR,WORK(2*N+1),N,WORK(N+1),
1564: & V,LDV,WORK(2*N+N*NR+NR+1),LWORK-2*N-N*NR-NR,IERR )
1565: *
1566: * Permute the rows of V using the (column) permutation from the
1567: * first QRF. Also, scale the columns to make them unit in
1568: * Euclidean norm. This applies to all cases.
1569: *
1570: TEMP1 = DSQRT(DBLE(N)) * EPSLN
1571: DO 7972 q = 1, N
1572: DO 8972 p = 1, N
1573: WORK(2*N+N*NR+NR+IWORK(p)) = V(p,q)
1574: 8972 CONTINUE
1575: DO 8973 p = 1, N
1576: V(p,q) = WORK(2*N+N*NR+NR+p)
1577: 8973 CONTINUE
1578: XSC = ONE / DNRM2( N, V(1,q), 1 )
1579: IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) )
1580: & CALL DSCAL( N, XSC, V(1,q), 1 )
1581: 7972 CONTINUE
1582: *
1583: * At this moment, V contains the right singular vectors of A.
1584: * Next, assemble the left singular vector matrix U (M x N).
1585: *
1586: IF ( NR .LT. M ) THEN
1587: CALL DLASET( 'A', M-NR, NR, ZERO, ZERO, U(NR+1,1), LDU )
1588: IF ( NR .LT. N1 ) THEN
1589: CALL DLASET( 'A',NR, N1-NR, ZERO, ZERO, U(1,NR+1),LDU )
1590: CALL DLASET( 'A',M-NR,N1-NR, ZERO, ONE,U(NR+1,NR+1),LDU )
1591: END IF
1592: END IF
1593: *
1594: CALL DORMQR( 'Left', 'No Tr', M, N1, N, A, LDA, WORK, U,
1595: & LDU, WORK(N+1), LWORK-N, IERR )
1596: *
1597: IF ( ROWPIV )
1598: & CALL DLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 )
1599: *
1600: *
1601: END IF
1602: IF ( TRANSP ) THEN
1603: * .. swap U and V because the procedure worked on A^t
1604: DO 6974 p = 1, N
1605: CALL DSWAP( N, U(1,p), 1, V(1,p), 1 )
1606: 6974 CONTINUE
1607: END IF
1608: *
1609: END IF
1610: * end of the full SVD
1611: *
1612: * Undo scaling, if necessary (and possible)
1613: *
1614: IF ( USCAL2 .LE. (BIG/SVA(1))*USCAL1 ) THEN
1615: CALL DLASCL( 'G', 0, 0, USCAL1, USCAL2, NR, 1, SVA, N, IERR )
1616: USCAL1 = ONE
1617: USCAL2 = ONE
1618: END IF
1619: *
1620: IF ( NR .LT. N ) THEN
1621: DO 3004 p = NR+1, N
1622: SVA(p) = ZERO
1623: 3004 CONTINUE
1624: END IF
1625: *
1626: WORK(1) = USCAL2 * SCALEM
1627: WORK(2) = USCAL1
1628: IF ( ERREST ) WORK(3) = SCONDA
1629: IF ( LSVEC .AND. RSVEC ) THEN
1630: WORK(4) = CONDR1
1631: WORK(5) = CONDR2
1632: END IF
1633: IF ( L2TRAN ) THEN
1634: WORK(6) = ENTRA
1635: WORK(7) = ENTRAT
1636: END IF
1637: *
1638: IWORK(1) = NR
1639: IWORK(2) = NUMRANK
1640: IWORK(3) = WARNING
1641: *
1642: RETURN
1643: * ..
1644: * .. END OF DGEJSV
1645: * ..
1646: END
1647: *
CVSweb interface <joel.bertrand@systella.fr>