Annotation of rpl/lapack/lapack/dgejsv.f, revision 1.6
1.1 bertrand 1: SUBROUTINE DGEJSV( JOBA, JOBU, JOBV, JOBR, JOBT, JOBP,
1.6 ! bertrand 2: $ M, N, A, LDA, SVA, U, LDU, V, LDV,
! 3: $ WORK, LWORK, IWORK, INFO )
1.1 bertrand 4: *
1.6 ! bertrand 5: * -- LAPACK routine (version 3.3.1) --
1.1 bertrand 6: *
7: * -- Contributed by Zlatko Drmac of the University of Zagreb and --
8: * -- Kresimir Veselic of the Fernuniversitaet Hagen --
1.6 ! bertrand 9: * -- April 2011 --
1.1 bertrand 10: *
11: * -- LAPACK is a software package provided by Univ. of Tennessee, --
12: * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
13: *
14: * This routine is also part of SIGMA (version 1.23, October 23. 2008.)
15: * SIGMA is a library of algorithms for highly accurate algorithms for
16: * computation of SVD, PSVD, QSVD, (H,K)-SVD, and for solution of the
17: * eigenvalue problems Hx = lambda M x, H M x = lambda x with H, M > 0.
18: *
19: * .. Scalar Arguments ..
20: IMPLICIT NONE
21: INTEGER INFO, LDA, LDU, LDV, LWORK, M, N
22: * ..
23: * .. Array Arguments ..
24: DOUBLE PRECISION A( LDA, * ), SVA( N ), U( LDU, * ), V( LDV, * ),
1.6 ! bertrand 25: $ WORK( LWORK )
1.1 bertrand 26: INTEGER IWORK( * )
27: CHARACTER*1 JOBA, JOBP, JOBR, JOBT, JOBU, JOBV
28: * ..
29: *
30: * Purpose
31: * =======
32: *
33: * DGEJSV computes the singular value decomposition (SVD) of a real M-by-N
34: * matrix [A], where M >= N. The SVD of [A] is written as
35: *
36: * [A] = [U] * [SIGMA] * [V]^t,
37: *
38: * where [SIGMA] is an N-by-N (M-by-N) matrix which is zero except for its N
39: * diagonal elements, [U] is an M-by-N (or M-by-M) orthonormal matrix, and
40: * [V] is an N-by-N orthogonal matrix. The diagonal elements of [SIGMA] are
41: * the singular values of [A]. The columns of [U] and [V] are the left and
42: * the right singular vectors of [A], respectively. The matrices [U] and [V]
43: * are computed and stored in the arrays U and V, respectively. The diagonal
44: * of [SIGMA] is computed and stored in the array SVA.
45: *
46: * Arguments
47: * =========
48: *
49: * JOBA (input) CHARACTER*1
50: * Specifies the level of accuracy:
51: * = 'C': This option works well (high relative accuracy) if A = B * D,
52: * with well-conditioned B and arbitrary diagonal matrix D.
53: * The accuracy cannot be spoiled by COLUMN scaling. The
54: * accuracy of the computed output depends on the condition of
55: * B, and the procedure aims at the best theoretical accuracy.
56: * The relative error max_{i=1:N}|d sigma_i| / sigma_i is
57: * bounded by f(M,N)*epsilon* cond(B), independent of D.
58: * The input matrix is preprocessed with the QRF with column
59: * pivoting. This initial preprocessing and preconditioning by
60: * a rank revealing QR factorization is common for all values of
61: * JOBA. Additional actions are specified as follows:
62: * = 'E': Computation as with 'C' with an additional estimate of the
63: * condition number of B. It provides a realistic error bound.
64: * = 'F': If A = D1 * C * D2 with ill-conditioned diagonal scalings
65: * D1, D2, and well-conditioned matrix C, this option gives
66: * higher accuracy than the 'C' option. If the structure of the
67: * input matrix is not known, and relative accuracy is
68: * desirable, then this option is advisable. The input matrix A
69: * is preprocessed with QR factorization with FULL (row and
70: * column) pivoting.
71: * = 'G' Computation as with 'F' with an additional estimate of the
72: * condition number of B, where A=D*B. If A has heavily weighted
73: * rows, then using this condition number gives too pessimistic
74: * error bound.
75: * = 'A': Small singular values are the noise and the matrix is treated
76: * as numerically rank defficient. The error in the computed
77: * singular values is bounded by f(m,n)*epsilon*||A||.
78: * The computed SVD A = U * S * V^t restores A up to
79: * f(m,n)*epsilon*||A||.
80: * This gives the procedure the licence to discard (set to zero)
81: * all singular values below N*epsilon*||A||.
82: * = 'R': Similar as in 'A'. Rank revealing property of the initial
83: * QR factorization is used do reveal (using triangular factor)
84: * a gap sigma_{r+1} < epsilon * sigma_r in which case the
85: * numerical RANK is declared to be r. The SVD is computed with
86: * absolute error bounds, but more accurately than with 'A'.
87: *
88: * JOBU (input) CHARACTER*1
89: * Specifies whether to compute the columns of U:
90: * = 'U': N columns of U are returned in the array U.
91: * = 'F': full set of M left sing. vectors is returned in the array U.
92: * = 'W': U may be used as workspace of length M*N. See the description
93: * of U.
94: * = 'N': U is not computed.
95: *
96: * JOBV (input) CHARACTER*1
97: * Specifies whether to compute the matrix V:
98: * = 'V': N columns of V are returned in the array V; Jacobi rotations
99: * are not explicitly accumulated.
100: * = 'J': N columns of V are returned in the array V, but they are
101: * computed as the product of Jacobi rotations. This option is
102: * allowed only if JOBU .NE. 'N', i.e. in computing the full SVD.
103: * = 'W': V may be used as workspace of length N*N. See the description
104: * of V.
105: * = 'N': V is not computed.
106: *
107: * JOBR (input) CHARACTER*1
108: * Specifies the RANGE for the singular values. Issues the licence to
109: * set to zero small positive singular values if they are outside
110: * specified range. If A .NE. 0 is scaled so that the largest singular
111: * value of c*A is around DSQRT(BIG), BIG=SLAMCH('O'), then JOBR issues
112: * the licence to kill columns of A whose norm in c*A is less than
113: * DSQRT(SFMIN) (for JOBR.EQ.'R'), or less than SMALL=SFMIN/EPSLN,
114: * where SFMIN=SLAMCH('S'), EPSLN=SLAMCH('E').
115: * = 'N': Do not kill small columns of c*A. This option assumes that
116: * BLAS and QR factorizations and triangular solvers are
117: * implemented to work in that range. If the condition of A
118: * is greater than BIG, use DGESVJ.
119: * = 'R': RESTRICTED range for sigma(c*A) is [DSQRT(SFMIN), DSQRT(BIG)]
120: * (roughly, as described above). This option is recommended.
121: * ~~~~~~~~~~~~~~~~~~~~~~~~~~~
122: * For computing the singular values in the FULL range [SFMIN,BIG]
123: * use DGESVJ.
124: *
125: * JOBT (input) CHARACTER*1
126: * If the matrix is square then the procedure may determine to use
127: * transposed A if A^t seems to be better with respect to convergence.
128: * If the matrix is not square, JOBT is ignored. This is subject to
129: * changes in the future.
130: * The decision is based on two values of entropy over the adjoint
131: * orbit of A^t * A. See the descriptions of WORK(6) and WORK(7).
132: * = 'T': transpose if entropy test indicates possibly faster
133: * convergence of Jacobi process if A^t is taken as input. If A is
134: * replaced with A^t, then the row pivoting is included automatically.
135: * = 'N': do not speculate.
136: * This option can be used to compute only the singular values, or the
137: * full SVD (U, SIGMA and V). For only one set of singular vectors
138: * (U or V), the caller should provide both U and V, as one of the
139: * matrices is used as workspace if the matrix A is transposed.
140: * The implementer can easily remove this constraint and make the
141: * code more complicated. See the descriptions of U and V.
142: *
143: * JOBP (input) CHARACTER*1
144: * Issues the licence to introduce structured perturbations to drown
145: * denormalized numbers. This licence should be active if the
146: * denormals are poorly implemented, causing slow computation,
147: * especially in cases of fast convergence (!). For details see [1,2].
148: * For the sake of simplicity, this perturbations are included only
149: * when the full SVD or only the singular values are requested. The
150: * implementer/user can easily add the perturbation for the cases of
151: * computing one set of singular vectors.
152: * = 'P': introduce perturbation
153: * = 'N': do not perturb
154: *
155: * M (input) INTEGER
156: * The number of rows of the input matrix A. M >= 0.
157: *
158: * N (input) INTEGER
159: * The number of columns of the input matrix A. M >= N >= 0.
160: *
161: * A (input/workspace) DOUBLE PRECISION array, dimension (LDA,N)
162: * On entry, the M-by-N matrix A.
163: *
164: * LDA (input) INTEGER
165: * The leading dimension of the array A. LDA >= max(1,M).
166: *
167: * SVA (workspace/output) DOUBLE PRECISION array, dimension (N)
168: * On exit,
169: * - For WORK(1)/WORK(2) = ONE: The singular values of A. During the
170: * computation SVA contains Euclidean column norms of the
171: * iterated matrices in the array A.
172: * - For WORK(1) .NE. WORK(2): The singular values of A are
173: * (WORK(1)/WORK(2)) * SVA(1:N). This factored form is used if
174: * sigma_max(A) overflows or if small singular values have been
175: * saved from underflow by scaling the input matrix A.
176: * - If JOBR='R' then some of the singular values may be returned
177: * as exact zeros obtained by "set to zero" because they are
178: * below the numerical rank threshold or are denormalized numbers.
179: *
180: * U (workspace/output) DOUBLE PRECISION array, dimension ( LDU, N )
181: * If JOBU = 'U', then U contains on exit the M-by-N matrix of
182: * the left singular vectors.
183: * If JOBU = 'F', then U contains on exit the M-by-M matrix of
184: * the left singular vectors, including an ONB
185: * of the orthogonal complement of the Range(A).
186: * If JOBU = 'W' .AND. (JOBV.EQ.'V' .AND. JOBT.EQ.'T' .AND. M.EQ.N),
187: * then U is used as workspace if the procedure
188: * replaces A with A^t. In that case, [V] is computed
189: * in U as left singular vectors of A^t and then
190: * copied back to the V array. This 'W' option is just
191: * a reminder to the caller that in this case U is
192: * reserved as workspace of length N*N.
193: * If JOBU = 'N' U is not referenced.
194: *
195: * LDU (input) INTEGER
196: * The leading dimension of the array U, LDU >= 1.
197: * IF JOBU = 'U' or 'F' or 'W', then LDU >= M.
198: *
199: * V (workspace/output) DOUBLE PRECISION array, dimension ( LDV, N )
200: * If JOBV = 'V', 'J' then V contains on exit the N-by-N matrix of
201: * the right singular vectors;
202: * If JOBV = 'W', AND (JOBU.EQ.'U' AND JOBT.EQ.'T' AND M.EQ.N),
203: * then V is used as workspace if the pprocedure
204: * replaces A with A^t. In that case, [U] is computed
205: * in V as right singular vectors of A^t and then
206: * copied back to the U array. This 'W' option is just
207: * a reminder to the caller that in this case V is
208: * reserved as workspace of length N*N.
209: * If JOBV = 'N' V is not referenced.
210: *
211: * LDV (input) INTEGER
212: * The leading dimension of the array V, LDV >= 1.
213: * If JOBV = 'V' or 'J' or 'W', then LDV >= N.
214: *
215: * WORK (workspace/output) DOUBLE PRECISION array, dimension at least LWORK.
1.6 ! bertrand 216: * On exit, if N.GT.0 .AND. M.GT.0 (else not referenced),
1.1 bertrand 217: * WORK(1) = SCALE = WORK(2) / WORK(1) is the scaling factor such
218: * that SCALE*SVA(1:N) are the computed singular values
219: * of A. (See the description of SVA().)
220: * WORK(2) = See the description of WORK(1).
221: * WORK(3) = SCONDA is an estimate for the condition number of
222: * column equilibrated A. (If JOBA .EQ. 'E' or 'G')
223: * SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
224: * It is computed using DPOCON. It holds
225: * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
226: * where R is the triangular factor from the QRF of A.
227: * However, if R is truncated and the numerical rank is
228: * determined to be strictly smaller than N, SCONDA is
229: * returned as -1, thus indicating that the smallest
230: * singular values might be lost.
231: *
232: * If full SVD is needed, the following two condition numbers are
233: * useful for the analysis of the algorithm. They are provied for
234: * a developer/implementer who is familiar with the details of
235: * the method.
236: *
237: * WORK(4) = an estimate of the scaled condition number of the
238: * triangular factor in the first QR factorization.
239: * WORK(5) = an estimate of the scaled condition number of the
240: * triangular factor in the second QR factorization.
241: * The following two parameters are computed if JOBT .EQ. 'T'.
242: * They are provided for a developer/implementer who is familiar
243: * with the details of the method.
244: *
245: * WORK(6) = the entropy of A^t*A :: this is the Shannon entropy
246: * of diag(A^t*A) / Trace(A^t*A) taken as point in the
247: * probability simplex.
248: * WORK(7) = the entropy of A*A^t.
249: *
250: * LWORK (input) INTEGER
251: * Length of WORK to confirm proper allocation of work space.
252: * LWORK depends on the job:
253: *
254: * If only SIGMA is needed ( JOBU.EQ.'N', JOBV.EQ.'N' ) and
1.6 ! bertrand 255: * -> .. no scaled condition estimate required (JOBE.EQ.'N'):
1.1 bertrand 256: * LWORK >= max(2*M+N,4*N+1,7). This is the minimal requirement.
1.6 ! bertrand 257: * ->> For optimal performance (blocked code) the optimal value
1.1 bertrand 258: * is LWORK >= max(2*M+N,3*N+(N+1)*NB,7). Here NB is the optimal
1.6 ! bertrand 259: * block size for DGEQP3 and DGEQRF.
! 260: * In general, optimal LWORK is computed as
! 261: * LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF), 7).
1.1 bertrand 262: * -> .. an estimate of the scaled condition number of A is
263: * required (JOBA='E', 'G'). In this case, LWORK is the maximum
1.6 ! bertrand 264: * of the above and N*N+4*N, i.e. LWORK >= max(2*M+N,N*N+4*N,7).
! 265: * ->> For optimal performance (blocked code) the optimal value
! 266: * is LWORK >= max(2*M+N,3*N+(N+1)*NB, N*N+4*N, 7).
! 267: * In general, the optimal length LWORK is computed as
! 268: * LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DGEQRF),
! 269: * N+N*N+LWORK(DPOCON),7).
1.1 bertrand 270: *
271: * If SIGMA and the right singular vectors are needed (JOBV.EQ.'V'),
1.6 ! bertrand 272: * -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
! 273: * -> For optimal performance, LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
! 274: * where NB is the optimal block size for DGEQP3, DGEQRF, DGELQ,
! 275: * DORMLQ. In general, the optimal length LWORK is computed as
! 276: * LWORK >= max(2*M+N,N+LWORK(DGEQP3), N+LWORK(DPOCON),
! 277: * N+LWORK(DGELQ), 2*N+LWORK(DGEQRF), N+LWORK(DORMLQ)).
1.1 bertrand 278: *
279: * If SIGMA and the left singular vectors are needed
1.6 ! bertrand 280: * -> the minimal requirement is LWORK >= max(2*M+N,4*N+1,7).
! 281: * -> For optimal performance:
! 282: * if JOBU.EQ.'U' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,7),
! 283: * if JOBU.EQ.'F' :: LWORK >= max(2*M+N,3*N+(N+1)*NB,N+M*NB,7),
! 284: * where NB is the optimal block size for DGEQP3, DGEQRF, DORMQR.
! 285: * In general, the optimal length LWORK is computed as
! 286: * LWORK >= max(2*M+N,N+LWORK(DGEQP3),N+LWORK(DPOCON),
! 287: * 2*N+LWORK(DGEQRF), N+LWORK(DORMQR)).
! 288: * Here LWORK(DORMQR) equals N*NB (for JOBU.EQ.'U') or
! 289: * M*NB (for JOBU.EQ.'F').
! 290: *
! 291: * If the full SVD is needed: (JOBU.EQ.'U' or JOBU.EQ.'F') and
! 292: * -> if JOBV.EQ.'V'
! 293: * the minimal requirement is LWORK >= max(2*M+N,6*N+2*N*N).
! 294: * -> if JOBV.EQ.'J' the minimal requirement is
! 295: * LWORK >= max(2*M+N, 4*N+N*N,2*N+N*N+6).
! 296: * -> For optimal performance, LWORK should be additionally
! 297: * larger than N+M*NB, where NB is the optimal block size
! 298: * for DORMQR.
1.1 bertrand 299: *
300: * IWORK (workspace/output) INTEGER array, dimension M+3*N.
301: * On exit,
302: * IWORK(1) = the numerical rank determined after the initial
303: * QR factorization with pivoting. See the descriptions
304: * of JOBA and JOBR.
305: * IWORK(2) = the number of the computed nonzero singular values
306: * IWORK(3) = if nonzero, a warning message:
307: * If IWORK(3).EQ.1 then some of the column norms of A
308: * were denormalized floats. The requested high accuracy
309: * is not warranted by the data.
310: *
311: * INFO (output) INTEGER
312: * < 0 : if INFO = -i, then the i-th argument had an illegal value.
313: * = 0 : successfull exit;
314: * > 0 : DGEJSV did not converge in the maximal allowed number
315: * of sweeps. The computed values may be inaccurate.
316: *
317: * Further Details
318: * ===============
319: *
1.6 ! bertrand 320: * DGEJSV implements a preconditioned Jacobi SVD algorithm. It uses DGEQP3,
! 321: * DGEQRF, and DGELQF as preprocessors and preconditioners. Optionally, an
1.1 bertrand 322: * additional row pivoting can be used as a preprocessor, which in some
323: * cases results in much higher accuracy. An example is matrix A with the
324: * structure A = D1 * C * D2, where D1, D2 are arbitrarily ill-conditioned
325: * diagonal matrices and C is well-conditioned matrix. In that case, complete
326: * pivoting in the first QR factorizations provides accuracy dependent on the
327: * condition number of C, and independent of D1, D2. Such higher accuracy is
328: * not completely understood theoretically, but it works well in practice.
329: * Further, if A can be written as A = B*D, with well-conditioned B and some
330: * diagonal D, then the high accuracy is guaranteed, both theoretically and
331: * in software, independent of D. For more details see [1], [2].
332: * The computational range for the singular values can be the full range
333: * ( UNDERFLOW,OVERFLOW ), provided that the machine arithmetic and the BLAS
334: * & LAPACK routines called by DGEJSV are implemented to work in that range.
335: * If that is not the case, then the restriction for safe computation with
336: * the singular values in the range of normalized IEEE numbers is that the
337: * spectral condition number kappa(A)=sigma_max(A)/sigma_min(A) does not
338: * overflow. This code (DGEJSV) is best used in this restricted range,
1.6 ! bertrand 339: * meaning that singular values of magnitude below ||A||_2 / DLAMCH('O') are
1.1 bertrand 340: * returned as zeros. See JOBR for details on this.
341: * Further, this implementation is somewhat slower than the one described
342: * in [1,2] due to replacement of some non-LAPACK components, and because
343: * the choice of some tuning parameters in the iterative part (DGESVJ) is
344: * left to the implementer on a particular machine.
1.6 ! bertrand 345: * The rank revealing QR factorization (in this code: DGEQP3) should be
! 346: * implemented as in [3]. We have a new version of DGEQP3 under development
1.1 bertrand 347: * that is more robust than the current one in LAPACK, with a cleaner cut in
348: * rank defficient cases. It will be available in the SIGMA library [4].
349: * If M is much larger than N, it is obvious that the inital QRF with
350: * column pivoting can be preprocessed by the QRF without pivoting. That
351: * well known trick is not used in DGEJSV because in some cases heavy row
352: * weighting can be treated with complete pivoting. The overhead in cases
353: * M much larger than N is then only due to pivoting, but the benefits in
354: * terms of accuracy have prevailed. The implementer/user can incorporate
355: * this extra QRF step easily. The implementer can also improve data movement
356: * (matrix transpose, matrix copy, matrix transposed copy) - this
357: * implementation of DGEJSV uses only the simplest, naive data movement.
358: *
359: * Contributors
360: *
361: * Zlatko Drmac (Zagreb, Croatia) and Kresimir Veselic (Hagen, Germany)
362: *
363: * References
364: *
365: * [1] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm I.
366: * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1322-1342.
367: * LAPACK Working note 169.
368: * [2] Z. Drmac and K. Veselic: New fast and accurate Jacobi SVD algorithm II.
369: * SIAM J. Matrix Anal. Appl. Vol. 35, No. 2 (2008), pp. 1343-1362.
370: * LAPACK Working note 170.
371: * [3] Z. Drmac and Z. Bujanovic: On the failure of rank-revealing QR
372: * factorization software - a case study.
373: * ACM Trans. Math. Softw. Vol. 35, No 2 (2008), pp. 1-28.
374: * LAPACK Working note 176.
375: * [4] Z. Drmac: SIGMA - mathematical software library for accurate SVD, PSV,
376: * QSVD, (H,K)-SVD computations.
377: * Department of Mathematics, University of Zagreb, 2008.
378: *
379: * Bugs, examples and comments
1.6 ! bertrand 380: *
1.1 bertrand 381: * Please report all bugs and send interesting examples and/or comments to
382: * drmac@math.hr. Thank you.
383: *
1.6 ! bertrand 384: * ===========================================================================
1.1 bertrand 385: *
386: * .. Local Parameters ..
387: DOUBLE PRECISION ZERO, ONE
388: PARAMETER ( ZERO = 0.0D0, ONE = 1.0D0 )
389: * ..
390: * .. Local Scalars ..
391: DOUBLE PRECISION AAPP, AAQQ, AATMAX, AATMIN, BIG, BIG1, COND_OK,
1.6 ! bertrand 392: $ CONDR1, CONDR2, ENTRA, ENTRAT, EPSLN, MAXPRJ, SCALEM,
! 393: $ SCONDA, SFMIN, SMALL, TEMP1, USCAL1, USCAL2, XSC
1.1 bertrand 394: INTEGER IERR, N1, NR, NUMRANK, p, q, WARNING
395: LOGICAL ALMORT, DEFR, ERREST, GOSCAL, JRACC, KILL, LSVEC,
1.6 ! bertrand 396: $ L2ABER, L2KILL, L2PERT, L2RANK, L2TRAN,
! 397: $ NOSCAL, ROWPIV, RSVEC, TRANSP
1.1 bertrand 398: * ..
399: * .. Intrinsic Functions ..
400: INTRINSIC DABS, DLOG, DMAX1, DMIN1, DBLE,
1.6 ! bertrand 401: $ MAX0, MIN0, IDNINT, DSIGN, DSQRT
1.1 bertrand 402: * ..
403: * .. External Functions ..
404: DOUBLE PRECISION DLAMCH, DNRM2
405: INTEGER IDAMAX
406: LOGICAL LSAME
407: EXTERNAL IDAMAX, LSAME, DLAMCH, DNRM2
408: * ..
409: * .. External Subroutines ..
410: EXTERNAL DCOPY, DGELQF, DGEQP3, DGEQRF, DLACPY, DLASCL,
1.6 ! bertrand 411: $ DLASET, DLASSQ, DLASWP, DORGQR, DORMLQ,
! 412: $ DORMQR, DPOCON, DSCAL, DSWAP, DTRSM, XERBLA
1.1 bertrand 413: *
414: EXTERNAL DGESVJ
415: * ..
416: *
417: * Test the input arguments
418: *
419: LSVEC = LSAME( JOBU, 'U' ) .OR. LSAME( JOBU, 'F' )
420: JRACC = LSAME( JOBV, 'J' )
421: RSVEC = LSAME( JOBV, 'V' ) .OR. JRACC
422: ROWPIV = LSAME( JOBA, 'F' ) .OR. LSAME( JOBA, 'G' )
423: L2RANK = LSAME( JOBA, 'R' )
424: L2ABER = LSAME( JOBA, 'A' )
425: ERREST = LSAME( JOBA, 'E' ) .OR. LSAME( JOBA, 'G' )
426: L2TRAN = LSAME( JOBT, 'T' )
427: L2KILL = LSAME( JOBR, 'R' )
428: DEFR = LSAME( JOBR, 'N' )
429: L2PERT = LSAME( JOBP, 'P' )
430: *
431: IF ( .NOT.(ROWPIV .OR. L2RANK .OR. L2ABER .OR.
1.6 ! bertrand 432: $ ERREST .OR. LSAME( JOBA, 'C' ) )) THEN
1.1 bertrand 433: INFO = - 1
434: ELSE IF ( .NOT.( LSVEC .OR. LSAME( JOBU, 'N' ) .OR.
1.6 ! bertrand 435: $ LSAME( JOBU, 'W' )) ) THEN
1.1 bertrand 436: INFO = - 2
437: ELSE IF ( .NOT.( RSVEC .OR. LSAME( JOBV, 'N' ) .OR.
1.6 ! bertrand 438: $ LSAME( JOBV, 'W' )) .OR. ( JRACC .AND. (.NOT.LSVEC) ) ) THEN
1.1 bertrand 439: INFO = - 3
440: ELSE IF ( .NOT. ( L2KILL .OR. DEFR ) ) THEN
441: INFO = - 4
442: ELSE IF ( .NOT. ( L2TRAN .OR. LSAME( JOBT, 'N' ) ) ) THEN
443: INFO = - 5
444: ELSE IF ( .NOT. ( L2PERT .OR. LSAME( JOBP, 'N' ) ) ) THEN
445: INFO = - 6
446: ELSE IF ( M .LT. 0 ) THEN
447: INFO = - 7
448: ELSE IF ( ( N .LT. 0 ) .OR. ( N .GT. M ) ) THEN
449: INFO = - 8
450: ELSE IF ( LDA .LT. M ) THEN
451: INFO = - 10
452: ELSE IF ( LSVEC .AND. ( LDU .LT. M ) ) THEN
453: INFO = - 13
454: ELSE IF ( RSVEC .AND. ( LDV .LT. N ) ) THEN
455: INFO = - 14
456: ELSE IF ( (.NOT.(LSVEC .OR. RSVEC .OR. ERREST).AND.
457: & (LWORK .LT. MAX0(7,4*N+1,2*M+N))) .OR.
1.6 ! bertrand 458: & (.NOT.(LSVEC .OR. RSVEC) .AND. ERREST .AND.
1.1 bertrand 459: & (LWORK .LT. MAX0(7,4*N+N*N,2*M+N))) .OR.
1.6 ! bertrand 460: & (LSVEC .AND. (.NOT.RSVEC) .AND. (LWORK .LT. MAX0(7,2*M+N,4*N+1)))
! 461: & .OR.
! 462: & (RSVEC .AND. (.NOT.LSVEC) .AND. (LWORK .LT. MAX0(7,2*M+N,4*N+1)))
! 463: & .OR.
! 464: & (LSVEC .AND. RSVEC .AND. (.NOT.JRACC) .AND.
! 465: & (LWORK.LT.MAX0(2*M+N,6*N+2*N*N)))
! 466: & .OR. (LSVEC .AND. RSVEC .AND. JRACC .AND.
! 467: & LWORK.LT.MAX0(2*M+N,4*N+N*N,2*N+N*N+6)))
1.1 bertrand 468: & THEN
469: INFO = - 17
470: ELSE
471: * #:)
472: INFO = 0
473: END IF
474: *
475: IF ( INFO .NE. 0 ) THEN
476: * #:(
477: CALL XERBLA( 'DGEJSV', - INFO )
1.6 ! bertrand 478: RETURN
1.1 bertrand 479: END IF
480: *
481: * Quick return for void matrix (Y3K safe)
482: * #:)
483: IF ( ( M .EQ. 0 ) .OR. ( N .EQ. 0 ) ) RETURN
484: *
485: * Determine whether the matrix U should be M x N or M x M
486: *
487: IF ( LSVEC ) THEN
488: N1 = N
489: IF ( LSAME( JOBU, 'F' ) ) N1 = M
490: END IF
491: *
492: * Set numerical parameters
493: *
494: *! NOTE: Make sure DLAMCH() does not fail on the target architecture.
495: *
496: EPSLN = DLAMCH('Epsilon')
497: SFMIN = DLAMCH('SafeMinimum')
498: SMALL = SFMIN / EPSLN
499: BIG = DLAMCH('O')
500: * BIG = ONE / SFMIN
501: *
502: * Initialize SVA(1:N) = diag( ||A e_i||_2 )_1^N
503: *
504: *(!) If necessary, scale SVA() to protect the largest norm from
505: * overflow. It is possible that this scaling pushes the smallest
506: * column norm left from the underflow threshold (extreme case).
507: *
508: SCALEM = ONE / DSQRT(DBLE(M)*DBLE(N))
509: NOSCAL = .TRUE.
510: GOSCAL = .TRUE.
511: DO 1874 p = 1, N
512: AAPP = ZERO
1.4 bertrand 513: AAQQ = ONE
1.1 bertrand 514: CALL DLASSQ( M, A(1,p), 1, AAPP, AAQQ )
515: IF ( AAPP .GT. BIG ) THEN
516: INFO = - 9
517: CALL XERBLA( 'DGEJSV', -INFO )
518: RETURN
519: END IF
520: AAQQ = DSQRT(AAQQ)
521: IF ( ( AAPP .LT. (BIG / AAQQ) ) .AND. NOSCAL ) THEN
522: SVA(p) = AAPP * AAQQ
523: ELSE
524: NOSCAL = .FALSE.
525: SVA(p) = AAPP * ( AAQQ * SCALEM )
526: IF ( GOSCAL ) THEN
527: GOSCAL = .FALSE.
528: CALL DSCAL( p-1, SCALEM, SVA, 1 )
529: END IF
530: END IF
531: 1874 CONTINUE
532: *
533: IF ( NOSCAL ) SCALEM = ONE
534: *
535: AAPP = ZERO
536: AAQQ = BIG
537: DO 4781 p = 1, N
538: AAPP = DMAX1( AAPP, SVA(p) )
539: IF ( SVA(p) .NE. ZERO ) AAQQ = DMIN1( AAQQ, SVA(p) )
540: 4781 CONTINUE
541: *
542: * Quick return for zero M x N matrix
543: * #:)
544: IF ( AAPP .EQ. ZERO ) THEN
545: IF ( LSVEC ) CALL DLASET( 'G', M, N1, ZERO, ONE, U, LDU )
546: IF ( RSVEC ) CALL DLASET( 'G', N, N, ZERO, ONE, V, LDV )
547: WORK(1) = ONE
548: WORK(2) = ONE
549: IF ( ERREST ) WORK(3) = ONE
550: IF ( LSVEC .AND. RSVEC ) THEN
551: WORK(4) = ONE
552: WORK(5) = ONE
553: END IF
554: IF ( L2TRAN ) THEN
555: WORK(6) = ZERO
556: WORK(7) = ZERO
557: END IF
558: IWORK(1) = 0
559: IWORK(2) = 0
1.6 ! bertrand 560: IWORK(3) = 0
1.1 bertrand 561: RETURN
562: END IF
563: *
564: * Issue warning if denormalized column norms detected. Override the
565: * high relative accuracy request. Issue licence to kill columns
566: * (set them to zero) whose norm is less than sigma_max / BIG (roughly).
567: * #:(
568: WARNING = 0
569: IF ( AAQQ .LE. SFMIN ) THEN
570: L2RANK = .TRUE.
571: L2KILL = .TRUE.
572: WARNING = 1
573: END IF
574: *
575: * Quick return for one-column matrix
576: * #:)
577: IF ( N .EQ. 1 ) THEN
578: *
579: IF ( LSVEC ) THEN
580: CALL DLASCL( 'G',0,0,SVA(1),SCALEM, M,1,A(1,1),LDA,IERR )
581: CALL DLACPY( 'A', M, 1, A, LDA, U, LDU )
582: * computing all M left singular vectors of the M x 1 matrix
583: IF ( N1 .NE. N ) THEN
584: CALL DGEQRF( M, N, U,LDU, WORK, WORK(N+1),LWORK-N,IERR )
585: CALL DORGQR( M,N1,1, U,LDU,WORK,WORK(N+1),LWORK-N,IERR )
586: CALL DCOPY( M, A(1,1), 1, U(1,1), 1 )
587: END IF
588: END IF
589: IF ( RSVEC ) THEN
590: V(1,1) = ONE
591: END IF
592: IF ( SVA(1) .LT. (BIG*SCALEM) ) THEN
593: SVA(1) = SVA(1) / SCALEM
594: SCALEM = ONE
595: END IF
596: WORK(1) = ONE / SCALEM
597: WORK(2) = ONE
598: IF ( SVA(1) .NE. ZERO ) THEN
599: IWORK(1) = 1
600: IF ( ( SVA(1) / SCALEM) .GE. SFMIN ) THEN
601: IWORK(2) = 1
602: ELSE
603: IWORK(2) = 0
604: END IF
605: ELSE
606: IWORK(1) = 0
607: IWORK(2) = 0
608: END IF
609: IF ( ERREST ) WORK(3) = ONE
610: IF ( LSVEC .AND. RSVEC ) THEN
611: WORK(4) = ONE
612: WORK(5) = ONE
613: END IF
614: IF ( L2TRAN ) THEN
615: WORK(6) = ZERO
616: WORK(7) = ZERO
617: END IF
618: RETURN
619: *
620: END IF
621: *
622: TRANSP = .FALSE.
623: L2TRAN = L2TRAN .AND. ( M .EQ. N )
624: *
625: AATMAX = -ONE
626: AATMIN = BIG
627: IF ( ROWPIV .OR. L2TRAN ) THEN
628: *
629: * Compute the row norms, needed to determine row pivoting sequence
630: * (in the case of heavily row weighted A, row pivoting is strongly
631: * advised) and to collect information needed to compare the
632: * structures of A * A^t and A^t * A (in the case L2TRAN.EQ..TRUE.).
633: *
634: IF ( L2TRAN ) THEN
635: DO 1950 p = 1, M
636: XSC = ZERO
1.4 bertrand 637: TEMP1 = ONE
1.1 bertrand 638: CALL DLASSQ( N, A(p,1), LDA, XSC, TEMP1 )
639: * DLASSQ gets both the ell_2 and the ell_infinity norm
640: * in one pass through the vector
641: WORK(M+N+p) = XSC * SCALEM
642: WORK(N+p) = XSC * (SCALEM*DSQRT(TEMP1))
643: AATMAX = DMAX1( AATMAX, WORK(N+p) )
644: IF (WORK(N+p) .NE. ZERO) AATMIN = DMIN1(AATMIN,WORK(N+p))
645: 1950 CONTINUE
646: ELSE
647: DO 1904 p = 1, M
648: WORK(M+N+p) = SCALEM*DABS( A(p,IDAMAX(N,A(p,1),LDA)) )
649: AATMAX = DMAX1( AATMAX, WORK(M+N+p) )
650: AATMIN = DMIN1( AATMIN, WORK(M+N+p) )
651: 1904 CONTINUE
652: END IF
653: *
654: END IF
655: *
656: * For square matrix A try to determine whether A^t would be better
657: * input for the preconditioned Jacobi SVD, with faster convergence.
658: * The decision is based on an O(N) function of the vector of column
659: * and row norms of A, based on the Shannon entropy. This should give
660: * the right choice in most cases when the difference actually matters.
661: * It may fail and pick the slower converging side.
662: *
663: ENTRA = ZERO
664: ENTRAT = ZERO
665: IF ( L2TRAN ) THEN
666: *
667: XSC = ZERO
1.4 bertrand 668: TEMP1 = ONE
1.1 bertrand 669: CALL DLASSQ( N, SVA, 1, XSC, TEMP1 )
670: TEMP1 = ONE / TEMP1
671: *
672: ENTRA = ZERO
673: DO 1113 p = 1, N
674: BIG1 = ( ( SVA(p) / XSC )**2 ) * TEMP1
675: IF ( BIG1 .NE. ZERO ) ENTRA = ENTRA + BIG1 * DLOG(BIG1)
676: 1113 CONTINUE
677: ENTRA = - ENTRA / DLOG(DBLE(N))
678: *
679: * Now, SVA().^2/Trace(A^t * A) is a point in the probability simplex.
680: * It is derived from the diagonal of A^t * A. Do the same with the
681: * diagonal of A * A^t, compute the entropy of the corresponding
682: * probability distribution. Note that A * A^t and A^t * A have the
683: * same trace.
684: *
685: ENTRAT = ZERO
686: DO 1114 p = N+1, N+M
687: BIG1 = ( ( WORK(p) / XSC )**2 ) * TEMP1
688: IF ( BIG1 .NE. ZERO ) ENTRAT = ENTRAT + BIG1 * DLOG(BIG1)
689: 1114 CONTINUE
690: ENTRAT = - ENTRAT / DLOG(DBLE(M))
691: *
692: * Analyze the entropies and decide A or A^t. Smaller entropy
693: * usually means better input for the algorithm.
694: *
695: TRANSP = ( ENTRAT .LT. ENTRA )
696: *
697: * If A^t is better than A, transpose A.
698: *
699: IF ( TRANSP ) THEN
700: * In an optimal implementation, this trivial transpose
701: * should be replaced with faster transpose.
702: DO 1115 p = 1, N - 1
703: DO 1116 q = p + 1, N
704: TEMP1 = A(q,p)
705: A(q,p) = A(p,q)
706: A(p,q) = TEMP1
707: 1116 CONTINUE
708: 1115 CONTINUE
709: DO 1117 p = 1, N
710: WORK(M+N+p) = SVA(p)
711: SVA(p) = WORK(N+p)
712: 1117 CONTINUE
713: TEMP1 = AAPP
714: AAPP = AATMAX
715: AATMAX = TEMP1
716: TEMP1 = AAQQ
717: AAQQ = AATMIN
718: AATMIN = TEMP1
719: KILL = LSVEC
720: LSVEC = RSVEC
721: RSVEC = KILL
1.4 bertrand 722: IF ( LSVEC ) N1 = N
1.1 bertrand 723: *
724: ROWPIV = .TRUE.
725: END IF
726: *
727: END IF
728: * END IF L2TRAN
729: *
730: * Scale the matrix so that its maximal singular value remains less
731: * than DSQRT(BIG) -- the matrix is scaled so that its maximal column
732: * has Euclidean norm equal to DSQRT(BIG/N). The only reason to keep
733: * DSQRT(BIG) instead of BIG is the fact that DGEJSV uses LAPACK and
734: * BLAS routines that, in some implementations, are not capable of
735: * working in the full interval [SFMIN,BIG] and that they may provoke
736: * overflows in the intermediate results. If the singular values spread
737: * from SFMIN to BIG, then DGESVJ will compute them. So, in that case,
738: * one should use DGESVJ instead of DGEJSV.
739: *
740: BIG1 = DSQRT( BIG )
741: TEMP1 = DSQRT( BIG / DBLE(N) )
742: *
743: CALL DLASCL( 'G', 0, 0, AAPP, TEMP1, N, 1, SVA, N, IERR )
744: IF ( AAQQ .GT. (AAPP * SFMIN) ) THEN
745: AAQQ = ( AAQQ / AAPP ) * TEMP1
746: ELSE
747: AAQQ = ( AAQQ * TEMP1 ) / AAPP
748: END IF
749: TEMP1 = TEMP1 * SCALEM
750: CALL DLASCL( 'G', 0, 0, AAPP, TEMP1, M, N, A, LDA, IERR )
751: *
752: * To undo scaling at the end of this procedure, multiply the
753: * computed singular values with USCAL2 / USCAL1.
754: *
755: USCAL1 = TEMP1
756: USCAL2 = AAPP
757: *
758: IF ( L2KILL ) THEN
759: * L2KILL enforces computation of nonzero singular values in
760: * the restricted range of condition number of the initial A,
761: * sigma_max(A) / sigma_min(A) approx. DSQRT(BIG)/DSQRT(SFMIN).
762: XSC = DSQRT( SFMIN )
763: ELSE
764: XSC = SMALL
765: *
766: * Now, if the condition number of A is too big,
767: * sigma_max(A) / sigma_min(A) .GT. DSQRT(BIG/N) * EPSLN / SFMIN,
768: * as a precaution measure, the full SVD is computed using DGESVJ
769: * with accumulated Jacobi rotations. This provides numerically
770: * more robust computation, at the cost of slightly increased run
771: * time. Depending on the concrete implementation of BLAS and LAPACK
772: * (i.e. how they behave in presence of extreme ill-conditioning) the
773: * implementor may decide to remove this switch.
774: IF ( ( AAQQ.LT.DSQRT(SFMIN) ) .AND. LSVEC .AND. RSVEC ) THEN
775: JRACC = .TRUE.
776: END IF
777: *
778: END IF
779: IF ( AAQQ .LT. XSC ) THEN
780: DO 700 p = 1, N
781: IF ( SVA(p) .LT. XSC ) THEN
782: CALL DLASET( 'A', M, 1, ZERO, ZERO, A(1,p), LDA )
783: SVA(p) = ZERO
784: END IF
785: 700 CONTINUE
786: END IF
787: *
788: * Preconditioning using QR factorization with pivoting
789: *
790: IF ( ROWPIV ) THEN
791: * Optional row permutation (Bjoerck row pivoting):
792: * A result by Cox and Higham shows that the Bjoerck's
793: * row pivoting combined with standard column pivoting
794: * has similar effect as Powell-Reid complete pivoting.
795: * The ell-infinity norms of A are made nonincreasing.
796: DO 1952 p = 1, M - 1
797: q = IDAMAX( M-p+1, WORK(M+N+p), 1 ) + p - 1
798: IWORK(2*N+p) = q
799: IF ( p .NE. q ) THEN
800: TEMP1 = WORK(M+N+p)
801: WORK(M+N+p) = WORK(M+N+q)
802: WORK(M+N+q) = TEMP1
803: END IF
804: 1952 CONTINUE
805: CALL DLASWP( N, A, LDA, 1, M-1, IWORK(2*N+1), 1 )
806: END IF
807: *
808: * End of the preparation phase (scaling, optional sorting and
809: * transposing, optional flushing of small columns).
810: *
811: * Preconditioning
812: *
813: * If the full SVD is needed, the right singular vectors are computed
814: * from a matrix equation, and for that we need theoretical analysis
815: * of the Businger-Golub pivoting. So we use DGEQP3 as the first RR QRF.
816: * In all other cases the first RR QRF can be chosen by other criteria
817: * (eg speed by replacing global with restricted window pivoting, such
818: * as in SGEQPX from TOMS # 782). Good results will be obtained using
819: * SGEQPX with properly (!) chosen numerical parameters.
820: * Any improvement of DGEQP3 improves overal performance of DGEJSV.
821: *
822: * A * P1 = Q1 * [ R1^t 0]^t:
823: DO 1963 p = 1, N
824: * .. all columns are free columns
825: IWORK(p) = 0
826: 1963 CONTINUE
827: CALL DGEQP3( M,N,A,LDA, IWORK,WORK, WORK(N+1),LWORK-N, IERR )
828: *
829: * The upper triangular matrix R1 from the first QRF is inspected for
830: * rank deficiency and possibilities for deflation, or possible
831: * ill-conditioning. Depending on the user specified flag L2RANK,
832: * the procedure explores possibilities to reduce the numerical
833: * rank by inspecting the computed upper triangular factor. If
834: * L2RANK or L2ABER are up, then DGEJSV will compute the SVD of
835: * A + dA, where ||dA|| <= f(M,N)*EPSLN.
836: *
837: NR = 1
838: IF ( L2ABER ) THEN
839: * Standard absolute error bound suffices. All sigma_i with
840: * sigma_i < N*EPSLN*||A|| are flushed to zero. This is an
841: * agressive enforcement of lower numerical rank by introducing a
842: * backward error of the order of N*EPSLN*||A||.
843: TEMP1 = DSQRT(DBLE(N))*EPSLN
844: DO 3001 p = 2, N
845: IF ( DABS(A(p,p)) .GE. (TEMP1*DABS(A(1,1))) ) THEN
846: NR = NR + 1
847: ELSE
848: GO TO 3002
849: END IF
850: 3001 CONTINUE
851: 3002 CONTINUE
852: ELSE IF ( L2RANK ) THEN
853: * .. similarly as above, only slightly more gentle (less agressive).
854: * Sudden drop on the diagonal of R1 is used as the criterion for
855: * close-to-rank-defficient.
856: TEMP1 = DSQRT(SFMIN)
857: DO 3401 p = 2, N
858: IF ( ( DABS(A(p,p)) .LT. (EPSLN*DABS(A(p-1,p-1))) ) .OR.
1.6 ! bertrand 859: $ ( DABS(A(p,p)) .LT. SMALL ) .OR.
! 860: $ ( L2KILL .AND. (DABS(A(p,p)) .LT. TEMP1) ) ) GO TO 3402
1.1 bertrand 861: NR = NR + 1
862: 3401 CONTINUE
863: 3402 CONTINUE
864: *
865: ELSE
866: * The goal is high relative accuracy. However, if the matrix
867: * has high scaled condition number the relative accuracy is in
868: * general not feasible. Later on, a condition number estimator
869: * will be deployed to estimate the scaled condition number.
870: * Here we just remove the underflowed part of the triangular
871: * factor. This prevents the situation in which the code is
872: * working hard to get the accuracy not warranted by the data.
873: TEMP1 = DSQRT(SFMIN)
874: DO 3301 p = 2, N
875: IF ( ( DABS(A(p,p)) .LT. SMALL ) .OR.
1.6 ! bertrand 876: $ ( L2KILL .AND. (DABS(A(p,p)) .LT. TEMP1) ) ) GO TO 3302
1.1 bertrand 877: NR = NR + 1
878: 3301 CONTINUE
879: 3302 CONTINUE
880: *
881: END IF
882: *
883: ALMORT = .FALSE.
884: IF ( NR .EQ. N ) THEN
885: MAXPRJ = ONE
886: DO 3051 p = 2, N
887: TEMP1 = DABS(A(p,p)) / SVA(IWORK(p))
888: MAXPRJ = DMIN1( MAXPRJ, TEMP1 )
889: 3051 CONTINUE
890: IF ( MAXPRJ**2 .GE. ONE - DBLE(N)*EPSLN ) ALMORT = .TRUE.
891: END IF
892: *
893: *
894: SCONDA = - ONE
895: CONDR1 = - ONE
896: CONDR2 = - ONE
897: *
898: IF ( ERREST ) THEN
899: IF ( N .EQ. NR ) THEN
900: IF ( RSVEC ) THEN
901: * .. V is available as workspace
902: CALL DLACPY( 'U', N, N, A, LDA, V, LDV )
903: DO 3053 p = 1, N
904: TEMP1 = SVA(IWORK(p))
905: CALL DSCAL( p, ONE/TEMP1, V(1,p), 1 )
906: 3053 CONTINUE
907: CALL DPOCON( 'U', N, V, LDV, ONE, TEMP1,
1.6 ! bertrand 908: $ WORK(N+1), IWORK(2*N+M+1), IERR )
1.1 bertrand 909: ELSE IF ( LSVEC ) THEN
910: * .. U is available as workspace
911: CALL DLACPY( 'U', N, N, A, LDA, U, LDU )
912: DO 3054 p = 1, N
913: TEMP1 = SVA(IWORK(p))
914: CALL DSCAL( p, ONE/TEMP1, U(1,p), 1 )
915: 3054 CONTINUE
916: CALL DPOCON( 'U', N, U, LDU, ONE, TEMP1,
1.6 ! bertrand 917: $ WORK(N+1), IWORK(2*N+M+1), IERR )
1.1 bertrand 918: ELSE
919: CALL DLACPY( 'U', N, N, A, LDA, WORK(N+1), N )
920: DO 3052 p = 1, N
921: TEMP1 = SVA(IWORK(p))
922: CALL DSCAL( p, ONE/TEMP1, WORK(N+(p-1)*N+1), 1 )
923: 3052 CONTINUE
924: * .. the columns of R are scaled to have unit Euclidean lengths.
925: CALL DPOCON( 'U', N, WORK(N+1), N, ONE, TEMP1,
1.6 ! bertrand 926: $ WORK(N+N*N+1), IWORK(2*N+M+1), IERR )
1.1 bertrand 927: END IF
928: SCONDA = ONE / DSQRT(TEMP1)
929: * SCONDA is an estimate of DSQRT(||(R^t * R)^(-1)||_1).
930: * N^(-1/4) * SCONDA <= ||R^(-1)||_2 <= N^(1/4) * SCONDA
931: ELSE
932: SCONDA = - ONE
933: END IF
934: END IF
935: *
936: L2PERT = L2PERT .AND. ( DABS( A(1,1)/A(NR,NR) ) .GT. DSQRT(BIG1) )
937: * If there is no violent scaling, artificial perturbation is not needed.
938: *
939: * Phase 3:
940: *
941: IF ( .NOT. ( RSVEC .OR. LSVEC ) ) THEN
942: *
943: * Singular Values only
944: *
945: * .. transpose A(1:NR,1:N)
946: DO 1946 p = 1, MIN0( N-1, NR )
947: CALL DCOPY( N-p, A(p,p+1), LDA, A(p+1,p), 1 )
948: 1946 CONTINUE
949: *
950: * The following two DO-loops introduce small relative perturbation
951: * into the strict upper triangle of the lower triangular matrix.
952: * Small entries below the main diagonal are also changed.
953: * This modification is useful if the computing environment does not
954: * provide/allow FLUSH TO ZERO underflow, for it prevents many
955: * annoying denormalized numbers in case of strongly scaled matrices.
956: * The perturbation is structured so that it does not introduce any
957: * new perturbation of the singular values, and it does not destroy
958: * the job done by the preconditioner.
959: * The licence for this perturbation is in the variable L2PERT, which
960: * should be .FALSE. if FLUSH TO ZERO underflow is active.
961: *
962: IF ( .NOT. ALMORT ) THEN
963: *
964: IF ( L2PERT ) THEN
965: * XSC = DSQRT(SMALL)
966: XSC = EPSLN / DBLE(N)
967: DO 4947 q = 1, NR
968: TEMP1 = XSC*DABS(A(q,q))
969: DO 4949 p = 1, N
970: IF ( ( (p.GT.q) .AND. (DABS(A(p,q)).LE.TEMP1) )
1.6 ! bertrand 971: $ .OR. ( p .LT. q ) )
! 972: $ A(p,q) = DSIGN( TEMP1, A(p,q) )
1.1 bertrand 973: 4949 CONTINUE
974: 4947 CONTINUE
975: ELSE
976: CALL DLASET( 'U', NR-1,NR-1, ZERO,ZERO, A(1,2),LDA )
977: END IF
978: *
979: * .. second preconditioning using the QR factorization
980: *
981: CALL DGEQRF( N,NR, A,LDA, WORK, WORK(N+1),LWORK-N, IERR )
982: *
983: * .. and transpose upper to lower triangular
984: DO 1948 p = 1, NR - 1
985: CALL DCOPY( NR-p, A(p,p+1), LDA, A(p+1,p), 1 )
986: 1948 CONTINUE
987: *
988: END IF
989: *
990: * Row-cyclic Jacobi SVD algorithm with column pivoting
991: *
992: * .. again some perturbation (a "background noise") is added
993: * to drown denormals
994: IF ( L2PERT ) THEN
995: * XSC = DSQRT(SMALL)
996: XSC = EPSLN / DBLE(N)
997: DO 1947 q = 1, NR
998: TEMP1 = XSC*DABS(A(q,q))
999: DO 1949 p = 1, NR
1000: IF ( ( (p.GT.q) .AND. (DABS(A(p,q)).LE.TEMP1) )
1.6 ! bertrand 1001: $ .OR. ( p .LT. q ) )
! 1002: $ A(p,q) = DSIGN( TEMP1, A(p,q) )
1.1 bertrand 1003: 1949 CONTINUE
1004: 1947 CONTINUE
1005: ELSE
1006: CALL DLASET( 'U', NR-1, NR-1, ZERO, ZERO, A(1,2), LDA )
1007: END IF
1008: *
1009: * .. and one-sided Jacobi rotations are started on a lower
1010: * triangular matrix (plus perturbation which is ignored in
1011: * the part which destroys triangular form (confusing?!))
1012: *
1013: CALL DGESVJ( 'L', 'NoU', 'NoV', NR, NR, A, LDA, SVA,
1.6 ! bertrand 1014: $ N, V, LDV, WORK, LWORK, INFO )
1.1 bertrand 1015: *
1016: SCALEM = WORK(1)
1017: NUMRANK = IDNINT(WORK(2))
1018: *
1019: *
1020: ELSE IF ( RSVEC .AND. ( .NOT. LSVEC ) ) THEN
1021: *
1022: * -> Singular Values and Right Singular Vectors <-
1023: *
1024: IF ( ALMORT ) THEN
1025: *
1026: * .. in this case NR equals N
1027: DO 1998 p = 1, NR
1028: CALL DCOPY( N-p+1, A(p,p), LDA, V(p,p), 1 )
1029: 1998 CONTINUE
1030: CALL DLASET( 'Upper', NR-1, NR-1, ZERO, ZERO, V(1,2), LDV )
1031: *
1032: CALL DGESVJ( 'L','U','N', N, NR, V,LDV, SVA, NR, A,LDA,
1.6 ! bertrand 1033: $ WORK, LWORK, INFO )
1.1 bertrand 1034: SCALEM = WORK(1)
1035: NUMRANK = IDNINT(WORK(2))
1036:
1037: ELSE
1038: *
1039: * .. two more QR factorizations ( one QRF is not enough, two require
1040: * accumulated product of Jacobi rotations, three are perfect )
1041: *
1042: CALL DLASET( 'Lower', NR-1, NR-1, ZERO, ZERO, A(2,1), LDA )
1043: CALL DGELQF( NR, N, A, LDA, WORK, WORK(N+1), LWORK-N, IERR)
1044: CALL DLACPY( 'Lower', NR, NR, A, LDA, V, LDV )
1045: CALL DLASET( 'Upper', NR-1, NR-1, ZERO, ZERO, V(1,2), LDV )
1046: CALL DGEQRF( NR, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1.6 ! bertrand 1047: $ LWORK-2*N, IERR )
1.1 bertrand 1048: DO 8998 p = 1, NR
1049: CALL DCOPY( NR-p+1, V(p,p), LDV, V(p,p), 1 )
1050: 8998 CONTINUE
1051: CALL DLASET( 'Upper', NR-1, NR-1, ZERO, ZERO, V(1,2), LDV )
1052: *
1053: CALL DGESVJ( 'Lower', 'U','N', NR, NR, V,LDV, SVA, NR, U,
1.6 ! bertrand 1054: $ LDU, WORK(N+1), LWORK, INFO )
1.1 bertrand 1055: SCALEM = WORK(N+1)
1056: NUMRANK = IDNINT(WORK(N+2))
1057: IF ( NR .LT. N ) THEN
1058: CALL DLASET( 'A',N-NR, NR, ZERO,ZERO, V(NR+1,1), LDV )
1059: CALL DLASET( 'A',NR, N-NR, ZERO,ZERO, V(1,NR+1), LDV )
1060: CALL DLASET( 'A',N-NR,N-NR,ZERO,ONE, V(NR+1,NR+1), LDV )
1061: END IF
1062: *
1063: CALL DORMLQ( 'Left', 'Transpose', N, N, NR, A, LDA, WORK,
1.6 ! bertrand 1064: $ V, LDV, WORK(N+1), LWORK-N, IERR )
1.1 bertrand 1065: *
1066: END IF
1067: *
1068: DO 8991 p = 1, N
1069: CALL DCOPY( N, V(p,1), LDV, A(IWORK(p),1), LDA )
1070: 8991 CONTINUE
1071: CALL DLACPY( 'All', N, N, A, LDA, V, LDV )
1072: *
1073: IF ( TRANSP ) THEN
1074: CALL DLACPY( 'All', N, N, V, LDV, U, LDU )
1075: END IF
1076: *
1077: ELSE IF ( LSVEC .AND. ( .NOT. RSVEC ) ) THEN
1078: *
1079: * .. Singular Values and Left Singular Vectors ..
1080: *
1081: * .. second preconditioning step to avoid need to accumulate
1082: * Jacobi rotations in the Jacobi iterations.
1083: DO 1965 p = 1, NR
1084: CALL DCOPY( N-p+1, A(p,p), LDA, U(p,p), 1 )
1085: 1965 CONTINUE
1086: CALL DLASET( 'Upper', NR-1, NR-1, ZERO, ZERO, U(1,2), LDU )
1087: *
1088: CALL DGEQRF( N, NR, U, LDU, WORK(N+1), WORK(2*N+1),
1.6 ! bertrand 1089: $ LWORK-2*N, IERR )
1.1 bertrand 1090: *
1091: DO 1967 p = 1, NR - 1
1092: CALL DCOPY( NR-p, U(p,p+1), LDU, U(p+1,p), 1 )
1093: 1967 CONTINUE
1094: CALL DLASET( 'Upper', NR-1, NR-1, ZERO, ZERO, U(1,2), LDU )
1095: *
1096: CALL DGESVJ( 'Lower', 'U', 'N', NR,NR, U, LDU, SVA, NR, A,
1.6 ! bertrand 1097: $ LDA, WORK(N+1), LWORK-N, INFO )
1.1 bertrand 1098: SCALEM = WORK(N+1)
1099: NUMRANK = IDNINT(WORK(N+2))
1100: *
1101: IF ( NR .LT. M ) THEN
1102: CALL DLASET( 'A', M-NR, NR,ZERO, ZERO, U(NR+1,1), LDU )
1103: IF ( NR .LT. N1 ) THEN
1104: CALL DLASET( 'A',NR, N1-NR, ZERO, ZERO, U(1,NR+1), LDU )
1105: CALL DLASET( 'A',M-NR,N1-NR,ZERO,ONE,U(NR+1,NR+1), LDU )
1106: END IF
1107: END IF
1108: *
1109: CALL DORMQR( 'Left', 'No Tr', M, N1, N, A, LDA, WORK, U,
1.6 ! bertrand 1110: $ LDU, WORK(N+1), LWORK-N, IERR )
1.1 bertrand 1111: *
1112: IF ( ROWPIV )
1.6 ! bertrand 1113: $ CALL DLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 )
1.1 bertrand 1114: *
1115: DO 1974 p = 1, N1
1116: XSC = ONE / DNRM2( M, U(1,p), 1 )
1117: CALL DSCAL( M, XSC, U(1,p), 1 )
1118: 1974 CONTINUE
1119: *
1120: IF ( TRANSP ) THEN
1121: CALL DLACPY( 'All', N, N, U, LDU, V, LDV )
1122: END IF
1123: *
1124: ELSE
1125: *
1126: * .. Full SVD ..
1127: *
1128: IF ( .NOT. JRACC ) THEN
1129: *
1130: IF ( .NOT. ALMORT ) THEN
1131: *
1132: * Second Preconditioning Step (QRF [with pivoting])
1133: * Note that the composition of TRANSPOSE, QRF and TRANSPOSE is
1134: * equivalent to an LQF CALL. Since in many libraries the QRF
1135: * seems to be better optimized than the LQF, we do explicit
1136: * transpose and use the QRF. This is subject to changes in an
1137: * optimized implementation of DGEJSV.
1138: *
1139: DO 1968 p = 1, NR
1140: CALL DCOPY( N-p+1, A(p,p), LDA, V(p,p), 1 )
1141: 1968 CONTINUE
1142: *
1143: * .. the following two loops perturb small entries to avoid
1144: * denormals in the second QR factorization, where they are
1145: * as good as zeros. This is done to avoid painfully slow
1146: * computation with denormals. The relative size of the perturbation
1147: * is a parameter that can be changed by the implementer.
1148: * This perturbation device will be obsolete on machines with
1149: * properly implemented arithmetic.
1150: * To switch it off, set L2PERT=.FALSE. To remove it from the
1151: * code, remove the action under L2PERT=.TRUE., leave the ELSE part.
1152: * The following two loops should be blocked and fused with the
1153: * transposed copy above.
1154: *
1155: IF ( L2PERT ) THEN
1156: XSC = DSQRT(SMALL)
1157: DO 2969 q = 1, NR
1158: TEMP1 = XSC*DABS( V(q,q) )
1159: DO 2968 p = 1, N
1160: IF ( ( p .GT. q ) .AND. ( DABS(V(p,q)) .LE. TEMP1 )
1.6 ! bertrand 1161: $ .OR. ( p .LT. q ) )
! 1162: $ V(p,q) = DSIGN( TEMP1, V(p,q) )
! 1163: IF ( p .LT. q ) V(p,q) = - V(p,q)
1.1 bertrand 1164: 2968 CONTINUE
1165: 2969 CONTINUE
1166: ELSE
1167: CALL DLASET( 'U', NR-1, NR-1, ZERO, ZERO, V(1,2), LDV )
1168: END IF
1169: *
1170: * Estimate the row scaled condition number of R1
1171: * (If R1 is rectangular, N > NR, then the condition number
1172: * of the leading NR x NR submatrix is estimated.)
1173: *
1174: CALL DLACPY( 'L', NR, NR, V, LDV, WORK(2*N+1), NR )
1175: DO 3950 p = 1, NR
1176: TEMP1 = DNRM2(NR-p+1,WORK(2*N+(p-1)*NR+p),1)
1177: CALL DSCAL(NR-p+1,ONE/TEMP1,WORK(2*N+(p-1)*NR+p),1)
1178: 3950 CONTINUE
1179: CALL DPOCON('Lower',NR,WORK(2*N+1),NR,ONE,TEMP1,
1.6 ! bertrand 1180: $ WORK(2*N+NR*NR+1),IWORK(M+2*N+1),IERR)
1.1 bertrand 1181: CONDR1 = ONE / DSQRT(TEMP1)
1182: * .. here need a second oppinion on the condition number
1183: * .. then assume worst case scenario
1184: * R1 is OK for inverse <=> CONDR1 .LT. DBLE(N)
1185: * more conservative <=> CONDR1 .LT. DSQRT(DBLE(N))
1186: *
1187: COND_OK = DSQRT(DBLE(NR))
1188: *[TP] COND_OK is a tuning parameter.
1189:
1190: IF ( CONDR1 .LT. COND_OK ) THEN
1191: * .. the second QRF without pivoting. Note: in an optimized
1192: * implementation, this QRF should be implemented as the QRF
1193: * of a lower triangular matrix.
1194: * R1^t = Q2 * R2
1195: CALL DGEQRF( N, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1.6 ! bertrand 1196: $ LWORK-2*N, IERR )
1.1 bertrand 1197: *
1198: IF ( L2PERT ) THEN
1199: XSC = DSQRT(SMALL)/EPSLN
1200: DO 3959 p = 2, NR
1201: DO 3958 q = 1, p - 1
1202: TEMP1 = XSC * DMIN1(DABS(V(p,p)),DABS(V(q,q)))
1203: IF ( DABS(V(q,p)) .LE. TEMP1 )
1.6 ! bertrand 1204: $ V(q,p) = DSIGN( TEMP1, V(q,p) )
1.1 bertrand 1205: 3958 CONTINUE
1206: 3959 CONTINUE
1207: END IF
1208: *
1209: IF ( NR .NE. N )
1.6 ! bertrand 1210: $ CALL DLACPY( 'A', N, NR, V, LDV, WORK(2*N+1), N )
1.1 bertrand 1211: * .. save ...
1212: *
1213: * .. this transposed copy should be better than naive
1214: DO 1969 p = 1, NR - 1
1215: CALL DCOPY( NR-p, V(p,p+1), LDV, V(p+1,p), 1 )
1216: 1969 CONTINUE
1217: *
1218: CONDR2 = CONDR1
1219: *
1220: ELSE
1221: *
1222: * .. ill-conditioned case: second QRF with pivoting
1223: * Note that windowed pivoting would be equaly good
1224: * numerically, and more run-time efficient. So, in
1225: * an optimal implementation, the next call to DGEQP3
1226: * should be replaced with eg. CALL SGEQPX (ACM TOMS #782)
1227: * with properly (carefully) chosen parameters.
1228: *
1229: * R1^t * P2 = Q2 * R2
1230: DO 3003 p = 1, NR
1231: IWORK(N+p) = 0
1232: 3003 CONTINUE
1233: CALL DGEQP3( N, NR, V, LDV, IWORK(N+1), WORK(N+1),
1.6 ! bertrand 1234: $ WORK(2*N+1), LWORK-2*N, IERR )
1.1 bertrand 1235: ** CALL DGEQRF( N, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1.6 ! bertrand 1236: ** $ LWORK-2*N, IERR )
1.1 bertrand 1237: IF ( L2PERT ) THEN
1238: XSC = DSQRT(SMALL)
1239: DO 3969 p = 2, NR
1240: DO 3968 q = 1, p - 1
1241: TEMP1 = XSC * DMIN1(DABS(V(p,p)),DABS(V(q,q)))
1242: IF ( DABS(V(q,p)) .LE. TEMP1 )
1.6 ! bertrand 1243: $ V(q,p) = DSIGN( TEMP1, V(q,p) )
1.1 bertrand 1244: 3968 CONTINUE
1245: 3969 CONTINUE
1246: END IF
1247: *
1248: CALL DLACPY( 'A', N, NR, V, LDV, WORK(2*N+1), N )
1249: *
1250: IF ( L2PERT ) THEN
1251: XSC = DSQRT(SMALL)
1252: DO 8970 p = 2, NR
1253: DO 8971 q = 1, p - 1
1254: TEMP1 = XSC * DMIN1(DABS(V(p,p)),DABS(V(q,q)))
1255: V(p,q) = - DSIGN( TEMP1, V(q,p) )
1256: 8971 CONTINUE
1257: 8970 CONTINUE
1258: ELSE
1259: CALL DLASET( 'L',NR-1,NR-1,ZERO,ZERO,V(2,1),LDV )
1260: END IF
1261: * Now, compute R2 = L3 * Q3, the LQ factorization.
1262: CALL DGELQF( NR, NR, V, LDV, WORK(2*N+N*NR+1),
1.6 ! bertrand 1263: $ WORK(2*N+N*NR+NR+1), LWORK-2*N-N*NR-NR, IERR )
1.1 bertrand 1264: * .. and estimate the condition number
1265: CALL DLACPY( 'L',NR,NR,V,LDV,WORK(2*N+N*NR+NR+1),NR )
1266: DO 4950 p = 1, NR
1267: TEMP1 = DNRM2( p, WORK(2*N+N*NR+NR+p), NR )
1268: CALL DSCAL( p, ONE/TEMP1, WORK(2*N+N*NR+NR+p), NR )
1269: 4950 CONTINUE
1270: CALL DPOCON( 'L',NR,WORK(2*N+N*NR+NR+1),NR,ONE,TEMP1,
1.6 ! bertrand 1271: $ WORK(2*N+N*NR+NR+NR*NR+1),IWORK(M+2*N+1),IERR )
1.1 bertrand 1272: CONDR2 = ONE / DSQRT(TEMP1)
1273: *
1274: IF ( CONDR2 .GE. COND_OK ) THEN
1275: * .. save the Householder vectors used for Q3
1276: * (this overwrittes the copy of R2, as it will not be
1277: * needed in this branch, but it does not overwritte the
1278: * Huseholder vectors of Q2.).
1279: CALL DLACPY( 'U', NR, NR, V, LDV, WORK(2*N+1), N )
1280: * .. and the rest of the information on Q3 is in
1281: * WORK(2*N+N*NR+1:2*N+N*NR+N)
1282: END IF
1283: *
1284: END IF
1285: *
1286: IF ( L2PERT ) THEN
1287: XSC = DSQRT(SMALL)
1288: DO 4968 q = 2, NR
1289: TEMP1 = XSC * V(q,q)
1290: DO 4969 p = 1, q - 1
1291: * V(p,q) = - DSIGN( TEMP1, V(q,p) )
1292: V(p,q) = - DSIGN( TEMP1, V(p,q) )
1293: 4969 CONTINUE
1294: 4968 CONTINUE
1295: ELSE
1296: CALL DLASET( 'U', NR-1,NR-1, ZERO,ZERO, V(1,2), LDV )
1297: END IF
1298: *
1299: * Second preconditioning finished; continue with Jacobi SVD
1300: * The input matrix is lower trinagular.
1301: *
1302: * Recover the right singular vectors as solution of a well
1303: * conditioned triangular matrix equation.
1304: *
1305: IF ( CONDR1 .LT. COND_OK ) THEN
1306: *
1307: CALL DGESVJ( 'L','U','N',NR,NR,V,LDV,SVA,NR,U,
1.6 ! bertrand 1308: $ LDU,WORK(2*N+N*NR+NR+1),LWORK-2*N-N*NR-NR,INFO )
1.1 bertrand 1309: SCALEM = WORK(2*N+N*NR+NR+1)
1310: NUMRANK = IDNINT(WORK(2*N+N*NR+NR+2))
1311: DO 3970 p = 1, NR
1312: CALL DCOPY( NR, V(1,p), 1, U(1,p), 1 )
1313: CALL DSCAL( NR, SVA(p), V(1,p), 1 )
1314: 3970 CONTINUE
1315:
1316: * .. pick the right matrix equation and solve it
1317: *
1.6 ! bertrand 1318: IF ( NR .EQ. N ) THEN
1.1 bertrand 1319: * :)) .. best case, R1 is inverted. The solution of this matrix
1320: * equation is Q2*V2 = the product of the Jacobi rotations
1321: * used in DGESVJ, premultiplied with the orthogonal matrix
1322: * from the second QR factorization.
1323: CALL DTRSM( 'L','U','N','N', NR,NR,ONE, A,LDA, V,LDV )
1324: ELSE
1325: * .. R1 is well conditioned, but non-square. Transpose(R2)
1326: * is inverted to get the product of the Jacobi rotations
1327: * used in DGESVJ. The Q-factor from the second QR
1328: * factorization is then built in explicitly.
1329: CALL DTRSM('L','U','T','N',NR,NR,ONE,WORK(2*N+1),
1.6 ! bertrand 1330: $ N,V,LDV)
1.1 bertrand 1331: IF ( NR .LT. N ) THEN
1332: CALL DLASET('A',N-NR,NR,ZERO,ZERO,V(NR+1,1),LDV)
1333: CALL DLASET('A',NR,N-NR,ZERO,ZERO,V(1,NR+1),LDV)
1334: CALL DLASET('A',N-NR,N-NR,ZERO,ONE,V(NR+1,NR+1),LDV)
1335: END IF
1336: CALL DORMQR('L','N',N,N,NR,WORK(2*N+1),N,WORK(N+1),
1.6 ! bertrand 1337: $ V,LDV,WORK(2*N+N*NR+NR+1),LWORK-2*N-N*NR-NR,IERR)
1.1 bertrand 1338: END IF
1339: *
1340: ELSE IF ( CONDR2 .LT. COND_OK ) THEN
1341: *
1342: * :) .. the input matrix A is very likely a relative of
1343: * the Kahan matrix :)
1344: * The matrix R2 is inverted. The solution of the matrix equation
1345: * is Q3^T*V3 = the product of the Jacobi rotations (appplied to
1346: * the lower triangular L3 from the LQ factorization of
1347: * R2=L3*Q3), pre-multiplied with the transposed Q3.
1348: CALL DGESVJ( 'L', 'U', 'N', NR, NR, V, LDV, SVA, NR, U,
1.6 ! bertrand 1349: $ LDU, WORK(2*N+N*NR+NR+1), LWORK-2*N-N*NR-NR, INFO )
1.1 bertrand 1350: SCALEM = WORK(2*N+N*NR+NR+1)
1351: NUMRANK = IDNINT(WORK(2*N+N*NR+NR+2))
1352: DO 3870 p = 1, NR
1353: CALL DCOPY( NR, V(1,p), 1, U(1,p), 1 )
1354: CALL DSCAL( NR, SVA(p), U(1,p), 1 )
1355: 3870 CONTINUE
1356: CALL DTRSM('L','U','N','N',NR,NR,ONE,WORK(2*N+1),N,U,LDU)
1357: * .. apply the permutation from the second QR factorization
1358: DO 873 q = 1, NR
1359: DO 872 p = 1, NR
1360: WORK(2*N+N*NR+NR+IWORK(N+p)) = U(p,q)
1361: 872 CONTINUE
1362: DO 874 p = 1, NR
1363: U(p,q) = WORK(2*N+N*NR+NR+p)
1364: 874 CONTINUE
1365: 873 CONTINUE
1366: IF ( NR .LT. N ) THEN
1367: CALL DLASET( 'A',N-NR,NR,ZERO,ZERO,V(NR+1,1),LDV )
1368: CALL DLASET( 'A',NR,N-NR,ZERO,ZERO,V(1,NR+1),LDV )
1369: CALL DLASET( 'A',N-NR,N-NR,ZERO,ONE,V(NR+1,NR+1),LDV )
1370: END IF
1371: CALL DORMQR( 'L','N',N,N,NR,WORK(2*N+1),N,WORK(N+1),
1.6 ! bertrand 1372: $ V,LDV,WORK(2*N+N*NR+NR+1),LWORK-2*N-N*NR-NR,IERR )
1.1 bertrand 1373: ELSE
1374: * Last line of defense.
1375: * #:( This is a rather pathological case: no scaled condition
1376: * improvement after two pivoted QR factorizations. Other
1377: * possibility is that the rank revealing QR factorization
1378: * or the condition estimator has failed, or the COND_OK
1379: * is set very close to ONE (which is unnecessary). Normally,
1380: * this branch should never be executed, but in rare cases of
1381: * failure of the RRQR or condition estimator, the last line of
1382: * defense ensures that DGEJSV completes the task.
1383: * Compute the full SVD of L3 using DGESVJ with explicit
1384: * accumulation of Jacobi rotations.
1385: CALL DGESVJ( 'L', 'U', 'V', NR, NR, V, LDV, SVA, NR, U,
1.6 ! bertrand 1386: $ LDU, WORK(2*N+N*NR+NR+1), LWORK-2*N-N*NR-NR, INFO )
1.1 bertrand 1387: SCALEM = WORK(2*N+N*NR+NR+1)
1388: NUMRANK = IDNINT(WORK(2*N+N*NR+NR+2))
1389: IF ( NR .LT. N ) THEN
1390: CALL DLASET( 'A',N-NR,NR,ZERO,ZERO,V(NR+1,1),LDV )
1391: CALL DLASET( 'A',NR,N-NR,ZERO,ZERO,V(1,NR+1),LDV )
1392: CALL DLASET( 'A',N-NR,N-NR,ZERO,ONE,V(NR+1,NR+1),LDV )
1393: END IF
1394: CALL DORMQR( 'L','N',N,N,NR,WORK(2*N+1),N,WORK(N+1),
1.6 ! bertrand 1395: $ V,LDV,WORK(2*N+N*NR+NR+1),LWORK-2*N-N*NR-NR,IERR )
1.1 bertrand 1396: *
1397: CALL DORMLQ( 'L', 'T', NR, NR, NR, WORK(2*N+1), N,
1.6 ! bertrand 1398: $ WORK(2*N+N*NR+1), U, LDU, WORK(2*N+N*NR+NR+1),
! 1399: $ LWORK-2*N-N*NR-NR, IERR )
1.1 bertrand 1400: DO 773 q = 1, NR
1401: DO 772 p = 1, NR
1402: WORK(2*N+N*NR+NR+IWORK(N+p)) = U(p,q)
1403: 772 CONTINUE
1404: DO 774 p = 1, NR
1405: U(p,q) = WORK(2*N+N*NR+NR+p)
1406: 774 CONTINUE
1407: 773 CONTINUE
1408: *
1409: END IF
1410: *
1411: * Permute the rows of V using the (column) permutation from the
1412: * first QRF. Also, scale the columns to make them unit in
1413: * Euclidean norm. This applies to all cases.
1414: *
1415: TEMP1 = DSQRT(DBLE(N)) * EPSLN
1416: DO 1972 q = 1, N
1417: DO 972 p = 1, N
1418: WORK(2*N+N*NR+NR+IWORK(p)) = V(p,q)
1419: 972 CONTINUE
1420: DO 973 p = 1, N
1421: V(p,q) = WORK(2*N+N*NR+NR+p)
1422: 973 CONTINUE
1423: XSC = ONE / DNRM2( N, V(1,q), 1 )
1424: IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) )
1.6 ! bertrand 1425: $ CALL DSCAL( N, XSC, V(1,q), 1 )
1.1 bertrand 1426: 1972 CONTINUE
1427: * At this moment, V contains the right singular vectors of A.
1428: * Next, assemble the left singular vector matrix U (M x N).
1429: IF ( NR .LT. M ) THEN
1430: CALL DLASET( 'A', M-NR, NR, ZERO, ZERO, U(NR+1,1), LDU )
1431: IF ( NR .LT. N1 ) THEN
1432: CALL DLASET('A',NR,N1-NR,ZERO,ZERO,U(1,NR+1),LDU)
1433: CALL DLASET('A',M-NR,N1-NR,ZERO,ONE,U(NR+1,NR+1),LDU)
1434: END IF
1435: END IF
1436: *
1437: * The Q matrix from the first QRF is built into the left singular
1438: * matrix U. This applies to all cases.
1439: *
1440: CALL DORMQR( 'Left', 'No_Tr', M, N1, N, A, LDA, WORK, U,
1.6 ! bertrand 1441: $ LDU, WORK(N+1), LWORK-N, IERR )
1.1 bertrand 1442:
1443: * The columns of U are normalized. The cost is O(M*N) flops.
1444: TEMP1 = DSQRT(DBLE(M)) * EPSLN
1445: DO 1973 p = 1, NR
1446: XSC = ONE / DNRM2( M, U(1,p), 1 )
1447: IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) )
1.6 ! bertrand 1448: $ CALL DSCAL( M, XSC, U(1,p), 1 )
1.1 bertrand 1449: 1973 CONTINUE
1450: *
1451: * If the initial QRF is computed with row pivoting, the left
1452: * singular vectors must be adjusted.
1453: *
1454: IF ( ROWPIV )
1.6 ! bertrand 1455: $ CALL DLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 )
1.1 bertrand 1456: *
1457: ELSE
1458: *
1459: * .. the initial matrix A has almost orthogonal columns and
1460: * the second QRF is not needed
1461: *
1462: CALL DLACPY( 'Upper', N, N, A, LDA, WORK(N+1), N )
1463: IF ( L2PERT ) THEN
1464: XSC = DSQRT(SMALL)
1465: DO 5970 p = 2, N
1466: TEMP1 = XSC * WORK( N + (p-1)*N + p )
1467: DO 5971 q = 1, p - 1
1468: WORK(N+(q-1)*N+p)=-DSIGN(TEMP1,WORK(N+(p-1)*N+q))
1469: 5971 CONTINUE
1470: 5970 CONTINUE
1471: ELSE
1472: CALL DLASET( 'Lower',N-1,N-1,ZERO,ZERO,WORK(N+2),N )
1473: END IF
1474: *
1475: CALL DGESVJ( 'Upper', 'U', 'N', N, N, WORK(N+1), N, SVA,
1.6 ! bertrand 1476: $ N, U, LDU, WORK(N+N*N+1), LWORK-N-N*N, INFO )
1.1 bertrand 1477: *
1478: SCALEM = WORK(N+N*N+1)
1479: NUMRANK = IDNINT(WORK(N+N*N+2))
1480: DO 6970 p = 1, N
1481: CALL DCOPY( N, WORK(N+(p-1)*N+1), 1, U(1,p), 1 )
1482: CALL DSCAL( N, SVA(p), WORK(N+(p-1)*N+1), 1 )
1483: 6970 CONTINUE
1484: *
1485: CALL DTRSM( 'Left', 'Upper', 'NoTrans', 'No UD', N, N,
1.6 ! bertrand 1486: $ ONE, A, LDA, WORK(N+1), N )
1.1 bertrand 1487: DO 6972 p = 1, N
1488: CALL DCOPY( N, WORK(N+p), N, V(IWORK(p),1), LDV )
1489: 6972 CONTINUE
1490: TEMP1 = DSQRT(DBLE(N))*EPSLN
1491: DO 6971 p = 1, N
1492: XSC = ONE / DNRM2( N, V(1,p), 1 )
1493: IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) )
1.6 ! bertrand 1494: $ CALL DSCAL( N, XSC, V(1,p), 1 )
1.1 bertrand 1495: 6971 CONTINUE
1496: *
1497: * Assemble the left singular vector matrix U (M x N).
1498: *
1499: IF ( N .LT. M ) THEN
1.4 bertrand 1500: CALL DLASET( 'A', M-N, N, ZERO, ZERO, U(N+1,1), LDU )
1.1 bertrand 1501: IF ( N .LT. N1 ) THEN
1502: CALL DLASET( 'A',N, N1-N, ZERO, ZERO, U(1,N+1),LDU )
1.4 bertrand 1503: CALL DLASET( 'A',M-N,N1-N, ZERO, ONE,U(N+1,N+1),LDU )
1.1 bertrand 1504: END IF
1505: END IF
1506: CALL DORMQR( 'Left', 'No Tr', M, N1, N, A, LDA, WORK, U,
1.6 ! bertrand 1507: $ LDU, WORK(N+1), LWORK-N, IERR )
1.1 bertrand 1508: TEMP1 = DSQRT(DBLE(M))*EPSLN
1509: DO 6973 p = 1, N1
1510: XSC = ONE / DNRM2( M, U(1,p), 1 )
1511: IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) )
1.6 ! bertrand 1512: $ CALL DSCAL( M, XSC, U(1,p), 1 )
1.1 bertrand 1513: 6973 CONTINUE
1514: *
1515: IF ( ROWPIV )
1.6 ! bertrand 1516: $ CALL DLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 )
1.1 bertrand 1517: *
1518: END IF
1519: *
1520: * end of the >> almost orthogonal case << in the full SVD
1521: *
1522: ELSE
1523: *
1524: * This branch deploys a preconditioned Jacobi SVD with explicitly
1525: * accumulated rotations. It is included as optional, mainly for
1526: * experimental purposes. It does perfom well, and can also be used.
1527: * In this implementation, this branch will be automatically activated
1528: * if the condition number sigma_max(A) / sigma_min(A) is predicted
1529: * to be greater than the overflow threshold. This is because the
1530: * a posteriori computation of the singular vectors assumes robust
1531: * implementation of BLAS and some LAPACK procedures, capable of working
1532: * in presence of extreme values. Since that is not always the case, ...
1533: *
1534: DO 7968 p = 1, NR
1535: CALL DCOPY( N-p+1, A(p,p), LDA, V(p,p), 1 )
1536: 7968 CONTINUE
1537: *
1538: IF ( L2PERT ) THEN
1539: XSC = DSQRT(SMALL/EPSLN)
1540: DO 5969 q = 1, NR
1541: TEMP1 = XSC*DABS( V(q,q) )
1542: DO 5968 p = 1, N
1543: IF ( ( p .GT. q ) .AND. ( DABS(V(p,q)) .LE. TEMP1 )
1.6 ! bertrand 1544: $ .OR. ( p .LT. q ) )
! 1545: $ V(p,q) = DSIGN( TEMP1, V(p,q) )
! 1546: IF ( p .LT. q ) V(p,q) = - V(p,q)
1.1 bertrand 1547: 5968 CONTINUE
1548: 5969 CONTINUE
1549: ELSE
1550: CALL DLASET( 'U', NR-1, NR-1, ZERO, ZERO, V(1,2), LDV )
1551: END IF
1552:
1553: CALL DGEQRF( N, NR, V, LDV, WORK(N+1), WORK(2*N+1),
1.6 ! bertrand 1554: $ LWORK-2*N, IERR )
1.1 bertrand 1555: CALL DLACPY( 'L', N, NR, V, LDV, WORK(2*N+1), N )
1556: *
1557: DO 7969 p = 1, NR
1558: CALL DCOPY( NR-p+1, V(p,p), LDV, U(p,p), 1 )
1559: 7969 CONTINUE
1560:
1561: IF ( L2PERT ) THEN
1562: XSC = DSQRT(SMALL/EPSLN)
1563: DO 9970 q = 2, NR
1564: DO 9971 p = 1, q - 1
1565: TEMP1 = XSC * DMIN1(DABS(U(p,p)),DABS(U(q,q)))
1566: U(p,q) = - DSIGN( TEMP1, U(q,p) )
1567: 9971 CONTINUE
1568: 9970 CONTINUE
1569: ELSE
1570: CALL DLASET('U', NR-1, NR-1, ZERO, ZERO, U(1,2), LDU )
1571: END IF
1572:
1573: CALL DGESVJ( 'G', 'U', 'V', NR, NR, U, LDU, SVA,
1.6 ! bertrand 1574: $ N, V, LDV, WORK(2*N+N*NR+1), LWORK-2*N-N*NR, INFO )
1.1 bertrand 1575: SCALEM = WORK(2*N+N*NR+1)
1576: NUMRANK = IDNINT(WORK(2*N+N*NR+2))
1577:
1578: IF ( NR .LT. N ) THEN
1579: CALL DLASET( 'A',N-NR,NR,ZERO,ZERO,V(NR+1,1),LDV )
1580: CALL DLASET( 'A',NR,N-NR,ZERO,ZERO,V(1,NR+1),LDV )
1581: CALL DLASET( 'A',N-NR,N-NR,ZERO,ONE,V(NR+1,NR+1),LDV )
1582: END IF
1583:
1584: CALL DORMQR( 'L','N',N,N,NR,WORK(2*N+1),N,WORK(N+1),
1.6 ! bertrand 1585: $ V,LDV,WORK(2*N+N*NR+NR+1),LWORK-2*N-N*NR-NR,IERR )
1.1 bertrand 1586: *
1587: * Permute the rows of V using the (column) permutation from the
1588: * first QRF. Also, scale the columns to make them unit in
1589: * Euclidean norm. This applies to all cases.
1590: *
1591: TEMP1 = DSQRT(DBLE(N)) * EPSLN
1592: DO 7972 q = 1, N
1593: DO 8972 p = 1, N
1594: WORK(2*N+N*NR+NR+IWORK(p)) = V(p,q)
1595: 8972 CONTINUE
1596: DO 8973 p = 1, N
1597: V(p,q) = WORK(2*N+N*NR+NR+p)
1598: 8973 CONTINUE
1599: XSC = ONE / DNRM2( N, V(1,q), 1 )
1600: IF ( (XSC .LT. (ONE-TEMP1)) .OR. (XSC .GT. (ONE+TEMP1)) )
1.6 ! bertrand 1601: $ CALL DSCAL( N, XSC, V(1,q), 1 )
1.1 bertrand 1602: 7972 CONTINUE
1603: *
1604: * At this moment, V contains the right singular vectors of A.
1605: * Next, assemble the left singular vector matrix U (M x N).
1606: *
1.4 bertrand 1607: IF ( NR .LT. M ) THEN
1608: CALL DLASET( 'A', M-NR, NR, ZERO, ZERO, U(NR+1,1), LDU )
1609: IF ( NR .LT. N1 ) THEN
1610: CALL DLASET( 'A',NR, N1-NR, ZERO, ZERO, U(1,NR+1),LDU )
1611: CALL DLASET( 'A',M-NR,N1-NR, ZERO, ONE,U(NR+1,NR+1),LDU )
1.1 bertrand 1612: END IF
1613: END IF
1614: *
1615: CALL DORMQR( 'Left', 'No Tr', M, N1, N, A, LDA, WORK, U,
1.6 ! bertrand 1616: $ LDU, WORK(N+1), LWORK-N, IERR )
1.1 bertrand 1617: *
1618: IF ( ROWPIV )
1.6 ! bertrand 1619: $ CALL DLASWP( N1, U, LDU, 1, M-1, IWORK(2*N+1), -1 )
1.1 bertrand 1620: *
1621: *
1622: END IF
1623: IF ( TRANSP ) THEN
1624: * .. swap U and V because the procedure worked on A^t
1625: DO 6974 p = 1, N
1626: CALL DSWAP( N, U(1,p), 1, V(1,p), 1 )
1627: 6974 CONTINUE
1628: END IF
1629: *
1630: END IF
1631: * end of the full SVD
1632: *
1633: * Undo scaling, if necessary (and possible)
1634: *
1635: IF ( USCAL2 .LE. (BIG/SVA(1))*USCAL1 ) THEN
1636: CALL DLASCL( 'G', 0, 0, USCAL1, USCAL2, NR, 1, SVA, N, IERR )
1637: USCAL1 = ONE
1638: USCAL2 = ONE
1639: END IF
1640: *
1641: IF ( NR .LT. N ) THEN
1642: DO 3004 p = NR+1, N
1643: SVA(p) = ZERO
1644: 3004 CONTINUE
1645: END IF
1646: *
1647: WORK(1) = USCAL2 * SCALEM
1648: WORK(2) = USCAL1
1649: IF ( ERREST ) WORK(3) = SCONDA
1650: IF ( LSVEC .AND. RSVEC ) THEN
1651: WORK(4) = CONDR1
1652: WORK(5) = CONDR2
1653: END IF
1654: IF ( L2TRAN ) THEN
1655: WORK(6) = ENTRA
1656: WORK(7) = ENTRAT
1657: END IF
1658: *
1659: IWORK(1) = NR
1660: IWORK(2) = NUMRANK
1661: IWORK(3) = WARNING
1662: *
1663: RETURN
1664: * ..
1665: * .. END OF DGEJSV
1666: * ..
1667: END
1668: *
CVSweb interface <joel.bertrand@systella.fr>