1: *> \brief \b ZLATRS3 solves a triangular system of equations with the scale factors set to prevent overflow.
2: *
3: * Definition:
4: * ===========
5: *
6: * SUBROUTINE ZLATRS3( UPLO, TRANS, DIAG, NORMIN, N, NRHS, A, LDA,
7: * X, LDX, SCALE, CNORM, WORK, LWORK, INFO )
8: *
9: * .. Scalar Arguments ..
10: * CHARACTER DIAG, NORMIN, TRANS, UPLO
11: * INTEGER INFO, LDA, LWORK, LDX, N, NRHS
12: * ..
13: * .. Array Arguments ..
14: * DOUBLE PRECISION CNORM( * ), SCALE( * ), WORK( * )
15: * COMPLEX*16 A( LDA, * ), X( LDX, * )
16: * ..
17: *
18: *
19: *> \par Purpose:
20: * =============
21: *>
22: *> \verbatim
23: *>
24: *> ZLATRS3 solves one of the triangular systems
25: *>
26: *> A * X = B * diag(scale), A**T * X = B * diag(scale), or
27: *> A**H * X = B * diag(scale)
28: *>
29: *> with scaling to prevent overflow. Here A is an upper or lower
30: *> triangular matrix, A**T denotes the transpose of A, A**H denotes the
31: *> conjugate transpose of A. X and B are n-by-nrhs matrices and scale
32: *> is an nrhs-element vector of scaling factors. A scaling factor scale(j)
33: *> is usually less than or equal to 1, chosen such that X(:,j) is less
34: *> than the overflow threshold. If the matrix A is singular (A(j,j) = 0
35: *> for some j), then a non-trivial solution to A*X = 0 is returned. If
36: *> the system is so badly scaled that the solution cannot be represented
37: *> as (1/scale(k))*X(:,k), then x(:,k) = 0 and scale(k) is returned.
38: *>
39: *> This is a BLAS-3 version of LATRS for solving several right
40: *> hand sides simultaneously.
41: *>
42: *> \endverbatim
43: *
44: * Arguments:
45: * ==========
46: *
47: *> \param[in] UPLO
48: *> \verbatim
49: *> UPLO is CHARACTER*1
50: *> Specifies whether the matrix A is upper or lower triangular.
51: *> = 'U': Upper triangular
52: *> = 'L': Lower triangular
53: *> \endverbatim
54: *>
55: *> \param[in] TRANS
56: *> \verbatim
57: *> TRANS is CHARACTER*1
58: *> Specifies the operation applied to A.
59: *> = 'N': Solve A * x = s*b (No transpose)
60: *> = 'T': Solve A**T* x = s*b (Transpose)
61: *> = 'C': Solve A**T* x = s*b (Conjugate transpose)
62: *> \endverbatim
63: *>
64: *> \param[in] DIAG
65: *> \verbatim
66: *> DIAG is CHARACTER*1
67: *> Specifies whether or not the matrix A is unit triangular.
68: *> = 'N': Non-unit triangular
69: *> = 'U': Unit triangular
70: *> \endverbatim
71: *>
72: *> \param[in] NORMIN
73: *> \verbatim
74: *> NORMIN is CHARACTER*1
75: *> Specifies whether CNORM has been set or not.
76: *> = 'Y': CNORM contains the column norms on entry
77: *> = 'N': CNORM is not set on entry. On exit, the norms will
78: *> be computed and stored in CNORM.
79: *> \endverbatim
80: *>
81: *> \param[in] N
82: *> \verbatim
83: *> N is INTEGER
84: *> The order of the matrix A. N >= 0.
85: *> \endverbatim
86: *>
87: *> \param[in] NRHS
88: *> \verbatim
89: *> NRHS is INTEGER
90: *> The number of columns of X. NRHS >= 0.
91: *> \endverbatim
92: *>
93: *> \param[in] A
94: *> \verbatim
95: *> A is COMPLEX*16 array, dimension (LDA,N)
96: *> The triangular matrix A. If UPLO = 'U', the leading n by n
97: *> upper triangular part of the array A contains the upper
98: *> triangular matrix, and the strictly lower triangular part of
99: *> A is not referenced. If UPLO = 'L', the leading n by n lower
100: *> triangular part of the array A contains the lower triangular
101: *> matrix, and the strictly upper triangular part of A is not
102: *> referenced. If DIAG = 'U', the diagonal elements of A are
103: *> also not referenced and are assumed to be 1.
104: *> \endverbatim
105: *>
106: *> \param[in] LDA
107: *> \verbatim
108: *> LDA is INTEGER
109: *> The leading dimension of the array A. LDA >= max (1,N).
110: *> \endverbatim
111: *>
112: *> \param[in,out] X
113: *> \verbatim
114: *> X is COMPLEX*16 array, dimension (LDX,NRHS)
115: *> On entry, the right hand side B of the triangular system.
116: *> On exit, X is overwritten by the solution matrix X.
117: *> \endverbatim
118: *>
119: *> \param[in] LDX
120: *> \verbatim
121: *> LDX is INTEGER
122: *> The leading dimension of the array X. LDX >= max (1,N).
123: *> \endverbatim
124: *>
125: *> \param[out] SCALE
126: *> \verbatim
127: *> SCALE is DOUBLE PRECISION array, dimension (NRHS)
128: *> The scaling factor s(k) is for the triangular system
129: *> A * x(:,k) = s(k)*b(:,k) or A**T* x(:,k) = s(k)*b(:,k).
130: *> If SCALE = 0, the matrix A is singular or badly scaled.
131: *> If A(j,j) = 0 is encountered, a non-trivial vector x(:,k)
132: *> that is an exact or approximate solution to A*x(:,k) = 0
133: *> is returned. If the system so badly scaled that solution
134: *> cannot be presented as x(:,k) * 1/s(k), then x(:,k) = 0
135: *> is returned.
136: *> \endverbatim
137: *>
138: *> \param[in,out] CNORM
139: *> \verbatim
140: *> CNORM is DOUBLE PRECISION array, dimension (N)
141: *>
142: *> If NORMIN = 'Y', CNORM is an input argument and CNORM(j)
143: *> contains the norm of the off-diagonal part of the j-th column
144: *> of A. If TRANS = 'N', CNORM(j) must be greater than or equal
145: *> to the infinity-norm, and if TRANS = 'T' or 'C', CNORM(j)
146: *> must be greater than or equal to the 1-norm.
147: *>
148: *> If NORMIN = 'N', CNORM is an output argument and CNORM(j)
149: *> returns the 1-norm of the offdiagonal part of the j-th column
150: *> of A.
151: *> \endverbatim
152: *>
153: *> \param[out] WORK
154: *> \verbatim
155: *> WORK is DOUBLE PRECISION array, dimension (LWORK).
156: *> On exit, if INFO = 0, WORK(1) returns the optimal size of
157: *> WORK.
158: *> \endverbatim
159: *>
160: *> \param[in] LWORK
161: *> LWORK is INTEGER
162: *> LWORK >= MAX(1, 2*NBA * MAX(NBA, MIN(NRHS, 32)), where
163: *> NBA = (N + NB - 1)/NB and NB is the optimal block size.
164: *>
165: *> If LWORK = -1, then a workspace query is assumed; the routine
166: *> only calculates the optimal dimensions of the WORK array, returns
167: *> this value as the first entry of the WORK array, and no error
168: *> message related to LWORK is issued by XERBLA.
169: *>
170: *> \param[out] INFO
171: *> \verbatim
172: *> INFO is INTEGER
173: *> = 0: successful exit
174: *> < 0: if INFO = -k, the k-th argument had an illegal value
175: *> \endverbatim
176: *
177: * Authors:
178: * ========
179: *
180: *> \author Univ. of Tennessee
181: *> \author Univ. of California Berkeley
182: *> \author Univ. of Colorado Denver
183: *> \author NAG Ltd.
184: *
185: *> \ingroup doubleOTHERauxiliary
186: *> \par Further Details:
187: * =====================
188: * \verbatim
189: * The algorithm follows the structure of a block triangular solve.
190: * The diagonal block is solved with a call to the robust the triangular
191: * solver LATRS for every right-hand side RHS = 1, ..., NRHS
192: * op(A( J, J )) * x( J, RHS ) = SCALOC * b( J, RHS ),
193: * where op( A ) = A or op( A ) = A**T or op( A ) = A**H.
194: * The linear block updates operate on block columns of X,
195: * B( I, K ) - op(A( I, J )) * X( J, K )
196: * and use GEMM. To avoid overflow in the linear block update, the worst case
197: * growth is estimated. For every RHS, a scale factor s <= 1.0 is computed
198: * such that
199: * || s * B( I, RHS )||_oo
200: * + || op(A( I, J )) ||_oo * || s * X( J, RHS ) ||_oo <= Overflow threshold
201: *
202: * Once all columns of a block column have been rescaled (BLAS-1), the linear
203: * update is executed with GEMM without overflow.
204: *
205: * To limit rescaling, local scale factors track the scaling of column segments.
206: * There is one local scale factor s( I, RHS ) per block row I = 1, ..., NBA
207: * per right-hand side column RHS = 1, ..., NRHS. The global scale factor
208: * SCALE( RHS ) is chosen as the smallest local scale factor s( I, RHS )
209: * I = 1, ..., NBA.
210: * A triangular solve op(A( J, J )) * x( J, RHS ) = SCALOC * b( J, RHS )
211: * updates the local scale factor s( J, RHS ) := s( J, RHS ) * SCALOC. The
212: * linear update of potentially inconsistently scaled vector segments
213: * s( I, RHS ) * b( I, RHS ) - op(A( I, J )) * ( s( J, RHS )* x( J, RHS ) )
214: * computes a consistent scaling SCAMIN = MIN( s(I, RHS ), s(J, RHS) ) and,
215: * if necessary, rescales the blocks prior to calling GEMM.
216: *
217: * \endverbatim
218: * =====================================================================
219: * References:
220: * C. C. Kjelgaard Mikkelsen, A. B. Schwarz and L. Karlsson (2019).
221: * Parallel robust solution of triangular linear systems. Concurrency
222: * and Computation: Practice and Experience, 31(19), e5064.
223: *
224: * Contributor:
225: * Angelika Schwarz, Umea University, Sweden.
226: *
227: * =====================================================================
228: SUBROUTINE ZLATRS3( UPLO, TRANS, DIAG, NORMIN, N, NRHS, A, LDA,
229: $ X, LDX, SCALE, CNORM, WORK, LWORK, INFO )
230: IMPLICIT NONE
231: *
232: * .. Scalar Arguments ..
233: CHARACTER DIAG, TRANS, NORMIN, UPLO
234: INTEGER INFO, LDA, LWORK, LDX, N, NRHS
235: * ..
236: * .. Array Arguments ..
237: COMPLEX*16 A( LDA, * ), X( LDX, * )
238: DOUBLE PRECISION CNORM( * ), SCALE( * ), WORK( * )
239: * ..
240: *
241: * =====================================================================
242: *
243: * .. Parameters ..
244: DOUBLE PRECISION ZERO, ONE
245: PARAMETER ( ZERO = 0.0D+0, ONE = 1.0D+0 )
246: COMPLEX*16 CZERO, CONE
247: PARAMETER ( CONE = ( 1.0D+0, 0.0D+0 ) )
248: PARAMETER ( CZERO = ( 0.0D+0, 0.0D+0 ) )
249: INTEGER NBMAX, NBMIN, NBRHS, NRHSMIN
250: PARAMETER ( NRHSMIN = 2, NBRHS = 32 )
251: PARAMETER ( NBMIN = 8, NBMAX = 64 )
252: * ..
253: * .. Local Arrays ..
254: DOUBLE PRECISION W( NBMAX ), XNRM( NBRHS )
255: * ..
256: * .. Local Scalars ..
257: LOGICAL LQUERY, NOTRAN, NOUNIT, UPPER
258: INTEGER AWRK, I, IFIRST, IINC, ILAST, II, I1, I2, J,
259: $ JFIRST, JINC, JLAST, J1, J2, K, KK, K1, K2,
260: $ LANRM, LDS, LSCALE, NB, NBA, NBX, RHS
261: DOUBLE PRECISION ANRM, BIGNUM, BNRM, RSCAL, SCAL, SCALOC,
262: $ SCAMIN, SMLNUM, TMAX
263: * ..
264: * .. External Functions ..
265: LOGICAL LSAME
266: INTEGER ILAENV
267: DOUBLE PRECISION DLAMCH, ZLANGE, DLARMM
268: EXTERNAL ILAENV, LSAME, DLAMCH, ZLANGE, DLARMM
269: * ..
270: * .. External Subroutines ..
271: EXTERNAL ZLATRS, ZDSCAL, XERBLA
272: * ..
273: * .. Intrinsic Functions ..
274: INTRINSIC ABS, MAX, MIN
275: * ..
276: * .. Executable Statements ..
277: *
278: INFO = 0
279: UPPER = LSAME( UPLO, 'U' )
280: NOTRAN = LSAME( TRANS, 'N' )
281: NOUNIT = LSAME( DIAG, 'N' )
282: LQUERY = ( LWORK.EQ.-1 )
283: *
284: * Partition A and X into blocks.
285: *
286: NB = MAX( NBMIN, ILAENV( 1, 'ZLATRS', '', N, N, -1, -1 ) )
287: NB = MIN( NBMAX, NB )
288: NBA = MAX( 1, (N + NB - 1) / NB )
289: NBX = MAX( 1, (NRHS + NBRHS - 1) / NBRHS )
290: *
291: * Compute the workspace
292: *
293: * The workspace comprises two parts.
294: * The first part stores the local scale factors. Each simultaneously
295: * computed right-hand side requires one local scale factor per block
296: * row. WORK( I + KK * LDS ) is the scale factor of the vector
297: * segment associated with the I-th block row and the KK-th vector
298: * in the block column.
299: LSCALE = NBA * MAX( NBA, MIN( NRHS, NBRHS ) )
300: LDS = NBA
301: * The second part stores upper bounds of the triangular A. There are
302: * a total of NBA x NBA blocks, of which only the upper triangular
303: * part or the lower triangular part is referenced. The upper bound of
304: * the block A( I, J ) is stored as WORK( AWRK + I + J * NBA ).
305: LANRM = NBA * NBA
306: AWRK = LSCALE
307: WORK( 1 ) = LSCALE + LANRM
308: *
309: * Test the input parameters.
310: *
311: IF( .NOT.UPPER .AND. .NOT.LSAME( UPLO, 'L' ) ) THEN
312: INFO = -1
313: ELSE IF( .NOT.NOTRAN .AND. .NOT.LSAME( TRANS, 'T' ) .AND. .NOT.
314: $ LSAME( TRANS, 'C' ) ) THEN
315: INFO = -2
316: ELSE IF( .NOT.NOUNIT .AND. .NOT.LSAME( DIAG, 'U' ) ) THEN
317: INFO = -3
318: ELSE IF( .NOT.LSAME( NORMIN, 'Y' ) .AND. .NOT.
319: $ LSAME( NORMIN, 'N' ) ) THEN
320: INFO = -4
321: ELSE IF( N.LT.0 ) THEN
322: INFO = -5
323: ELSE IF( NRHS.LT.0 ) THEN
324: INFO = -6
325: ELSE IF( LDA.LT.MAX( 1, N ) ) THEN
326: INFO = -8
327: ELSE IF( LDX.LT.MAX( 1, N ) ) THEN
328: INFO = -10
329: ELSE IF( .NOT.LQUERY .AND. LWORK.LT.WORK( 1 ) ) THEN
330: INFO = -14
331: END IF
332: IF( INFO.NE.0 ) THEN
333: CALL XERBLA( 'ZLATRS3', -INFO )
334: RETURN
335: ELSE IF( LQUERY ) THEN
336: RETURN
337: END IF
338: *
339: * Initialize scaling factors
340: *
341: DO KK = 1, NRHS
342: SCALE( KK ) = ONE
343: END DO
344: *
345: * Quick return if possible
346: *
347: IF( MIN( N, NRHS ).EQ.0 )
348: $ RETURN
349: *
350: * Determine machine dependent constant to control overflow.
351: *
352: BIGNUM = DLAMCH( 'Overflow' )
353: SMLNUM = DLAMCH( 'Safe Minimum' )
354: *
355: * Use unblocked code for small problems
356: *
357: IF( NRHS.LT.NRHSMIN ) THEN
358: CALL ZLATRS( UPLO, TRANS, DIAG, NORMIN, N, A, LDA, X( 1, 1),
359: $ SCALE( 1 ), CNORM, INFO )
360: DO K = 2, NRHS
361: CALL ZLATRS( UPLO, TRANS, DIAG, 'Y', N, A, LDA, X( 1, K ),
362: $ SCALE( K ), CNORM, INFO )
363: END DO
364: RETURN
365: END IF
366: *
367: * Compute norms of blocks of A excluding diagonal blocks and find
368: * the block with the largest norm TMAX.
369: *
370: TMAX = ZERO
371: DO J = 1, NBA
372: J1 = (J-1)*NB + 1
373: J2 = MIN( J*NB, N ) + 1
374: IF ( UPPER ) THEN
375: IFIRST = 1
376: ILAST = J - 1
377: ELSE
378: IFIRST = J + 1
379: ILAST = NBA
380: END IF
381: DO I = IFIRST, ILAST
382: I1 = (I-1)*NB + 1
383: I2 = MIN( I*NB, N ) + 1
384: *
385: * Compute upper bound of A( I1:I2-1, J1:J2-1 ).
386: *
387: IF( NOTRAN ) THEN
388: ANRM = ZLANGE( 'I', I2-I1, J2-J1, A( I1, J1 ), LDA, W )
389: WORK( AWRK + I+(J-1)*NBA ) = ANRM
390: ELSE
391: ANRM = ZLANGE( '1', I2-I1, J2-J1, A( I1, J1 ), LDA, W )
392: WORK( AWRK + J+(I-1) * NBA ) = ANRM
393: END IF
394: TMAX = MAX( TMAX, ANRM )
395: END DO
396: END DO
397: *
398: IF( .NOT. TMAX.LE.DLAMCH('Overflow') ) THEN
399: *
400: * Some matrix entries have huge absolute value. At least one upper
401: * bound norm( A(I1:I2-1, J1:J2-1), 'I') is not a valid floating-point
402: * number, either due to overflow in LANGE or due to Inf in A.
403: * Fall back to LATRS. Set normin = 'N' for every right-hand side to
404: * force computation of TSCAL in LATRS to avoid the likely overflow
405: * in the computation of the column norms CNORM.
406: *
407: DO K = 1, NRHS
408: CALL ZLATRS( UPLO, TRANS, DIAG, 'N', N, A, LDA, X( 1, K ),
409: $ SCALE( K ), CNORM, INFO )
410: END DO
411: RETURN
412: END IF
413: *
414: * Every right-hand side requires workspace to store NBA local scale
415: * factors. To save workspace, X is computed successively in block columns
416: * of width NBRHS, requiring a total of NBA x NBRHS space. If sufficient
417: * workspace is available, larger values of NBRHS or NBRHS = NRHS are viable.
418: DO K = 1, NBX
419: * Loop over block columns (index = K) of X and, for column-wise scalings,
420: * over individual columns (index = KK).
421: * K1: column index of the first column in X( J, K )
422: * K2: column index of the first column in X( J, K+1 )
423: * so the K2 - K1 is the column count of the block X( J, K )
424: K1 = (K-1)*NBRHS + 1
425: K2 = MIN( K*NBRHS, NRHS ) + 1
426: *
427: * Initialize local scaling factors of current block column X( J, K )
428: *
429: DO KK = 1, K2 - K1
430: DO I = 1, NBA
431: WORK( I+KK*LDS ) = ONE
432: END DO
433: END DO
434: *
435: IF( NOTRAN ) THEN
436: *
437: * Solve A * X(:, K1:K2-1) = B * diag(scale(K1:K2-1))
438: *
439: IF( UPPER ) THEN
440: JFIRST = NBA
441: JLAST = 1
442: JINC = -1
443: ELSE
444: JFIRST = 1
445: JLAST = NBA
446: JINC = 1
447: END IF
448: ELSE
449: *
450: * Solve op(A) * X(:, K1:K2-1) = B * diag(scale(K1:K2-1))
451: * where op(A) = A**T or op(A) = A**H
452: *
453: IF( UPPER ) THEN
454: JFIRST = 1
455: JLAST = NBA
456: JINC = 1
457: ELSE
458: JFIRST = NBA
459: JLAST = 1
460: JINC = -1
461: END IF
462: END IF
463:
464: DO J = JFIRST, JLAST, JINC
465: * J1: row index of the first row in A( J, J )
466: * J2: row index of the first row in A( J+1, J+1 )
467: * so that J2 - J1 is the row count of the block A( J, J )
468: J1 = (J-1)*NB + 1
469: J2 = MIN( J*NB, N ) + 1
470: *
471: * Solve op(A( J, J )) * X( J, RHS ) = SCALOC * B( J, RHS )
472: *
473: DO KK = 1, K2 - K1
474: RHS = K1 + KK - 1
475: IF( KK.EQ.1 ) THEN
476: CALL ZLATRS( UPLO, TRANS, DIAG, 'N', J2-J1,
477: $ A( J1, J1 ), LDA, X( J1, RHS ),
478: $ SCALOC, CNORM, INFO )
479: ELSE
480: CALL ZLATRS( UPLO, TRANS, DIAG, 'Y', J2-J1,
481: $ A( J1, J1 ), LDA, X( J1, RHS ),
482: $ SCALOC, CNORM, INFO )
483: END IF
484: * Find largest absolute value entry in the vector segment
485: * X( J1:J2-1, RHS ) as an upper bound for the worst case
486: * growth in the linear updates.
487: XNRM( KK ) = ZLANGE( 'I', J2-J1, 1, X( J1, RHS ),
488: $ LDX, W )
489: *
490: IF( SCALOC .EQ. ZERO ) THEN
491: * LATRS found that A is singular through A(j,j) = 0.
492: * Reset the computation x(1:n) = 0, x(j) = 1, SCALE = 0
493: * and compute op(A)*x = 0. Note that X(J1:J2-1, KK) is
494: * set by LATRS.
495: SCALE( RHS ) = ZERO
496: DO II = 1, J1-1
497: X( II, KK ) = CZERO
498: END DO
499: DO II = J2, N
500: X( II, KK ) = CZERO
501: END DO
502: * Discard the local scale factors.
503: DO II = 1, NBA
504: WORK( II+KK*LDS ) = ONE
505: END DO
506: SCALOC = ONE
507: ELSE IF( SCALOC*WORK( J+KK*LDS ) .EQ. ZERO ) THEN
508: * LATRS computed a valid scale factor, but combined with
509: * the current scaling the solution does not have a
510: * scale factor > 0.
511: *
512: * Set WORK( J+KK*LDS ) to smallest valid scale
513: * factor and increase SCALOC accordingly.
514: SCAL = WORK( J+KK*LDS ) / SMLNUM
515: SCALOC = SCALOC * SCAL
516: WORK( J+KK*LDS ) = SMLNUM
517: * If LATRS overestimated the growth, x may be
518: * rescaled to preserve a valid combined scale
519: * factor WORK( J, KK ) > 0.
520: RSCAL = ONE / SCALOC
521: IF( XNRM( KK )*RSCAL .LE. BIGNUM ) THEN
522: XNRM( KK ) = XNRM( KK ) * RSCAL
523: CALL ZDSCAL( J2-J1, RSCAL, X( J1, RHS ), 1 )
524: SCALOC = ONE
525: ELSE
526: * The system op(A) * x = b is badly scaled and its
527: * solution cannot be represented as (1/scale) * x.
528: * Set x to zero. This approach deviates from LATRS
529: * where a completely meaningless non-zero vector
530: * is returned that is not a solution to op(A) * x = b.
531: SCALE( RHS ) = ZERO
532: DO II = 1, N
533: X( II, KK ) = CZERO
534: END DO
535: * Discard the local scale factors.
536: DO II = 1, NBA
537: WORK( II+KK*LDS ) = ONE
538: END DO
539: SCALOC = ONE
540: END IF
541: END IF
542: SCALOC = SCALOC * WORK( J+KK*LDS )
543: WORK( J+KK*LDS ) = SCALOC
544: END DO
545: *
546: * Linear block updates
547: *
548: IF( NOTRAN ) THEN
549: IF( UPPER ) THEN
550: IFIRST = J - 1
551: ILAST = 1
552: IINC = -1
553: ELSE
554: IFIRST = J + 1
555: ILAST = NBA
556: IINC = 1
557: END IF
558: ELSE
559: IF( UPPER ) THEN
560: IFIRST = J + 1
561: ILAST = NBA
562: IINC = 1
563: ELSE
564: IFIRST = J - 1
565: ILAST = 1
566: IINC = -1
567: END IF
568: END IF
569: *
570: DO I = IFIRST, ILAST, IINC
571: * I1: row index of the first column in X( I, K )
572: * I2: row index of the first column in X( I+1, K )
573: * so the I2 - I1 is the row count of the block X( I, K )
574: I1 = (I-1)*NB + 1
575: I2 = MIN( I*NB, N ) + 1
576: *
577: * Prepare the linear update to be executed with GEMM.
578: * For each column, compute a consistent scaling, a
579: * scaling factor to survive the linear update, and
580: * rescale the column segments, if necesssary. Then
581: * the linear update is safely executed.
582: *
583: DO KK = 1, K2 - K1
584: RHS = K1 + KK - 1
585: * Compute consistent scaling
586: SCAMIN = MIN( WORK( I+KK*LDS), WORK( J+KK*LDS ) )
587: *
588: * Compute scaling factor to survive the linear update
589: * simulating consistent scaling.
590: *
591: BNRM = ZLANGE( 'I', I2-I1, 1, X( I1, RHS ), LDX, W )
592: BNRM = BNRM*( SCAMIN / WORK( I+KK*LDS ) )
593: XNRM( KK ) = XNRM( KK )*( SCAMIN / WORK( J+KK*LDS) )
594: ANRM = WORK( AWRK + I+(J-1)*NBA )
595: SCALOC = DLARMM( ANRM, XNRM( KK ), BNRM )
596: *
597: * Simultaneously apply the robust update factor and the
598: * consistency scaling factor to X( I, KK ) and X( J, KK ).
599: *
600: SCAL = ( SCAMIN / WORK( I+KK*LDS) )*SCALOC
601: IF( SCAL.NE.ONE ) THEN
602: CALL ZDSCAL( I2-I1, SCAL, X( I1, RHS ), 1 )
603: WORK( I+KK*LDS ) = SCAMIN*SCALOC
604: END IF
605: *
606: SCAL = ( SCAMIN / WORK( J+KK*LDS ) )*SCALOC
607: IF( SCAL.NE.ONE ) THEN
608: CALL ZDSCAL( J2-J1, SCAL, X( J1, RHS ), 1 )
609: WORK( J+KK*LDS ) = SCAMIN*SCALOC
610: END IF
611: END DO
612: *
613: IF( NOTRAN ) THEN
614: *
615: * B( I, K ) := B( I, K ) - A( I, J ) * X( J, K )
616: *
617: CALL ZGEMM( 'N', 'N', I2-I1, K2-K1, J2-J1, -CONE,
618: $ A( I1, J1 ), LDA, X( J1, K1 ), LDX,
619: $ CONE, X( I1, K1 ), LDX )
620: ELSE IF( LSAME( TRANS, 'T' ) ) THEN
621: *
622: * B( I, K ) := B( I, K ) - A( I, J )**T * X( J, K )
623: *
624: CALL ZGEMM( 'T', 'N', I2-I1, K2-K1, J2-J1, -CONE,
625: $ A( J1, I1 ), LDA, X( J1, K1 ), LDX,
626: $ CONE, X( I1, K1 ), LDX )
627: ELSE
628: *
629: * B( I, K ) := B( I, K ) - A( I, J )**H * X( J, K )
630: *
631: CALL ZGEMM( 'C', 'N', I2-I1, K2-K1, J2-J1, -CONE,
632: $ A( J1, I1 ), LDA, X( J1, K1 ), LDX,
633: $ CONE, X( I1, K1 ), LDX )
634: END IF
635: END DO
636: END DO
637:
638: *
639: * Reduce local scaling factors
640: *
641: DO KK = 1, K2 - K1
642: RHS = K1 + KK - 1
643: DO I = 1, NBA
644: SCALE( RHS ) = MIN( SCALE( RHS ), WORK( I+KK*LDS ) )
645: END DO
646: END DO
647: *
648: * Realize consistent scaling
649: *
650: DO KK = 1, K2 - K1
651: RHS = K1 + KK - 1
652: IF( SCALE( RHS ).NE.ONE .AND. SCALE( RHS ).NE. ZERO ) THEN
653: DO I = 1, NBA
654: I1 = (I - 1) * NB + 1
655: I2 = MIN( I * NB, N ) + 1
656: SCAL = SCALE( RHS ) / WORK( I+KK*LDS )
657: IF( SCAL.NE.ONE )
658: $ CALL ZDSCAL( I2-I1, SCAL, X( I1, RHS ), 1 )
659: END DO
660: END IF
661: END DO
662: END DO
663: RETURN
664: *
665: * End of ZLATRS3
666: *
667: END
CVSweb interface <joel.bertrand@systella.fr>