1: *> \brief \b ZUNHR_COL
2: *
3: * =========== DOCUMENTATION ===========
4: *
5: * Online html documentation available at
6: * http://www.netlib.org/lapack/explore-html/
7: *
8: *> \htmlonly
9: *> Download ZUNHR_COL + dependencies
10: *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.tgz?format=tgz&filename=/lapack/lapack_routine/zunhr_col.f">
11: *> [TGZ]</a>
12: *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.zip?format=zip&filename=/lapack/lapack_routine/zunhr_col.f">
13: *> [ZIP]</a>
14: *> <a href="http://www.netlib.org/cgi-bin/netlibfiles.txt?format=txt&filename=/lapack/lapack_routine/zunhr_col.f">
15: *> [TXT]</a>
16: *>
17: * Definition:
18: * ===========
19: *
20: * SUBROUTINE ZUNHR_COL( M, N, NB, A, LDA, T, LDT, D, INFO )
21: *
22: * .. Scalar Arguments ..
23: * INTEGER INFO, LDA, LDT, M, N, NB
24: * ..
25: * .. Array Arguments ..
26: * COMPLEX*16 A( LDA, * ), D( * ), T( LDT, * )
27: * ..
28: *
29: *> \par Purpose:
30: * =============
31: *>
32: *> \verbatim
33: *>
34: *> ZUNHR_COL takes an M-by-N complex matrix Q_in with orthonormal columns
35: *> as input, stored in A, and performs Householder Reconstruction (HR),
36: *> i.e. reconstructs Householder vectors V(i) implicitly representing
37: *> another M-by-N matrix Q_out, with the property that Q_in = Q_out*S,
38: *> where S is an N-by-N diagonal matrix with diagonal entries
39: *> equal to +1 or -1. The Householder vectors (columns V(i) of V) are
40: *> stored in A on output, and the diagonal entries of S are stored in D.
41: *> Block reflectors are also returned in T
42: *> (same output format as ZGEQRT).
43: *> \endverbatim
44: *
45: * Arguments:
46: * ==========
47: *
48: *> \param[in] M
49: *> \verbatim
50: *> M is INTEGER
51: *> The number of rows of the matrix A. M >= 0.
52: *> \endverbatim
53: *>
54: *> \param[in] N
55: *> \verbatim
56: *> N is INTEGER
57: *> The number of columns of the matrix A. M >= N >= 0.
58: *> \endverbatim
59: *>
60: *> \param[in] NB
61: *> \verbatim
62: *> NB is INTEGER
63: *> The column block size to be used in the reconstruction
64: *> of Householder column vector blocks in the array A and
65: *> corresponding block reflectors in the array T. NB >= 1.
66: *> (Note that if NB > N, then N is used instead of NB
67: *> as the column block size.)
68: *> \endverbatim
69: *>
70: *> \param[in,out] A
71: *> \verbatim
72: *> A is COMPLEX*16 array, dimension (LDA,N)
73: *>
74: *> On entry:
75: *>
76: *> The array A contains an M-by-N orthonormal matrix Q_in,
77: *> i.e the columns of A are orthogonal unit vectors.
78: *>
79: *> On exit:
80: *>
81: *> The elements below the diagonal of A represent the unit
82: *> lower-trapezoidal matrix V of Householder column vectors
83: *> V(i). The unit diagonal entries of V are not stored
84: *> (same format as the output below the diagonal in A from
85: *> ZGEQRT). The matrix T and the matrix V stored on output
86: *> in A implicitly define Q_out.
87: *>
88: *> The elements above the diagonal contain the factor U
89: *> of the "modified" LU-decomposition:
90: *> Q_in - ( S ) = V * U
91: *> ( 0 )
92: *> where 0 is a (M-N)-by-(M-N) zero matrix.
93: *> \endverbatim
94: *>
95: *> \param[in] LDA
96: *> \verbatim
97: *> LDA is INTEGER
98: *> The leading dimension of the array A. LDA >= max(1,M).
99: *> \endverbatim
100: *>
101: *> \param[out] T
102: *> \verbatim
103: *> T is COMPLEX*16 array,
104: *> dimension (LDT, N)
105: *>
106: *> Let NOCB = Number_of_output_col_blocks
107: *> = CEIL(N/NB)
108: *>
109: *> On exit, T(1:NB, 1:N) contains NOCB upper-triangular
110: *> block reflectors used to define Q_out stored in compact
111: *> form as a sequence of upper-triangular NB-by-NB column
112: *> blocks (same format as the output T in ZGEQRT).
113: *> The matrix T and the matrix V stored on output in A
114: *> implicitly define Q_out. NOTE: The lower triangles
115: *> below the upper-triangular blcoks will be filled with
116: *> zeros. See Further Details.
117: *> \endverbatim
118: *>
119: *> \param[in] LDT
120: *> \verbatim
121: *> LDT is INTEGER
122: *> The leading dimension of the array T.
123: *> LDT >= max(1,min(NB,N)).
124: *> \endverbatim
125: *>
126: *> \param[out] D
127: *> \verbatim
128: *> D is COMPLEX*16 array, dimension min(M,N).
129: *> The elements can be only plus or minus one.
130: *>
131: *> D(i) is constructed as D(i) = -SIGN(Q_in_i(i,i)), where
132: *> 1 <= i <= min(M,N), and Q_in_i is Q_in after performing
133: *> i-1 steps of “modified” Gaussian elimination.
134: *> See Further Details.
135: *> \endverbatim
136: *>
137: *> \param[out] INFO
138: *> \verbatim
139: *> INFO is INTEGER
140: *> = 0: successful exit
141: *> < 0: if INFO = -i, the i-th argument had an illegal value
142: *> \endverbatim
143: *>
144: *> \par Further Details:
145: * =====================
146: *>
147: *> \verbatim
148: *>
149: *> The computed M-by-M unitary factor Q_out is defined implicitly as
150: *> a product of unitary matrices Q_out(i). Each Q_out(i) is stored in
151: *> the compact WY-representation format in the corresponding blocks of
152: *> matrices V (stored in A) and T.
153: *>
154: *> The M-by-N unit lower-trapezoidal matrix V stored in the M-by-N
155: *> matrix A contains the column vectors V(i) in NB-size column
156: *> blocks VB(j). For example, VB(1) contains the columns
157: *> V(1), V(2), ... V(NB). NOTE: The unit entries on
158: *> the diagonal of Y are not stored in A.
159: *>
160: *> The number of column blocks is
161: *>
162: *> NOCB = Number_of_output_col_blocks = CEIL(N/NB)
163: *>
164: *> where each block is of order NB except for the last block, which
165: *> is of order LAST_NB = N - (NOCB-1)*NB.
166: *>
167: *> For example, if M=6, N=5 and NB=2, the matrix V is
168: *>
169: *>
170: *> V = ( VB(1), VB(2), VB(3) ) =
171: *>
172: *> = ( 1 )
173: *> ( v21 1 )
174: *> ( v31 v32 1 )
175: *> ( v41 v42 v43 1 )
176: *> ( v51 v52 v53 v54 1 )
177: *> ( v61 v62 v63 v54 v65 )
178: *>
179: *>
180: *> For each of the column blocks VB(i), an upper-triangular block
181: *> reflector TB(i) is computed. These blocks are stored as
182: *> a sequence of upper-triangular column blocks in the NB-by-N
183: *> matrix T. The size of each TB(i) block is NB-by-NB, except
184: *> for the last block, whose size is LAST_NB-by-LAST_NB.
185: *>
186: *> For example, if M=6, N=5 and NB=2, the matrix T is
187: *>
188: *> T = ( TB(1), TB(2), TB(3) ) =
189: *>
190: *> = ( t11 t12 t13 t14 t15 )
191: *> ( t22 t24 )
192: *>
193: *>
194: *> The M-by-M factor Q_out is given as a product of NOCB
195: *> unitary M-by-M matrices Q_out(i).
196: *>
197: *> Q_out = Q_out(1) * Q_out(2) * ... * Q_out(NOCB),
198: *>
199: *> where each matrix Q_out(i) is given by the WY-representation
200: *> using corresponding blocks from the matrices V and T:
201: *>
202: *> Q_out(i) = I - VB(i) * TB(i) * (VB(i))**T,
203: *>
204: *> where I is the identity matrix. Here is the formula with matrix
205: *> dimensions:
206: *>
207: *> Q(i){M-by-M} = I{M-by-M} -
208: *> VB(i){M-by-INB} * TB(i){INB-by-INB} * (VB(i))**T {INB-by-M},
209: *>
210: *> where INB = NB, except for the last block NOCB
211: *> for which INB=LAST_NB.
212: *>
213: *> =====
214: *> NOTE:
215: *> =====
216: *>
217: *> If Q_in is the result of doing a QR factorization
218: *> B = Q_in * R_in, then:
219: *>
220: *> B = (Q_out*S) * R_in = Q_out * (S * R_in) = O_out * R_out.
221: *>
222: *> So if one wants to interpret Q_out as the result
223: *> of the QR factorization of B, then corresponding R_out
224: *> should be obtained by R_out = S * R_in, i.e. some rows of R_in
225: *> should be multiplied by -1.
226: *>
227: *> For the details of the algorithm, see [1].
228: *>
229: *> [1] "Reconstructing Householder vectors from tall-skinny QR",
230: *> G. Ballard, J. Demmel, L. Grigori, M. Jacquelin, H.D. Nguyen,
231: *> E. Solomonik, J. Parallel Distrib. Comput.,
232: *> vol. 85, pp. 3-31, 2015.
233: *> \endverbatim
234: *>
235: * Authors:
236: * ========
237: *
238: *> \author Univ. of Tennessee
239: *> \author Univ. of California Berkeley
240: *> \author Univ. of Colorado Denver
241: *> \author NAG Ltd.
242: *
243: *> \date November 2019
244: *
245: *> \ingroup complex16OTHERcomputational
246: *
247: *> \par Contributors:
248: * ==================
249: *>
250: *> \verbatim
251: *>
252: *> November 2019, Igor Kozachenko,
253: *> Computer Science Division,
254: *> University of California, Berkeley
255: *>
256: *> \endverbatim
257: *
258: * =====================================================================
259: SUBROUTINE ZUNHR_COL( M, N, NB, A, LDA, T, LDT, D, INFO )
260: IMPLICIT NONE
261: *
262: * -- LAPACK computational routine (version 3.9.0) --
263: * -- LAPACK is a software package provided by Univ. of Tennessee, --
264: * -- Univ. of California Berkeley, Univ. of Colorado Denver and NAG Ltd..--
265: * November 2019
266: *
267: * .. Scalar Arguments ..
268: INTEGER INFO, LDA, LDT, M, N, NB
269: * ..
270: * .. Array Arguments ..
271: COMPLEX*16 A( LDA, * ), D( * ), T( LDT, * )
272: * ..
273: *
274: * =====================================================================
275: *
276: * .. Parameters ..
277: COMPLEX*16 CONE, CZERO
278: PARAMETER ( CONE = ( 1.0D+0, 0.0D+0 ),
279: $ CZERO = ( 0.0D+0, 0.0D+0 ) )
280: * ..
281: * .. Local Scalars ..
282: INTEGER I, IINFO, J, JB, JBTEMP1, JBTEMP2, JNB,
283: $ NPLUSONE
284: * ..
285: * .. External Subroutines ..
286: EXTERNAL ZCOPY, ZLAUNHR_COL_GETRFNP, ZSCAL, ZTRSM,
287: $ XERBLA
288: * ..
289: * .. Intrinsic Functions ..
290: INTRINSIC MAX, MIN
291: * ..
292: * .. Executable Statements ..
293: *
294: * Test the input parameters
295: *
296: INFO = 0
297: IF( M.LT.0 ) THEN
298: INFO = -1
299: ELSE IF( N.LT.0 .OR. N.GT.M ) THEN
300: INFO = -2
301: ELSE IF( NB.LT.1 ) THEN
302: INFO = -3
303: ELSE IF( LDA.LT.MAX( 1, M ) ) THEN
304: INFO = -5
305: ELSE IF( LDT.LT.MAX( 1, MIN( NB, N ) ) ) THEN
306: INFO = -7
307: END IF
308: *
309: * Handle error in the input parameters.
310: *
311: IF( INFO.NE.0 ) THEN
312: CALL XERBLA( 'ZUNHR_COL', -INFO )
313: RETURN
314: END IF
315: *
316: * Quick return if possible
317: *
318: IF( MIN( M, N ).EQ.0 ) THEN
319: RETURN
320: END IF
321: *
322: * On input, the M-by-N matrix A contains the unitary
323: * M-by-N matrix Q_in.
324: *
325: * (1) Compute the unit lower-trapezoidal V (ones on the diagonal
326: * are not stored) by performing the "modified" LU-decomposition.
327: *
328: * Q_in - ( S ) = V * U = ( V1 ) * U,
329: * ( 0 ) ( V2 )
330: *
331: * where 0 is an (M-N)-by-N zero matrix.
332: *
333: * (1-1) Factor V1 and U.
334:
335: CALL ZLAUNHR_COL_GETRFNP( N, N, A, LDA, D, IINFO )
336: *
337: * (1-2) Solve for V2.
338: *
339: IF( M.GT.N ) THEN
340: CALL ZTRSM( 'R', 'U', 'N', 'N', M-N, N, CONE, A, LDA,
341: $ A( N+1, 1 ), LDA )
342: END IF
343: *
344: * (2) Reconstruct the block reflector T stored in T(1:NB, 1:N)
345: * as a sequence of upper-triangular blocks with NB-size column
346: * blocking.
347: *
348: * Loop over the column blocks of size NB of the array A(1:M,1:N)
349: * and the array T(1:NB,1:N), JB is the column index of a column
350: * block, JNB is the column block size at each step JB.
351: *
352: NPLUSONE = N + 1
353: DO JB = 1, N, NB
354: *
355: * (2-0) Determine the column block size JNB.
356: *
357: JNB = MIN( NPLUSONE-JB, NB )
358: *
359: * (2-1) Copy the upper-triangular part of the current JNB-by-JNB
360: * diagonal block U(JB) (of the N-by-N matrix U) stored
361: * in A(JB:JB+JNB-1,JB:JB+JNB-1) into the upper-triangular part
362: * of the current JNB-by-JNB block T(1:JNB,JB:JB+JNB-1)
363: * column-by-column, total JNB*(JNB+1)/2 elements.
364: *
365: JBTEMP1 = JB - 1
366: DO J = JB, JB+JNB-1
367: CALL ZCOPY( J-JBTEMP1, A( JB, J ), 1, T( 1, J ), 1 )
368: END DO
369: *
370: * (2-2) Perform on the upper-triangular part of the current
371: * JNB-by-JNB diagonal block U(JB) (of the N-by-N matrix U) stored
372: * in T(1:JNB,JB:JB+JNB-1) the following operation in place:
373: * (-1)*U(JB)*S(JB), i.e the result will be stored in the upper-
374: * triangular part of T(1:JNB,JB:JB+JNB-1). This multiplication
375: * of the JNB-by-JNB diagonal block U(JB) by the JNB-by-JNB
376: * diagonal block S(JB) of the N-by-N sign matrix S from the
377: * right means changing the sign of each J-th column of the block
378: * U(JB) according to the sign of the diagonal element of the block
379: * S(JB), i.e. S(J,J) that is stored in the array element D(J).
380: *
381: DO J = JB, JB+JNB-1
382: IF( D( J ).EQ.CONE ) THEN
383: CALL ZSCAL( J-JBTEMP1, -CONE, T( 1, J ), 1 )
384: END IF
385: END DO
386: *
387: * (2-3) Perform the triangular solve for the current block
388: * matrix X(JB):
389: *
390: * X(JB) * (A(JB)**T) = B(JB), where:
391: *
392: * A(JB)**T is a JNB-by-JNB unit upper-triangular
393: * coefficient block, and A(JB)=V1(JB), which
394: * is a JNB-by-JNB unit lower-triangular block
395: * stored in A(JB:JB+JNB-1,JB:JB+JNB-1).
396: * The N-by-N matrix V1 is the upper part
397: * of the M-by-N lower-trapezoidal matrix V
398: * stored in A(1:M,1:N);
399: *
400: * B(JB) is a JNB-by-JNB upper-triangular right-hand
401: * side block, B(JB) = (-1)*U(JB)*S(JB), and
402: * B(JB) is stored in T(1:JNB,JB:JB+JNB-1);
403: *
404: * X(JB) is a JNB-by-JNB upper-triangular solution
405: * block, X(JB) is the upper-triangular block
406: * reflector T(JB), and X(JB) is stored
407: * in T(1:JNB,JB:JB+JNB-1).
408: *
409: * In other words, we perform the triangular solve for the
410: * upper-triangular block T(JB):
411: *
412: * T(JB) * (V1(JB)**T) = (-1)*U(JB)*S(JB).
413: *
414: * Even though the blocks X(JB) and B(JB) are upper-
415: * triangular, the routine ZTRSM will access all JNB**2
416: * elements of the square T(1:JNB,JB:JB+JNB-1). Therefore,
417: * we need to set to zero the elements of the block
418: * T(1:JNB,JB:JB+JNB-1) below the diagonal before the call
419: * to ZTRSM.
420: *
421: * (2-3a) Set the elements to zero.
422: *
423: JBTEMP2 = JB - 2
424: DO J = JB, JB+JNB-2
425: DO I = J-JBTEMP2, NB
426: T( I, J ) = CZERO
427: END DO
428: END DO
429: *
430: * (2-3b) Perform the triangular solve.
431: *
432: CALL ZTRSM( 'R', 'L', 'C', 'U', JNB, JNB, CONE,
433: $ A( JB, JB ), LDA, T( 1, JB ), LDT )
434: *
435: END DO
436: *
437: RETURN
438: *
439: * End of ZUNHR_COL
440: *
441: END
CVSweb interface <joel.bertrand@systella.fr>