Provided by: scalapack-doc_1.5-11_all bug

NAME

       PDLAHQR  -  i  an  auxiliary routine used to find the Schur decomposition  and or eigenvalues of a matrix
       already in Hessenberg form from  cols ILO to IHI

SYNOPSIS

       SUBROUTINE PDLAHQR( WANTT, WANTZ, N, ILO, IHI, A, DESCA, WR, WI,  ILOZ,  IHIZ,  Z,  DESCZ,  WORK,  LWORK,
                           IWORK, ILWORK, INFO )

           LOGICAL         WANTT, WANTZ

           INTEGER         IHI, IHIZ, ILO, ILOZ, ILWORK, INFO, LWORK, N, ROTN

           INTEGER         DESCA( * ), DESCZ( * ), IWORK( * )

           DOUBLE          PRECISION A( * ), WI( * ), WORK( * ), WR( * ), Z( * )

PURPOSE

       PDLAHQR is an auxiliary routine used to find the Schur decomposition
         and or eigenvalues of a matrix already in Hessenberg form from
         cols ILO to IHI.

       Notes
       =====

       Each  global  data  object  is  described  by  an  associated description vector.  This vector stores the
       information required to establish the mapping between an object element and its corresponding process and
       memory location.

       Let A be a generic term for any 2D block  cyclicly  distributed  array.   Such  a  global  array  has  an
       associated  description  vector  DESCA.  In the following comments, the character _ should be read as "of
       the global array".

       NOTATION        STORED IN      EXPLANATION
       --------------- -------------- -------------------------------------- DTYPE_A(global) DESCA( DTYPE_  )The
       descriptor type.  In this case,
                                      DTYPE_A = 1.
       CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
                                      the BLACS process grid A is distribu-
                                      ted over. The context itself is glo-
                                      bal, but the handle (the integer
                                      value) may vary.
       M_A    (global) DESCA( M_ )    The number of rows in the global
                                      array A.
       N_A    (global) DESCA( N_ )    The number of columns in the global
                                      array A.
       MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
                                      the rows of the array.
       NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
                                      the columns of the array.
       RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
                                      row  of  the  array  A is distributed.  CSRC_A (global) DESCA( CSRC_ ) The
       process column over which the
                                      first column of the array A is
                                      distributed.
       LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
                                      array.  LLD_A >= MAX(1,LOCr(M_A)).

       Let K be the number of rows or columns of a distributed matrix, and assume  that  its  process  grid  has
       dimension p x q.
       LOCr( K ) denotes the number of elements of K that a process would receive if K were distributed over the
       p processes of its process column.
       Similarly,  LOCc(  K  )  denotes  the  number  of  elements  of  K that a process would receive if K were
       distributed over the q processes of its process row.
       The values of LOCr() and LOCc() may be determined via a call to the ScaLAPACK tool function, NUMROC:
               LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
               LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).  An upper bound for these quantities may  be
       computed by:
               LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
               LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A

ARGUMENTS

       WANTT   (global input) LOGICAL
               = .TRUE. : the full Schur form T is required;
               = .FALSE.: only eigenvalues are required.

       WANTZ   (global input) LOGICAL
               = .TRUE. : the matrix of Schur vectors Z is required;
               = .FALSE.: Schur vectors are not required.

       N       (global input) INTEGER
               The order of the Hessenberg matrix A (and Z if WANTZ).  N >= 0.

       ILO     (global input) INTEGER
               IHI     (global input) INTEGER It is assumed that A is already upper quasi-triangular in rows and
               columns  IHI+1:N,  and  that  A(ILO,ILO-1) = 0 (unless ILO = 1). PDLAHQR works primarily with the
               Hessenberg submatrix in rows and columns ILO to IHI, but applies transformations to all of  H  if
               WANTT is .TRUE..  1 <= ILO <= max(1,IHI); IHI <= N.

       A       (global input/output) DOUBLE PRECISION array, dimension
               (DESCA(LLD_),*) On entry, the upper Hessenberg matrix A.  On exit, if WANTT is .TRUE., A is upper
               quasi-triangular  in  rows and columns ILO:IHI, with any 2-by-2 or larger diagonal blocks not yet
               in standard form. If WANTT is .FALSE., the contents of A are unspecified on exit.

       DESCA   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix A.

       WR      (global replicated output) DOUBLE PRECISION array,
               dimension (N) WI      (global replicated output) DOUBLE PRECISION array, dimension (N)  The  real
               and  imaginary  parts,  respectively,  of  the  computed eigenvalues ILO to IHI are stored in the
               corresponding elements of WR and WI. If two eigenvalues are computed as a complex conjugate pair,
               they are stored in consecutive elements of WR and WI, say the i-th and (i+1)th, with  WI(i)  >  0
               and  WI(i+1)  <  0.  If  WANTT  is .TRUE., the eigenvalues are stored in the same order as on the
               diagonal of the Schur form returned in A.  A may be returned with larger  diagonal  blocks  until
               the next release.

       ILOZ    (global input) INTEGER
               IHIZ     (global input) INTEGER Specify the rows of Z to which transformations must be applied if
               WANTZ is .TRUE..  1 <= ILOZ <= ILO; IHI <= IHIZ <= N.

       Z       (global input/output) DOUBLE PRECISION array.
               If WANTZ is .TRUE., on entry Z must contain the current matrix Z of  transformations  accumulated
               by  PDHSEQR,  and  on  exit Z has been updated; transformations are applied only to the submatrix
               Z(ILOZ:IHIZ,ILO:IHI).  If WANTZ is .FALSE., Z is not referenced.

       DESCZ   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix Z.

       WORK    (local output) DOUBLE PRECISION array of size LWORK
               (Unless LWORK=-1, in which case WORK must be at least size 1)

       LWORK   (local input) INTEGER
               WORK(LWORK) is a local array and LWORK is assumed  big  enough  so  that  LWORK  >=  3*N  +  MAX(
               2*MAX(DESCZ(LLD_),DESCA(LLD_))   +   2*LOCc(N),   7*Ceil(N/HBL)/LCM(NPROW,NPCOL))   +  MAX(  2*N,
               (8*LCM(NPROW,NPCOL)+2)**2 ) If LWORK=-1, then WORK(1) gets set to the above number and  the  code
               returns immediately.

       IWORK   (global and local input) INTEGER array of size ILWORK
               This  will  hold  some  of  the IBLK integer arrays.  This is held as a place holder for a future
               release.  Currently unreferenced.

       ILWORK  (local input) INTEGER
               This will hold the size of the IWORK array.  This is held as a place holder for a future release.
               Currently unreferenced.

       INFO    (global output) INTEGER
               < 0: parameter number -INFO incorrect or inconsistent
               = 0: successful exit
               > 0: PDLAHQR failed to compute all the eigenvalues ILO  to  IHI  in  a  total  of  30*(IHI-ILO+1)
               iterations;  if INFO = i, elements i+1:ihi of WR and WI contain those eigenvalues which have been
               successfully computed.

               Logic: This algorithm is very similar to _LAHQR.  Unlike _LAHQR, instead of  sending  one  double
               shift  through  the  largest unreduced submatrix, this algorithm sends multiple double shifts and
               spaces them apart so that there can be parallelism across several processor row/columns.  Another
               critical difference is that this algorithm aggregrates multiple transforms together in  order  to
               apply them in a block fashion.

               Important  Local  Variables: IBLK = The maximum number of bulges that can be computed.  Currently
               fixed.   Future   releases   this   won't   be   fixed.    HBL    =   The   square   block   size
               (HBL=DESCA(MB_)=DESCA(NB_)) ROTN = The number of transforms to block together NBULGE = The number
               of bulges that will be attempted on the current submatrix.  IBULGE = The current number of bulges
               started.  K1(*),K2(*) = The current bulge loops from K1(*) to K2(*).

               Subroutines:  From  LAPACK,  this routine calls: DLAHQR     -> Serial QR used to determine shifts
               and eigenvalues DLARFG     -> Determine the Householder transforms

               This ScaLAPACK, this routine calls: PDLACONSB  -> To determine  where  to  start  each  iteration
               DLAMSH      ->  Sends  multiple  shifts  through  a  small  submatrix  to see how the consecutive
               subdiagonals change (if PDLACONSB indicates we can start a run in the middle) PDLAWIL    -> Given
               the shift, get the transformation DLASORTE   -> Pair up eigenvalues so  that  reals  are  paired.
               PDLACP3     ->  Parallel  array  to local replicated array copy & back.  DLAREF     -> Row/column
               reflector applier.  Core routine here.  PDLASMSUB  -> Finds negligible subdiagonal elements.

               Current Notes and/or Restrictions: 1.) This code requires the distributed block size to be square
               and at least six (6); unlike simpler codes like LU, this  algorithm  is  extremely  sensitive  to
               block size.  Unwise choices of too small a block size can lead to bad performance.  2.) This code
               requires  A and Z to be distributed identically and have identical contxts.  A future version may
               allow Z to have a different contxt to 1D row map it to all nodes (so no  communication  on  Z  is
               necessary.)   3.)  This  release currently does not have a routine for resolving the Schur blocks
               into regular 2x2 form after this code is completed.  Because of this, a  significant  performance
               impact  is  required while the deflation is done by sometimes a single column of processors.  4.)
               This code does not currently block the initial transforms so that none of the rows or columns for
               any bulge are completed until all are started.  To offset pipeline  start-up  it  is  recommended
               that  at  least 2*LCM(NPROW,NPCOL) bulges are used (if possible) 5.) The maximum number of bulges
               currently supported is fixed at 32.  In future versions this will be limited only by the incoming
               WORK and IWORK array.  6.) The matrix A must be in upper Hessenberg form.  If elements below  the
               subdiagonal  are nonzero, the resulting transforms may be nonsimilar.  This is also true with the
               LAPACK routine DLAHQR.  7.) For this release, this code has only been tested  for  RSRC_=CSRC_=0,
               but it has been written for the general case.  8.) Currently, all the eigenvalues are distributed
               to  all  the  nodes.   Future  releases  will  probably  distribute the eigenvalues by the column
               partitioning.  9.) The internals of this routine are subject to change.  10.)  To  optimize  this
               for your architecture, try tuning DLAREF.  11.) This code has only been tested for WANTZ = .TRUE.
               and may behave unpredictably for WANTZ set to .FALSE.

               Implemented by:  G. Henry, May 1, 1997

LAPACK version 1.5                                 12 May 1997                                        PDLAHQR(l)