Provided by: scalapack-doc_1.5-11_all bug

NAME

       PDSTEIN - compute the eigenvectors of a symmetric tridiagonal matrix in parallel, using inverse iteration

SYNOPSIS

       SUBROUTINE PDSTEIN( N,  D,  E, M, W, IBLOCK, ISPLIT, ORFAC, Z, IZ, JZ, DESCZ, WORK, LWORK, IWORK, LIWORK,
                           IFAIL, ICLUSTR, GAP, INFO )

           INTEGER         INFO, IZ, JZ, LIWORK, LWORK, M, N

           DOUBLE          PRECISION ORFAC

           INTEGER         DESCZ( * ), IBLOCK( * ), ICLUSTR( * ), IFAIL( * ), ISPLIT( * ), IWORK( * )

           DOUBLE          PRECISION D( * ), E( * ), GAP( * ), W( * ), WORK( * ), Z( * )

PURPOSE

       PDSTEIN computes the eigenvectors of a symmetric tridiagonal matrix in parallel, using inverse iteration.
       The eigenvectors found correspond to user specified eigenvalues. PDSTEIN does not  orthogonalize  vectors
       that  are  on  different  processes. The extent of orthogonalization is controlled by the input parameter
       LWORK.  Eigenvectors that are to be orthogonalized are computed by the same process. PDSTEIN  decides  on
       the  allocation  of  work  among  the  processes and then calls DSTEIN2 (modified LAPACK routine) on each
       individual process. If insufficient workspace is allocated, the expected  orthogonalization  may  not  be
       done.

       Note : If the eigenvectors obtained are not orthogonal, increase
              LWORK and run the code again.

       Notes
       =====

       Each  global  data  object  is  described  by  an  associated description vector.  This vector stores the
       information required to establish the mapping between an object element and its corresponding process and
       memory location.

       Let A be a generic term for any 2D block  cyclicly  distributed  array.   Such  a  global  array  has  an
       associated  description  vector  DESCA.  In the following comments, the character _ should be read as "of
       the global array".

       NOTATION        STORED IN      EXPLANATION
       --------------- -------------- -------------------------------------- DTYPE_A(global) DESCA( DTYPE_  )The
       descriptor type.  In this case,
                                      DTYPE_A = 1.
       CTXT_A (global) DESCA( CTXT_ ) The BLACS context handle, indicating
                                      the BLACS process grid A is distribu-
                                      ted over. The context itself is glo-
                                      bal, but the handle (the integer
                                      value) may vary.
       M_A    (global) DESCA( M_ )    The number of rows in the global
                                      array A.
       N_A    (global) DESCA( N_ )    The number of columns in the global
                                      array A.
       MB_A   (global) DESCA( MB_ )   The blocking factor used to distribute
                                      the rows of the array.
       NB_A   (global) DESCA( NB_ )   The blocking factor used to distribute
                                      the columns of the array.
       RSRC_A (global) DESCA( RSRC_ ) The process row over which the first
                                      row  of  the  array  A is distributed.  CSRC_A (global) DESCA( CSRC_ ) The
       process column over which the
                                      first column of the array A is
                                      distributed.
       LLD_A  (local)  DESCA( LLD_ )  The leading dimension of the local
                                      array.  LLD_A >= MAX(1,LOCr(M_A)).

       Let K be the number of rows or columns of a distributed matrix, and assume  that  its  process  grid  has
       dimension p x q.
       LOCr( K ) denotes the number of elements of K that a process would receive if K were distributed over the
       p processes of its process column.
       Similarly,  LOCc(  K  )  denotes  the  number  of  elements  of  K that a process would receive if K were
       distributed over the q processes of its process row.
       The values of LOCr() and LOCc() may be determined via a call to the ScaLAPACK tool function, NUMROC:
               LOCr( M ) = NUMROC( M, MB_A, MYROW, RSRC_A, NPROW ),
               LOCc( N ) = NUMROC( N, NB_A, MYCOL, CSRC_A, NPCOL ).  An upper bound for these quantities may  be
       computed by:
               LOCr( M ) <= ceil( ceil(M/MB_A)/NPROW )*MB_A
               LOCc( N ) <= ceil( ceil(N/NB_A)/NPCOL )*NB_A

ARGUMENTS

       P = NPROW * NPCOL is the total number of processes

       N       (global input) INTEGER
               The order of the tridiagonal matrix T.  N >= 0.

       D       (global input) DOUBLE PRECISION array, dimension (N)
               The n diagonal elements of the tridiagonal matrix T.

       E       (global input) DOUBLE PRECISION array, dimension (N-1)
               The (n-1) off-diagonal elements of the tridiagonal matrix T.

       M       (global input) INTEGER
               The total number of eigenvectors to be found. 0 <= M <= N.

       W       (global input/global output) DOUBLE PRECISION array, dim (M)
               On  input, the first M elements of W contain all the eigenvalues for which eigenvectors are to be
               computed. The eigenvalues should be grouped by split-off  block  and  ordered  from  smallest  to
               largest  within the block (The output array W from PDSTEBZ with ORDER='b' is expected here). This
               array should be replicated on all processes.  On output, the first M elements contain  the  input
               eigenvalues in ascending order.

               Note : To obtain orthogonal vectors, it is best if eigenvalues are computed to highest accuracy (
               this  can  be  done  by  setting ABSTOL to the underflow threshold = DLAMCH('U') --- ABSTOL is an
               input parameter to PDSTEBZ )

       IBLOCK  (global input) INTEGER array, dimension (N)
               The submatrix indices associated with the corresponding eigenvalues in W  --  1  for  eigenvalues
               belonging  to  the  first  submatrix from the top, 2 for those belonging to the second submatrix,
               etc. (The output array IBLOCK from PDSTEBZ is expected here).

       ISPLIT  (global input) INTEGER array, dimension (N)
               The splitting points, at which T breaks up into submatrices.  The  first  submatrix  consists  of
               rows/columns  1 to ISPLIT(1), the second of rows/columns ISPLIT(1)+1 through ISPLIT(2), etc., and
               the NSPLIT-th consists of rows/columns ISPLIT(NSPLIT-1)+1 through  ISPLIT(NSPLIT)=N  (The  output
               array ISPLIT from PDSTEBZ is expected here.)

       ORFAC   (global input) DOUBLE PRECISION
               ORFAC  specifies  which  eigenvectors  should be orthogonalized.  Eigenvectors that correspond to
               eigenvalues which are within ORFAC*||T|| of each other are to be orthogonalized.  However, if the
               workspace is insufficient (see LWORK), this tolerance may be decreased until all eigenvectors  to
               be  orthogonalized  can  be  stored  in  one process.  No orthogonalization will be done if ORFAC
               equals zero.  A default value of 10^-3 is used if ORFAC is negative.  ORFAC should  be  identical
               on all processes.

       Z       (local output) DOUBLE PRECISION array,
               dimension  (DESCZ(DLEN_),  N/npcol + NB) Z contains the computed eigenvectors associated with the
               specified eigenvalues. Any vector which fails to converge is set to  its  current  iterate  after
               MAXITS  iterations  ( See DSTEIN2 ).  On output, Z is distributed across the P processes in block
               cyclic format.

       IZ      (global input) INTEGER
               Z's global row index, which points to the beginning of the submatrix which is to be operated on.

       JZ      (global input) INTEGER
               Z's global column index, which points to the beginning of the submatrix which is to  be  operated
               on.

       DESCZ   (global and local input) INTEGER array of dimension DLEN_.
               The array descriptor for the distributed matrix Z.

       WORK    (local workspace/global output) DOUBLE PRECISION array,
               dimension  (  LWORK  )  On  output,  WORK(1)  gives a lower bound on the workspace ( LWORK ) that
               guarantees the user desired orthogonalization (see ORFAC).  Note that this may  overestimate  the
               minimum workspace needed.

       LWORK   (local input) integer
               LWORK  controls the extent of orthogonalization which can be done. The number of eigenvectors for
               which  storage  is  allocated  on  each process is NVEC = floor(( LWORK- max(5*N,NP00*MQ00) )/N).
               Eigenvectors corresponding to eigenvalue clusters of size NVEC - ceil(M/P) + 1 are guaranteed  to
               be  orthogonal ( the orthogonality is similar to that obtained from DSTEIN2).  Note : LWORK  must
               be no smaller than: max(5*N,NP00*MQ00) + ceil(M/P)*N, and should have the same input value on all
               processes.  It is the minimum value of LWORK input on different processes that is significant.

               If LWORK = -1, then LWORK is global input and a workspace query  is  assumed;  the  routine  only
               calculates  the minimum and optimal size for all work arrays. Each of these values is returned in
               the first entry of the corresponding work array, and no error message is issued by PXERBLA.

       IWORK   (local workspace/global output) INTEGER array,
               dimension ( 3*N+P+1 ) On return, IWORK(1) contains the amount of integer workspace required.   On
               return,  the  IWORK(2)  through  IWORK(P+2)  indicate  the eigenvectors computed by each process.
               Process I computes eigenvectors indexed IWORK(I+2)+1 thru' IWORK(I+3).

       LIWORK  (local input) INTEGER
               Size of array IWORK.  Must be >= 3*N + P + 1

               If LIWORK = -1, then LIWORK is global input and a workspace query is assumed;  the  routine  only
               calculates  the minimum and optimal size for all work arrays. Each of these values is returned in
               the first entry of the corresponding work array, and no error message is issued by PXERBLA.

       IFAIL   (global output) integer array, dimension (M)
               On normal exit, all elements of IFAIL are zero.  If one or more  eigenvectors  fail  to  converge
               after  MAXITS iterations (as in DSTEIN), then INFO > 0 is returned.  If mod(INFO,M+1)>0, then for
               I=1 to mod(INFO,M+1), the eigenvector corresponding  to  the  eigenvalue  W(IFAIL(I))  failed  to
               converge ( W refers to the array of eigenvalues on output ).

               ICLUSTR  (global  output)  integer  array,  dimension (2*P) This output array contains indices of
               eigenvectors corresponding to a cluster of eigenvalues that could not be  orthogonalized  due  to
               insufficient  workspace  (see  LWORK,  ORFAC and INFO). Eigenvectors corresponding to clusters of
               eigenvalues  indexed  ICLUSTR(2*I-1)  to  ICLUSTR(2*I),  I  =  1  to  INFO/(M+1),  could  not  be
               orthogonalized  due  to lack of workspace. Hence the eigenvectors corresponding to these clusters
               may not be orthogonal.  ICLUSTR  is  a  zero  terminated  array  ---  (  ICLUSTR(2*K).NE.0  .AND.
               ICLUSTR(2*K+1).EQ.0 ) if and only if K is the number of clusters.

       GAP     (global output) DOUBLE PRECISION array, dimension (P)
               This  output  array  contains  the  gap  between  eigenvalues  whose  eigenvectors  could  not be
               orthogonalized. The INFO/M output values in this array  correspond  to  the  INFO/(M+1)  clusters
               indicated  by  the array ICLUSTR. As a result, the dot product between eigenvectors corresponding
               to the I^th cluster may be as high as ( O(n)*macheps ) / GAP(I).

       INFO    (global output) INTEGER
               = 0:  successful exit
               < 0:  If the i-th argument is an array and  the  j-entry  had  an  illegal  value,  then  INFO  =
               -(i*100+j), if the i-th argument is a scalar and had an illegal value, then INFO = -i.  < 0 :  if
               INFO = -I, the I-th argument had an illegal value
               >  0  :  if mod(INFO,M+1) = I, then I eigenvectors failed to converge in MAXITS iterations. Their
               indices are stored in the array IFAIL.  if INFO/(M+1) = I, then eigenvectors corresponding  to  I
               clusters of eigenvalues could not be orthogonalized due to insufficient workspace. The indices of
               the clusters are stored in the array ICLUSTR.

LAPACK version 1.5                                 12 May 1997                                        PDSTEIN(l)