HPL_pdpancrN man page on DragonFly

HPL_pdpancrN man page on DragonFly

Man page or keyword search:
man Server 44335 pages
apropos Keyword Search (all sections)
Output format

HPL_pdpancrN(3)		     HPL Library Functions	       HPL_pdpancrN(3)

NAME
       HPL_pdpancrN - Crout panel factorization.

SYNOPSIS
       #include "hpl.h"

       void HPL_pdpancrN( HPL_T_panel * PANEL, const int M, const int N, const
       int ICOFF, double * WORK );

DESCRIPTION
       HPL_pdpancrN factorizes	a panel of columns that is a  sub-array	 of  a
       larger  one-dimensional	panel  A using the Crout variant of the	 usual
       one-dimensional algorithm.  The lower triangular N0-by-N0  upper	 block
       of  the	panel is stored in no-transpose form (i.e. just like the input
       matrix itself).

       Bi-directional  exchange	 is  used  to  perform	 the   swap::broadcast
       operations   at	once  for one column in the panel.  This  results in a
       lower number of slightly larger	messages than usual.  On  P  processes
       and  assuming  bi-directional links,  the running time of this function
       can be approximated by (when N is equal to N0):

	  N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
	  N0^2 * ( M - N0/3 ) * gam2-3

       where M is the local number of rows of  the panel, lat and  bdwth   are
       the  latency  and bandwidth of the network for  double  precision  real
       words, and gam2-3 is  an	 estimate  of the  Level 2 and Level  3	  BLAS
       rate  of	 execution. The	 recursive  algorithm  allows indeed to almost
       achieve	Level 3 BLAS  performance  in the panel factorization.	 On  a
       large   number  of modern machines,  this  operation is however latency
       bound,  meaning	that its cost can  be estimated	 by only  the  latency
       portion	N0  * log_2(P) * lat.  Mono-directional links will double this
       communication cost.

       Note that  one  iteration of the the main loop is unrolled.  The	 local
       computation  of	the absolute value max of the next column is performed
       just after its update by the current column. This allows to  bring  the
       current	column	only  once through  cache at each  step.  The  current
       implementation  does not perform	 any blocking  for  this  sequence  of
       BLAS  operations,  however the design allows for plugging in an optimal
       (machine-specific) specialized  BLAS-like kernel.  This idea  has  been
       suggested to us by Fred Gustavson, IBM T.J. Watson Research Center.

ARGUMENTS
       PANEL   (local input/output)    HPL_T_panel *
	       On  entry,   PANEL  points to the data structure containing the
	       panel information.

       M       (local input)	       const int
	       On entry,  M specifies the local number of rows of sub(A).

       N       (local input)	       const int
	       On entry,  N specifies the local number of columns of sub(A).

       ICOFF   (global input)	       const int
	       On entry, ICOFF specifies the row and column offset  of	sub(A)
	       in A.

       WORK    (local workspace)       double *
	       On entry, WORK  is a workarray of size at least 2*(4+2*N0).

SEE ALSO
       HPL_dlocmax (3),	 HPL_dlocswpN (3),  HPL_dlocswpT (3), HPL_pdmxswp (3),
       HPL_pdpancrT (3), HPL_pdpanllN (3), HPL_pdpanllT (3), HPL_pdpanrlN (3),
       HPL_pdpanrlT (3).

HPL 2.1			       October 26, 2012		       HPL_pdpancrN(3)

[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]

Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................

Vote for polarhome