HPL_pdpancrT man page on DragonFly

Man page or keyword search:  
man Server   44335 pages
apropos Keyword Search (all sections)
Output format
DragonFly logo
[printable version]

HPL_pdpancrT(3)		     HPL Library Functions	       HPL_pdpancrT(3)

NAME
       HPL_pdpancrT - Crout panel factorization.

SYNOPSIS
       #include "hpl.h"

       void HPL_pdpancrT( HPL_T_panel * PANEL, const int M, const int N, const
       int ICOFF, double * WORK );

DESCRIPTION
       HPL_pdpancrT factorizes	a panel of columns that is a  sub-array	 of  a
       larger  one-dimensional	panel  A using the Crout variant of the	 usual
       one-dimensional algorithm.  The lower triangular N0-by-N0  upper	 block
       of the panel is stored in transpose form.

       Bi-directional	exchange   is  used  to	 perform  the  swap::broadcast
       operations  at once  for one column in the panel.  This	results	 in  a
       lower  number  of slightly larger  messages than usual.	On P processes
       and assuming bi-directional links,  the running time of	this  function
       can be approximated by (when N is equal to N0):

	  N0 * log_2( P ) * ( lat + ( 2*N0 + 4 ) / bdwth ) +
	  N0^2 * ( M - N0/3 ) * gam2-3

       where  M	 is the local number of rows of	 the panel, lat and bdwth  are
       the latency and bandwidth of the network for  double   precision	  real
       words,  and   gam2-3  is an  estimate of the  Level 2 and Level 3  BLAS
       rate of execution. The  recursive  algorithm  allows indeed  to	almost
       achieve	 Level	3 BLAS	performance  in the panel factorization.  On a
       large  number of modern machines,  this	operation is  however  latency
       bound,	meaning	  that its cost can  be estimated  by only the latency
       portion N0 * log_2(P) * lat.  Mono-directional links will  double  this
       communication cost.

       Note  that   one	 iteration of the the main loop is unrolled. The local
       computation of the absolute value max of the next column	 is  performed
       just  after  its update by the current column. This allows to bring the
       current column only  once through  cache at each	 step.	 The   current
       implementation	does  not perform  any blocking	 for  this sequence of
       BLAS operations, however the design allows for plugging in  an  optimal
       (machine-specific)  specialized	 BLAS-like kernel.  This idea has been
       suggested to us by Fred Gustavson, IBM T.J. Watson Research Center.

ARGUMENTS
       PANEL   (local input/output)    HPL_T_panel *
	       On entry,  PANEL	 points to the data structure  containing  the
	       panel information.

       M       (local input)	       const int
	       On entry,  M specifies the local number of rows of sub(A).

       N       (local input)	       const int
	       On entry,  N specifies the local number of columns of sub(A).

       ICOFF   (global input)	       const int
	       On  entry,  ICOFF specifies the row and column offset of sub(A)
	       in A.

       WORK    (local workspace)       double *
	       On entry, WORK  is a workarray of size at least 2*(4+2*N0).

SEE ALSO
       HPL_dlocmax (3), HPL_dlocswpN (3),  HPL_dlocswpT (3),  HPL_pdmxswp (3),
       HPL_pdpancrN (3), HPL_pdpanllN (3), HPL_pdpanllT (3), HPL_pdpanrlN (3),
       HPL_pdpanrlT (3).

HPL 2.1			       October 26, 2012		       HPL_pdpancrT(3)
[top]

List of man pages available for DragonFly

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]
Tweet
Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................
Vote for polarhome
Free Shell Accounts :: the biggest list on the net