Provided by: libfabric-dev_1.17.0-3build2_amd64 bug

NAME

       fi_psm - The PSM Fabric Provider

OVERVIEW

       The  psm provider runs over the PSM 1.x interface that is currently supported by the Intel TrueScale Fab‐
       ric.  PSM provides tag-matching message queue functions that are optimized for MPI implementations.   PSM
       also  has  limited Active Message support, which is not officially published but is quite stable and well
       documented in the source code (part of the OFED release).  The psm provider makes use of  both  the  tag-
       matching  message queue functions and the Active Message functions to support a variety of libfabric data
       transfer APIs, including tagged message queue, message queue, RMA, and atomic operations.

       The psm provider can work with the psm2-compat library, which exposes a PSM 1.x interface over the  Intel
       Omni-Path Fabric.

LIMITATIONS

       The  psm  provider  doesn’t  support all the features defined in the libfabric API.  Here are some of the
       limitations:

       Endpoint types
              Only support non-connection based types FI_DGRAM and FI_RDM

       Endpoint capabilities
              Endpoints can support any combination of data transfer capabilities FI_TAGGED, FI_MSG, FI_ATOMICS,
              and FI_RMA.  These capabilities can be further refined by  FI_SEND,  FI_RECV,  FI_READ,  FI_WRITE,
              FI_REMOTE_READ,  and FI_REMOTE_WRITE to limit the direction of operations.  The limitation is that
              no two endpoints can have overlapping receive or RMA target capabilities in any of the above cate‐
              gories.  For example it is fine to have two endpoints with FI_TAGGED | FI_SEND, one endpoint  with
              FI_TAGGED  |  FI_RECV, one endpoint with FI_MSG, one endpoint with FI_RMA | FI_ATOMICS.  But it is
              not allowed to have two endpoints with FI_TAGGED, or two endpoints with FI_RMA.

       FI_MULTI_RECV is supported for non-tagged message queue only.

       Other supported capabilities include FI_TRIGGER.

       Modes  FI_CONTEXT is required for the FI_TAGGED and FI_MSG capabilities.  That means, any request belong‐
              ing to these two categories that generates a completion must pass as the operation context a valid
              pointer to type struct fi_context, and the space referenced by the pointer must  remain  untouched
              until  the  request  has  completed.  If none of FI_TAGGED and FI_MSG is asked for, the FI_CONTEXT
              mode is not required.

       Progress
              The psm provider requires manual progress.  The application is  expected  to  call  fi_cq_read  or
              fi_cntr_read  function  from  time  to  time  when no other libfabric function is called to ensure
              progress is made in a timely manner.  The provider does support auto progress mode.  However,  the
              performance  can  be  significantly  impacted if the application purely depends on the provider to
              make auto progress.

       Unsupported features
              These features are unsupported: connection management, scalable endpoint, passive endpoint, shared
              receive context, send/inject with immediate data.

RUNTIME PARAMETERS

       The psm provider checks for the following environment variables:

       FI_PSM_UUID
              PSM requires that each job has a unique ID (UUID).  All the processes in the same job need to  use
              the same UUID in order to be able to talk to each other.  The PSM reference manual advises to keep
              UUID unique to each job.  In practice, it generally works fine to reuse UUID as long as (1) no two
              jobs  with  the  same  UUID are running at the same time; and (2) previous jobs with the same UUID
              have exited normally.  If running into “resource busy” or “connection failure” issues with unknown
              reason, it is advisable to manually set the UUID to a value different from the default.

       The default UUID is 0FFF0FFF-0000-0000-0000-0FFF0FFF0FFF.

       FI_PSM_NAME_SERVER
              The psm provider has a simple built-in name server that can be used to resolve an  IP  address  or
              host name into a transport address needed by the fi_av_insert call.  The main purpose of this name
              server  is to allow simple client-server type applications (such as those in fabtests) to be writ‐
              ten purely with libfabric, without using any out-of-band communication mechanism.  For such appli‐
              cations, the server would run first to allow endpoints be created and  registered  with  the  name
              server, and then the client would call fi_getinfo with the node parameter set to the IP address or
              host  name of the server.  The resulting fi_info structure would have the transport address of the
              endpoint created by the server in the dest_addr field.  Optionally the service  parameter  can  be
              used  in  addition  to node.  Notice that the service number is interpreted by the provider and is
              not a TCP/IP port number.

       The name server is on by default.  It can be turned off by setting the variable to 0.  This  may  save  a
       small amount of resource since a separate thread is created when the name server is on.

       The provider detects OpenMPI and MPICH runs and changes the default setting to off.

       FI_PSM_TAGGED_RMA
              The  RMA functions are implemented on top of the PSM Active Message functions.  The Active Message
              functions have limit on the size of data can be transferred in a single message.  Large  transfers
              can  be divided into small chunks and be pipe-lined.  However, the bandwidth is sub-optimal by do‐
              ing this way.

       The psm provider use PSM tag-matching message queue functions to achieve higher bandwidth for large  size
       RMA.  For this purpose, a bit is reserved from the tag space to separate the RMA traffic from the regular
       tagged message queue.

       The option is on by default.  To turn it off set the variable to 0.

       FI_PSM_AM_MSG
              The  psm provider implements the non-tagged message queue over the PSM tag-matching message queue.
              One tag bit is reserved for this purpose.  Alternatively, the non-tagged message queue can be  im‐
              plemented over Active Message.  This experimental feature has slightly larger latency.

       This option is off by default.  To turn it on set the variable to 1.

       FI_PSM_DELAY
              Time (seconds) to sleep before closing PSM endpoints.  This is a workaround for a bug in some ver‐
              sions of PSM library.

       The default setting is 1.

       FI_PSM_TIMEOUT
              Timeout  (seconds) for gracefully closing PSM endpoints.  A forced closing will be issued if time‐
              out expires.

       The default setting is 5.

       FI_PSM_PROG_INTERVAL
              When auto progress is enabled (asked via the hints to fi_getinfo), a progress thread is created to
              make progress calls from time to time.   This  option  set  the  interval  (microseconds)  between
              progress calls.

       The default setting is 1 if affinity is set, or 1000 if not.  See FI_PSM_PROG_AFFINITY.

       FI_PSM_PROG_AFFINITY
              When  set,  specify  the  set  of CPU cores to set the progress thread affinity to.  The format is
              <start>[:<end>[:<stride>]][,<start>[:<end>[:<stride>]]]*,        where        each         triplet
              <start>:<end>:<stride>  defines  a  block  of  core_ids.  Both <start> and <end> can be either the
              core_id (when >=0) or core_id - num_cores (when <0).

       By default affinity is not set.

SEE ALSO

       fabric(7), fi_provider(7), fi_psm2(7), fi_psm3(7),

AUTHORS

       OpenFabrics.

Libfabric Programmer’s Manual                      2022-12-11                                          fi_psm(7)