Ubuntu Manpage: fi_export_fid / fi_import

Provided by: libfabric-dev_2.1.0-1.1_amd64

NAME

       fi_export_fid / fi_import_fid
              Share a fabric object between different providers or resources

       struct fid_peer_av
              An address vector sharable between independent providers

       struct fid_peer_av_set
              An AV set sharable between independent providers

       struct fid_peer_cq
              A completion queue that may be shared between independent providers

       struct fid_peer_cntr
              A counter that may be shared between independent providers

       struct fid_peer_srx
              A shared receive context that may be shared between independent providers

SYNOPSIS

              #include <rdma/fabric.h>
              #include <rdma/fi_ext.h>
              #include <rdma/providers/fi_peer.h>

              int fi_export_fid(struct fid *fid, uint64_t flags,
                  struct fid **expfid, void *context);

              int fi_import_fid(struct fid *fid, struct fid *expfid, uint64_t flags);

ARGUMENTS

       fid    Returned fabric identifier for opened object.

       expfid Exported fabric object that may be shared with another provider.

       flags  Control flags for the operation.

       *context:
              User defined context that will be associated with a fabric object.

DESCRIPTION

NOTICE: The peer APIs describe by this man page are developmental and may change between libfabric ver‐
sions. The data structures and API definitions should not be considered stable between versions.
Providers being used as peers must target the same libfabric version.

Functions defined in this man page are typically used by providers to communicate with other providers,
known as peer providers, or by other libraries to communicate with the libfabric core, known as peer li‐
braries. Most middleware and applications should not need to access this functionality, as the documen‐
tation mainly targets provider developers.

Peer providers are a way for independently developed providers to be used together in a tight fashion,
such that layering overhead and duplicate provider functionality can be avoided. Peer providers are
linked by having one provider export specific functionality to another. This is done by having one
provider export a sharable fabric object (fid), which is imported by one or more peer providers.

As an example, a provider which uses TCP to communicate with remote peers may wish to use the shared mem‐
ory provider to communicate with local peers. To remove layering overhead, the TCP based provider may
export its completion queue and shared receive context and import those into the shared memory provider.

The general mechanisms used to share fabric objects between peer providers are similar, independent from
the object being shared. However, because the goal of using peer providers is to avoid overhead,
providers must be explicitly written to support the peer provider mechanisms.

When importing any shared fabric object into a peer, the owner will create a separate fid_peer_* for each
peer provider it intends to import into. The owner will pass this unique fid_peer_* into each peer
through the context parameter of the init call for the resource (i.e. fi_cq_open, fi_srx_context, fi_cn‐
tr_open, etc). The fi_peer_context will indicate the owner-allocated fid_peer for the peer to use but is
temporary for the init call and may not be accessed by the peer after initialization. The peer will set
just the peer_ops of the owner-allocated fid and save a reference to the imported fid_peer_* for use in
the peer API flow. The peer will allocate its own fid for internal uses and return that fid to the owner
through the regular fid parameter of the init call (as if it were just another opened resource). The
owner is responsible for saving the returned peer fid from the open call in order to close it later (or
to drive progress in the case of the cq_fid).

There are two peer provider models. In the example listed above, both peers are full providers in their
own right and usable in a stand-alone fashion. In a second model, one of the peers is known as an of‐
fload provider. An offload provider implements a subset of the libfabric API and targets the use of spe‐
cific acceleration hardware. For example, network switches may support collective operations, such as
barrier or broadcast. An offload provider may be written specifically to leverage this capability; how‐
ever, such a provider is not usable for general purposes. As a result, an offload provider is paired
with a main peer provider.

PEER AV

       The peer AV allows the sharing of addressing metadata between providers.  It specifically targets the use
       case  of having a main provider paired with an offload provider, where the offload provider leverages the
       communication that has already been established through the main provider.  In other situations, such  as
       that  mentioned  above  pairing  a tcp provider with a shared memory provider, each peer will likely have
       their own AV that is not shared.

       The setup for a peer AV is similar to the setup for a shared CQ, described below.  The owner  of  the  AV
       creates  a  fid_peer_av object that links back to its actual fid_av.  The fid_peer_av is then imported by
       the offload provider.

       Peer AVs are configured by the owner calling the peer’s fi_av_open() call, passing in the  FI_PEER  flag,
       and pointing the context parameter to struct fi_peer_av_context.

       The data structures to support peer AVs are:

              struct fid_peer_av;

              struct fi_ops_av_owner {
                  size_t  size;
                  int (*query)(struct fid_peer_av *av, struct fi_av_attr *attr);
                  fi_addr_t (*ep_addr)(struct fid_peer_av *av, struct fid_ep *ep);
              };

              struct fid_peer_av {
                  struct fid fid;
                  struct fi_ops_av_owner *owner_ops;
              };

              struct fi_peer_av_context {
                  size_t size;
                  struct fid_peer_av *av;
              };

   fi_ops_av_owner::query()
       This  call  returns  current  attributes  for the peer AV.  The owner sets the fields of the input struct
       fi_av_attr based on the current state of the AV for return to the caller.

   fi_ops_av_owner::ep_addr()
       This lookup function returns the fi_addr of the address associated with the given local endpoint.  If the
       address of the local endpoint has not been inserted into the AV, the function should  return  FI_ADDR_NO‐
       TAVAIL.

PEER AV SET

       The peer AV set allows the sharing of collective addressing data between providers.  It specifically tar‐
       gets  the  use  case pairing a main provider with a collective offload provider.  The setup for a peer AV
       set is similar to a shared CQ, described below.  The owner of the AV set creates a fid_peer_av_set object
       that links back to its fid_av_set.  The fid_peer_av_set is imported by the offload provider.

       Peer AV sets are configured by the owner  calling  the  peer’s  fi_av_set_open()  call,  passing  in  the
       FI_PEER_AV flag, and pointing the context parameter to struct fi_peer_av_set_context.

       The data structures to support peer AV sets are:

              struct fi_ops_av_set_owner {
                  size_t  size;
                  int (*members)(struct fid_peer_av_set *av, fi_addr_t *addr,
                             size_t *count);
              };

              struct fid_peer_av_set {
                  struct fid fid;
                  struct fi_ops_av_set_owner *owner_ops;
              };

              struct fi_peer_av_set_context {
                  size_t size;
                  struct fi_peer_av_set *av_set;
              };

   fi_ops_peer_av_owner::members
       This  call  returns  an  array  of AV addresses that are members of the AV set.  The size of the array is
       specified through the count parameter.  On return, count is set to the number of addresses in the AV set.
       If the input count value is too small, the function returns -FI_ETOOSMALL.  Otherwise, the  function  re‐
       turns an array of fi_addr values.

PEER CQ

       The  peer CQ defines a mechanism by which a peer provider may insert completions into the CQ owned by an‐
       other provider.  This avoids the overhead of the libfabric user needing to access multiple CQs.

       To setup a peer CQ, a provider creates a fid_peer_cq object, which links back to  the  provider’s  actual
       fid_cq.   The  fid_peer_cq object is then imported by a peer provider.  The fid_peer_cq defines callbacks
       that the providers use to communicate with each other.  The provider that allocates  the  fid_peer_cq  is
       known  as  the owner, with the other provider referred to as the peer.  An owner may setup peer relation‐
       ships with multiple providers.

       Peer CQs are configured by the owner calling the peer’s fi_cq_open()  call.   The  owner  passes  in  the
       FI_PEER  flag to fi_cq_open().  When FI_PEER is specified, the context parameter passed into fi_cq_open()
       must reference a struct fi_peer_cq_context.  Providers that  do  not  support  peer  CQs  must  fail  the
       fi_cq_open()  call  with  -FI_EINVAL  (indicating an invalid flag).  The fid_peer_cq referenced by struct
       fi_peer_cq_context must remain valid until the peer’s CQ is closed.

       The data structures to support peer CQs are defined as follows:

              struct fi_ops_cq_owner {
                  size_t  size;
                  ssize_t (*write)(struct fid_peer_cq *cq, void *context, uint64_t flags,
                      size_t len, void *buf, uint64_t data, uint64_t tag, fi_addr_t src);
                  ssize_t (*writeerr)(struct fid_peer_cq *cq,
                      const struct fi_cq_err_entry *err_entry);
              };

              struct fid_peer_cq {
                  struct fid fid;
                  struct fi_ops_cq_owner *owner_ops;
              };

              struct fi_peer_cq_context {
                  size_t size;
                  struct fid_peer_cq *cq;
              };

       For struct fid_peer_cq, the owner initializes the fid and owner_ops fields.   struct  fi_ops_cq_owner  is
       used by the peer to communicate with the owning provider.

       If  manual  progress  is needed on the peer CQ, the owner should drive progress by using the fi_cq_read()
       function with the buf parameter set to NULL and count equal 0.  The peer provider should set other  func‐
       tions that attempt to read the peer’s CQ (i.e. fi_cq_readerr, fi_cq_sread, etc.)  to return -FI_ENOSYS.

   fi_ops_cq_owner::write()
       This  call  directs the owner to insert new completions into the CQ.  The fi_cq_attr::format field, along
       with other related attributes, determines which input parameters are valid.  Parameters that are not  re‐
       ported as part of a completion are ignored by the owner, and should be set to 0, NULL, or other appropri‐
       ate value by the user.  For example, if source addressing is not returned with a completion, then the src
       parameter should be set to FI_ADDR_NOTAVAIL and ignored on input.

       The owner is responsible for locking, event signaling, and handling CQ overflow.  Data passed through the
       write  callback  is  relative to the user.  For example, the fi_addr_t is relative to the peer’s AV.  The
       owner is responsible for converting the address if source addressing is needed.

       (TBD: should CQ overflow push back to the user for flow control?  Do we need backoff /  resume  callbacks
       in ops_cq_user?)

   fi_ops_cq_owner::writeerr()
       The  behavior of this call is similar to the write() ops.  It inserts a completion indicating that a data
       transfer has failed into the CQ.

   EXAMPLE PEER CQ SETUP
       The above description defines the generic mechanism for sharing CQs between providers.  This section out‐
       lines one possible implementation to demonstrate the use of the APIs.  In the example,  provider  A  uses
       provider B as a peer for data transfers targeting endpoints on the local node.

              1. Provider A is configured to use provider B as a peer.  This may be coded
                 into provider A or set through an environment variable.
              2. The application calls:
                 fi_cq_open(domain_a, attr, &cq_a, app_context)
              3. Provider A allocates cq_a and automatically configures it to be used
                 as a peer cq.
              4. Provider A takes these steps:
                 allocate peer_cq and reference cq_a
                 set peer_cq_context->cq = peer_cq
                 set attr_b.flags |= FI_PEER
                 fi_cq_open(domain_b, attr_b, &cq_b, peer_cq_context)
              5. Provider B allocates a cq, but configures it such that all completions
                 are written to the peer_cq.  The cq ops to read from the cq are
                 set to enosys calls.
              6. Provider B inserts its own callbacks into the peer_cq object.  It
                 creates a reference between the peer_cq object and its own cq.

PEER COUNTER

       The  peer  counter  defines  a  mechanism  by which a peer provider may increment value or error into the
       counter owned by another provider.

       The setup of a peer counter is similar to the setup for a peer CQ outlined above.   The  owner’s  counter
       object is imported directly into the peer.

       The data structures to support peer counters are defined as follows:

              struct fi_ops_cntr_owner {
                  size_t size;
                  void (*inc)(struct fid_peer_cntr *cntr);
                  void (*incerr)(struct fid_peer_cntr *cntr);
              };

              struct fid_peer_cntr {
                  struct fid fid;
                  struct fi_ops_cntr_owner *owner_ops;
              };

              struct fi_peer_cntr_context {
                  size_t size;
                  struct fid_peer_cntr *cntr;
              };

       Similar to the peer CQ, if manual progress is needed on the peer counter, the owner should drive progress
       by  using  the  fi_cntr_read()  and the fi_cntr_read() should do nothing but progress the peer cntr.  The
       peer provider should set other functions that attempt to access the  peer’s  cntr  (i.e. fi_cntr_readerr,
       fi_cntr_set, etc.)  to return -FI_ENOSYS.

   fi_ops_cntr_owner::inc()
       This call directs the owner to increment the value of the cntr.

   fi_ops_cntr_owner::incerr()
       The  behavior  of  this call is similar to the inc() ops.  It increments the error of the cntr indicating
       that a data transfer has failed into the cntr.

PEER DOMAIN

       The peer domain allows a provider to access the operations of a domain object of its peer.  For  example,
       an offload provider can use a peer domain to register memory buffers with the main provider.

       The  setup  of a peer domain is similar to the setup for a peer CQ outline above.  The owner’s domain ob‐
       ject is imported directly into the peer.

       Peer domains are configured by the owner calling the peer’s fi_domain2() call.  The owner passes  in  the
       FI_PEER  flag to fi_domain2().  When FI_PEER is specified, the context parameter passed into fi_domain2()
       must reference a struct fi_peer_domain_context.  Providers that do not support peer domains must fail the
       fi_domain2() call with -FI_EINVAL.  The fid_domain referenced by struct fi_peer_domain_context  must  re‐
       main valid until the peer’s domain is closed.

       The data structures to support peer domains are defined as follows:

              struct fi_peer_domain_context {
                  size_t size;
                  struct fid_domain *domain;
              };

PEER EQ

       The  peer  EQ defines a mechanism by which a peer provider may insert events into the EQ owned by another
       provider.  This avoids the overhead of the libfabric user needing to access multiple EQs.

       The setup of a peer EQ is similar to the setup for a peer CQ outline above.  The owner’s EQ object is im‐
       ported directly into the peer provider.

       Peer EQs are configured by the owner calling the peer’s fi_eq_open()  call.   The  owner  passes  in  the
       FI_PEER  flag to fi_eq_open().  When FI_PEER is specified, the context parameter passed into fi_eq_open()
       must reference a struct fi_peer_eq_context.  Providers that  do  not  support  peer  EQs  must  fail  the
       fi_eq_open()  call  with  -FI_EINVAL  (indicating  an  invalid  flag).   The  fid_eq referenced by struct
       fi_peer_eq_context must remain valid until the peer’s EQ is closed.

       The data structures to support peer EQs are defined as follows:

              struct fi_peer_eq_context {
                  size_t size;
                  struct fid_eq *eq;
              };

PEER SRX

       The peer SRX defines a mechanism by which peer providers may share a common shared receive context.  This
       avoids the overhead of having separate receive queues, can eliminate memory copies, and  ensures  correct
       application level message ordering.

       The  setup  of  a  peer  SRX is similar to the setup for a peer CQ outlined above.  A fid_peer_srx object
       links the owner of the SRX with the peer provider.  Peer SRXs are configured by  the  owner  calling  the
       peer’s fi_srx_context() call with the FI_PEER flag set.  The context parameter passed to fi_srx_context()
       must be a struct fi_peer_srx_context.

       The   owner   provider   initializes   all   elements  of  the  fid_peer_srx  and  referenced  structures
       (fi_ops_srx_owner and fi_ops_srx_peer), with the exception of  the  fi_ops_srx_peer  callback  functions.
       Those  must  be initialized by the peer provider prior to returning from the fi_srx_contex() call and are
       used by the owner to control peer actions.

       The data structures to support peer SRXs are defined as follows:

              struct fid_peer_srx;

              /* Castable to dlist_entry */
              struct fi_peer_rx_entry {
                  struct fi_peer_rx_entry *next;
                  struct fi_peer_rx_entry *prev;
                  struct fi_peer_srx *srx;
                  fi_addr_t addr;
                  size_t msg_size;
                  uint64_t tag;
                  uint64_t cq_data;
                  uint64_t flags;
                  void *context;
                  size_t count;
                  void **desc;
                  void *peer_context;
                  void *owner_context;
                  struct iovec *iov;
              };

              struct fi_peer_match_attr {
                  fi_addr_t addr;
                  size_t msg_size;
                  uint64_t tag;
              };

              struct fi_ops_srx_owner {
                  size_t size;
                  int (*get_msg)(struct fid_peer_srx *srx,
                                 struct fi_peer_match_attr *attr,
                                 struct fi_peer_rx_entry **entry);
                  int (*get_tag)(struct fid_peer_srx *srx,
                                 struct fi_peer_match_attr *attr,
                                 uint64_t tag, struct fi_peer_rx_entry **entry);
                  int (*queue_msg)(struct fi_peer_rx_entry *entry);
                  int (*queue_tag)(struct fi_peer_rx_entry *entry);
                  void (*foreach_unspec_addr)(struct fid_peer_srx *srx,
                                fi_addr_t (*get_addr)(struct fi_peer_rx_entry *));

                  void (*free_entry)(struct fi_peer_rx_entry *entry);
              };

              struct fi_ops_srx_peer {
                  size_t size;
                  int (*start_msg)(struct fi_peer_rx_entry *entry);
                  int (*start_tag)(struct fi_peer_rx_entry *entry);
                  int (*discard_msg)(struct fi_peer_rx_entry *entry);
                  int (*discard_tag)(struct fi_peer_rx_entry *entry);
              };

              struct fid_peer_srx {
                  struct fid_ep ep_fid;
                  struct fi_ops_srx_owner *owner_ops;
                  struct fi_ops_srx_peer *peer_ops;
              };

              struct fi_peer_srx_context {
                  size_t size;
                  struct fid_peer_srx *srx;
              };

       The ownership of structure field values and callback functions is similar to those defined for peer  CQs,
       relative to owner versus peer ops.

       The  owner  is  responsible  for  acquiring any necessary locks before anything that could result in peer
       callbacks.  The following functions are progress  level  functions:  get_msg(),  get_tag(),  queue_msg(),
       queue_tag(),  free_entry(), start_msg(), start_tag(), discard_msg(), discard_tag().  If needed, it is the
       owner’s responsibility to acquire the appropriate lock prior to calling into a  peer’s  fi_cq_read(),  or
       similar, function that drives progress.

       The following functions are domain level functions: foreach_unspec_addr().  This function is used outside
       of message progress flow (i.e. during fi_av_insert()).  The owner of the srx is responsible for acquiring
       the same lock, if needed.

   fi_peer_rx_entry
       fi_peer_rx_entry  defines  a common receive entry for use between the owner and peer.  The entry is allo‐
       cated and set by the owner and passed between owner and peer to communicate details of  the  application-
       posted  receive  entry.   All  fields are initialized by the owner, except in the unexpected message case
       where the peer can initialize any extra available data before queuing the message with  the  owner.   The
       peer_context  and  owner_context fields are only modifiable by the peer and owner, respectively, to store
       extra provider-specific information.

   fi_ops_srx_owner::get_msg() / get_tag()
       These calls are invoked by the peer provider to obtain the receive buffer(s) where  an  incoming  message
       should be placed.  The peer provider will pass in the relevant fields to request a matching rx_entry from
       the  owner.  If source addressing is required, the addr will be passed in; otherwise, the address will be
       set to FI_ADDR_NOT_AVAIL.  The msg_size field indicates the received message size.   This  field  may  be
       needed  by the owner when handling FI_MULTI_RECV or FI_PEEK.  The owner will set the peer_entry->msg_size
       field on get_msg/tag() for the owner and peer to use later, if needed.  This field will be  set  on  both
       the expected and unexpected paths.  The returned rx_entry->iov returned from the owner refers to the full
       size of the posted receive passed to the peer.  The peer provider is responsible for checking that an in‐
       coming message fits within the provided buffer space and generating truncation errors.  The tag parameter
       is  only  used for tagged messages but must be set to 0 for the non-tagged cases.  An fi_peer_rx_entry is
       allocated by the owner, whether or not a match was found.  If a match was found, the  owner  will  return
       FI_SUCCESS  and  the rx_entry will be filled in with the known receive fields for the peer to process ac‐
       cordingly.  This includes the information that was passed into the calls as well as  the  rx_entry->flags
       with  either  FI_MSG | FI_RECV (for get_msg()) or FI_TAGGED | FI_RECV (for get_tag()).  The peer provider
       is responsible for completing with any other flags, if needed.  If no match was found, the owner will re‐
       turn -FI_ENOENT; the rx_entry will still be valid but will not match to an existing posted receive.  When
       the peer gets FI_ENOENT, it should allocate whatever resources it needs to process the message later  (on
       start_msg/tag)  and  set  the  rx_entry->peer_context  appropriately,  followed  by a call to the owner’s
       queue_msg/tag.  The get and queue calls should be serialized.  When the owner gets a matching receive for
       the queued unexpected message, it will call the peer’s start function to notify the peer of  the  updated
       rx_entry (or the peer’s discard function if the message is to be discarded)

fi_ops_srx_owner::queue_msg() / queue_tag()

       Called  by the peer to queue an incoming unexpected message to the srx.  Once it gets queued by the peer,
       the owner is responsible for starting it once it gets matched to a receive buffer, or discard it if need‐
       ed.

   fi_ops_srx_owner::foreach_unspec_addr()
       Called by the peer when any addressing updates have occurred with the peer.  This triggers the  owner  to
       iterate over any entries whose address is still unknown and call the inputed get_addr function on each to
       retrieve updated address information.

fi_ops_srx_owner:: free_entry()

Called by the peer when it is completely done using an owner-allocated peer entry.

fi_ops_srx_peer::start_msg() / start_tag()
These calls indicate that an asynchronous get_msg() or get_tag() has completed and a buffer is now avail‐
able to receive the message. Control of the fi_peer_rx_entry is returned to the peer provider and has
been initialized for receiving the incoming message.

fi_ops_srx_peer::discard_msg() / discard_tag()
Indicates that the message and data associated with the specified fi_peer_rx_entry should be discarded.
This often indicates that the application has canceled or discarded the receive operation. No completion
should be generated by the peer provider for a discarded message. Control of the fi_peer_rx_entry is re‐
turned to the peer provider.

EXAMPLE PEER SRX SETUP
The above description defines the generic mechanism for sharing SRXs between providers. This section
outlines one possible implementation to demonstrate the use of the APIs. In the example, provider A uses
provider B as a peer for data transfers targeting endpoints on the local node.

1. Provider A is configured to use provider B as a peer. This may be coded
into provider A or set through an environment variable.
2. The application calls:
fi_srx_context(domain_a, attr, &srx_a, app_context)
3. Provider A allocates srx_a and automatically configures it to be used
as a peer srx.
4. Provider A takes these steps:
allocate peer_srx and reference srx_a
set peer_srx_context->srx = peer_srx
set attr_b.flags |= FI_PEER
fi_srx_context(domain_b, attr_b, &srx_b, peer_srx_context)
5. Provider B allocates an srx, but configures it such that all receive
buffers are obtained from the peer_srx. The srx ops to post receives are
set to enosys calls.
6. Provider B inserts its own callbacks into the peer_srx object. It
creates a reference between the peer_srx object and its own srx.

EXAMPLE PEER SRX RECEIVE FLOW
The following outlines shows simplified, example software flows for receive message handling using a peer
SRX. The first flow demonstrates the case where a receive buffer is waiting when the message arrives.

1. Application calls fi_recv() / fi_trecv() on owner.
2. Owner queues the receive buffer.
3. A message is received by the peer provider.
4. The peer calls owner->get_msg() / get_tag().
5. The owner removes the queued receive buffer and returns it to
the peer. The get entry call will complete with FI_SUCCESS.
6. When the peer finishes processing the message and completes it on its own
CQ, the peer will call free_entry to free the entry with the owner.

The second case below shows the flow when a message arrives before the application has posted the match‐
ing receive buffer.

1. A message is received by the peer provider.
2. The peer calls owner->get_msg() / get_tag(). If the incoming address is
FI_ADDR_UNSPEC, the owner cannot match this message to a receive posted with
FI_DIRECTED_RECV and can only match to receives posted with FI_ADDR_UNSPEC.
3. The owner fails to find a matching receive buffer.
4. The owner allocates a rx_entry with any known fields and returns -FI_ENOENT.
5. The peer allocates any resources needed to handle the asynchronous processing
and sets peer_context accordingly, calling the owner's queue
function when ready to queue the unexpected message from the peer.
6. The application calls fi_recv() / fi_trecv() on owner, posting the
matching receive buffer.
7. The owner matches the receive with the queued message on the peer. Note that
the owner cannot match a directed receive with an unexpected message whose
address is unknown.
8. The owner removes the queued request, fills in the rest of the known fields
and calls the peer->start_msg() / start_tag() function.
9. When the peer finishes processing the message and completes it on its own
CQ, the peer will call free_entry to free the entry with the owner.

Whenever a peer’s addressing is updated (e.g. via fi_av_insert()), it needs to call the owner’s fore‐
ach_unspec_addr() call to trigger any necessary updating of unknown entries. The owner is expected to
iterate over any necessary entries and call the inputed get_addr() function on each one in order to get
updated addressing information. Once the address is known, the owner can proceed to receive directed re‐
ceives into those entries.

fi_export_fid / fi_import_fid

       The fi_export_fid function is reserved for future use.

       The fi_import_fid call may be used to import a fabric object created and owned  by  the  libfabric  user.
       This  allows upper level libraries or the application to override or define low-level libfabric behavior.
       Details on specific uses of fi_import_fid are outside the scope of this documentation.

FI_PEER_TRANSFER

       Providers frequently send control messages to their remote counterparts as part of their  wire  protocol.
       For  example, a provider may send an ACK message to guarantee reliable delivery of a message or to meet a
       requested completion semantic.  When two or more providers are coordinating as peers, it can be more  ef‐
       ficient  if  control messages for both peer providers go over the same transport.  In some cases, such as
       when one of the peers is an offload provider, it may even be required.  Peer transfers define the  mecha‐
       nism by which such communication occurs.

       Peer  transfers  enable  one peer to send and receive data transfers over its associated peer.  Providers
       that require this functionality indicate this by  setting  the  FI_PEER_TRANSFER  flag  as  a  mode  bit,
       i.e. fi_info::mode.

       To  use  such  a  provider as a peer, the main, or owner, provider must setup peer transfers by opening a
       peer transfer endpoint and accepting transfers with this flag set.  Setup of peer transfers involves  the
       following data structures:

              struct fi_ops_transfer_peer {
                  size_t size;
                  ssize_t (*complete)(struct fid_ep *ep, struct fi_cq_tagged_entry *buf,
                          fi_addr_t *src_addr);
                  ssize_t (*comperr)(struct fid_ep *ep, struct fi_cq_err_entry *buf);
              };

              struct fi_peer_transfer_context {
                  size_t size;
                  struct fi_info *info;
                  struct fid_ep *ep;
                  struct fi_ops_transfer_peer *peer_ops;
              };

       Peer  transfer  contexts  form  a virtual link between endpoints allocated on each of the peer providers.
       The setup of a peer transfer context occurs through the  fi_endpoint()  API.   The  main  provider  calls
       fi_endpoint()  with  the  FI_PEER_TRANSFER  mode bit set in the info parameter, and the context parameter
       must reference the struct fi_peer_transfer_context defined above.

       The size field indicates the size of struct fi_peer_transfer_context being passed to the peer.   This  is
       used for backward compatibility.  The info field is optional.  If given, it defines the attributes of the
       main  provider’s  objects.  It may be used to report the capabilities and restrictions on peer transfers,
       such as whether memory registration is required, maximum message sizes, data and completion ordering  se‐
       mantics,  and  so  forth.   If  the  importing  provider cannot meet these restrictions, it must fail the
       fi_endpoint() call.

       The peer_ops field contains callbacks from the main provider into the peer and is used to report the com‐
       pletion (success or failure) of peer initiated data transfers.  The callback functions defined in  struct
       fi_ops_transfer_peer  must be set by the peer provider before returning from the fi_endpoint() call.  Ac‐
       tions that the peer provider can take from within the completion callbacks are most unrestricted, and can
       include any of the following types of operations: initiation of additional data transfers, writing events
       to the owner’s CQ or EQ, and memory registration/deregistration.  The owner  must  ensure  that  deadlock
       cannot  occur prior to invoking the peer’s callback should the peer invoke any of these operations.  Fur‐
       ther, the owner must avoid recursive calls into the completion callbacks.

RETURN VALUE

       Returns FI_SUCCESS on success.  On error, a negative value corresponding to  fabric  errno  is  returned.
       Fabric errno values are defined in rdma/fi_errno.h.

AUTHORS

       OpenFabrics.

Libfabric Programmer’s Manual                      2024-12-10                                         fi_peer(3)