Ubuntu Manpage: librpmem - remote persistent memory support library (EXPERIMENTAL)

Provided by: librpmem-dev_1.11.1-3build1_amd64

NAME

       librpmem - remote persistent memory support library (EXPERIMENTAL)

SYNOPSIS

              #include <librpmem.h>
              cc ... -lrpmem

   Library API versioning:
              const char *rpmem_check_version(
                  unsigned major_required,
                  unsigned minor_required);

   Error handling:
              const char *rpmem_errormsg(void);

   Other library functions:
       A description of other librpmem functions can be found on the following manual pages:

       • rpmem_create(3), rpmem_persist(3)

DESCRIPTION

       librpmem  provides low-level support for remote access to persistent memory (pmem) utilizing RDMA-capable
       RNICs.  The library can be used to remotely replicate a memory region over the RDMA  protocol.   It  uti‐
       lizes  an  appropriate  persistency mechanism based on the remote node's platform capabilities.  librpmem
       utilizes the ssh(1) client to authenticate a user on the remote node, and for encryption of  the  connec‐
       tion's out-of-band configuration data.  See SSH, below, for details.

       The  maximum  replicated  memory  region size can not be bigger than the maximum locked-in-memory address
       space limit.  See memlock in limits.conf(5) for more details.

       This library is for applications that use remote persistent memory directly, without the help of any  li‐
       brary-supplied  transactions  or  memory allocation.  Higher-level libraries that build on libpmem(7) are
       available and are recommended for most applications, see:

       • libpmemobj(7), a general use persistent memory API, providing memory allocation and transactional oper‐
         ations on variable-sized objects.

TARGET NODE ADDRESS FORMAT

              [<user>@]<hostname>[:<port>]

       The target node address is described by the hostname which the client connects to, with an optional  user
       name.   The  user  must  be  authorized  to authenticate to the remote machine without querying for pass‐
       word/passphrase.  The optional port number is used to establish the SSH  connection.   The  default  port
       number is 22.

REMOTE POOL ATTRIBUTES

       The  rpmem_pool_attr  structure  describes  a  remote pool and is stored in remote pool's metadata.  This
       structure must be passed to the rpmem_create(3) function by caller when creating a pool on  remote  node.
       When  opening  the pool using rpmem_open(3) function the appropriate fields are read from pool's metadata
       and returned back to the caller.

              #define RPMEM_POOL_HDR_SIG_LEN    8
              #define RPMEM_POOL_HDR_UUID_LEN   16
              #define RPMEM_POOL_USER_FLAGS_LEN 16

              struct rpmem_pool_attr {
                  char signature[RPMEM_POOL_HDR_SIG_LEN];
                  uint32_t major;
                  uint32_t compat_features;
                  uint32_t incompat_features;
                  uint32_t ro_compat_features;
                  unsigned char poolset_uuid[RPMEM_POOL_HDR_UUID_LEN];
                  unsigned char uuid[RPMEM_POOL_HDR_UUID_LEN];
                  unsigned char next_uuid[RPMEM_POOL_HDR_UUID_LEN];
                  unsigned char prev_uuid[RPMEM_POOL_HDR_UUID_LEN];
                  unsigned char user_flags[RPMEM_POOL_USER_FLAGS_LEN];
              };

       The signature field is an 8-byte field which describes the pool's on-media format.

       The major field is a major version number of the pool's on-media format.

       The compat_features field is a mask describing compatibility of pool's on-media format optional features.

       The incompat_features field is a mask describing compatibility of pool's on-media  format  required  fea‐
       tures.

       The  ro_compat_features  field is a mask describing compatibility of pool's on-media format features.  If
       these features are not available, the pool shall be opened in read-only mode.

       The poolset_uuid field is an UUID of the pool which the remote pool is associated with.

       The uuid field is an UUID of a first part of the remote pool.  This field can be used to connect the  re‐
       mote pool with other pools in a list.

       The  next_uuid  and  prev_uuid fields are UUIDs of next and previous replicas respectively.  These fields
       can be used to connect the remote pool with other pools in a list.

       The user_flags field is a 16-byte user-defined flags.

SSH

       librpmem utilizes the ssh(1) client to login and execute the rpmemd(1) process on the  remote  node.   By
       default, ssh(1) is executed with the -4 option, which forces using IPv4 addressing.

       For  debugging purposes, both the ssh client and the commands executed on the remote node may be overrid‐
       den by setting the RPMEM_SSH and RPMEM_CMD environment variables, respectively.  See ENVIRONMENT for  de‐
       tails.

FORK

       The  ssh(1)  client  is executed by rpmem_open(3) and rpmem_create(3) after forking a child process using
       fork(2).  The application must take this into account when using wait(2) and waitpid(2), which may return
       the PID of the ssh(1) process executed by librpmem.

       If fork(2) support is not enabled in libibverbs, rpmem_open(3) and rpmem_create(3)  will  fail.   By  de‐
       fault,  fabric(7)  initializes  libibverbs with fork(2) support by calling the ibv_fork_init(3) function.
       See fi_verbs(7) for more details.

CAVEATS

       librpmem relies on the library destructor being called from the main thread.  For this reason, all  func‐
       tions  that  might trigger destruction (e.g.  dlclose(3)) should be called in the main thread.  Otherwise
       some of the resources associated with that thread might not be cleaned up properly.

       librpmem registers a pool as a single memory region.  A Chelsio T4 and T5 hardware can not handle a memo‐
       ry region greater than or equal to 8GB due to a hardware bug.  So pool_size value for rpmem_create(3) and
       rpmem_open(3) using this hardware can not be greater than or equal to 8GB.

LIBRARY API VERSIONING

       This section describes how the library API is versioned, allowing applications to work with  an  evolving
       API.

       The  rpmem_check_version()  function is used to see if the installed librpmem supports the version of the
       library API required by an application.  The easiest way to do this is for the application to supply  the
       compile-time version information, supplied by defines in <librpmem.h>, like this:

              reason = rpmem_check_version(RPMEM_MAJOR_VERSION,
                                           RPMEM_MINOR_VERSION);
              if (reason != NULL) {
                  /* version check failed, reason string tells you why */
              }

       Any  mismatch  in the major version number is considered a failure, but a library with a newer minor ver‐
       sion number will pass this check since increasing minor versions imply backwards compatibility.

       An application can also check specifically for the existence of an interface by checking for the  version
       where  that  interface was introduced.  These versions are documented in this man page as follows: unless
       otherwise specified, all interfaces described here are available in version 1.0 of the  library.   Inter‐
       faces added after version 1.0 will contain the text introduced in version x.y in the section of this man‐
       ual describing the feature.

       When  the version check performed by rpmem_check_version() is successful, the return value is NULL.  Oth‐
       erwise the return value is a static string describing the reason for  failing  the  version  check.   The
       string returned by rpmem_check_version() must not be modified or freed.

ENVIRONMENT

       librpmem can change its default behavior based on the following environment variables.  These are largely
       intended for testing and are not normally required.

       • RPMEM_SSH=ssh_client

       Setting this environment variable overrides the default ssh(1) client command name.

       • RPMEM_CMD=cmd

       Setting  this environment variable overrides the default command executed on the remote node using either
       ssh(1) or the alternative remote shell command specified by RPMEM_SSH.

       RPMEM_CMD can contain multiple commands separated by a vertical bar (|).  Each consecutive command is ex‐
       ecuted on the remote node in order read from a pool set file.  This environment variable is read when the
       library is initialized, so RPMEM_CMD must be set prior to application launch (or prior  to  dlopen(3)  if
       librpmem is being dynamically loaded).

       • RPMEM_ENABLE_SOCKETS=0|1

       Setting this variable to 1 enables using fi_sockets(7) provider for in-band RDMA connection.  The sockets
       provider  does not support IPv6.  It is required to disable IPv6 system wide if RPMEM_ENABLE_SOCKETS == 1
       and target == localhost (or any other loopback interface address) and SSH_CONNECTION variable (see ssh(1)
       for more details) contains IPv6 address after ssh to loopback interface.  By default the sockets provider
       is disabled.

       • RPMEM_ENABLE_VERBS=0|1

       Setting this variable to 0 disables using fi_verbs(7) provider for in-band RDMA  connection.   The  verbs
       provider is enabled by default.

       • RPMEM_MAX_NLANES=num

       Limit the maximum number of lanes to num.  See LANES, in rpmem_create(3), for details.

       • RPMEM_WORK_QUEUE_SIZE=size

       Suggest the work queue size.  The effective work queue size can be greater than suggested if librpmem re‐
       quires  it  or  it  can  be smaller if underlying hardware does not support the suggested size.  The work
       queue size affects the performance of communication to the remote node.  rpmem_flush(3) operations can be
       added to the work queue up to the size of this queue.  When work queue is full any subsequent call has to
       wait till the work queue will be drained.  rpmem_drain(3) and rpmem_persist(3) among  other  things  also
       drain the work queue.

DEBUGGING AND ERROR HANDLING

       If  an  error  is  detected during the call to a librpmem function, the application may retrieve an error
       message describing the reason for the failure from rpmem_errormsg().  This function returns a pointer  to
       a  static  buffer containing the last error message logged for the current thread.  If errno was set, the
       error message may include a description of the corresponding error code as returned by strerror(3).   The
       error  message  buffer is thread-local; errors encountered in one thread do not affect its value in other
       threads.  The buffer is never cleared by any library function; its content is significant only  when  the
       return value of the immediately preceding call to a librpmem function indicated an error, or if errno was
       set.   The application must not modify or free the error message string, but it may be modified by subse‐
       quent calls to other library functions.

       Two versions of librpmem are typically available on a development system.  The normal  version,  accessed
       when  a  program  is  linked  using the -lrpmem option, is optimized for performance.  That version skips
       checks that impact performance and never logs any trace information or performs any run-time assertions.

       A second version of librpmem, accessed when a program uses the libraries under /usr/lib/pmdk_debug,  con‐
       tains  run-time  assertions  and trace points.  The typical way to access the debug version is to set the
       environment variable LD_LIBRARY_PATH to /usr/lib/pmdk_debug  or  /usr/lib64/pmdk_debug,  as  appropriate.
       Debugging output is controlled using the following environment variables.  These variables have no effect
       on the non-debug version of the library.

              NOTE:  On Debian/Ubuntu systems, this extra debug version of the library is shipped in the respec‐
              tive -debug Debian package and placed in the /usr/lib/$ARCH/pmdk_dbg/ directory.

       • RPMEM_LOG_LEVEL

       The value of RPMEM_LOG_LEVEL enables trace points in the debug version of the library, as follows:

       • 0 - This is the default level when RPMEM_LOG_LEVEL is not set.  No log messages  are  emitted  at  this
         level.

       • 1  - Additional details on any errors detected are logged (in addition to returning the errno-based er‐
         rors as usual).  The same information may be retrieved using rpmem_errormsg().

       • 2 - A trace of basic operations is logged.

       • 3 - Enables a very verbose amount of function call tracing in the library.

       • 4 - Enables voluminous and fairly obscure tracing information that is likely only useful to the  librp‐
         mem developers.

       Unless RPMEM_LOG_FILE is set, debugging output is written to stderr.

       • RPMEM_LOG_FILE

       Specifies  the  name of a file where all logging information should be written.  If the last character in
       the name is “-”, the PID of the current process will be appended to the file name when the  log  file  is
       created.  If RPMEM_LOG_FILE is not set, logging output is written to stderr.

EXAMPLE

       The following example uses librpmem to create a remote pool on given target node identified by given pool
       set  name.   The  associated  local memory pool is zeroed and the data is made persistent on remote node.
       Upon success the remote pool is closed.

              #include <assert.h>
              #include <unistd.h>
              #include <stdio.h>
              #include <stdlib.h>
              #include <string.h>

              #include <librpmem.h>

              #define POOL_SIGNATURE  "MANPAGE"
              #define POOL_SIZE   (32 * 1024 * 1024)
              #define NLANES      4

              #define DATA_OFF    4096
              #define DATA_SIZE   (POOL_SIZE - DATA_OFF)

              static void
              parse_args(int argc, char *argv[], const char **target, const char **poolset)
              {
                  if (argc < 3) {
                      fprintf(stderr, "usage:\t%s <target> <poolset>\n", argv[0]);
                      exit(1);
                  }

                  *target = argv[1];
                  *poolset = argv[2];
              }

              static void *
              alloc_memory()
              {
                  long pagesize = sysconf(_SC_PAGESIZE);
                  if (pagesize < 0) {
                      perror("sysconf");
                      exit(1);
                  }

                  /* allocate a page size aligned local memory pool */
                  void *mem;
                  int ret = posix_memalign(&mem, pagesize, POOL_SIZE);
                  if (ret) {
                      fprintf(stderr, "posix_memalign: %s\n", strerror(ret));
                      exit(1);
                  }

                  assert(mem != NULL);

                  return mem;
              }

              int
              main(int argc, char *argv[])
              {
                  const char *target, *poolset;
                  parse_args(argc, argv, &target, &poolset);

                  unsigned nlanes = NLANES;
                  void *pool = alloc_memory();
                  int ret;

                  /* fill pool_attributes */
                  struct rpmem_pool_attr pool_attr;
                  memset(&pool_attr, 0, sizeof(pool_attr));
                  strncpy(pool_attr.signature, POOL_SIGNATURE, RPMEM_POOL_HDR_SIG_LEN);

                  /* create a remote pool */
                  RPMEMpool *rpp = rpmem_create(target, poolset, pool, POOL_SIZE,
                          &nlanes, &pool_attr);
                  if (!rpp) {
                      fprintf(stderr, "rpmem_create: %s\n", rpmem_errormsg());
                      return 1;
                  }

                  /* store data on local pool */
                  memset(pool, 0, POOL_SIZE);

                  /* make local data persistent on remote node */
                  ret = rpmem_persist(rpp, DATA_OFF, DATA_SIZE, 0, 0);
                  if (ret) {
                      fprintf(stderr, "rpmem_persist: %s\n", rpmem_errormsg());
                      return 1;
                  }

                  /* close the remote pool */
                  ret = rpmem_close(rpp);
                  if (ret) {
                      fprintf(stderr, "rpmem_close: %s\n", rpmem_errormsg());
                      return 1;
                  }

                  free(pool);

                  return 0;
              }

NOTE

       The librpmem API is experimental and may be subject to change in the future.  However, using  the  remote
       replication in libpmemobj(7) is safe and backward compatibility will be preserved.

ACKNOWLEDGEMENTS

       librpmem  builds on the persistent memory programming model recommended by the SNIA NVM Programming Tech‐
       nical Work Group: <https://snia.org/nvmp>

NAME

SYNOPSIS

DESCRIPTION

TARGET NODE ADDRESS FORMAT

REMOTE POOL ATTRIBUTES

SSH

FORK

CAVEATS

LIBRARY API VERSIONING

ENVIRONMENT

DEBUGGING AND ERROR HANDLING

EXAMPLE

NOTE

ACKNOWLEDGEMENTS

SEE ALSO